R Details Flashcards
(104 cards)
How can you join vectors together?
Using the names of the data sets/vectors you want to add- eg girls and boys this is the code
> children =c(girls,boys)
How do you check the length of the vector
> length(vector name)
When adding vectors what is key to remember?
Don’t put + signs - only commas
How to extract particular elements/numbers from a vector / data set?
> nameofvector[1]
The square brackets tell r where in the vector you want to be shown
A range of elements is written
nameofvector[1:7]
How do you see a vector without certain elements
> nameofvector[-1]
Minus the first element
Maximum value of the vector
> max(vectorname)
How to work out if any vectors match our number
> which(vectorname==7)
Will give you the position of those values that match
Change the name of a vector
Vector= nameofvector
How to calculate the sum of all elements
> sum(vector)
Mean of elements
> mean(vector)
Median of elements
> median(vector)
Variance of elements
> var(vector)
Standard deviation
> std = function(x) sqrt(va(x))
std(vector)
You have to teach r how to calculate standard deviation
Normality test example
Shapiro- wilks test
When should you use Shapiro- wilks
To answer the null hypothesis: the data is drawn from a normal population
The p-value is the probability that our data are normal
A low p value lower than 0.05/5% allows us to reject the null hypothesis - meaning the alternative is true - the data is not normal
What do you do if your data is not Normal?
Calculate a non-parametric measure of data spread eg interquartile range
>IQR(vector)
Or
Median average deviation (MAD)- this finds the median of the absolute differences from the median and then multiples by a constant (1.4826)- which makes it comparable with the standard deviation
>mad(vector)
What is the code for summary and what does it show you?
> summary(vector)
Reports:
Minimum
Maximum
Median
Mean
1st quartile
3rd quartile
How do you graphically show that random data is approx normal?
“Normal probability plot”
Any curving will show that the distribution has short or long tails
The line is drawn through points formed by the 1st and 3rd quartiles
>qqnorm(vector,main=“normal (0,1)”)
>qqline(vector)
What does a data transformation do?
Attempts of approximate normality before parametric stats can be applied
If data cant be converted to normality non parametric stats have to be used
Common data transformation process
Logarithm of the data - log(x+1)
>qqnorm(log(vector+1))
>qqline(log(vector+1))
Test if it worked with a normality test
Barcharts in r
> barplot(vector)
How to generate a more informative barplot
> table(vector)
barplot(table(vector))
How to change the scale on a barplot
> barplot(table(vector)/valuemeasured(vector))
How to add labels to a barplot
> labels=as.vector*(c(“one”, “two”,”three”))
barplot((table(vector)/measurement(vector)), names.arg=labels , xlab**=“Number of children”, ylab=“relative frequency”)
*actually write as.vector here
**label for x axis etc