Stats and Probability Flashcards
(36 cards)
What is Data that can be named called?
Categorical Nominal.
What is Data that can be put in an order called?
Categorical Ordinal.
What is Data that increases at set intervals called?
Discrete Numerical.
What is Data that can be measured to any degree of accuracy called?
Continuous Numerical.
What is Systematic Sampling?
Each member of the population is numbered, and a system is used to draw the sample.
What is Stratified Sampling?
The population is organized based on known characteristics and samples drawn from sub-groups.
What is Cluster Sampling?
Where the population is already organized into groups and areas.
What is Non-Random Sampling?
May include advertising for volunteers or using an opportunity sample.
How to load a dataset on RStudio -
Session > Set Working Directory > Choose Directory.
How to calculate proportions-
prop.table(table(X))
(to display this as a percentage, *100)
What to do when you are finished with a dataset-
detach(name)
How to produce a pie chart for (categorical) data -
pie(table(X), main=”Title”)
How to produce a bar chart for (categorical) data -
barplot(table(X), main=”Title”, xlab=”Categories”, ylab=”Frequency”)
How to produce scatter plot for (numerical) data -
plot(X-bottom, Y-side, main= “X and Y”)
How to produce histogram for (numerical) data -
hist(X)
How to produce box plot for (numerical) data -
boxplot(Y,X, main=”Y and X”, horizontal=TRUE, names=c(“Y”,”X”))
What does a box plot tell us?
- The box shows the interquartile range (IQR) – the middle 50% of the data (from Q1 to Q3).
- The line inside the box shows the median.
- The whiskers extend from the box to the minimum and maximum values (excluding outliers).
Data Distribution skewed to the right-
Positive Skew - Most of the data values are clustered toward the lower end (left side), and a few larger values (high outliers) pull the tail to the right.
Data Distribution skewed to the left-
Negative Skew - Most of the data values are clustered toward the higher end (right side), and a few smaller values (low outliers) pull the tail to the left.
Correlation Coefficient Test-
cor.test(X,Y)
Strong Negative Linear Relationship -
A correlation coefficient between -0.75 and -1 is a strong negative relationship.
Strong Positive Linear Relationship -
A correlation coefficient between 0.75 and 1 is a strong positive relationship.
Moderate Negative Linear Relationship -
Values between -0.3 and -0.75 are considered a moderate negative relationship
Moderate Positive Linear Relationship -
Values between 0.3 and 0.75 are considered a moderate positive relationship.