Stats and Probability Flashcards

(36 cards)

1
Q

What is Data that can be named called?

A

Categorical Nominal.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is Data that can be put in an order called?

A

Categorical Ordinal.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is Data that increases at set intervals called?

A

Discrete Numerical.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is Data that can be measured to any degree of accuracy called?

A

Continuous Numerical.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is Systematic Sampling?

A

Each member of the population is numbered, and a system is used to draw the sample.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is Stratified Sampling?

A

The population is organized based on known characteristics and samples drawn from sub-groups.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is Cluster Sampling?

A

Where the population is already organized into groups and areas.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is Non-Random Sampling?

A

May include advertising for volunteers or using an opportunity sample.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How to load a dataset on RStudio -

A

Session > Set Working Directory > Choose Directory.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How to calculate proportions-

A

prop.table(table(X))
(to display this as a percentage, *100)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What to do when you are finished with a dataset-

A

detach(name)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How to produce a pie chart for (categorical) data -

A

pie(table(X), main=”Title”)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How to produce a bar chart for (categorical) data -

A

barplot(table(X), main=”Title”, xlab=”Categories”, ylab=”Frequency”)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How to produce scatter plot for (numerical) data -

A

plot(X-bottom, Y-side, main= “X and Y”)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How to produce histogram for (numerical) data -

A

hist(X)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How to produce box plot for (numerical) data -

A

boxplot(Y,X, main=”Y and X”, horizontal=TRUE, names=c(“Y”,”X”))

16
Q

What does a box plot tell us?

A
  • The box shows the interquartile range (IQR) – the middle 50% of the data (from Q1 to Q3).
  • The line inside the box shows the median.
  • The whiskers extend from the box to the minimum and maximum values (excluding outliers).
17
Q

Data Distribution skewed to the right-

A

Positive Skew - Most of the data values are clustered toward the lower end (left side), and a few larger values (high outliers) pull the tail to the right.

18
Q

Data Distribution skewed to the left-

A

Negative Skew - Most of the data values are clustered toward the higher end (right side), and a few smaller values (low outliers) pull the tail to the left.

19
Q

Correlation Coefficient Test-

A

cor.test(X,Y)

20
Q

Strong Negative Linear Relationship -

A

A correlation coefficient between -0.75 and -1 is a strong negative relationship.

21
Q

Strong Positive Linear Relationship -

A

A correlation coefficient between 0.75 and 1 is a strong positive relationship.

22
Q

Moderate Negative Linear Relationship -

A

Values between -0.3 and -0.75 are considered a moderate negative relationship

23
Q

Moderate Positive Linear Relationship -

A

Values between 0.3 and 0.75 are considered a moderate positive relationship.

24
No Linear Relationship -
Values close to zero (between -0.3 and 0.3) indicate that there is no relationship between variable X and Y.
25
Line of Best Fit -
lm(Y~X) write the line of best fit as Y=X(x) - Intercept
26
Applying the Line of Best Fit -
abline(lm(Y~X))
27
Null Hypothesis-
The null hypothesis (H₀) is a statement that there is no effect, no difference, or no relationship between variables. It represents the idea you’re testing against.
28
Alternative Hypothesis-
The alternative hypothesis (H₁) is a statement in statistics that contradicts the null hypothesis (H₀). It proposes that there is a real effect, a difference, or a relationship between variables.
29
p-value -
If the p-value is less than 0.05, we can reject the null hypothesis, thus the results are statistically significant.
30
95% confidence interval -
Tells us with 95% certainty that the 'true' difference between the means lies between the given confidence intervals. If it does not include a 0, there is a less than 5% probability that the difference between the samples is 0. This confirms that the result is statistically significant.
31
t-test -
t.test(X,Y) Used to compare numerical data that is normally distributed.
32
Paired t-test -
t.test(paired="TRUE") Used when there is meaningful pairing between numerical data samples.
33
z-test -
prop.test(x=c(a,b), n=c(n1,n2)) Used to compare categorical data. The 'prop 1', and 'prop 2' can be used to compare percentages between the two samples.
34
type 1 error -
Known as a false positive. This occurs when results appear to be significant, but in fact there is no real difference.
35
type 2 error -
Known as a false negative. There is a real difference, but the statistical test did not reflect on this.