Stats and Probability Flashcards by Jannatul Rahman

What is Data that can be named called?

Categorical Nominal.

How well did you know this?

Not at all

Perfectly

What is Data that can be put in an order called?

Categorical Ordinal.

How well did you know this?

Not at all

Perfectly

What is Data that increases at set intervals called?

Discrete Numerical.

How well did you know this?

Not at all

Perfectly

What is Data that can be measured to any degree of accuracy called?

Continuous Numerical.

How well did you know this?

Not at all

Perfectly

What is Systematic Sampling?

Each member of the population is numbered, and a system is used to draw the sample.

How well did you know this?

Not at all

Perfectly

What is Stratified Sampling?

The population is organized based on known characteristics and samples drawn from sub-groups.

How well did you know this?

Not at all

Perfectly

What is Cluster Sampling?

Where the population is already organized into groups and areas.

How well did you know this?

Not at all

Perfectly

What is Non-Random Sampling?

May include advertising for volunteers or using an opportunity sample.

How well did you know this?

Not at all

Perfectly

How to load a dataset on RStudio -

Session > Set Working Directory > Choose Directory.

How well did you know this?

Not at all

Perfectly

How to calculate proportions-

prop.table(table(X))
(to display this as a percentage, *100)

How well did you know this?

Not at all

Perfectly

What to do when you are finished with a dataset-

detach(name)

How well did you know this?

Not at all

Perfectly

How to produce a pie chart for (categorical) data -

pie(table(X), main=”Title”)

How well did you know this?

Not at all

Perfectly

How to produce a bar chart for (categorical) data -

barplot(table(X), main=”Title”, xlab=”Categories”, ylab=”Frequency”)

How well did you know this?

Not at all

Perfectly

How to produce scatter plot for (numerical) data -

plot(X-bottom, Y-side, main= “X and Y”)

How well did you know this?

Not at all

Perfectly

How to produce histogram for (numerical) data -

hist(X)

How well did you know this?

Not at all

Perfectly

How to produce box plot for (numerical) data -

Study These Flashcards

boxplot(Y,X, main=”Y and X”, horizontal=TRUE, names=c(“Y”,”X”))

What does a box plot tell us?

Study These Flashcards

The box shows the interquartile range (IQR) – the middle 50% of the data (from Q1 to Q3).
The line inside the box shows the median.
The whiskers extend from the box to the minimum and maximum values (excluding outliers).

Data Distribution skewed to the right-

Study These Flashcards

Positive Skew - Most of the data values are clustered toward the lower end (left side), and a few larger values (high outliers) pull the tail to the right.

Data Distribution skewed to the left-

Study These Flashcards

Negative Skew - Most of the data values are clustered toward the higher end (right side), and a few smaller values (low outliers) pull the tail to the left.

Correlation Coefficient Test-

Study These Flashcards

cor.test(X,Y)

Strong Negative Linear Relationship -

Study These Flashcards

A correlation coefficient between -0.75 and -1 is a strong negative relationship.

Strong Positive Linear Relationship -

Study These Flashcards

A correlation coefficient between 0.75 and 1 is a strong positive relationship.

Moderate Negative Linear Relationship -

Study These Flashcards

Values between -0.3 and -0.75 are considered a moderate negative relationship

Moderate Positive Linear Relationship -

Study These Flashcards

Values between 0.3 and 0.75 are considered a moderate positive relationship.

No Linear Relationship -

Values close to zero (between -0.3 and 0.3) indicate that there is no relationship between variable X and Y.

Line of Best Fit -

lm(Y~X) write the line of best fit as Y=X(x) - Intercept

Applying the Line of Best Fit -

abline(lm(Y~X))

Null Hypothesis-

The null hypothesis (H₀) is a statement that there is no effect, no difference, or no relationship between variables. It represents the idea you’re testing against.

Alternative Hypothesis-

The alternative hypothesis (H₁) is a statement in statistics that contradicts the null hypothesis (H₀). It proposes that there is a real effect, a difference, or a relationship between variables.

p-value -

If the p-value is less than 0.05, we can reject the null hypothesis, thus the results are statistically significant.

95% confidence interval -

Tells us with 95% certainty that the 'true' difference between the means lies between the given confidence intervals. If it does not include a 0, there is a less than 5% probability that the difference between the samples is 0. This confirms that the result is statistically significant.

t-test -

t.test(X,Y) Used to compare numerical data that is normally distributed.

Paired t-test -

t.test(paired="TRUE") Used when there is meaningful pairing between numerical data samples.

z-test -

prop.test(x=c(a,b), n=c(n1,n2)) Used to compare categorical data. The 'prop 1', and 'prop 2' can be used to compare percentages between the two samples.

type 1 error -

Known as a false positive. This occurs when results appear to be significant, but in fact there is no real difference.

type 2 error -

Known as a false negative. There is a real difference, but the statistical test did not reflect on this.

Stats and Probability Flashcards

(36 cards)