# INFERENTIAL STATISTICS Flashcards

1
Q

inferential stats?

A

reach conclusions that extend beyond immediate data sets

2
Q

Bernoulli distribution?

A

important case of discrete variables–>Binary only 2 possible outcomes (0 or 1)

3
Q

population parameter?

A

fixed feature of a particular population e.g. pop mean, pop variance

4
Q

sample stats?

A

quantity that vary from one sample to another (obtain population parameter using random sampling as surveying entire population not practical)

5
Q

Law of large numbers?

A

as sample size n increases, the sample mean gets closer to population mean

6
Q

Central limit theorem?

A

when sample size large (n>=30),sampling distribution of x is approximately normal, regardless of distribution we started out with

7
Q

hypothesis testing

A

tells us how extreme our sample outcome is. creates a rejection region, beyond which sample too extreme to maintain that null hypothesis is true

8
Q

standardisation

A

Z=(x-mean)/SD Z~N(0,1)

9
Q

test stat Z

A

(p observed-p)/sample variance (reject if >1.96)

10
Q

reject Ho?

A

p-value<0.05

11
Q

95% Confidence Interval

A

(pop mean-1.96SD, pop mean+1.96SD) reject if observed P not in range

12
Q

import data from file?

A

13
Q

class of Auto?

A

‘data.frame’

14
Q

structure of data?

A

str(Auto)

15
Q

A

16
Q

names of variables?

A

names(Auto)

17
Q

number of observations and variables?

A

dim(Auto)

18
Q

frequency of each observation under a origin variable?

A

table(Auto\$origin)

19
Q

recoding data for ‘origin’? (check using table(Auto\$originf)

A

Auto\$originf = factor(Auto\$origin,

labels = c(“USA”, “Europe”, “Japan”))

20
Q

create new data.frame without variable ‘origin’?

A

new_data=subset(Auto,select=c(-origin))

21
Q

identify number of rows with missing values (NA)?

A

sum(is.na(Auto))

22
Q

locate entries (which row and column) with missing values

A

which(is.na(Auto),arr.ind=TRUE)

23
Q

remove rows with missing values?

A

Auto=na.omit(Auto)

24
Q

summarising data for a variable?

A

mean(Auto\$variable)
median(Auto\$variable) –> quantile(Auto\$variable,0.5)
max(Auto\$variable),min(Auto\$variable) (minus will range)
var(Auto\$variable)
sd(Auto\$variable)

25
Q

5 number summary?

A

quantile(Auto\$variable) OR summary(Auto\$variable)

26
Q

interquartile range?

A

IQR(Auto\$variable)

27
Q

covariance n correspondance of variable?

A

attach(Auto)
cov(var1,var2)
cor(var1,var2)

28
Q

barplot of variable?

A

barplot(summary(Auto\$variable), xlab= ‘label’, ylab=’frequency’,col ‘wheat’)

29
Q

histogram of variable?

A

hist(Auto\$variable, breaks=20, xlab=’variable (#bin=20)’, ylab=’frequency’, main=’’, col=’wheat’)

30
Q

side by side graphs with 1 row n 2 columns?

A

par(mfrow=c(1,2))

31
Q

box plot of variable?

A

boxplot(Auto\$variable, col=’wheat’, main=’title’, horizontal=TRUE)

32
Q

Detect outliers based on IQR: [Q1 - 1.5IQR, Q3 + 1.5IQR]?

A

boxplot.stats(Auto\$variable)\$out

33
Q

Locate the outliers in the dataset

A

outlier= boxplot.stats(Auto\$variable)\$out
outlier_row=which(Auto\$variable)%int%c(outlier))
Auto[outlier_row, ]

34
Q

Detect outliers based on percentile: 2.5% - 97.5%

A

lower=quantile(Auto\$variable,0.025)
upper=quantile(Auto\$variable,0.975)
outlier_row=which(Auto\$variable>upper|Auto\$variable)

35
Q

scatterplot?

A

plot(Auto\$var1,Auto\$var2, xlab=’var1’, ylab=’var2’)