Initial analysis of the data Flashcards

1
Q

What are all R commands?

A

Functions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

how do you get data into R

A

Read.table()

read.csv()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

how do you get data out of R

A

write.csv()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what is nominal data?

A

names of things

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what is ordinal data?

A

ordered names

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what is interval data?

A

numeric with no true zero (Celsius)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is ratio data?

A

numeric with true zero (kelvin)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

which 2 data classifications are categorical or discreet?

A

nominal and ordinal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

which 2 data classifications are continuous variables?

A

interval and ratio

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what is a number?

A

can have decimals

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what is a integer?

A

whole number

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what is a character?

A

not a number

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what is a vector?

A

set of values of the same data (combine function c() )

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what is a list?

A

collection of different vectors or other data structures

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what is a factor?

A

categorical variable

fixed set of values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what are arrays?

A

n-dimensional homogeneous data types

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

what are matrices?

A

2D and numeric

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

what is a data frame?

A

a list but all component vectors are same length

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

what is the R code for viewing the data?

A

head()

tail()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

what is the r code for viewing a summary of the data?

21
Q

what is the r code for computing basic statistics?

A

sd()
var()
range()
IQR()

22
Q

What is the r code for the correlation?

23
Q

what does visualisation give you?

A

more holistic picture of the data

24
Q

what are summary statistics?

A

mean vs median
standard dev
quartiles
correlations

25
what is Anscombe's Quartet?
4 sets of data based on standard statistics
26
what does hist() mean in R?
Plot a histogram
27
what do missing values suggest?
dirty data
28
what is the best first visualisation of 2 variables?
scatter plots
29
what is a box and whisker plot?
a plot that shows the centre box of the data (50%)
30
why use a pairwise plot?
visually represent data relationships | examines relationship quickly
31
What does time series analysis have to have?
the same time period
32
what is the null hypothesis
no difference
33
what is the alternate hypothesis
there is a difference
34
what is the difference of means?
the overlap of 2 data sets
35
what is the p value?
the area under the tails of curve
36
if the p value is less than 0.5 what do you do?
reject the null hypothesis
37
student t-test assumes both populations are - a) normally distributed b) not normally distributed
a)normally distributed
38
what do you use if the data is not normally distributed?
wilcoxon rank sum test
39
what are the steps in hypothesis testing?(3)
calculate test statistic calculate p value if p value less than 0.5 then reject
40
what is a type 1 error (false positive)
reject null hypothesis and the null hypothesis is true
41
what is a type 2 error (false negative)
accept null hypothesis and null hypothesis is false
42
what is significance?
the probability of a false positive
43
what is power?
the probability of a true positive
44
what is effect size?
the actual magnitude of the result
45
what does ANOVA stand for?
analysis of variance
46
what is ANOVA?
Generalisation of the difference of means
47
what percentage of confidence interval do most people use?
95%
48
would you visualise before or after model building?
before