03. Review of Basic Data Analytic Methods Using R Flashcards

1
Q

What is a factor

A

A factor is vector of values that are limited to a fixed set of values (categories). Factors always have levels (the things it allows to be put in - like it had a data validation drop down list). When you first create the factor it will assume the different options are the initial options, after that it will check against this set of levels.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a list

A

A list is a vector of values which can contain different data types

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is an array

A

An array can only contain the same type of data. Arrays can multi-dimensional. i.e. rows and columns and sheets and workbooks etc

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a dataframe

A

A dataframe is a table of vectors or factors; all items of the same length. Individual columns are the same data type, but different columns can be different data types.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a record

A

A single cell

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a pairs plot

A

It plots every variable against every other variable. Also known as a Splom or a ScatterPlot

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What do T-Tests use

A

Samples of the population (not the full population)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are T-Tests use

A

Samples of the population (not the full population) are tested to compare against a NULL hypothesis (i.e. checking if there is a statistical significance)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What can an AVOVA test be used for

A

AVOVA are used in hypothesis testing when you have more than two sample populations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is hypothesis testing

A

where you are picking between the null and the alternative hypothesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is statistical power

A

Statistical Power is a measure of how well that test compares against the null

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a parametric distribution

A

The data follows a normal distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How is the standard deviation calculated

A

The standard deviation is the square root of the deviance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

The T-Test

A

Parametric. Can be one test and two test

A “students t-test” is another name for the two sample t test.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Welch’s test

A

Parametric. But can cope with different standard deviations (hence automatically two sample)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Wilcoxon Rank Sum Test

A

Non-Parametric. It tests is two populations of numbers are equally distributed.

17
Q

AVOVA

A

We use ANOVA tests to perform multiple comparisons across more than two populations of data.
Example: You have an online shop that gives two offers or none at all. We want to find out whether the offers are affecting the number of purchases.
The point of the ANOVA is to determine whether the variance in the dataset is due to the spread of values within each group or because of the spread of values between groups.

18
Q

What is the p value

A

The P-value is the likelihood that the NULL hypothesis is true. So a low p value means the NULL is very unlikely so happy days.

19
Q

What is a vector

A

A vector can only consist of one class and represents a single column

20
Q

What is a matrix

A

A two dimensional array

21
Q

If we reject the NULL hypothesis but it is actually true what type of error is this

A

Type 1 error

22
Q

How does the AVOVA test work

A

It’s like clustering and looks at the between groups mean sum of squares (between group variance) and the within group mean sum of squares (within group variance)