# Statistics Flashcards

1
Q

Descriptive Statistics?

A

Descriptive statistics is what we can say about a sample by observing the sample itself. This is somewhat limited and mostly consists of summarisations of the data, e.g. like aggregates on a column in a database table.

2
Q

Inferential Statistics

A

Inferential statistics is what we can say about a population based on what we know about a sample. That means that we can infer (deduce or conclude from evidence rather than from explicit statements) about the population based on a smaller sample.

3
Q

In statistics what is ‘Probability’?

A

Probability is what we can generally say about samples from a population.

So if we know 10 % of the population are left handed, we can expect 10 % of a sample randomly taken to be left handed.

4
Q

In Probability Theory:

What does the experiment yield?

A

One possible outcome of a a sample space.

The sample space for tossing a coin is {head, tail}

5
Q

In Probability Theory

What is a ‘Sample Space S’

A

A set of possible outcomes of an experiment.

The sample space for tossing a coin is {head, tail}

6
Q

In Probability Theory

What is a ‘Event E’

A

An event is a possible outcome of an experiment, e.g. the event head when we toss a coin.

7
Q

In Probability Theory

What is a ‘Probability of Outcome P(s)’

A

The probability of an outcome is always greater than 0 and less than 1, and the sum of the probability of all possible outcomes is 1, .

8
Q

Descriptive Statistics

In Descriptive Statistics Which are the two different areas

A

Centrality and variability

Centrality: mean, median, mode

9
Q

Descriptive Statistics

What is the Mean, or average and what kind of data is it most useful for?

A

The mean / average is the sum of a value divided with the number of values.

Most useful with homogeneous data - variables of one type. categorical or binary.

10
Q

In Descriptive Statistics

What is the Median

What is the median in an evenly numbered data set?

A

The exact middle value of the data set.

If n is even, the median is the mean value of the two middle elements

11
Q

In Descriptive Statistics

What is the Mode

A

The mode is the most frequent element.

1 , 1, 2, 3, 4 = mode = 1

12
Q

Standard Deviation

A

Measure of the amount of variation on a set of values.

Low standard deviation indicates that the values are closer to the mean - the distribution is less wide

A high standard deviation indicates that the values are spread out on a wider range

13
Q

In Descriptive Statistics

Is Standard Deviation describing variability or centrality

A

Variability : Dispersion of the data

Centrality: centrality measures determine the relative significance of a node in a social network

14
Q

What is Correlation Analysis concerned with

A

Correlation analysis is concerned with relations between variables, e.g. if one goes up, what happens to the other?

15
Q

What is a Correlation Coefficient

A

A correlation coefficient is statistic measure of the degree that one variable Y is a function of another variable X.

16
Q

What does a correlation coefficient range between. and what do they mean

A

The correlation coefficient value ranges from -1 to 1, where 1 indicates perfect correlation, 0 indicates no correlation, and -1 indicates perfect negative correlation.

17
Q

Does correlation imply causation?

A

No

18
Q

Inferential Statistics

A

Used to infer about the population based on our knowledge about a sample.

19
Q

Null-Hypothesis

A

In inferential statistics, the null hypothesis is a general statement or default position that there is no relationship between two measured phenomena

An example:

Hypothesis: drinking large amounts of alcohol makes you fall over.

Null-Hypothesis: people will fall over the same amount whether they drink alcohol or not.

20
Q

What is the approach most often taken in regards to Null-Hypothesis and Hypothesis

A

We usually take the approach of rejecting the null-hypothesis; saying that the idea that there is no correlation is unlikely

• rather than confirming our hypothesis.
21
Q

What does a 95% confidence interval mean?

A

The confidence interval is the limits within which a certain percentage (say 95% or 99%) of sample means will fall.

Given observations x1-xn and 95 % confidence level, there is 95 % probability of that the mean of a sample will fall in this interval

22
Q

Significance

A

a result has statistical significance when it is very unlikely to have occurred given the null hypothesis.

23
Q

In hypothesis testing, we can make Type 1 and Type 2 errors

what is a type 1 error

A

Falsely rejecting the null-hypothesis - false positive. “You are pregnant” when he is not

24
Q

In hypothesis testing, we can make Type 1 and Type 2 errors

what is a type 2 error

A

Falsely accepting the null-hypothesis - false negative. “You are not pregnant” when she is

25
Q

What is a dependent variable

A

A variable (most often denoted Y) whose value depends on that of another variable. In an experiment it is a variable that we are not trying to manipulate.

26
Q

Independent Variable

A

A variable (often denoted X) whose variation does not depend on that of another variable. In an experiment it is the variable that we are trying to manipulate.

27
Q

In a correlation study which of the two would you apply to parametric data

Pearson’s r

Spearman’s rho

A

Pearson’s r

28
Q

In a correlation study which of the two would you apply to non-parametric data

Pearson’s r

Spearman’s rho

A

Spearman’s rho