Preliminaries and Descriptive Statistics Flashcards

1
Q

What is a variable?

A

Something that can vary

2
Q

What are the three types of variable?

A
• Categorical: non-numerical
• Continuous: numerical but doesn’t have to be a whole number
• Discrete: numerical and a whole number
3
Q

What does a population consist of?

A

people or items that share a particular characteristic (or set of characteristics)

4
Q

What does a sample refer to?

A

to a selection of individual people or items from a population

5
Q

In statistics what are we trying to do in regard to populations and samples?

A

draw inferences about a population from a sample

6
Q

What is a population parameter?

A

a quantity that describes some characteristics of a population with respect to a specific variable (needs to be worked out with the entire population)

7
Q

What is a sample statistic?

A

A quantity that describes some characteristic of a sample with respect to a specific variable

8
Q

Why is it usually hard to calculate population parameters?

A

9
Q

Why is it important to summarise data?

A

It can be very complex and there can be lots of it

10
Q

What should a measure of central tendency provide?

A

An indication of a ‘typical’ score in the data set

11
Q

What are the three measures of central tendency?

A
• Mean
• Median
• Mode
12
Q

How do you work out the mean?

A

Add all scores together and divide by number of scores (N)

13
Q

What are the pros and cons of using the mean?

A
• Pro: provides an estimate of the average score of the data-set
• Con: is affected by extreme data points
14
Q

What is the mean?

A

The average

15
Q

What is the median?

A

Value that lies in the middle of the data

16
Q

How do you find the median?

A
• order (rank) the data
• find the score in the middle
• if working with an even set find the average of the two scores in the middle
17
Q

What are the pros and cons of using the median?

A
• Pro: Insensitive to extreme scores in the data set

- Con: Doesn’t reflect the shape of the scores

18
Q

What is the mode?

A

Indication of the ‘typical’ score in the data set.

19
Q

How do you find the mode?

A

Find the most frequently occurring value

20
Q

What’s the pro and con of using the mode?

A
• Pro: very easy to calculate from a histogram and easy to understand
• Con: data set might have more than 1 mode or no mode at all
21
Q

What is the range and what is the problem with it?

A
• Difference between the maximum and minimum scores in your data
• Range doesn’t always change for distributions with different shapes
22
Q

What is the deviation?

A

The (signed) distance of a score from the mean

23
Q

How do you calculate the average deviation?

A
1. Calculate the mean
2. Calculate the deviation of each score from the mean
3. Calculate the average deviation (add up all the deviations and divide by the number of deviations)
24
Q

Why don’t we usually use the average deviation as a measure of spread?

A

Deviations often cancel each other out

25
Q

How do you calculate the average squared deviation (we don’t usually use this)

A
1. Calculate the mean
2. Calculate the deviation of each score from the mean
3. Square the deviation (doesn’t affect ordering of the largest deviation)
4. Calculate the average squared deviation (by dividing by number of deviations)
26
Q

How do you work out the sample variance? Why don’t we usually use this?

A
1. Work out the mean
2. Calculate deviation of each score from mean
3. Square deviation
4. Calculate a slightly adjusted average squared deviation (divide by n (number of scores) – 1)
- When N is big this won’t make much of a difference
- However the units are in measure^2 which is a bit weird
27
Q

How do we calculate the standard deviation?

A
1. Calculate mean
2. Calculate deviation of each score from mean
3. Square deviation
4. Calculate sample Variance (divide by n-1)
5. Calculate standard deviation by taking square root
- This number is in original units
28
Q

Will more concentrated data have a larger or smaller standard deviation?

A

Smaller and more spread out data will have a larger SD

29
Q

What are the four different types of data plot?

A
• histogram
• box plot
• scatter plot
• data summary
30
Q

Why is using a histogram useful?

A
• easy to spot mode
• easy to see outlying data
• easy to see range and shape of data
31
Q

Why is using a box plot useful?

A
• easy to identify median
• easy to see lower and upper hinge
• easy to see hinge spread
• easy to see adjacent values (lowest and highest values falling within the inner fence
32
Q

What is a scatter plot?

A

A correlational research design

33
Q

Where’s a data summary graph often found?

A

in data research where you have manipulated variable and see what effect it has on the DV - the standard deviation is plotted

34
Q

What’s the difference between a numerical and categorical data summary graph?

A
• numerical data summary graph has a line connecting the data points
• categorical usually uses bars