data analysis and descriptive tendencies Flashcards

1
Q

what is a population?

A
  • complete set of objects
  • group containing elements of anything you want to study
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what is a sample?

A
  • subset of a given population
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

does the sample have to be people?

A
  • no, can be cells, products, SMS messages
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

why do you take a sample?

A
  • cannot test every individual so take a sample and infer about population causing error
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what should the sample represent? what should be considered?

A
  • represents the population
  • careful considerations of sub- categories required to ensure that the sample reliably represents the population
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what shouldn’t be done to the samples after determined?

A
  • sample shouldn’t be modified or subdivided after determined for the sake of deriving a better conclusion
    ‘ cherry picking’
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is a variable?

A
  • set of related events that can take on more than one value
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

can a variable be changed? give examples

A
  • something that can be changed
    e.g., characteristic or value like weight, exam mark, academic degree, hometown
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what is statistical inference?

A
  • involves figuring out how well a property of one variable can be predicted by that of another variable
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what is an independent variable?

A
  • value being changed or manipulated
  • controlled or selected to determine its relationship on an observed outcome
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what is a dependent variable?

A
  • observed result of the IV being manipulated
  • it is something that may depend on the IV
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what does research aim to do with the variables?

A
  • attempt made to find evidence that DV is dependent to IV
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what do independent variables consist of?

A
  • different categories called levels, conditions or treatments
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

how are levels of independent variable different from number?

A
  • because there is multiple independent variable but you only belong to one level
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what is a control variable?

A
  • kept constant to prevent them influencing the effect of IV on DV
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what are control variables critical for?

A
  • critical for study design e.g., recruitment criteria for participants
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

what are the different types of data?

A
  • categorical
  • ordered
  • continuous
  • measured
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

what are nominal and ordinal variables?

A
  • qualitative and categorical
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

what are interval and ratio variables?

A
  • quantitative and continuous
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

what is nominal data?

A
  • categorical
  • cannot be ordered/ counted
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

what are examples of nominal data?

A
  • gender
  • country
  • occupation
  • blood type
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

what is ordinal data?

A
  • can be ordered but cannot be added or subtracted
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

what are examples of ordinal data?

A
  • satisfaction rating
  • education level
  • spice level
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

what is interval data?

A
  • can be ordered
  • difference can be measured but cannot compute a ratio between two values
  • no meaningful zero exists
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
what are examples of interval data?
- exam mark - date - year
26
what is ratio data?
- interval and can take a ratio between two - has meaningful zero
27
what are examples of ratio data?
- distance - height - annual income - number of success
28
how do you distinguish between interval and ratio?
- can it be doubled? yes= ratio; no= interval
29
what are the four main descriptive tendencies?
- central tendency - spread - shape - outliers
30
what are the three central tendencies?
- mode - median - mean
31
what is the mode and what variable/ data is it used for?
- highest value - can be used for all types of variables - often used for nominal and ordinal variables
32
what is the median and what variable/ data is it used for?
- middle value - cannot be obtained for nominal variables - obtained only on ordered variables e.g., ordinal, interval, ratio
33
what is the mean and what variable/ data is it used for?
- average - distances (1st moment) are balanced - only defined in interval and ratio variables
34
what two of the central tendencies are normally similar?
- mean and median are similar
35
how does an outlier effect central tendencies?
- hugely affects the mean value but doesn't affect the median
36
what are the three types of data found for spread?
- quantile/ quartile/ percentile - variance and standard deviation - Z score
37
how do you find out the quantile, quartile and percentiles?
- divide data into sections containing the same number of data and report where the sections are located
38
what is a quantile? where do we plot this data?
- sample is divided into equal sized subgroups - for N sections = N-1 values - plotted onto a scatterplot
39
what is a quartile? what is the median?
- 1st to 3rd - when there are four sections in total - median= 2
40
what is percentiles? what is the median?
- 1st to 99th - when there are 100 sections - median is 50
41
how do you calculate the 2nd moment ?
variance =(distance from mean)2 to each data point / number of data points
42
what is the square root of variance called and what is it?
- called standard deviation - standard distance from mean
43
what does mean + / - SD provide information on?
- where the centre is - how spread the data points are
44
given SD, how can distance be described? what is this called and what does it enable?
- distance can be described as a ratio with respect to SD - known as Z - score - enables fair comparison of deviations
45
what are the two main types of shapes?
- skewness - kurtosis
46
what does skewness measure and correspond to?
- measures degree of asymmetry - corresponds to 3rd moment
47
how do you calculate the 3rd moment? what do you divide it by and why?
3rd moment = distance from mean^3 to each data point/ number of data points - divide by SD^3 to make it dimensionless
48
what does zero skewness mean?
- data are symmetrically distributed
49
what does high skewness mean?
- distribution is highly asymmetrical
50
what does positive/ negative skewness mean?
- indicates which direction data are skewed
51
what does kurtosis measure? what does it correspond to?
- measures the sharpness/ thinness - corresponds to the 4th moment
52
how do you work out 4th moment? what do you divide it by and why?
4th moment = distance from mean^4 to each data point/ number of data points - divide by SD^4 to make it dimensionless
53
what is kurtosis always by definition? what do we subtract?
- always positive - subtract 3 (kurtosis of ' normal distribution)
54
what are outliers?
- extreme values relative to bulk of values in a data set
55
what are outliers due to?
- inaccuracies in data processing - problems with methodology e.g., measures, instruments, participants not following instructions - actual extreme value from an unusual participant
56
what are the two ways you can detect outliers?
- based on z- score - based on inter-quartile range
57
how does Z- score detect outliers?
- outlier if z-score is more than 3 or less than 3 - when the distance from mean is more than 3x of SD
58
how does inter- quartile range detect outliers?
- width between 1st and 3rd quartile - outlier if value is greater than 1.5 IQR above 3rd quartile or smaller than 1.5 IQR below 2nd
59
what samples do outliers distort data?
- in small samples
60
describe a histogram- what does height represent?
- visualises how data is distributed - height represents frequency (how often a value appears in data)
61
describe a box plot
- plot summarising quartile- based stats of a data set, includes; - location of quartiles - range of data excluding outliers - outliers detected by quartiles