WEEK 1 Flashcards

(38 cards)

1
Q

2 types of Data

A

Categorical and Scalar

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Categorical has 2, which are?

A

Nominal and Ordinal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Scalar has 2, which are? WHAT IS WORKS WELL WITH?

A

Continuous and Discrete

It works well with median, range interquartile range (IQR)

Doesn’t work well with mode unless you group the data and the frequency table would have too many different values for the continuous data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is Nominal?

A

Data that does not have a numerical value and can only be placed in a suitable category like gender and yes and no questions, they give a label such as College or Breakfast in the example

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is Ordinal?

A

Ordinal is data that can be arranged in some meaningful order such as confidence with numbers (agree, dissagree, etc). The data includes the idea of order because it is categorical and the bar chart is generally the best to use.

They assume that all the distances between the confidence with a number (disagree and strongly disagree) but if it weight we can measure the distance

(Categorical variables with 2 different categories are called Dichotomous)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is Continous?

A

Measured on a scale such as temperature or weight

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is Discrete?

A

Data that takes on whole values, usually obtained by counting e.g, the number of defective items

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is mode? IN bar chart too

A

The most frequent score in our data set, the bar chart shows the tallest one is the mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is spread? In the bar chart too

A

How many different categories do we have, in the bar chart shown below

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the median number calculation?

A

159 +1 then divided by 2 to find the median point of the data, firstly order the data, even if it’s an even number

Only accurate if there is an even number of data points, having discrete or continuous data helps you to find the 2 observations add them and divide by 2

if categorical need to be lucky to find it

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Scalar is?

A

It has height, weight and the guessing variable, it adds the idea of distance not just order.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the Interquartile range?

A

Describethe s the middle of 50% of values when ordered from lowest to highest

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How to find IQR?

A

Find the median (middle value) of the lower and upper half of the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is Range?

A

The highest value (Maximum) - the lowest value (Minimum)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

IQR CALCULATION

A

n+1th divided by 4 is the LOWER QUARTILE (LQ) and the UPPER QUARTILE (UQ) is the same but times by 3 x n+1th divided by 4 in front, then IQR = UQ - LQ

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is five number summary?

A

It’s a set of descriptive statistics that provides information about a dataset e.g, BOX PLOT SHOULD BE ALL EQUAL DISTANCE

1) the sample of minimum (smallest observation)
2) the lower quartile or first quartile TOTAL + 1 DIVIDE BY 4
3) the median (the middle value) TOTAL + 1 DIVIDE BY 2
4) the upper quartile or third quartile (TOTAL + 1 DIVIDE BY 4) X BY 3
5) the sample maximum (larger observation)

17
Q

AVERAGE - SCALAR DATA the mean of the data, how to calculate?

A

ADD all values and divide by total number of observation to find the mean

18
Q

THE IDEA OF DISTANCE, wha it is?

A

Standard deviation is the distance of the data from the mean, it does not matter if the value of the observation is above or below the mean because of the squaring (distance matters)

The variance is the standard deviation squared, variance is sometimes more useful than the standard deviation

19
Q

STANDARD DEVIATION CALCULATION

A

So first calculate the mean by adding al and dividing by the n (number of observations)

x = muna reflects each individual height we pick the first individual height and minus the average (the mean)

And take the square plus we go to the second individual we subtract the mean from the height and square it

Calculate the square differences and add them up (1;40 min in the first lecture)

We divide by N at the end = gives us variance in the inner part

We take the square root gives us the spread standard deviation (sigma a greek letter)

20
Q

What is trimmed mean?

A

Means there might be very high or small heigh so it’s cutting the lowest 5 % of lower data and get rid o low values (outliners) and 5% of high values if there are any to make it more accurate answer of mean

21
Q

Standard deviation?

A

1 find mean the u word by adding all numbers and dividing by how many there are

2 the numbers given are then minus the mean of all

3 we then get the inside bracket and we do the “2

4 The E PART add all and divide by ( n ) how many points

5 then we square root it to get standard deviation

22
Q

What is the best measure of spread?

A

Variance (before square rooting the standard deviation)

23
Q

Standardisation

A

z = x - mean divided by the standard deviation

x is the number we want to standardise

to get same unit when there are different units

24
Q

When comparing spread of 2 or more distributions we should?

A

compare the coefficients of variations for each as these take into account differences in the means

25
coefficient of variations
CV = standard deviation (sigma) divided by the mean if the dispersion around the mean is large there is more uncertain and low accuracy of data can be positive and negative
26
what are 2 relative measures?
coefficient of variation and idea of standardising data
27
Different statical measures
all of them are absolute measures
28
IQR uses the middle 50 %| and is
less influenced by extreme values
29
WHAT ARE 3 MEASURES OF AVERAGE (CENTRALITY)
MEAN MEDIAN AND MODE
30
median if there's sales and frequency table
n = the data points = all frequencies added then we do n +1 divide by 2 we get the 24th (for example) then we count from frequencies which one is 24th order data can be written down like 0 x 5, 1 x 16, 2 x 12 etc to find the 24th data point
31
five number summary TIPS
the LQ = WE FIND THE DATA POINT the that's the answer UQ =we find the data point and its the answer DONT SUBTRACT THEM
32
inter quartile range (IQR)
order data in small to large FIND THE UQ -LQ BY DOING THE FORMULA n+ 1 divide by 4 and then times by 3 for UQTO find which data point is the number then find for each of them the number that corresponds to the TH e.g. (9th) - (3th) number then minus the actual number from the data set and u got IQR
33
If all datapoints all decrease by 7, the IQR decreases by 7 true of false?
FALSE = BECAUSE IQR is a measure of spread not centrality doesn't change as the dataset moves
34
IQR is not affected by outliners why?
Because it measures the UQ-LQ so it doesnt affect the data point
35
what are outliers?
they are points that are far away from other data points
36
What is a boxplot?
it demonstrates skew in the data if all sides equal = no skew as there is balance skew = no equal sides or median not in the middle of data , distribution is more concerted in left or right side its basically 5 number summary
37
MEAN IF there's 1,2,3,4,
sum all then divide by 4 (the number of data points)
38
Mean if there is x and frequency table
find mean by doing the x number times by frequency for each of them then add all then divide by (n) the data points n = add all of the frequencies together