WEEK 1 Flashcards by Kristine Zakoyan

2 types of Data

Categorical and Scalar

How well did you know this?

Not at all

Perfectly

Categorical has 2, which are?

Nominal and Ordinal

How well did you know this?

Not at all

Perfectly

Scalar has 2, which are? WHAT IS WORKS WELL WITH?

Continuous and Discrete

It works well with median, range interquartile range (IQR)

Doesn’t work well with mode unless you group the data and the frequency table would have too many different values for the continuous data

How well did you know this?

Not at all

Perfectly

What is Nominal?

Data that does not have a numerical value and can only be placed in a suitable category like gender and yes and no questions, they give a label such as College or Breakfast in the example

How well did you know this?

Not at all

Perfectly

What is Ordinal?

Ordinal is data that can be arranged in some meaningful order such as confidence with numbers (agree, dissagree, etc). The data includes the idea of order because it is categorical and the bar chart is generally the best to use.

They assume that all the distances between the confidence with a number (disagree and strongly disagree) but if it weight we can measure the distance

(Categorical variables with 2 different categories are called Dichotomous)

How well did you know this?

Not at all

Perfectly

What is Continous?

Measured on a scale such as temperature or weight

How well did you know this?

Not at all

Perfectly

What is Discrete?

Data that takes on whole values, usually obtained by counting e.g, the number of defective items

How well did you know this?

Not at all

Perfectly

What is mode? IN bar chart too

The most frequent score in our data set, the bar chart shows the tallest one is the mode

How well did you know this?

Not at all

Perfectly

What is spread? In the bar chart too

How many different categories do we have, in the bar chart shown below

How well did you know this?

Not at all

Perfectly

What is the median number calculation?

159 +1 then divided by 2 to find the median point of the data, firstly order the data, even if it’s an even number

Only accurate if there is an even number of data points, having discrete or continuous data helps you to find the 2 observations add them and divide by 2

if categorical need to be lucky to find it

How well did you know this?

Not at all

Perfectly

Scalar is?

It has height, weight and the guessing variable, it adds the idea of distance not just order.

How well did you know this?

Not at all

Perfectly

What is the Interquartile range?

Describethe s the middle of 50% of values when ordered from lowest to highest

How well did you know this?

Not at all

Perfectly

How to find IQR?

Find the median (middle value) of the lower and upper half of the data

How well did you know this?

Not at all

Perfectly

What is Range?

The highest value (Maximum) - the lowest value (Minimum)

How well did you know this?

Not at all

Perfectly

IQR CALCULATION

n+1th divided by 4 is the LOWER QUARTILE (LQ) and the UPPER QUARTILE (UQ) is the same but times by 3 x n+1th divided by 4 in front, then IQR = UQ - LQ

How well did you know this?

Not at all

Perfectly

What is five number summary?

Study These Flashcards

It’s a set of descriptive statistics that provides information about a dataset e.g, BOX PLOT SHOULD BE ALL EQUAL DISTANCE

1) the sample of minimum (smallest observation)
2) the lower quartile or first quartile TOTAL + 1 DIVIDE BY 4
3) the median (the middle value) TOTAL + 1 DIVIDE BY 2
4) the upper quartile or third quartile (TOTAL + 1 DIVIDE BY 4) X BY 3
5) the sample maximum (larger observation)

AVERAGE - SCALAR DATA the mean of the data, how to calculate?

Study These Flashcards

ADD all values and divide by total number of observation to find the mean

THE IDEA OF DISTANCE, wha it is?

Study These Flashcards

Standard deviation is the distance of the data from the mean, it does not matter if the value of the observation is above or below the mean because of the squaring (distance matters)

The variance is the standard deviation squared, variance is sometimes more useful than the standard deviation

STANDARD DEVIATION CALCULATION

Study These Flashcards

So first calculate the mean by adding al and dividing by the n (number of observations)

x = muna reflects each individual height we pick the first individual height and minus the average (the mean)

And take the square plus we go to the second individual we subtract the mean from the height and square it

Calculate the square differences and add them up (1;40 min in the first lecture)

We divide by N at the end = gives us variance in the inner part

We take the square root gives us the spread standard deviation (sigma a greek letter)

What is trimmed mean?

Study These Flashcards

Means there might be very high or small heigh so it’s cutting the lowest 5 % of lower data and get rid o low values (outliners) and 5% of high values if there are any to make it more accurate answer of mean

Standard deviation?

Study These Flashcards

1 find mean the u word by adding all numbers and dividing by how many there are

2 the numbers given are then minus the mean of all

3 we then get the inside bracket and we do the “2

4 The E PART add all and divide by ( n ) how many points

5 then we square root it to get standard deviation

What is the best measure of spread?

Study These Flashcards

Variance (before square rooting the standard deviation)

Standardisation

Study These Flashcards

z = x - mean divided by the standard deviation

x is the number we want to standardise

to get same unit when there are different units

When comparing spread of 2 or more distributions we should?

Study These Flashcards

compare the coefficients of variations for each as these take into account differences in the means

coefficient of variations

CV = standard deviation (sigma) divided by the mean if the dispersion around the mean is large there is more uncertain and low accuracy of data can be positive and negative

what are 2 relative measures?

coefficient of variation and idea of standardising data

Different statical measures

all of them are absolute measures

IQR uses the middle 50 %| and is

less influenced by extreme values

WHAT ARE 3 MEASURES OF AVERAGE (CENTRALITY)

MEAN MEDIAN AND MODE

median if there's sales and frequency table

n = the data points = all frequencies added then we do n +1 divide by 2 we get the 24th (for example) then we count from frequencies which one is 24th order data can be written down like 0 x 5, 1 x 16, 2 x 12 etc to find the 24th data point

five number summary TIPS

the LQ = WE FIND THE DATA POINT the that's the answer UQ =we find the data point and its the answer DONT SUBTRACT THEM

inter quartile range (IQR)

order data in small to large FIND THE UQ -LQ BY DOING THE FORMULA n+ 1 divide by 4 and then times by 3 for UQTO find which data point is the number then find for each of them the number that corresponds to the TH e.g. (9th) - (3th) number then minus the actual number from the data set and u got IQR

If all datapoints all decrease by 7, the IQR decreases by 7 true of false?

FALSE = BECAUSE IQR is a measure of spread not centrality doesn't change as the dataset moves

IQR is not affected by outliners why?

Because it measures the UQ-LQ so it doesnt affect the data point

what are outliers?

they are points that are far away from other data points

What is a boxplot?

it demonstrates skew in the data if all sides equal = no skew as there is balance skew = no equal sides or median not in the middle of data , distribution is more concerted in left or right side its basically 5 number summary

MEAN IF there's 1,2,3,4,

sum all then divide by 4 (the number of data points)

Mean if there is x and frequency table

find mean by doing the x number times by frequency for each of them then add all then divide by (n) the data points n = add all of the frequencies together

WEEK 1 Flashcards

(38 cards)