Week 1- Introduction and Descriptive Statistics Flashcards

1
Q

Name three types of Central Tendency (averages)?

A

Mean
Median
Mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Name two ways to measure the spread of a data set

A

Standard Deviation
Range

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are categorical variables? Give an example

A

Represents data that may be divided into groups

Age, Gender

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Which type of central tendency can you use for categorical variables? Give an example

A

The mode

For example, if you have the following eye colours: {brown, brown, blue, green, blue, brown}, the modal class is brown. You can’t (obviously) use mean or median here.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How would you compare wage inequality between two countries (using descriptive statistics, for now)?

A

You can check the standard deviation

If two countries have the same average salary (more or less), but one has a sd of 15k, and the other of 50k, the latter has higher wage inequality.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Name three stages of data analytics

A

1- Descriptive analytics
2- Predictive analytics
3- Prescriptive analytics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is descriptive analytics?

A

Basically inferential (generalisation) statistics. Analysing historical data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is predictive analytics?

A

Building mathematical, computational, and statistical models to make predictions using existing data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is prescriptive analytics?

A

Building data-driven solutions to control, or change the outcome of an event

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the 4 types of sampling techniques?

A

1- Simple Random
2- Stratified
3- Clustered
4- Systematic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is a simple random sample?

A

A random sample from the population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a stratified sample?

A

Allows for control of group sizes by sampling based on said groups (e.g., sex, profession, etc.).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is a clustered sample?

A

Usually based on geography and proximity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is a systematic sample?

A

Taking every kth member.

E.g, To find k, divide 836 by 20 to get 41.8.

Rounding gives k = 42.

Randomly select a number from 1 to 42, say 18.

Start at the person numbered 18 and then choose every 42nd member of the list.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the drawback of Simple random sampling?

A

Risks overrepresentation of certain groups, unequal group sizes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the drawback of stratified sampling?

A

May lead to biased representation of otherwise smaller groups by overrepresenting their importance.

17
Q

What is an example of clustered sampling?

A

Sampling from a local hospital, rather than all hospitals.

18
Q

What is the difference between a parameter and a statistic?

A

Parameters are summaries of population data.

Statistics are summaries of sample data. In many places there are used interchangeably (e.g., in data science).

19
Q

Name two types of qualitative data and describe them

A

Categorical- No particular order to them. E.g., countries, eye colours, sex, etc.

Ordinal- An order has been applied. E.g., countries by population size, products by amount sold, etc.

20
Q

Name two types of quantitative data and describe them

A

Discrete- Whole numbers (integers). E.g., number of emails sent.

Continuous- Real numbers, such as temperature, weight, height, etc.