Descriptive Statistics Fundamentals Flashcards

1
Q

What is Descriptive Statistics?

A

Descriptive statistics refers to a set of methods used to summarize and describe the main features of a dataset, such as its central tendency, variability, and distribution.

These methods provide an overview of the data and help identify patterns and relationships.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Do you want to learn the appropriate statistics to perform different test?

A

yes - do you know them?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the 2 main ways to classify data?

A
  1. Types of data
  2. Measurement levels
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the 2 ‘Types of Data’ that you can have?

A
  1. Categorical
  2. Numerical
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is an example of Categorical Data?

A

A. Car brands like Audi, BMW, Mercedes, etc
B. Answers to Yes and No questions
Example - “Are you currently enrolled in a university?” “Do you own a car?”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is an example of Numerical Data?

A

Numerical Data represents numbers. It has two subsets Discrete & Continuous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Numerical Data is a subset of Types of Data or Levels of Measurement?

A

Types of Data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Types of Data - view

A
  1. Types of Data 2. Levels of Measurement
    a. Categorical b. Numerical
    i. Discrete ii. Continuous
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the ‘3 Types of Data’?

A
  1. Categorical
  2. Numerical - Discrete
  3. Numerical - Continuous
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the two subsets of Numerical Data?

A
  1. Discrete
  2. Continuous
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is Discrete Data?

A

Something that can be counted in a finite manner. (Absolutely sure the value will be an integer) (it is the opposite of continuous data)
Examples:
“How many children do you want?”
Scores on the SAT
Grades at university
Number of objects
Money as bank notes and coins

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is Continuous data?

A

Continuous Data is ‘infinite’ and impossible to count. (It can take on an infinite amount of value)
Examples:
Your weight
Height
Area
Distance
Time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

A variable represents the weight of a person. What type of data does it represent?

A

numerical, continuous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

A variable represents the gender of a person, What type of data does it represent?

A

Categorical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the 2 “Levels of Measurement”?

A
  1. Qualitative
  2. Quantitative - represented by numbers
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the two types of Qualitative Data?

A
  1. Nominal
  2. Ordinal
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What are examples of Nominal Data?

A

Categorical data like car brands or like the four seasons (winter, spring, summer, fall)
They are not numbers and cannot be ordered
Definition: (of a role or status) existing in name only.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What are examples of Ordinal Data

A

Groups and categories that follow a strict order. Data that can be ordered.
Examples:
Likert Scale
Definition: relating to a thing’s position in a series.
“ordinal position of birth”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What are the two groups of Quantitative Data?

A
  1. Interval
  2. Ratio
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is unique about Ratio?

A

They have a true 0, and intervals don’t
Most things we observe in the world are ratio’s

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What are examples of Ratio’s?

A

Number of objects, distance, price and time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is the most common Interval variable?

A

Temperature - it doesn’t have a true zero
Celsius and Fahrenheit are Intervals and have no true zero

Temperature in K is a ratio and has a true zero

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

A variable represents the gender of a person. What type of data and level of measurement does it represent?

A

Categorical, Qualitative- Nominal
Gender is a nominal variable. The possible categories cannot be put in any order.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

A variable represents the weight of a person. What type of data does it represent?

A

Continuous, Quantitative - Ratio
Weight is a ratio variable, which means it is a quantitative measure that has a true zero point, signifying the absence of the attribute being measured. In the case of weight, zero signifies a complete lack of weight.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
What is the most intuitive way to interpret data?
Visualization
26
What are some useful ways to visualize categorical variables?
a. Frequency distribution tables b. Bar Charts c. Pie Charts d. Pareto diagrams
27
What is a Frequency Distribution Table?
A table that has two columns. The type and the corresponding frequency. frequency - the number of occurrences of each item
28
What is Relative Frequency?
Relative frequency is the percentage of the total frequency for each category Example: The percentage of cars sold All relative frequencies add up to 100% Reveals the share of the total ie. Market Share - a good representation is a pie chart
29
What is a Pareto Diagram?
A Pareto diagram is a special type of bar chart, where categories are shown in descending order of frequency
30
What does Frequency represent?
the number of occurrences of each item
31
What is Cumulative Frequence?
Cumulative Frequency is the sum of relative frequencies It starts as the frequency of the first item and then adds the second item and so on until it finishes at 100%
32
How do you calculate Desired Intervals?
Largest number minus smallest number divided by number of desired intervals largest number - smallest / number of desired intervals
33
Desire Interval Width?
5-20
34
If the frequency of a variable is 20 and its total frequency of all variables is 120, what is its relative frequency?
.17
35
What is the most common graph to represent Numerical Data?
The Histogram
36
Why do the bars in a histogram touch?
to show continuation between the intervals. Each interval ends where the next one starts.
37
True or False - Relative Frequency is made up of percentages?
True
38
Can histogram's have unequal widths?
Yes
39
What are two visualization options to represent relationships between two variables?
1. Cross tables 2. Scatter Plots
40
What are cross tables? What do they best represent?
A table where you calculate each row and column. They best represent relationships between two categorical variables. It best represents Categorical data A variation is the side by side bar chart
41
What is a scatter plot best used for?
A scatter plot is used when representing two 'numerical' variables -representing relationships between two variables - best used to get the main idea on how the data is distributed
42
What is a definition of an 'Outlier'?
Outliers are data points that go against the logic of the whole dataset
43
What are the 3 measures of Central Tendency?
Mean, Median, Mode
44
What are some uses of Central Tendency?
They give you an idea of how the data in a given dataset is distributed. The mean is the arithmetic average of all numbers. It is very useful because it indicates the average value in the dataset. However, the mean can be flawed because outliers might impact it significantly. The median is a value at the 50th percentile of the distribution.It disregards outliers and shows you what is in the middle of the distribution. The mode is the value that is observed most frequently in the distribution. This gives you an idea about the value that reoccurs most often in the dataset.
45
What is Mean also known as?
The simple average Denoted as mu (µ) for Population and x-bar (x̄) in Sample
46
What is the Median?
The middle number in the dataset
47
What is the Mode?
The mode is the value that occurs most often. It can be used in both categorical and numerical data
48
When calculating Mode in a dataset what happens when no number is represented more than once?
We say, there is NO mode
49
Which Central Tendency measure is best?
The measures should be used together rather than independently. There is no best, but using only one is definitely the worst.
50
What is Skewness?
Skewness is the most common way to measure asymmetry. Skewness indicates whether the data is concentrated on one side
51
What is a Positive or Right Skew?
When the mean > median. Data points are concentrated on the Left side (outliers are to the Right. Less data to the Right)
52
What kind of Skew happens when the Mean, Median and Mode are equal?
Zero or No Skew the distribution is cymetrical
53
What is a Negative or Left Skew?
When the Mean < Median The highest point is defined by the mode. The outliers are to the left
54
Why is Skew important?
Skew tells us where the data is situated. The link between Central Tendency and Probability Theory
55
What are the 3 main measures of Variability?
1. Variance 2. Standard Deviation 3. Coefficient of Variation
56
Do you use the same formulas when working with Population Data vs Sample Data?
No - different formulas are used
57
What does Variance measure?
Variance measures the dispersion of a set of data points around their mean The closer a number is to the mean the lower the result (variance) The farther away a number is from the mean the higher the result (variance) Can never be a negative value dispersion is about distance and distance cannot be negative * the result will be large and hard to compare - because it is squared
58
Which is more meaningful, Std Dev or Variance?
Std dev will be much more meaningful than variance
59
Are there different formulas for Std Deviation?
Yes, one for population and sample data
60
What are the formulas for Standard deviation?
Population = sq root of the population variance Sample = sq root of the sample variance
61
What is the formula for Coefficient of Variation (CV)?
standard deviation / mean
62
What is another name for Coefficient of Variation (CV)?
relative standard deviation
63
What is the most common measure of variability for a single dataset?
standard deviation
64
Why do we need the measure of Coefficient of Variation (CV)?
comparing the standard of deviation of two datasets is meaningless. Comparing Coefficients of Variation is not.
65
Why is Standard Deviation preferred measure of variability?
Because it is directly interpretable. It is given in original units. Variance is given in squared units.
66
Where is Coefficient of Variation (CV) best used?
When comparing the variability of two datasets
67
What are the 3 univariate measures? (one variable)
1. Central Tendency 2. Asymmetry 3. Variability
68
What are the two methods to explore the relationship between two variables?
1. Covariance 2. Linear correlation coefficient
69
What is the main statistic to measure correlation?
Covariance - it may be positive, negative, or zero
70
What does the direction of covariance tell us?
> 0, the two variables move together < 0, the two variables move in opposite directions = 0, the two variables are independent
71
What does the correlation coefficient do?
It adjusts the covariance, so that the relationship between the two variables becomes easy and intuitive to interpret.
72
What is the range of the correlation coefficient?
-1 to +1
73
What does Perfect Positive Correlation mean?
The entire variability of one variable is explained by the other Correlation coefficient = 1
74
What does a Correlation coefficient of Zero mean?
The variables are absolutely independent of each other. The two variables don't have anything in common.
75
What does a Negative Correlation Coefficient mean?
The variables move in opposite directions for each other. When one goes up the other goes down.
76
Is the correlation of x, y = y, x
Yes
77
Causality - Correlation does not imply causation
It is important to understand the direction of causal relationships In housing, size causes the price and not vice versa Causality is an asymmetric relation. (x causes y is different from y causes x)
78
What is the formula for Correlation Coefficient?
Cov (x,y) / Stdev(x) * Stdev(y)
79
What are the types of data and the levels of measurement of the following variables: Cust ID, Mortgage, Year of sale
Variable Type of Data Level of Measurement Cust ID Categorical, Qualitative Nominal Mortgage Categorical Nominal Year of Sale Numerical, discrete Interval Age Quantitative, Ratio - as a whole number is discrete Price Numerical, Continuous Ratio Gender Categorical Nominal State Categorical Nominal
80
What Excel function is used to calculate Correlation Coefficient?
CORREL()
81
What Excel function is used to calculate Covariance?
COVARIANCE.S()
82
When should you disregard correlations?
When the correlation is below 0.2