Quant, Org, visualizing data Flashcards

(132 cards)

1
Q

What are the two types of numerical data?

A

The two types of numerical data are discrete data and continuous data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is discrete data?

A

Discrete data are values that can be counted, such as the months, days, or hours in a year.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is continuous data?

A

Continuous data can take any fractional value, such as the annual percentage return on an investment.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is another term for categorical data?

A

Categorical data is also known as qualitative data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the two types of categorical data?

A

The two types of categorical data are nominal data and ordinal data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is nominal data?

A

Nominal data are labels that cannot be logically ordered, such as different types of mutual funds.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Can numbers be assigned to nominal data?

A

Yes, numbers can be assigned to nominal data, but the numbers are arbitrary and do not have any inherent meaning.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is ordinal data?

A

Ordinal data can be ranked in a logical order, such as ranking stocks based on performance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How are ordinal data represented?

A

Ordinal data are ranked by assigning numbers to different categories based on a specific characteristic.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Can mathematical operations be performed on categorical data?

A

No, mathematical operations cannot be performed on categorical data as it does not have inherent numerical meaning.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the key distinction between numerical and categorical data?

A

The key distinction between numerical and categorical data is that mathematical operations can be performed only on numerical data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is time series data?

A

Time series data represents data collected over a period of time, such as stock prices over a year.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

: What is cross-sectional data?

A

Cross-sectional data represents data collected at a specific point in time, such as a snapshot of stock prices on a particular day.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is structured data?

A

Answer: Structured data is organized and formatted in a specific way, such as data in a spreadsheet.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

: What is unstructured data?

A

Unstructured data does not have a specific format or organization, such as social media posts or customer reviews.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the difference between discrete data and continuous data?

A

Discrete data are values that can be counted and are distinct, whereas continuous data can take any fractional value and are not distinct.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What are some examples of categorical data?

A

Examples of categorical data include gender, marital status, education level, and product categories.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What are some examples of numerical data?

A

Examples of numerical data include age, height, weight, temperature, and income.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is the difference between nominal data and ordinal data?

A

Nominal data are labels that cannot be logically ordered, whereas ordinal data can be ranked in a logical order.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is an example of nominal data?

A

An example of nominal data is different types of mutual funds, such as corporate bond funds, municipal bond funds, and international bond funds.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

: What is an example of ordinal data?

A

An example of ordinal data is ranking stocks based on their performance, such as assigning numbers to stocks based on their performance ranking.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Can mathematical operations be performed on ordinal data?

A

No, mathematical operations should not be performed on ordinal data as the numerical difference between categories may not be meaningful.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What are some examples of time series data?

A

Examples of time series data include stock prices over a period of time, monthly sales data, and weather data over a year

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What are some examples of cross-sectional data?

A

Examples of cross-sectional data include a snapshot of stock prices on a particular day, customer data at a specific point in time, and survey data collected at a single time point.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
What are some examples of structured data?
Examples of structured data include data in a spreadsheet, database, or organized in a specific format with defined fields and values.
26
What is cross-sectional data?
Answer: Cross-sectional data refers to a set of comparable observations taken at one specific point in time, such as today's closing prices of the 30 stocks in the Dow Jones Industrial Average or fourth-quarter earnings per share for 10 health care companies.
27
: What is time series data?
Time series data is a set of observations taken periodically at equal intervals over time, such as daily closing prices of a stock over the past year or quarterly earnings per share of a company over a five-year period.
28
What is panel data?
Answer: Panel data is a combination of time series and cross-sectional data, often presented in tables. An example of panel data is OECD Composite Leading Indicators, Year-on-Year Growth Rate, where each row represents cross-sectional data and each column represents time series data.
29
What are examples of structured data?
Answer: Examples of structured data include time series, cross-sectional, and panel data that are organized in a defined way, such as market data like security prices, fundamental data like accounting values, and analytical data like analysts' earnings forecasts.
30
What is unstructured data?
Answer: Unstructured data refers to information presented in a form with no defined structure, such as management's commentary in company financial statements. Unstructured data can be generated by individuals, business processes, or sensors and usually needs to be transformed into structured data for analysis.
31
How are data organized for quantitative analysis?
Answer: Data are typically organized into arrays for quantitative analysis. Time series data represents a one-dimensional array, and panel data represents a two-dimensional array or a data table with cross-sectional observations for each measurement date.
32
: What is the key feature of time series data?
Answer: The key feature of time series data is that new data can be added without affecting the existing data, allowing for identification of trends, cycles, and patterns useful for forecasting.
33
: What are some examples of data that can be analyzed using panel data?
Examples of data that can be analyzed using panel data include economic indicators, market data, and financial performance of companies, where each row represents cross-sectional data and each column represents time series data in a data table.
34
What are some examples of unstructured data generated by individuals?
Examples of unstructured data generated by individuals include posts on social media, comments on online forums, and reviews on websites.
35
What are some examples of unstructured data generated by business processes?
Examples of unstructured data generated by business processes include records of deposits, withdrawals, and transfers of cash, customer service interactions, and email communications.
36
What are some examples of unstructured data generated by sensors?
Answer: Examples of unstructured data generated by sensors include satellite images, traffic camera footage, and weather sensor readings.
37
How can unstructured data be transformed into structured data?
Answer: Unstructured data can be transformed into structured data using techniques such as natural language processing, data extraction, and data classification algorithms that can identify patterns, entities, and relationships within the data.
38
What is a confusion matrix?
A confusion matrix is a 2-by-2 array that displays the number of occurrences predicted and the number actually observed for each of two possible outcomes. It is used to assess the performance of a model in predicting outcomes.
39
How is a confusion matrix used in quantitative analysis?
Answer: A confusion matrix is used in quantitative analysis to evaluate the accuracy of a model's predictions by comparing the predicted outcomes with the actual outcomes. It helps in understanding the false positive, false negative, true positive, and true negative predictions made by the model.
40
What does the number of occurrences in a confusion matrix represent?
Answer: The number of occurrences in a confusion matrix represents the count of predicted outcomes for each category (e.g., predicted outcomes occurring or not occurring) and the count of actual outcomes for each category (e.g., actual outcomes occurring or not occurring).
41
What information can be obtained from a confusion matrix?
Answer: A confusion matrix provides information on the accuracy of a model's predictions, including the number of correct and incorrect predictions for each outcome category, and can be used to identify the type and frequency of prediction errors made by the model.
42
What are some other types of contingency tables besides a confusion matrix?
Answer: Other types of contingency tables include multi-dimensional contingency tables that can represent more than two outcomes, and stratified contingency tables that allow for comparisons across different categories or subpopulations.
43
What is the purpose of evaluating model performance in quantitative analysis?
Answer: The purpose of evaluating model performance in quantitative analysis is to assess how well a model is performing in making accurate predictions or estimations, and to identify areas for improvement in the model's performance.
44
Question 27: How can model evaluation contribute to the development of better models?
Answer: Model evaluation can contribute to the development of better models by providing feedback on the strengths and weaknesses of the current model. This feedback can help guide the selection of better model parameters, techniques, or data quality improvements, leading to the development of more accurate and robust models.
45
What is the purpose of selecting a visualization type?
Answer: The purpose of selecting a visualization type is to effectively communicate information by choosing the most appropriate chart that clearly presents the data.
46
What should be considered when choosing a visualization type?
Answer: Factors such as the relationship between variables, the type of comparison needed, and the distribution of the data should be considered when choosing a visualization type.
47
What are some effective chart types for visualizing relationships between variables?
Answer: Scatter plots, scatter plot matrices, and heat maps are effective chart types for visualizing relationships between variables.
48
What are some effective chart types for making comparisons among categories?
Answer: Bar charts, tree maps, and heat maps are effective chart types for making comparisons among categories.
49
What are some effective chart types for comparisons over time?
Answer: Line charts, dual-scale line charts, and bubble line charts are effective chart types for comparisons over time.
50
What are some effective chart types for visualizing distributions of numerical data?
Answer: Histograms, frequency polygons, and cumulative distribution charts are effective chart types for visualizing distributions of numerical data.
51
What are some effective chart types for visualizing distributions of categorical data?
Answer: Bar charts, tree maps, and heat maps are effective chart types for visualizing distributions of categorical data.
52
What are some effective chart types for visualizing distributions of text data?
Answer: Word clouds are effective chart types for visualizing distributions of text data.
53
What should an analyst avoid when creating a chart?
Answer: An analyst should avoid misrepresentations, such as selectively showing data that supports their analysis while omitting contradictory data, or manipulating the scale of the axes to exaggerate or obscure variations in the data.
54
Why is it important to choose a chart type that effectively visualizes the underlying data?
Answer: Choosing a chart type that accurately represents the data is important to ensure that the information being presented is not misleading or misrepresented.
55
What is the purpose of using a scatter plot matrix?
Answer: The purpose of using a scatter plot matrix is to visualize relationships between multiple variables in a matrix format, where each variable is plotted against all other variables.
56
When would a tree map be a useful chart type to use?
Answer: A tree map would be a useful chart type to use when making comparisons among categories with hierarchical relationships, such as visualizing market share of different product categories within a larger product portfolio.
57
What is the main purpose of using a histogram?
Answer: The main purpose of using a histogram is to visualize the distribution of numerical data by displaying it in bins or intervals.
58
How can a bar chart be used to visualize distributions of both categorical and numerical data?
Answer: A bar chart can be used to visualize distributions of categorical data by showing the frequency or proportion of different categories, and it can also be used to visualize distributions of numerical data by showing summary statistics such as mean or median for each category.
59
what is a frequency polygon
it's a line graph where we take the midpoint of each interval to then create a line pattern.
60
what is the sample mean symbol? what is the population mean symbol
X-Bar Curved U
61
what is the formula of pop mean: what is the formula for sample mean
U = Sum(Xi)/N X-bar= Sum(Xi)/n
62
what is the population mean
Note that the population mean is unique in that a given population only has one mean.
63
what is a sample mean
it is a sum of all the values in a sample of a population. it is used to make inferences about the population mean.
64
what is the measure of central tendency
Measures of central tendency identify the center, or average, of a data set. This central point can then be used to represent the typical, or expected, value in the data set.
65
what is a trimmed mean
it excludes a stated percentage of the most extreme observations (ex: a 1% trimmed mean discards the lowest 0.5% and highest 0.5%.
66
what is a winsorized mean
it discards the highest and lowest observations, substituting a value for them. for ex: after determining the 5th and 95th percentile for the observations (in order to calc a 90% winsorized mean), you substitute the 5th percentile with any value lower than that, and sub the 95th percentile for any value higher than that, and then calc the mean
67
what is the formula for a weighted mean
X-barW = sum (w1X1 +W2X2 +...WnXn)
68
what is the median and why would it be better to use vs the mean
the median is the midpoint (50% quartile) on both ends of the datasets. it would be wise to use the median in cases where there are many outliers that affect the mean, as it does not affect the median.
69
how do we calc median for odd and even?
even: first arrange order in descending order, then 0.5(X1+x2) where x1 and 2 are the positions odd: first, rearange the order in descending order, then select the middle number in the order.
70
what is mode
mode is what value occurs most frequently in a data set.
71
what is unimodal, bimodal or trimodal
unimodal: when a distribution has one value that appears most frequently, when a set of data has two or three values that occur most frequently, it's bimodal or trimodal, respectively.
72
what is the difference between geometric mean and geometric mean of return?
mean return simply adds (-1) to get a % base
73
what is the harmonic mean used for usually?
used for certain computations, such as the average cost of shares purchased over time
74
what is the harmonic mean formula?
X-bar = N/Sum(1/Xi))
75
ex harmonic: investor purchases 1000 of MF per month, over the last 3 months, prices were 8 9 and 10. what is the average cost per share?
3/((1/8)+(1/9)+(1/10)) = 8.926 per share
76
what is a quartile quintile decile percentile
quartile: distribution divided into quarters quintile: distribution divided into fifths decile: distribution divided into tenths percentile: distribution divided into hundreths (percents)
77
what is the interquartile range
the interquartile range is the spread between 75th and 25th percentile (Q3-Q1)
78
WHAT IS THE FORMULA TO FIND THE POSITION OF A GIVEN PERCENTILE
Ly= (n+1)y/100 where y = given percentile, n = data points sorted in ascending order
79
What is the third quartile for the following distribution of returns? 8%, 10%, 12%, 13%, 15%, 17%, 17%, 18%, 19%, 23%
Ly=(10+1)×75/100=8.25
80
what is the range
distance between largest and smallest value in data set (Max-Min)
81
what is the mean absolute deviaton (MAD) (and formula)
the average of hte absolute alues of the deviation of individual observatons from the arithmetic mean MAD = SUM|Xi-X|/n
82
What is the MAD of the investment returns for the five managers discussed in the preceding example? How is it interpreted? annualized returns: [30%, 12%, 25%, 20%, 23%]
X-bar = [30+12+25+20+23]/5 = 22% MAD = {[30-22]+[12-22]+[25-22]+[20-22]+[23-22]}/5 MAD = [8+10+3+2+1]/5 = 4.8%
83
what is relative dispersion and what is it commonly measured with?
it is the amount of variability in a distribution relative to a reference or benchmark. Coefficient of variation (CV)
84
WHAT IS the Coeff of var formula (CV)
cv = Sx/X-bar = std dev of x/avg value of x
85
what is the CV?
it measures the amount of dispersion in a distribution relative to the distribution's mean. it is used to measure the risk (variability) per unit of expected return. a lower CV is better.
86
what is target downside deviation (or target semideviation)
it's similar to calculating std dev, but we choose a target (valueB) against which to measure each outcome and only include deviations from the target value in our calc (so instead of Xi-X-bar, it's Xi -B
87
what's an importnat factor about selecting variables for target downside deviation
we do not select variables where Xi < B (so if the Xi is less than B, we exclude it from the calc), however, we still keep N as all the observations (so if we have 5 observations, and we only use 3, well the numerator is still 3 calcs, but denom is N-1 = 5-1 still and NOT 3-1
88
What is skewness in statistics?
Skewness refers to the extent to which a distribution is not symmetrical.
89
What does a symmetrical distribution imply?
A symmetrical distribution is shaped identically on both sides of its mean, and intervals of losses and gains exhibit the same frequency.
90
What is a positively skewed distribution?
A positively skewed distribution has outliers greater than the mean (in the upper region, or right tail), and is characterized by a relatively long upper tail. mean>median>mode
91
What is a negatively skewed distribution?
A negatively skewed distribution has a disproportionately large amount of outliers less than the mean that fall within its lower tail (left tail), and is characterized by a long lower tail. mean
92
How does skewness affect the mean, median, and mode of a distribution?
For a symmetrical distribution, the mean, median, and mode are equal.
93
What happens to the mean in a positively skewed distribution?
In a positively skewed distribution, the mean is greater than the median, which is greater than the mode. The mean is affected by positive outliers, and tends to be pulled upward (to the right) by their presence.
94
What is an example of a positively skewed distribution?
An example of a positively skewed distribution is that of housing prices, where the presence of high-priced outlier homes pulls the mean upward.
95
What happens to the mean in a negatively skewed distribution?
In a negatively skewed distribution, the mean is less than the median, which is less than the mode. The mean is affected by negative outliers, and tends to be pulled downward (to the left) by their presence.
96
What is an example of a negatively skewed distribution?
An example of a negatively skewed distribution is the distribution of grades in a class where a significant number of students scored very low, pulling the mean downward.
97
What are the characteristics of a positively skewed distribution?
A positively skewed distribution has a longer upper (right) tail, and outliers greater than the mean in the upper region.
98
What are the characteristics of a negatively skewed distribution?
A negatively skewed distribution has a longer lower (left) tail, and a disproportionately large amount of outliers less than the mean that fall within its lower tail.
99
How does skewness affect the median and mode in a positively skewed distribution?
In a positively skewed distribution, the mode is less than the median, which is less than the mean.
100
How does skewness affect the median and mode in a negatively skewed distribution?
In a negatively skewed distribution, the mean is less than the median, which is less than the mode.
101
What is the key to remembering how measures of central tendency are affected by skewed data?
The key is to recognize that skewness affects the mean more than the median and mode, and the mean is pulled in the direction of the skew.
102
What does the numerator of sample skewness measure?
The numerator of sample skewness measures the tendency of observations above or below the mean to be farther from the mean on average.
103
What does a positive sample skewness indicate?
A positive sample skewness indicates that the distribution is right skewed, with deviations above the mean being larger on average.
104
What does a negative sample skewness indicate?
A negative sample skewness indicates that the distribution is left skewed, with deviations below the mean being larger on average.
105
What does dividing by standard deviation cubed do to the skewness statistic?
Dividing by standard deviation cubed standardizes the skewness statistic and allows for interpretation of the skewness measure.
106
What values of sample skewness in excess of 0.5 in absolute value are considered significant?
Values of sample skewness in excess of 0.5 in absolute value are considered significant and may indicate a skewed distribution.
107
What does a relative skewness of zero indicate?
A relative skewness of zero indicates that the data is not skewed.
108
what is the formula for skewnesss?
1/N * (Sum[Xi-X-bar]^3)/S^3
109
what is kurtosis
it is a measure of the degree to which a distribution is more or less peaked than a normal distribution
110
what is leptokurtic
it describes a distribution that is more peaked than a normal distribution, with more returns clustered around the mean and more returns with large deviations from the mean (i.e., a fatter tail)
111
what is platykurtic
it refers to a distrbution that is less peaked, or flatter than normal distributions. it has less returns clustered around the mean, and less of a fat tail [thin tails] (less deviations from the mean).
112
what is mesokurtic
this is when a distribution has the same kurtosis as a normal distribution.
113
what happens if excess kurtosis is greater than 3
it indicates distributions that are leptokurtic (more peaked, fat tail)
114
what if excess kurtosis is less than 3
it means kurtosis indicates a distribution that is platykurtic (leess peaked, thin tails)
115
what if excess kurtosis is equal to 3
this means that kurtosis is mesokurtic (equals normal distribution)
116
what is the formula for covariance
sum of (Xi-X bar)(Yi-Y bar)/n-1 where x/y bar = mean of variable xi/yi = obs of variable n = num of periods
117
what does correlation do?
it measures strength and direction of a linear relationship between two random variables
118
if Pxy=+1
it means it has perfect positive correlation
119
Pxy=0,
it means there is no linear relationship, indicating prediction of y cannot be made on the basis of x using linear methods
120
if Pxy=-1
perfect negative correlation.
121
what is the formula for correl
Pxy= Sxy/SxSy where Sx,y = covar (numerator is covar, denom is std of each var)
122
what is spurious correlation
refers to correlation that is either the result of chance or present due to changes in both variables over time that is caused by their association with a third variable
122
what is spurious correlation
refers to correlation that is either the result of chance or present due to changes in both variables over time that is caused by their association with a third variable
123
what is an empirical probability
it's established by analyzing past data (outcomes)
124
what is a priori probability
it's determined using formal reasoning and inspection process (not data).
125
what is a subjective probability
it is the least formal method of devloping probabilities (personal judgement)
126
how do we calculate the odds with a % given to us
P(A)/(1-P(A)) = our odds (ex: 1 in 7
127
what is an unconditional probability
it is also known as marginal probability and refers to an event occuring regardless of other external factors
128
what is a conditional probability
it's a probability that occurs GIVEN another event occurs. (P(A|b)
129
What is the addition rule of probability (and formula)
it's used to determine the probability that at least one event will occur (PAuB)=P(A)+P(B)-P(AB)
130
What is a joint probability?
the probability that two events will occur. (P(AB)=P(A|B)*(P*B).
131
if the probability of the addition rule is mutually exclusive, what happens to the formula?
it becomes P(AuB) = P(A) + P(B)..... we remove the (- p(ab))