Quantitative methods (legacy PREREQ1-LM1) Flashcards

Question

How do we calculate the present value of a series of unequal cash flows using the NPV function on a financial calculator?

Answer 1

First clear the calculator pressing 2nd CF 2nd CE/C in order. Then your screen will say CF0. This is the cash flow at the beginning. Where there is no value press 0 and then the down arrow In years where there is a value write the amount, press enter, and down arrow twice At the last cash flow, press the down arrow once and then hit NPV Then I will be displayed, This is the discount rate. Write a number, then press enter and the down arrow. Then press NPV and CPT Your value will be displayed

Answer 2

It's almost the same number of keystrokes You could simply divide each amount by 1+ the discount rate to the power of the number of years of discounting i.e.,10 000 / (1.04) + 20 000 / (1.04)^2 + 30 000 / (1.04)^3 = PV

Answer 3

r = (FV/PV)^(1/N) - 1 I.e., let's say future value = 2 000 000 present value = 450 000 N = 20 years (2 000 000 / 450 000)^0.05 - 1 = 0.077 = 7.7% By contrast, if FV = 1 500 000, PV = 550 000, N = 25 years: (1 500 000 / 550 000)^0.04 - 1 = 4.1% We can also use this to determine growth rate per year of a financial metric of a company found on its financial statements

Answer 4

Solve for N: FV = PV (1+r)^N (1+r)^N = FV/PV N ln (1+r) = ln (FV / PV) N = ln (FV / PV) / ln (1 + r) In this case: N = ln (500 / 100) / ln (1+0.1) N = 16.89

Answer 5

You can simply use the annuity formula! If PV = A [(1 - 1 / {1 + r}^N) / r ] then A = PV / [(1 - 1 / {1 + r}^N) / r ] Make sure to modify the periodicity by dividing the interest rate by 12 and multiplying N by 12: A = 500 000 / [(1 - 1 / {1 + 0.04/12}^12*20) / 0.04/12 ] A = 500 000 / 165.022 A = £3 030 per month

Answer 6

First let's calculate the value of the initial payments on my 28th birthday: FV5: N=5, PMT=2000, I/Y=6.25, PV=0 CPT FV = £11,339 Then let's calculate the value of the future retirement income when I hit 53: PV30: N=30, PMT=40 000, I/Y=6.25, FV=0 CPT PV = 536 173 Third, let's compare the value of the future retirement income at my 28th birthday: PV5 = PV30 / (1.0625)^25 PV5 = £117 783 Now let's see how far short I am: 117 783 - 11 339 = 106 444 The PV of my earnings from 28 to 53 must therefore equal £106,444 N=25, FV = 0, I/Y = 6.25, PV = 106,444 CPT PMT = 8 526 So I would need to pay in £8 526 per year from 28 to 53 to hit my retirement goal

Answer 7

First let's calculate the value of the initial payments on my 28th birthday: FV10: N=10, PMT=4000, I/Y=8, PV=0 CPT FV = £57,946 Second, let's calculate the value of the future retirement income when I hit 53: PV30: N=30, PMT=100 000, I/Y=8, FV=0 CPT PV = £1,125,778 Third, let's compare the value of the initial savings when I retire: PV30 = PV10(1.08)^20 = £270 083 Fourth, let's find the difference: 1,125,778 - 270 083 = 855 695 Fifth, let's calculate what annual payment I would need to make over the last 20 years of working to reach this figure: N=20, FV=855 695, I/Y=8, PV=0 CPT PMT = £18,699 Thus I would need to save £18,699 (nominal) every year whilst working from 33 to 53 to meet my retirement goals.

Answer 8

A collection of numbers, characters, words or text that represents FACTS or INFORMATION Thus, 1. Data is not knowledge Analysis or interpretation brought to data brings knowledge 2. Data does not have to be numbers

Answer 9

NOIR Categorical data: - Values that describe a quality or characteristic - Mutually exclusive labels or groups (somethign cannot belong to more than one category) Numberical data: - Measured or counted quantities - Quantitative Categorical: N for Nominal (no logical order) O for Order (has a logical order or rank, with gaps or groups of any size) Numerical: I for Integer (Discrete): limited to a finite number of values R for Ratio (Continuous): can take on any value within a range

Answer 10

Cross-sectional data involves multiple observations of a particular variable. I.e., the stock prices of 60 companies. In this case N=60 Time series data involves multiple observations of a particular variable for the same observational unit over time. For example, GM' stock price over the last 60 months Panel data is a combination of cross sectional and time series data. It might involve multiple observations of a particular variable (stock price of 60 companies) across a period of time (60 months). Putting time down the y axis and companies along the x axis creates a data table of panel data.

Answer 11

A particular quality or characteristic we are tracking, like stock price or height

Answer 12

The value of a specific variable. E.g., GM at $53.50 (where the variable is stock price) Tom at 93kg (where the variable is a person's mass)

Answer 13

Structured data is highly organised in a pre-defined manner. I.e., stock prices, returns, earnings per share Unstructured data has no organised form. E.g., news, social media posts, company filings, audio/video Unstructured data is also sometimes called alternative data. it can be produced by individuals, business processes (credit card transactions), or generated by sensors. To be useful in data analysis, it must be transformed into structured data. This is what machine learning does: it adds structure to unstructured data and gets progressively better at doing this.

Answer 14

- A 1D array is a column of a spreadsheet showing observations for 1 variable. It could be cross sectional or time series data. - A 2D array is a rectangular array showing two or more variables. It is also known as a data table. It could be cross sectional or panel data.

Answer 15

The number of observations of a specific value or group of a variable. I.e., how many are there in that category. Frequency could also be relative, i.e., number in that category as a % of number in all categories. It is sorted in ascending or descending order

Answer 16

We create bins (aka non overlapping intervals) 1. Sory data in ascending order 2. Find the range: max to min 3. Decide on the number of intervals (k) 4. Interval width = range / k (we always round up) Be careful when choosing k. Too few leads to too much aggregation and loss of info Too many results in insufficient aggregation, and too much noise included (i.e., only one observation in each interval) You may have to play around with different k values to choose a good one that gives you the right amount of information. The ML algorithm is only as good as the data you give it Interval no.1 will be the min value + width. When we specify intervals, a square bracket means it includes the value adjacent to it, a round bracket means it does not. i.e., (0,5] will include 5 but not 0.

Answer 17

1. Arrange the data in ascending order 2. Minus high from low to get range 3. Let k = a chosen number. Divide the range by k to get bin size 4. Sort the data into each bin. Count the number falling into each. If the data is too concentrated (ie a majority falling into one bin) or too spaced out (i.e. most bins have nothing or only one data point in) adjust k accordingly

Answer 18

A sequence of partial sums that sum to N or 100% So when you move from one bin to the next you add all those in previous bins (i.e., when going from bottom to top value)

Answer 19

Summarises data for 2 or more categorical variables Helps us visually find patterns A 2 way table will have 2 variables One variation might see 3 bins for small, mid, and large market capitalisation. These labels will be shown across the top Then down the y axis we would see the categories of different sectors (nominal data) like communication services, consumer staples, energy etc. That way we could see where the concentration is within our data set. Along the right hand side and bottom we might see totals to compare Every entry in the table is called a joint frequency You could also express each item as a percentage of row total, column total, or overall total for comparability We can pull a lot of information about a portfolio just by breaking it down and using a contingency table like this

Answer 20

1. Confusion matrix Used to help assess the precision of a classification model i.e. in ML in Level 2. Identify potential association between 2 categorical variables For example, we can use a contingency table to help conduct a "chi square test of independence" We would develop 2 tables, one where we just input actual values (i.e. low or high risk across, growth or value stock down), and one where we write down what we would expect to see for each value in this matrix Then we would do the sum of [(observed - expected)^2 / expected] The greater the Chi squared value, the higher the probability there is an association between the tested variables

Answer 21

Used to present distributions of numerical data. Useful when we want to compare to a normal distribution or a log distribution that has well defined properties and look for kurtosis, skewness

Answer 22

Created from joining together the tops of the histogram bars, giving you an understanding of the distribution without calculations. Can also be in the form of cumulative frequency. Adds from low to high. Can see where the most observations are

Answer 23

For categorical rather than numerical data. Can be horizontal or vertical, stacked showing decomposition, or grouped when there are 2 variables (including a nominal and a numerical observation)

Answer 24

A set of coloured rectangles used to represent groups. Area = % of that group. We can have nested rectangles to decompose.

Answer 25

Depicts frequency of unstructed data i.e., text. Colour can be used to display sentiment or simply distinguish between words

Answer 26

- Line chart. Used to visualise ordered observations. Typically used for time series data, to show changes and underlying trends. We could add other characteristics by also adding bubbles (i.e. EPS along stock price), in colours to show positive or negative EPS

Answer 27

Used to visualise joint variation in 2 numerical values. There may be no relationship, linear relationship, or non-linear relationship. A scatter plot matrix can be used to assess pairwise association between many variables. Many scatter plots will be laid next to one another so you can spot if there are any trends

Answer 28

A contingency table with colour coded cells Can be generated in BB terinal Can also be used to visualise the degree of correlation among different variables

Answer 29

Decide whether you are looking to explore/present a relationship, distribution, or comparison If a relationship, look toward scatter plots, heat maps If a distribution of numerical data, look at histograms, freq polygons, and cumulative distribution charts if a distr of categorical data, look at bar charts, tree and heat maps If a distr of unstructured data, use a word cloud If a comparison among categories, try a bar chart, tree map, or heat map If a comparison over time, try a line chart Heat maps are particularly versatile!

Answer 30

1. Selecting an improper chart type that hinders accurate interpretation of the data 2. Selecting data that favours a particular partisan conclusion 3. Truncating the range of data (so you don't get the full picture)

Answer 31

A measure that specifies where data are centred. Examples include: - Arithmetic mean - Median - Mode - Weighted mean - Geometric mean - Harmonic mean The geometric mean is used a lot at level 3 and quite a lot at level 2. To really understand the Black-Scholes options pricing model and how Z-values are generated, you need to understand it as well

Answer 32

Population is everything that we might want to look at. When we talk about population we are referring to parameters that describe data. Mean, Standard Deviation (a measure of dispersion) A sample might only be the last two years. Sample statistics include x bar and s (these are descriptive statistics). S is the measure of dispersion for a sample. We use greek letters for populations Later on we'll look at inferential statistics where we can say something about the population based on the sample

Answer 33

The classic mean. Sum of all the values you have divided by the number of values You could have a cross sectional mean (i.e., average sales of 50 comanies) Or a time-series mean (average sales for GM over 10 years) When you sum all the deviations of values from the arithmetic mean you should get 0 We can calculate: - variance by taking the deviations squared - skew by taking the deviations cubed - kurtosis by taking the deviations to the power of 4

Answer 34

The disadvantage of arithmetic mean is that it is highly sensitive to outliers. The AM of 1,2,3,4,5,6,1000 is 145.86. This is not representative of any value of the set! Options 1. Do nothing: AM may be appropriate if the value is legitimate and correct. It may contain meaningful information 2. Use a trimmed mean. Exclude a small % of lowest and highest value. I.e., a 5% trimmed mean would delete top 2.5% and bottom 2.5% of the data 3. Replace outliers with another value. we can use a winsorized mean. A 95% winsorized mean would be one where the top 2.5% of values are replaced by the value at which all others lie above. The bottom 2.5% replaced with the value at which all others lie below.

Answer 35

The median is the middlemost value of a set of observations. If there are 11 values the median is the 6th, if there are 10 then we find the mean of the 5th and 6th. Pros: - not affected by extreme values (outliers). - It is thus useful for describing central tendency for non symmetrical distributions - It can also describe symmetrical distributions. In a perfectly symmetrical one mean = median

Answer 36

The mode is the most frequently occuring value in a distribution. Unimodal selects only 1 value that is most frequent Bimodal selects two values that have the highest frequency Trimodal selects three And we can keep going. There will be no mode if there is no value that occurs more frequently than any other. This would be a uniform distribution. Pros: - Only measure of central tendency that can be used with nominal (non-numerical, non-ordered) data. For a symmetrical distribution, mode = median = mean

Answer 37

Weighted mean is often used in calculating the return of a portfolio or expected return given a set of asset classes and weightings on those asset classes We write it out as x-bar sub-w When there is equal weighting weighted mean equals arithmetic mean When weighting is greater than 1 in a portfolio context we have a long position When weighting is less than 1 in a portfolio context we have a short position WM weights can also be probabilities. We can multiple probability of bullish, neutral, and bearish scenarios by the expected return in each scenario to calculate the return of the S&P500 for example.

Answer 38

GM is used with rates of change over time or to compute growth rates. You can use GM to find compound growth rates whereas using AM over multiple periods would not actually tell you the compound growth rate. Thus AM is fine for 1 period. But if across multiple periods we use GM. I.e., to calculate CAGR of a company's sales based on Y1 and Y6 sales. GM is calculated by taking the nth root of all the values multiplied together, where n is the number of values. GM is thus always less than or equal to AM. It is only equal to AM if all the values are the same. AM and GM diverge, with GM getting relatively smaller, as variability increases When we calculate returns we add 1+% return so we don't get neg values I.e., +5% becomes 1.05, -6% becomes 0.94. We then subtract 1 at the very end to get a % GM can also be calculated as e^(ln(multiplied values)/n)

Answer 39

GM is also written as x-bar sub-g AM is also written as x-bar sub-a GM = AM - (sigma^2) /2 GM is the AM minus the variance of the observations / 2 Since variance is SD squared We have to multiply all of this by t if over multiple periods

Answer 40

The AM upside down When we get to the denominator of AM, we turn it upside down too Instead of sum of observations / n, we invert to n / sum of observations Instead of having sum of observations, we then do 1 / sum of observations The end result is this gives much less weight to outliers It is appropriate for averaging ratios when the ratios are repeatedly applied to a fixed quantity to yield a variable no of units I.e., dollar cost averaging You are buying a variable number of units each month for a fixed price. You can calculate the average price paid in a much more concise fashion using HM If you buy 1000 worth of shares over 2 months with share price first at 10 and then at 15: x-bar sub-h = 2/((1/10)+(1/15)) = 12

Answer 41

AM is used when including all values including outliers (so maybe when we think the outliers are important) GM is used when compounding is involved HM is used to avoid outliers When variance increases, the means spread out, such that: AM > GM > HM However, AM x HM ~= GM^2 This tendency is stronger with lower variance but can be very accurate

Answer 42

First find the location of the percentile by doing (n+1) x (y/100) Where y is the percentile we are looking for And n is the number of values in our data set Our data must be sorted by size. When location ends up as an integer value, we are done When location is a decimal, we use interpolation So for example if location ends up as 6.8, we multiply the difference between location 6 and 7 by 0.8. Then we add this value to location 6. The larger the dataset the more accurate the percentile value is

Answer 43

The whiskers show the highest and lowest value The box shows interquartile range There will be a line through it to show median and often an x or dot to show arithmetic average It can be used to - rank performance of portfolios and investment managers in terms of the percentile or quartile in which they fall - Perform investment research looking at the bottom and top return decile

Answer 44

Additional information sometimes included in a boxplot. We find the upper fence by multiplying IQR by 1.5. We then add it to the upper bound of Q3 We find the lower fence my multiplying IQR by 1.5, and subtracting it from the lower bound of Q2

Answer 45

Rank performance of portfolios and investment managers in terms of the percentile or quartile in which they fall In investment research we can find the bottom return decile for example and take a short position, and take a long position on the top return decile. This is something that a hedge fund might do. This is also a typical way to isolate a factor as discussed in L3

Answer 46

Captures the variability around the central tendency Measures of absolute dispertion include range, mean absolute deviation, variance, and standard deviation

Answer 47

1. Range: max value less min value This uses only two observations and tells us nothing about the shape of the distribution however, so its simplicity cuts both ways. It could simply take a range between outliers or return the same range for wildly different skew, kurtosis, or variance 2. Mean absolute deviation. We take all the deviations from the arithmetic mean and divide by n The difference is that we take the absolute value of each deviation rather than positive and negative, so they don't cancel out All the observations have equal weight - actually observations further from the mean should have more weight, which is why we turn to... 3. Variance (s^2 or sigma2) and standard deviation (s or sigma) we use s to notate these when talking about sample and the greek when talking about population Variance is just standard deviation squared Variance is the difference from the arithmetic mean squared over (n-1) The square root of all of this is s.d. we use n-1 because if we have n observations we can actually only take n-1 as random: the nth is constrained (we can calculate it using the n-1 observations plus the mean) Therefore we do not have n independent variables, we have n-1 Thus we actually lose 1 degree of freedom

Answer 48

sd is expressed in the same units as a mean variance is more difficult to interpret since it's in the units squared However, variance can still be useful: GM ~= AM - S^2 /2 S^2 x t = geometric variance S sqrt(t) = geometric sample standard deviation This last one is important - you see it in level 2 in the black scholes merton model

Answer 49

It is a measure of dispersion below the target figure As an investor we are not concerned with volatility on the upside but rather only downside risk Thus we can set a target and find variance below this target We calculate S sub-target as: sqrt( (sum of squared deviations of all x-sub i that are less than B) / n-1) We use the n of the whole dataset not just those that are less than B. This means when we change B, only the numerator changes, not the denominator Thus when we change the target the measure of target deviation changes satisfyingly as well This is technically because figures that are above the target are simply entered as a deviation of 0. So technically they are counted in this formula, just as 0.

Answer 50

Coefficient of variation CV = S / x-bar Sample standard deviation over the arithmetic mean where x-bar >0 For returns, CV measures the risk per unit of return It allows for direct comparisoin of dispersion across different datasets (different orders of magnitude)

Answer 51

Empirical probabilities are based on historical observation. The past is assumed to be representative of the future (not necessarily true). The historical period must include occurrences of the event Subjective probabilities involve adjusting an empirical probability based on an intuition or experience. We may do this when there is a lack of empirical observations, or to make a personal assessment A priori probabilities involve arriving at a conclusion based on deductive reasoning. I.e., if a die has 6 sides the probability of rolling 6 is 1/6. This is perhaps the most objective method of estimation

Answer 52

2 ways: if there is a 10% chance: odds: 1 to 9 chance (for every 1 occurence we expect 9 non-occurence) 9 to 1 chance of non occurence. probability: 1 in 10 chance (out of 10 instances we expect 1 occurence)

Answer 53

Unconditional probabiltiy is P(A) Conditional probability is P(A │ B) Probability of A occuring given B We could also illustrate this as a venn diagram. Conditional probability is the intersection between the A circle and B circle. Unconditional probability would just be the value of the whole A circle The mhltiplication rule means that P(AB) = P(A│B) x P(B) Also therefore P(A│B) = P(AB) / P(B)

Answer 54

A and A complement is the probability of A + the probability of not A Therefore A + A complement = 1

Answer 55

P (A or B) = P (A v B) = P(A) + P(B) - P(AB) Since we are double counting otherwise

Answer 56

2 events are independent iff: P (A│B) = P(A) or P(B│A) = P(B): Knowing B tells us nothing about A A dependent event is where P(A) is related to P(B) e.g., A = stock Q rises B = SP500 rises A is most likely dependent on B

Answer 57

A combination is the number of ways of selecting r objects from n where order does not matter nCr = (^n r) = n! / ((n - r)! x r!) A permutation is the number of ways of selecting r objects from n where order does matter nPr = n! / (n - r)! So we don't divide by r! for permutations since there are more possibilites A recombining lattice is like a probability tree that joins up This is used for permutations And in FA in asset price moves

Answer 58

Specifies the probabilities associated with the possible outcomes of a random variable

Answer 59

Uniform, binomial, normal, lognormal, Student's (named after a person called Student!), chi-square, or F-distribution Most distributions will look like one of these 7 So when we see a distribution we can say it is an "approximately normal" or "approximately chi square" distribution This is useful because each of these common distributions has well-defined mathematical properties, which we can then use to analyse and interpret our data

Answer 60

A random variable is a quantity whose future outcoems are uncertain. It can be either - Discrete: take on at most a countable number of possible values (possibly infinite) - Continuous: cannot count the possible values Every random variable is associated with a probability distribution that describes the variable completely

Answer 61

Specifies the probabilities that a random variable can take For discrete variables we would use p(x) For continuous variables we would use the probability density function The probability function has two key properties. 1. 0 =< p(x) =< 1 (any given probability within the data must be between or equal to 0 and/or 1 2. sum p(x) over all values of x equals 1. That is, if you add up all the values beneath the probability function they should add to 1

Answer 62

Cumulative distribution function Gives the probability that a variable X is less than equal to a particular value x Can be used for percentile rank for example It is a slope that goes from 0 to 1 (0 to 1 being on the y axis)

Answer 63

All outcomes are equally likely The probability distribution is a rectangle Thus length x width = 1 It will look like stairs of equal height and width as a CDF

Answer 64

The same as a discrete uniform distribution but with a continuous random variable Also a rectangular probability distribution An even slope upwards as a cumulative distribution function

Answer 65

One based on the outcome of a trial which produces one of two outcomes (binomial outcomes), interpreted as 1 or 0 p(1) = p p(0) = 1 - p In n trials, we can have 0 to n successes If each trial is a random variable, then the number of successes in n trials is also a random variable, known as a binomial random variable

Answer 66

The number of successes in n Bernoulli trials Assumption: 1. p is constant for all trials 2. Trials are independent A binomial random variable has a distribution completely described by 2 parameters x ~ B(n, p) - To find how many successes (x ) are in n trials we can use nCr Because the order doesn't matter -[ When we ask how probable is it to have x successes in n trials we can do: p^x (1 - p)^(n - x) - We multiply nCr by this to get the probability distribution function for a binomial random variable n! / ((n - x)! x!) * p^x (1 - p)^(n - x)

Answer 67

If we continue counting up past the mid point of the probability distribution we would misinterpret it Such that we would deduce that achieving the top figure has a 100% chance We have to count in the direction from the centre toward to tail

Answer 68

For Bernoulli, mean = p variance = p (1 - o) For Binomial, mean = np binomial = np (1 - p)

Answer 69

The distribution of a large number of independent random variables with finite variance is approximately normal Let's say we take a whole bunch of samples of random variables that are not related to each other and find their means The distribution of these means will be approximately normal The central limit theorem tells us that because of this result a lot of data tends to be normally distributed

Answer 70

A distribution where we have set the mean to 0 and standard deviation to 1 We may want to standardise our values (if they fall into an approximately normal distribution) and turn it into a standard normal distribution to allow data processing (using things like ML) and cross comparison

Answer 71

We use a normal distribution to model continuously compounded asset returns We do not use it to model asset prices because the left tail of a nd goes to negative infinity, whereas asset prices go to 0 Asset returns are approximately normally distributed, so we can use nd to model ("close enough") However asset returns tend to be more kurtotic than normal (longer tails), and options add skew (pos/neg) There is a lot more of this at L3

Answer 72

A normal distribution has these 3 characteristics: 1. Described by 2 parameters, mu and sigma squared (population variance). The formula is X ~ N(mu, sigma squared) 2. Skew = 0 and kurtosis = 3 (K sub-c = 0). Therefore median = median = mode 3. A linear combination of 2 or more normal random variables is also normally distributed. So R sub-p = w sub-1 R sub-1 + w sub-2 R sub-2 + w sub-3 R sub-3 .... is also nd, althought it is multivariate. Each of these terms is a univariate random variable

Answer 73

1. All the mean returns of all the individual securities (n returns) 2. All the securities' variances (n variances) 3. All pairwise correlations. There are (n^2 - n)/2 unique correlations Usually in PM we do this at the asset class level, because if we did this at the level of individual securities it could quickly become unmanageable

Answer 74

95% = 1.96 standard deviations for a normal distribution 99% = 2.58 standard deviations for a normal distribution

Answer 75

We need to set mean to 0 and standard deviation to 1 Let's say we have a distribution of n=30 Our mean is 4.7 Our standard deviation is 3 For each obseration, we calculate z (the standardised value) as: z = x sub-i - x-bar / sigma z = 7.2 - 4.7 / 3 = 0.8333

Answer 76

If we have a z-value and excel it's easy We can use the function NORM.S.DIST(z, 1) Where z is the z-value And 1 means that we use a cumulative probability function rather than a probability density function The output will be the probability from 0 to 1 If we want a z-value out, we can use NORM.S.INV(probability). This will output a z value from 0 to infinity, but most z values fall between 0 and 3. z-values can also be negative, but because the normal distribution is symmetrical we don't need to worry about this

Answer 77

NORM.S.INV(0.95) willl return the z-value for 95th percentile 1 - NORM.S.INV(0.95) = 5th percentile

Answer 78

We can express this as: P(12% =< R sub-p =< 20%) We calculate the z-value as: z = (x sub-i - x-bar) / sd so z = ((20-12)/22) = 0.3636 and z = ((12-12)/22) = 0 Then to find the probabilities we use the NORM.S.DIST function and subtract one from the other: NORM.S.DIST(0.3636, 1) - NORM.S.DIST(0,1)

Answer 79

Student's t distribution has fatter tails than the normal distribution (excess kurtosis / platykurtic) Therefore if something is significant in a t-test it will definitely be significant in a z-test

Answer 80

Sample size minus 1 (or n - 1) As degrees of freedom increases the tails of the t distribution are pulled in and added to the head, such that it converges to a normal distribution over n=200 Thus theoretically we would use a t test for small n values (below 200) and a z-test or normal distribution test for values above 200 However in practice we just use t really

Answer 81

z = (x-bar - mu) / (sigma / standard error) where mu and sigma are population parameters. As such, only 1 estimate is used t = (x-bar - mu) / (sigma / standard error) Where x-bar and s are sample statistics. As such, 2 estimates are used T-tests are used for hypothesis testing since they are more conservative, more stringent, and produce wide confidence intervals

Answer 82

A distribution of variance The interesting thing about variance is you can't have a negative value, because it's deviations squared. Like log normal, it is bounded below by 0. Variance follows a very particular distribution, depending of number of parameters used to arrive at the distribution. The distribution of variances is: the sum of the squares (of deviations) of k independent standard normally distributed random variables. Degrees of freedom is n - 1, same as t-distribution Because variance cannot be negative, the distribution flattens out. And with low degres of freedom (2, 3) the distribution gets pushed up against the y-axis As such, as degrees of freedom increase, the distribution becomes more symmetrical and bell-shaped (though flattening)

Answer 83

Bounded below by 0 like the chi square distribution Because it is the ratio of 2 chi square variables F = ((chi square sub1)/n sub1 - 1) / ((chi square sub2) / n sub2 - 1) By convention, the larger figure is used as the numerator on top F test is used in regression to test the significant of the whole regression. It is explained variance divided by unexplained variance. The higher the number is, the better the model is explaining all total variance.

Answer 84

CHISQ.DIST(chi squared value, degrees of freedom) Input is chi square value, output is a probability CHISQ.INV(p, degrees of freedom) Input is a probability, output is a chi square value

Answer 85

T.DIST(t-value, degrees of freedom, 1) Use 1 to specify cumulative distribution function. Input is a t-value, output is a probability T.INV(p, degrees of freedom) Input is a probability, output is a t-value

Answer 86

F.DIST(F-value, df for numerator, df for denominator, 1) Input an f value and the degrees of freedom of the two variables, output is a probability F.INV(p, df1, df2) Input a probability and degrees of freedom for the variables, output is an f-value

Answer 87

A formula used to estimate a statistic (ie variance)

Answer 88

1. Unbiasedness: an unbiased estimator is one whose expected value (the mean of its sampling distribution) equals the parameter it is intended to estimate An unbiased estimator would be one where xbar = sum of xsubi / n xbar = sum of xsubi / (n-1) would be biased upwards because it would increase the estimate of the mean upwards by 1 2. Efficiency: an unbiased estimator is efficient if no other unbiased estimator has a sampling distribution with smaller variance A more efficient estimator will have a taller head and thinner tails (even though both are unbiased 3. Consistency: a consistent estimator is one for which the probability of estimates close to the value of the population parameter increase as the sample size increases For example, if our estimation of Standard Error was SE = S/sqrt(n) this would be a consistent estimator. Because as n increases standard error should decrease

Answer 89

A range for which one can assert with a given probability (1-alpha), called the degree of confidence, that it will contain the parameter it is intended to estimate I.e., lower limit <- xbar -> upper limit This is a two sided confidence interval

Answer 90

An estimate for what a parameter is

Answer 91

1. Probabilistic: in repeated sampling, 95% (for example) of such CIs will in the long run include or bracket the population mean 2. Practical: 95% confident that a given CI contains the population mean

Answer 92

Take the point estimate (xbar) Add or substract the reliability factor, multiplied by the standard error The reliability factor can be based on a z value or a t value The standard error is sigma / sqrt(n) or s / sqrt(n) if you only have sample variance If you multiply reliability factor x standard error by 2 you get the confidence interval, as it is plus minus

Answer 93

90% confidence interval: 1.65 rf 95%: 1.96 99%: 2.58

Answer 94

z, because as sample size increases t increases i.e., if n=400 we would just use z The reading tends to say over n=30 we would stop using t, but over 200 or 300 is where they converge. A "large sample size" is not really 50. You can never be WRONG when using the t value because of the convergence

Answer 95

=T.INV(probability, degrees of freedom) gives you the t value or the negative t value

Answer 96

- You have to know the population variance - The population has to be either normally distributed OR your sample large

Answer 97

Let's call this E: xbar +/- ( t x s/sqrt(n) ) The width of the confidence interval will be 2E Thus we can rearrange to: n = [ (t x s) / E]^2 We would not expect standard deviation for the sample to change as n changes, but we would expect standard error to change.

Answer 98

The bias of searching a data set for statistical patterns or relationships. This is also known as data mining. If alpha = 5%, testing 100 different variables, on average, will produce 5 significant relationships Data snooping is typically not theory-driven, and lacks an economic rationale behind it.

Answer 99

1. To combat data snooping bias we must have a clear, well-formulated hypothesis. It must have an economic rationale and accompanying theory behind it. 2. We split our data set into a training data set, a validation data set, and test data. - The training data is used to build and fit a model - The validation data set is sed to fit and tune the model. - The test data is used as an out-of-sample test to evaluate model fit. If data snooping is present, there will be insignificant model fit!

Answer 100

Excluding some observations or time periods (basically choosing non-random samples) i.e., survivorship bias: historical data may only include data for companies that survived This would overstate the performance. Another example would be using hedge fund indexes. Since they self-report, only well-performing funds may opt to report.

Answer 101

Using information that was not available on the observation date. I.e., models that use price and accounting data from the historical record, when the accounting data may not have been available on the same date. For example, we can observe the price on Dec 31st, and book value on Dec 31st, but in fact BV may not have been reported until mid February. Linking BV and price on Dec 31st would be look ahead bias.

Answer 102

Results in one time period may be specific to that time period. Time period bias is typical of SHORT time series However, time series that are too long risk including more than one regime or distribution

Answer 103

The process of making judgements about a larger group (population) based on a smaller group (sample) E.g. hypothesis testing. Test to see whether a sample statistic is likely to come from a population with the hypothesised value of the population parameter. i.e., does xbar = Msub0 ?

Answer 104

A statement about one or more populations that are tested using sample statistics Process: 1. State the hypothesis 2. Identify the appropriate test statistic 3. Specify the level of significance 4. State the decision rule 5. Collect data and calculate the test statistic 6. Make a decision

Answer 105

Null (Hsub0) is assumed to be true, unless: Alternative (Hsuba) Typically we WANT to reject Hsub0 So we may hypothesise that our mean is greater than the population mean. xbar > Msub0 What we do is rule out that xbar =< Msub0 If we are successful then we reject Hsubo, thus "proving" our hypothesis A two-sided test (two-tailed) is one where we can reject the null either side. An Hsuba where mu /= 6% would be a two tailed test. We might rule out mu = 6% if it is less than 5.5 or 6.5% or greater. A one-sided test (left or right taled test) is one where we can reject the null n one side. A one sided side where HsubA is mu > 6%, for example, would be a right tailed test A one sided test where HsubA is mu < 6% would be a left tailed test The null will always contain the equality sign. We test at the point of equality

Answer 106

If population variance is known (sigma squared) then we can use the z test z = [xbar - musub0] / [sigma / sqrt(n)] If population variance is unknown, we will default to a t test t = [xbar - musub0] / [s / sqrt(n)]

Answer 107

The level of significance depends on the SERIOUSNESS of making a mistake. Usually we use 5% or 1%. We might use higher in social sciences and lower in physics There are two types of mistake you can make: false positive. If I determine someone is pregnant when they're not false negative. If I determine someone is NOT pregnant when they in fact are. We can always decrease the likelihood of a Type 1 error by decreasing alpha (the significance level). However, as alpha decreases, beta increases. If we reduce the likelihood of Type 1 errors, we increase the chance of Type 2 errors (false negative) The ONLY WAY to reduce both is to increase n (sample size). This is because we are decreasing the denominator. Thus the t-stat becomes larger. To understand type 1, type 2, reject, do not reject, true hypothesis, false hypothesis, beta, and alpha, just draw out a little grid on your piece of paper. This makes things much easier.

Quantitative methods (legacy PREREQ1-LM1) Flashcards

(131 cards)