Midterm Flashcards

Question 1

Q

What are Bollinger Bands?

Answer

A

+/- 2 sigma from the rolling mean. When you see excursions outside of these limits it may be a buy or sell signal. Specifically, when you break through and then come back within that limit. Outside to inside, back toward the moving avg.

Question 2

Q

What are Daily Returns and how do you calculate it?

Answer

A

One of the most important stats.

( price[t] / price[t-1] ) - 1

It’s a percentage return for each day.

Most revealing when comparing daily returns against multiple stocks.

Question 3

Q

What are cumulative returns?

Answer

A

Total % change in stock from the beginning to some point in time.

cum_return[t] = (price[t] / price[0]) - 1

(todays price / price at the beginning) - 1

Question 4

Q

What methods are there to fill in missing data?

Answer

A

Fill forward. Use the last known value and fill it forward until the next known value. At the beginning, fill backwards. (fill forward first, fill backward second)
We do this because interpolating would be like peeking into the future.
So the process is always fill forward first, and then fill backwards.

Question 5

Q

What type of distribution do stocks typically show?

Answer

A

A gaussian distribution.

Question 6

Q

What is kurtosis and how is it useful?

Answer

A

It’s a comparison of your distribution of data to a gaussian distribution. A positive value indicates that your tails are ‘fat’, meaning higher than a normal gaussian would look like. Negative kurtosis would indicate fewer excursions at the tails.

Summary:

Positive value: fat tails
Negative value: skinny tails

Question 7

Q

What is alpha and beta (with respect to a linear line fit of scatterplot data)?

Answer

A

Beta - slope - how reactive is the stock?
Alpha - y-intercept. > 0 means stock on Y axis is doing better than stock on X axis, generally.

This is generally compared to SPY, or ‘the market’, but it can be compared against any other stock.

Question 8

Q

How does the slope (beta) correlate two stocks?

Answer

A

It doesn’t! Slope is just the slope. Correlation is a measure of how tightly the data fits the line (like an R squared value)

Question 9

Q

How do you calculate your portfolio value?

Answer

A

Start with DF with rows = days and columns = ticker
DF = DF / DF[0] -> normalize each column. First row is now 1.0 for each column
DF = DF * alloc -> multiple each col by it’s allocation. First row is now allocation
DF = DF * Start value -> This gives the value of each stock on each day, based on the allocation (position values)
df.sum(axis = 1) - Gives total portfolio value each day

Question 10

Q

What are the four key stats for a portfolio?

Answer

A

They are based on the daily returns (don’t forget to remove the first row after calc daily returns b/c it’s 0!)

Cumulative return
Avg Daily Return
Std Daily Return
Sharp Ratio

Question 11

Q

How do you calculate the avg daily return?

Answer

A

daily returns.mean()

Question 12

Q

How do you calculate the std daily return?

Answer

A

daily returns.std()

Question 13

Q

What is and how do you calculate the sharp ratio?

Answer

A

A metric that adjusts return for risk (eg, volatility). All else being equal:

lower risk is better
higher return is better

A higher sharpe ratio is better.

Sharp ratio also considers risk free rate of return: interest rate on your money in a risk free asset like bank account or short term treasury bonds.

Expected(Portfolio Return - Risk Free Return) / std(portfolio - risk)

mean(daily_returns - daily_risk_free) / std(daily_returns - daily_risk_free)

B/c the daily_risk_free shortcut evaluates to a constant, you can remove that value from the std and the final equation becomes:

SR_sampled = mean(daily_returns - dail_risk_free) / std(daily_returns)

… Then you finally add in the annual adjustment factor to get:

SR_annual = sqrt(# samples per year) * SR_sampled

Question 14

Q

Where can you get the values of the Risk Free Rate?

Answer

A

LIBOR
3month T-Bill
0% …

Typically the daily risk free rate is calculated as follows:

daily_risk_free = root252(1 + APY) - 1

Question 15

Q

How many trading days are there for year?

Question 16

Q

Does it matter how frequently you calculate the sharp ratio?

Answer

A

Yes! It ill vary wildly depending on how often you sample. It was envisioned to be calculated annually.

SR_annualized = K*SR
K = sqrt(# samples per year)

So:
daily_k = sqrt(252)
weekly_k = sqrt(52)
monthly_k = sqrt(12)

It doesn’t matter how many data points you have! only the sample frequency!

daily, weekly, monthly

Question 17

Q

What is a basis point?

Answer

A

1/10000

so 10 bps = 0.001

Question 18

Q

What can you do with an optimizer?

Answer

A

Find minimum values of functions
Build parameterized models based on data (eg, find the optimum parameters for a model)
Refine allocations to stocks in portfolios

Question 19

Q

What are convex problems?

Answer

A

If you can draw a line between any two points and there is a segment of the function above the line between the points, it’s non-convex.

It’s only convex if all function values between any two points lie below a line between those two points.

For function to be convex, it must have one local minima (eg, global minima). It should also have no flat spots.

Not guaranteed to find the global minima if it’s a non convex problem.

Question 20

Q

What is instance based vs parametric learning?

Answer

A

Parametric - Finds parameters for a model (linear regression). You don’t need the original data. Training is slow, but prediction is fast.

Instance Based - KNN - you store all the data and consult it for a solution. Training is fast (you just store data), but prediction is slow.

Question 21

Q

What is backtesting?

Answer

A

You role back time and test your system, provide some training data and then test how your system would trade, knowing what the stock prices actually did. This process repeats over and over.

Usually performance in the real world isn’t as good as what is shown in back testing.

Question 22

Q

What are the problems with regression?

Answer

A

Noisy and uncertain
challenging to estimate confidence
holding time (how long you should hold), allocation

Question 23

Q

What is out of sample testing?

Answer

A

You test on data that you did not train on.

Question 24

Q

How does cross validation work?

Answer

A

Split data into N chunks, then 1 chunk is test and the rest is train. Then swap which chunk is the test chunk. Repeat this until you’ve tested all chunks. Take an average of the results.

This isn’t necessarily a good method for financial data though, because you can be peeking into the future.

Question 25

Q

What is roll forward cross validation?

Answer

A

This is similar to cross validation, except you make sure your test data is always ahead of your training data. If you have 5 chunks, you may do something like the following:

Train 1:, Test 2
Train 2: Test 3
Train 3: Test 4
Train 4: Test 5

Question 26

Q

Describe correlation

Answer

A

You’ve got XTest, YTest data. Then you make predictions on this test data. Now you calculate the correlation between the YTest and YPredict to see how well correlated they are.

Values are -1 (inversely correlated) -> +1 where -1 or +1 is strong positive or negative correlation. Closer to 0 is light the shotgun blast. (this is NOT the slope of a line of the data in a scatter plot with YTest vs YPred. More like a tight oval is high correlation and big circle is poor correlation)

Can use np.corrcoef or df.corr(method=’pearson’)

Question 27

Q

Describe overfitting

Answer

A

When in sample error is decreasing and you start to see out of sample error increasing.

Question 28

Q

What are ensemble learners?

Answer

A

Train multiple learners on the same data set (may be different types of learners). Feed each learner the X value you want to predict, then average the output (for regression) or do a majority vote for classification.

Question 29

Q

Describe algorithm bias.

Answer

A

What an algorithm is biased toward. For instance, a linear regression learner is biased toward being linear.

Creating an ensemble learner tends to smooth out the individual biases.

Question 30

Q

What is bootstrap aggregating?

Answer

A

AKA Bag learning. With your training data, you create m new models, each with a subset of the original data (sample with replacement).

So if you’ve got 100 data points in your training set, new training set also has 100 data points, but there could be repeats from the training data since you sample with replacement.

Then you query each model, collect the outputs and the mean() is the prediction.

Question 31

Q

What is Boosting (Ada Boost; Adaptive Boosting)?

Answer

A

Adaptive Boost

Starts like bagging. Sample with replacement to create a new data set and train your first learner.

Test your model against your training set and calculate error.

Create next new data set, sampling with replacement, but the probability of choose a sample is now weighted by the error of the previous models results. This means you’re more likely to choose data points that have high error.

Now you test the current ensemble against your training set and calculate the error. Then repeat for the next bag.

So it’s like bagging except that for each new bag, you’re testing on more of the data that you don’t perform well on.

Question 32

Q

What is the adjusted close vs close?

Answer

A

Close is literally just the close, adjusted close is adjusted for things like splits and dividend payments, etc.

Question 33

Q

How do you set the index to a date range in pandas?

Answer

A

df = pd.DataFrame(index=pd.date_range(start, end))

where start, end are date strings (YYYY-mm-dd)

Question 34

Q

How do you compute a rolling metric, such as the mean?

Answer

A

pd. rolling_mean(df[‘col’], window=N)

pd. rolling_std(df[‘col’], window=N)

Question 35

Q

How do you calculate the daily returns in numpy?

Answer

A

dr = ( (df / df.shift(1) ) - 1) [1:]

You return from the first value onward since your first value will be NaN.

You can also do:
dr[1:] = (df[1:] / df[:-1].values) - 1
df.ix[0, :] = 0

Must use values to make sure it does element wise division and doesn’t try to use index!

Question 36

Q

How do you fill missing values with pandas?

Answer

A

df. fillna(method=’ffill’)
df. fillna(method=’bfill’)
df. fillna(0)

Question 37

Q

How do you add horizontal and vertical lines to a plot?

Answer

A

plt. axvline(value, color, linestyle)

plt. axhline(value, color, linestyle)

Question 38

Q

How do you plot a histogram in pandas?

Answer

A

df[‘col’].hist(bins=20)

Question 39

Q

How might you use a scatter plot to compare stocks?

Answer

A

Normalize each stock, then plot one stock on each axis. Then you can look at the correlation between the stocks.

Question 40

Q

How do you do a scatterplot in Pandas?

Answer

A

df.plot(kind=’scatter’, x=’col1’, y=’col2’)

Question 41

Q

How do you create a linear regression in numpy?

Answer

A

np.polyfit(x_s, y_s, degree=1)

Question 42

Q

How do you use the spo minimizer?

Answer

A

spo.minimze(func_to_min, guess, args=(data,))

args is what gets passed to func_to_minimize as the constants

first arg to your func should be the things to find

to maximize something simply negate the func_to_minimize return value.

Question 43

Q

What are the three broad classes of funds?

Answer

A

ETF: Exchange Traded Funds

buy/sell like stocks
represents baskets of stocks
transparent (publish their investments)
very liquid

Mutual Fund:

buy/sell at end of day only
quarterly disclosure of investments
so less transparent
(they have stated goals, eg, track some fund, etc)

Hedge Fund

buy/sell by an agreement only
no disclosure of investments
no transparency

Question 44

Q

What does liquid mean wrt to stocks?

Answer

A

How easy it is to buy/sell a given stock. How easy it is to get into and out of it.

Question 45

Q

What are Large Cap stocks?

Answer

A

of outstanding shares * price/stock

Large Capitalization

Question 46

Q

How do you know what type a fund is based on it’s name/ticker?

Answer

A

ETF: 3-4 letters
Mutual Funds: 5 letters
Hedge Funds: Some fund name

Question 47

Q

What does AUM mean?

Answer

A

Assets Under Management - total value of the portfolio being managed.

Question 48

Q

How are fund managers compensated?

Answer

A

ETFs: Expense Ratio of 0.01 - 1.0 %
Mutual Funds: 0.5 - 3.0%
Hedge Funds: “Two and Twenty” 2% of AUM + 20% of profits

Question 49

Q

What is the expense ratio?

Answer

A

some % * AUM. How mangers may be paid.

Question 50

Q

What are some metrics used to assess hedge funds?

Answer

A

Cumulative Returns
Volatility (std(daily_returns)
Risk Adjusted Reward (Sharpe Ratio)

Question 51

Q

What does a stock order look like?

Answer

A

Buy (bid)/Sell (ask), Symbol, # Shares, Limit/Market, Price

Question 52

Q

What is Limit vs Market for a stock order?

Answer

A

Market: Willing to accept the current market price (you don’t specify price for market order)
Limit: You don’t want to do any worse than the specified price.
- buy: no more than X price
- sell: no less than X price

Question 53

Q

What are the ways orders can be exchanged?

Answer

A

Buyer -> broker -> exchange
broker -> broker (doesn’t hit exchange)
broker -> dark pool -> broker (doesn’t hit exchange)

Question 54

Q

How can hedge funds exploit the market?

Answer

A

1) Order Book Exploit: colocated hedge fund systems observe order book with faster response time. You, a long ways away see a delayed version of the book. They can buy and sell stocks really fast. (look at the High requency training vid)
2) Geographic Arbitrage Exploit: colocated server at different geographical exchanges. Look at diff in stock value at two locations. If diff exists, sell in the high place and buy in the low.

Question 55

Q

What other types of order types exist?

Answer

A

Exchanges:
Buy Sell
Market Limit

Broker: Broker holds order until conditions are met and then fulfills it
Stop Loss: when stock drops to certain price, sell it
Stop Gain: When stock reaches certain price, sell it
Trailing Stop: combination of stop loss w/ dynamic value. 10cents behind price, so sell if it ever drops by > 10 cents, otherwise let it go up.
Selling Short: negative position. Borrow from someone, wait for stock to change, sell it, then re-buy the stock and give it back to the borrowee.

Question 56

Q

What are the different ways to evaluate the ‘True’ value a company?

Answer

A

True Value:

Intrinsic Value
Book Value
Market Cap

Question 57

Q

What is the intrinsic value of a company?

Answer

A

As estimated by future dividends. If we own 1 share of stock, what is the value of all the dividends we will ever get into the future for that share of stock.

Question 58

Q

What is the book value of a company?

Answer

A

Value based on a companies assets and debts.

Total assets - intangible assets - liabilities.

You don’t count intangible assets b/c they are both an assets and intangible asset.

Question 59

Q

What is the Market Capitalization of a company?

Answer

A

Value of a company based on values of stock on market and how many shares are outstanding. Eg, prices * # of outstanding shares (# owned by people).

Question 60

Q

How do you calculate the present value?

Answer

A

Future Value / (1 + interest_rate) ^ i

where i is the number of compounding periods into the future (years, for instance).

Example: If interest rate = 1%, and you want to know what is the value of $1 that will be paid to you in 1 year, what is the present value of that dollar?

1 / (1 + 0.01)^1 = 0.99. So today that $1 is only worth $.99

That interest_rate (really a discount rate when calculating PV), could be the rate of your next best alternative. That would pay for the opportunity cost.

Question 61

Q

What is a discount rate and how does it compare to interest rate?

Answer

A

It’s the interest rate in the PV = (1 + ir)^i. With respect to bonds and intrinsic value, the ir is really the discount rate. It’s higher if you trust the company less, or the company is more risky.

Interest Rate is used when givena PV and want to figure out FV.
Discount rate if use when we have a known or desired FV and want to compute the corresponding PV.

Question 62

Q

How do you calculate the intrinsic value of a company?

Answer

A

PV = FV / (n-1), where n = (1 + ir), so the PV = FV / discount_rate. So it’s dividend / discount rate.

Question 63

Q

What kind of news can affect stock prices?

Answer

A

Company specific news
Sector News: affects a specific sector
Market wide news

This kind of news frequently reduces the view of potential dividends and thus the intrinsic value.

Question 64

Q

How do you define a portfolio and calculate it’s returns?

Answer

A

Portfolio weights must sum to one, even if you short a stock
return = sum(w_i * r_i(t))

Answer 64

A

The weight of an individual stock within a market is:

market_cap_i / sum(market_caps)

Answer 65

A

r_i(t) = beta_i * r_m(t) + alpha_i(t)

return of stock i at time t = beta of i * return of market at time t + alpha of i at t

beta_i *r_m(t) = market influence
alpha_i (t) = residual

When multiplied out, you get:

sum(w_ibeta_i)r_m + sum(w_i * alpha_i)

Answer 66

A

Expectation of alpha => 0

Answer 67

A

passive: buy an index portfolio and hold

- active: pick stocks

Answer 68

A

You multiple each CAPM stock equation by it’s weight and add them all up.

Alternatively you can calculate a porfolio beta:
B_p = sum(w_i * beta_i)
Then use it as:

r_p(t) = B_p * r_m(t) + alpha_p(t), but CAPM says alpha => 0

Answer 69

A

Passive agrees with CAPM that alpha => 0 (expectation)

- active says alpha_p = sum(w_i*alpha_i(t))

Answer 70

A

You want beta to follow the market. So in upward markets you want large beta and in downward markets you want smaller beta.

Answer 71

A

Split up the CAPM beta into the various sectors instead of just having a ‘market’ beta.

So each can be different betas for each sector.
beta_tech * return_tech + beta_energy * return_energy, etc.

(Not going to use this in this class)

Answer 72

A

Remove the market impact by making your sum(w_i * beta_i) = 0 so r_m doesn’t matter.

May not always work b/c beta is always changing and may not be a perfect estimate. But in theory, it wouldn’t matter which way the market goes. You’d still make yours.

Answer 73

A

Looks only at price and volume ONLY!!

You can compute indicators from these which are heuristics.

Answer 74

A

Looks at earnings, dividends, book value, cash flow, etc.

Answer 75

A

Individual indicators are weak
Combination of many indicators are stronger
You look for contrasts (stocks that have different indicators)
Generally work over shorter time periods than longer time periods

Answer 76

A

Momentum - over some number of days, how has the price changed? (up or down)
Simple Moving Average (SMA) - just what it sounds like
Bollinger bands - explained elsewhere

Answer 77

A

(price[t] / price[t-n:t].mean()) - 1

price of t divided by mean of last n days.

Answer 78

A

( price[t] - SMA[t] ) / 2 * std(t)

Answer 79

A

SMA: -0.5 : 0.5
Momentum: -0.5 : 0.5
BB: -1 : 1

You would normalize them because you don’t want any one field to ‘outweigh’ the others.

Answer 80

A

DT is just a single tree
DF is many trees. You query all of them to get an average return.

TODO: Lookup in code about this as well

Answer 81

A

Roll back time and then compare to known data.

Train on a set of data
Enter positions based on trained data
train on next set of data
Enter positions based on trained data
etc…

Answer 82

A

noisy and uncertain
challenging to estimate confidence
holding time and allocation (how long should you hold, and how should you allocate)

Answer 83

A

Weight the contributions of each datapoint based on their distance from the point of interest.

Answer 84

A

In Sample - testing on your training data

- Out of sample - testing on your non-training data

Answer 85

A

sqrt( sum( y_test - y_predict)^2 / N)

Tends to emphasize larger error a bit more

Answer 86

A

You slice it up into rolling chunks.

Train on 80%, test on 20%. Then shift by a chunk, train on 80% and then test on 20%.

Answer 87

A

You can’t peak into the future for time series, so you would only ever train on a single chunk then test on the next adjacent chunk. Then you shift until you’re last test chunk is at the end.

Answer 88

A

When the in sample error is decreasing while the out of sample error is increasing.

Answer 89

A

You use many different learners, have each one do a prediction, then average the total prediction.

Answer 90

A

Lower error and less overfitting. Tends to reduce the overall biases.

Answer 91

A

(AKA bootstrap aggregating)

You have many of the same learner type. You create a new data set from the training data for each learner with replacement and train that learner on the dataset. Then when you predict you have each one predict the value using their model and then average the output.

Answer 92

A

Like bagging, but each successive learner you train focuses on the data the previous model performed poorly on.
Next bag is random like bagging, but higher weights are placed on samples that you performed poorly on.
After each bag is created you have to test it on the training data to see which samples it performed poorly on.
Ada boost may be more prone to overfitting because it focuses on datapoints that aren’t predicted well in the training data.

Answer 93

A

Look for when you’ve got strong momentum and the SMA crosses over the price. This is especially true over a longer SMA window.

Answer 94

A

( price[t] / price[t-n] ) - 1

Where n is the number of days to look over.

Typical values are within the range: -0.5 to +0.5

Answer 95

A

You can compute the number of years it takes to double something via:

N years = 72 / interest rate

Answer 96

A

The difference between the buy and sell prices.

High trading volumes tend to indicate there is a very small spread and the market is very liquid.

Large spreads may be an indicator that the market is freezing up (low liquidity). Low volumes in general tend to produce high spreads.

Answer 97

A

When a broker issues trades in advance of those of a client, knowing the price movements that will probably occur when the clients orders are executed.

It’s not legal to do to your own clients, but you can do it in advance of another brokers client.

Answer 98

A

Market Cap / Book value

also: share price / book value per share

Answer 99

A

The sum of the discounted value of your money over a period of time. Example:

dividend = $1
interest rate you could get on your money: 0.08 %
number of years: 3

1 / (1 + 0.08) + 1 / (1 + 0.08)^2 + 1 / (1 + 0.08)^3 = $2.58

Each successive year’s dividend is worth less than the previous.

Answer 100

A

Asset based is book value. This is typically a lower bound, but during ‘bad’ times (liquidation, bad market), it could actually be lower. It ignores future earnings.

Cash flow based relate to dividends paid into the future.

Adding the two together can show the most complete view.

Answer 101

A

book value / market cap

Brainscape's Knowledge GenomeTM

Midterm Flashcards

Brainscape's Knowledge Genome^TM