Midterm Flashcards

1
Q

What are CSV files?

A

CSV - Comma Separated Values

Header Row, separated by commas

Data Rows, separated by commas

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Which field would you expect to see in a CSV file of stock data?

  • # of employees
  • Date/Time
  • Company Name
  • Price of the stock
  • Company’s hometown
A
  • Date/time
  • Price of the stock
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does real stock data look like?

A

Header: Date, Open, High, Low, Close, Volume, Adjusted Close

Close - closing price reported at exchange

Volume - volume sold

Adjusted Close - Number data provider provides based on stock splits and dividend payments. The rate of return looking back with adjusted close should be larger.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a data frame?

A

Columns represent the stock symbols ( Separate dataframes can have different dimensions of data AdjClose, Volume, Close, etc..)

Rows represent time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What pandas code would allow you to print the first or last 5 rows of the DataFrame df?

df = pd.read_csv(“data/AAPL.csv”)

A

First 5 rows:

print df.head()

Last 5 rows:

print df.tail()

Last n rows:

print df.tail(n)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How do you review specific rows in a data frame betwen random values? For example between rows 10 to 20?

df = pd.read_csv(“data/AAPL.csv”)

A

print df[10:21]

Note that the second number is not inclusive in the range

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How do you compute the max closing price for a stock using pandas?

df = pd.read_csv(“data/{}.csv”.format(symbol))

A

max value = df [‘Close’].max()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How do you compute the mean volume for a symbol?

df = pd.read_csv(“data/{}.csv”.format(symbol))

A

Mean = df[‘Volume’].mean()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How would you plot the adjusted close of the following data?

df = pd.read_csv(“data/AAPL.csv”)

print df [‘Adj Close’]

A

df [‘Adj Close’].plot()

plt.show()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Select the ‘High’ column from the dataframe and then plot it.

df = pd.read_csv(“data/XXX.csv”)

A

print df [‘High’]

df [‘High’].plot()

plt.show()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How do you plot two columns, such as ‘Close’ and ‘Adj Close’

df = pd.read_csv(“data/AAPL.csv”)

A

df [[ ‘Close’, ‘Adj Close’] ]. plot()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How many days were US stocks traded at NYSE in 2014?

A

252

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is S&P 500 and what is SPY?

A

S&P 500 - Stock Market Index based on 500 large American companies listed on the NYSE or NASDAQ. Essentially a weighted mean of the stock prices of the companies

SPY - SPDR S&P 500 - An ETF (Exchange-Traded Fund) that tracks the S&P 500 index

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How do you create an empty data frame (df1) with a given datetime range?

start_date = ‘2010-01-22’

end_date = ‘2010-01-26’

A

dates = pd.date_range(start_date, end_date)

df1 = pd.DataFrame( index = dates)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Using an empty data frame (df1) in a specified daterange, how do you join df1 to a data frame for SPY (dfSPY)

df1 = pd.DataFrame( index = dates )

A

Ensure first that the SPY dataframe is indexed with the date column, not the numbered column. Additionally, ensure that na values are interpreted is a “not a number” and not as strings

dfSPY = pd.read_csv( “data/SPY.csv”, index_col = “Date”, parse_dates = True, na_values=[‘nan’] )

Join the two dataframes using DataFrame.join()

df1 = df1.join(dfSPY)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How do you drop NaN values on a data frame (df1)?

A

df1 = df1.dropna()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How do you drop NaN values when combining two dataframes (ie. df1 and dfSPY)?

A

df1.join ( dfSPY, how = ‘inner’ )

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is the default operation for the “how” parameter in the dataframe.join function?

A

The default option is left which indicates that the calling dataframe’s index will be used. Therefore, any dates from the calling dataframe will be preserved, potentially yielding NaN values if not shared by the other dataframe.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

How can you read in multiple stocks into one dataframe though they may contain the same column names?

A

symbols = [‘GOOG’, ‘IBM’, ‘GLD’]

for symbol in symbols:

df_temp = pd.read_csv(“data/{}.csv”.format(symbol), index_col = ‘Date’, parse_dates = True, usecols=[‘Date’, ‘Adj Close’], na_values = [‘nan’])

Rename columns

df_temp = df_temp_rename( columns = {‘ Adj Close’ : symbol})

df1 = df1.join(df_temp)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

In a dataframe (df) containing multiple symbols,

how would you drop dates in which SPY did not trade?

A

if symbol == ‘SPY’:

df = df.dropna( subset = [SPY])

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

How do you select the piece of data indicating 2010 - 02 - 13 to 2010 - 02 - 15 and only GOOG and GLD?

A

df = df.ix [‘2010-02-13’ : ‘2010-02-15’, [ ‘GOOG’, ‘GLD’] ]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is the best way to normalize price data so that all prices start at 1.0?

A

df1 = df1 / df1[0]

OR

df1 = df1 / df.ix[0]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Slice and plot SPY and IBM over the daterange ‘2010-03-01’ to ‘2010-04-01’

A

start_index = ‘2010-03-01’

end_index = ‘2010-04-01’

columns = [‘SPY’, ‘IBM’]

plot_data(df.ix [start_index: end_index, columns], title=”title”)

df. plot()
plt. show()

def plot_data(df, title=”title”):

ax = df.plot(title = title, fontsize = 2)

ax. set_xlabel(“Date”)
ax. set_ylabel(“Price”)
plt. show()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

How do you normalize a dataframe df?

A

df = df / df.ix [0 :]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
26
How do you return the number of rows in an array, a?
a.shape[0]
27
How do you return the number of columns in an array, a?
a.shape[1]
28
How do you get the number of items in an array, a?
a.size
29
How do you get the sum of elements of an array, a?
a.sum()
30
How do you get the sum of each column of an array, a?
a.sum(axis = 0)
31
How do you get the sum of each row of an array, a?
a.sum(axis =1)
32
How do you get the location of the maximum value of an array, a?
a.argmax()
33
In an array a, how would you get the entire row of every other column up to the 3rd column?
a[:, 0:3:2] where 0 indicates start at first column 3 indicates end before 3rd column 2 indicates choose every second element
34
How do you index an array, a, with another array, b?
a = np.random(10, size = 5) indices = np.array( [1, 1, 2, 3] a = [7, 6, 8, 5, 9] a [indices] = [6, 6, 8, 5]
35
How would I access all elements in this array \>5 ? a = np.array( [1, 6, 5, 3, 8] )
a [a \> 5]
36
How do you compute the daily returns of a dataframe, df?
The daily returns are the net earnings compared to the previous day. daily\_returns = df.copy daily\_returns[1 :] = ( df [1 :] / df [:-1].values ) - 1 daily\_returns.ix[0, :] = 0
37
What is a bollinger band?
A way of quantifying how far a stock price has deviated from some norm.
38
Where are the bollinger bands?
2 standard deviations above and below the mean of the dataset. When the data crosses below the lower band, this could indicate a buy single. When the data crossed above the upper band, this could indicate a sell signal.
39
How do you calculate bollinger bands?
upper band = rolling mean + 2 \* rolling std lower band = rolling mean - 2 \* rolling std.
40
What is an ETF?
An ETF or Exchange-Traded Fund is a basket of equities allocated in such a way that the overall portfolio tracks the performance of a stock exchange index. ETFs can be bought and sold on the market like shares.
41
How do you fill in missing data in a dataframe?
df. fillna( method = ''ffill') df. fillna( method = 'bfill') Fill forward first then fill backwards.
42
What is kurtosis?
It tells us about the tails of the distribution. It tells us how different our distribution is from the Gaussian distribution. A positive kurtosis is more occurance in the tales than expected.
43
How would you print a scatter ploy of 'SPY' and 'GLD' data?
df.plot( kind = 'scatter' , x = 'SPY' , y = 'GLD' )
44
How do we fit a polynomial of degree 1 to a graph?
beta, alpha = np.polyfit( dailyret['SPY'], dailyret['XOM'], 1) plt. plot( dailyret['SPY'] , beta \* dailyret['SPY'] + alpha ) plt. show()
45
How do you find the correlation on a dataframe?
df.corr ( method = pearson )
46
How do you calculate the daily portfolio value
1) normalize df (prices / prices[0] 2) determine allocations = normed \* allocs 3) determine position values = allocs \* start\_val 4) determine portfolio values = pos\_vals.sum(axis = 1)
47
What is the sharp ratio?
Risk adjusted return All else being equal: lower risk is better higher return is better SR also considers risk free rate of return
48
What is the formula for Sharp Ratio?
( Rp - Rf ) / StdDev Rp - portfolio return rf - risk free rate of return stddev - std dev of portfolio return ----- ExpectedVal [Rp - Rf] / Std [Rp - Rf] ---- Mean [daily\_rets - daily\_rf] / std [daily\_rets - daily\_rf] ----- Using the shortcut and treating daily\_rf as a constant: mean [daily\_rets - daily\_rf] / std [daily\_rets]
49
How do you compute the annual risk free rate into a daily amount?
Daily\_Rf = 252nd sq rt ( begining value + risk free rate ) - 1
50
What do you do if the SR varies?
Consider SR an annual measure Sr annualized = K \* SR K = sqrt ( #samples per year ) SR = sq rt (252) \* mean ( daily\_rets - daily\_rf ) / std ( daily\_rets)
51
Ranges - limits on X Constraints - properties that must be true
How do you limit an optimizer to useful data?
52
What is an optimizer?
- Find minimum values of functions - Build parameterized models based on data - Refine allocations to stocks in portfolios
53
How do you use an optimizer?
1) Provide a function to minimize 2) Provide an initial guess 3) Call the optimizer
54
What is the python library to optimize a function?
scipy.optimize min\_result = spo.minimize(func, Xguess, method="SLSQP", options = {'disp': True})
55
What is a convex function?
A real-valued function f(x) defined on an interval is called convex if the line segment between any two points on the graph of the function lies above the graph.
56
How do you build a parameterized model?
Figure out what you are minimizing. Minimize the error
57
What are the types of funds
ETFs - Buy/sell like stocks, baskets of stocks, transparent Mutual Fund - Buy/sell at end of day, quarterly disclosure, less transparent Hedgefund - buy/sell by agreement, no disclosure, not transparent
58
What is liquid?
Ease with which one can buy shares in a holding ETFs are liquid
59
What is large cap?
How is the company worth in terms of #shares x price Price of the stock is related to what a share is selling at.
60
How can you tell what type a fund is?
ETFs - 3/4 letters Mutual Funds - 5 letter Hedge Funds - name
61
How are the manager of these funds compensated? ETF Mutual Funds Hedge Funds
ETFs - Expense Ratio in terms of AUM (0.01 to 1%), tied to an index Mutual Funds - Expense Ratio (0.5 to 3%) Hedge Funds - Two and Twenty (2% of AUM and 20% of profits) AUM - Assets Under Management is the total amount of money being managed by the fund.
62
What types of investors use hedge funds?
Individuals Institutions - retirement funds, university foundations Funds of funds - group together funds of individuals or institutions
63
What are hedge fun goals and metrics?
1) Beat a benchmark 2) Absolute returns
64
What is an order?
Buy or sell info Symbol shares Limit or Market ( market means accept a good market price, limit price means no worse than a certain price) Price
65
How do orders get to the exchange?
you -\> broker -\> exchange you -\> broker then joe -\> broker then joe -\> you you -\> broker -\> dark pool \<- broker2 \<- lisa
66
What are broker order types?
Stop loss Stop gain Trailing stop Selling Short
67
What is the value of a future dollar
PV = FV / (1 + IR) ^i PV - present value FV - future value
68
What is the difference between the interest rate and discount rate?
Interest rate is used with a given present value, to figure out what the future value would be Discount rate is used when we have a known or desired Future Value and want to compute the corresponding present value.
69
What is the intrinsic value of a company?
FV / DR Future Value / Discount Rate
70
What's the value? Dividend = d Discount RAte = dr d / dr
71
What is book value?
Total assets minus intangible assets and liabilities
72
What is market capitalization?
of shares \* price
73
What is a portfolio?
A weighted set of assets. Wi is the portion of funds in asset i the sum of absolute value of the weights is 1.0
74
What is the equation for the return on a portfolio?
The weight \* the return summed for all assets
75
What is the market porfolio?
An index that covers a large portion of stocks US: SP500. An index can be thought of as the "ocean" when malaise occurs Index are cap weighted, where the weight of the stock is the market cap / sum of all market caps.
76
What is the CAPM equation?
The return for a stock on day t is equal to Beta times the return on the market on day t plus alpha on that day. Ri (t) = Bi \* Rm(t) + Ai (t) Beta component - market, SLOPE! Alpha component - residual, y INT! CAPM says that alpha is expected to be 0.
77
What is CAPM vs Active Management?
Passive - buy index and hold Active - pick stocks (over/under weight stocks)
78
What is the difference between CAPM and Active investors?
CAPM says that alpha is random and Expected (alpha) = 0 Active managers believe they can predict alpha, at least more than a coin flip.
79
What are the implications of cAPM?
Only way to beat market is choose Beta Expected value of alpha = 0 Efficient Markets Hypothesis says you cant predict the market.
80
What is Arbitrage Pricing Theory (APT)?
We ought to consider multiple betas. Beta for different sectors
81
Why do stocks split?
The price is too high
82
What are the problems with regression based forecasting?
1) Noisy and uncertain forecasts 2) Challenging to estimate confidence 3) Holding time, allocation
83
Pros and cons of parametric vs non-parametric learners
Parametric: slow training query fast Non parametric: traning fast query slow complex patterns with no underlying model
84
What is cross validation?
Splitting data into many chunks to create different test/train data. It does not work well with financial data because it is time sensitive.
85
What is overfitting?
When in-sample error is decreasing and out-of-sample error is increasing