Bot Code Knowledge Flashcards Preview

Python & Computer Science > Bot Code Knowledge > Flashcards

Flashcards in Bot Code Knowledge Deck (95):
1

What is getopt? 

What is it's syntax? 

Is a "command line option parser". 

It parses an argument sequence such as sys.argv and returns a sequence of (option, argument) pairs and a sequence of non-options arguments. 

Syntax: (option, argument) = getopt.getopt([ '-a', '-bval', '-c', 'val'], 'ab:c')

As you see that it outputs a pair, this is why you need to equate to a pair.

2

What is numpy? 

Numpy is the core library for scientific computing. 

Numpy provides a high-performance multidimensional array object. Alongside this, it gives tools to work with these arrays. 

It allows you to get the same sort of functionality as Matlab. 

3

What is pip? 

It is the preferred used installer program for modules in python. 

Since python 3.4 pip has been included by default with python. 

Almost all packages that you hear of will be available with pip install

 

"PIP is a package manager for Python packages, or modules if you like."

"A package contains all the files you need for a module."

"Modules are Python code libraries you can include in your project."

4

How do you create your own modules in python? 

Modules are simply python scripts that are imported into another script. 

First, write up your function and save it.  

5

How do you write to a file? 

First of all you need to have 'w' in your file open line. 

Then use .write() to add you text. *Note writing to a file clears the file of anything beforehand, to add to a file you need to use append i.e. 'a' in the open line*

Within the brackets of .write() you want to write the text. 

6

What does CSV stand for? 

Comma separated variables

The delimiter determines what separated the variables. It doesn't have to be commas, it can pretty much be anything.

7

What is pandas? 

"Pandas is a python software library for data manipulation and analysis. "

"Pandas is a python package providing fast, flexible, and expressive data structures designed to make working with 'relational' and 'labeled' data both easy and intuitive.

8

What is matplotlib? 

Is plotting library/package. 

9

What is beautfiulsoup? 

Is a python library for pulling data out of HTML and XML files. 

10

How do you permanently set the index for a data frame?  

To set the index you use .set_index("Desired Index")

To make it permanent you need to add another parameter in the brackets, inplace, and it needs to be set to True

11

How would you access a single column from a dataframe? 

In the same way you would get the values in a dictionary. 

dataframe_name['Desired_Column_Name']

Or 

dataframe_name.Desired_Column_Name()

12

How do you convert a dataframe column into a list? 

What do you need to remember?

dataframe_name.column_name.tolist()

 

Only works on one column at a time.

13

If you want to print to columns of a dataframe what should you do ? 

You can extract the columns by referencing both columns with respect to the dataframe, within double square brackets. 

Alternatively, you could convert the dataframe into an array using np.array(dataframe[['Column1', 'Column2']]). 

14

When reading in a csv file through .read_csv() how can we define the index column? 

within the brackets you place index_col and equate it to 0 (or whatever index you want)

15

What does .to_csv() do ? 

It converts a dataframe in python to a csv file. 

 

Before the full stop you place the dataframe you can to convert into a dataframe. 

dataframe_name.to_csv()

16

How do you block comment in mac ? 

cmd + 1

 

cmd +4 comments the block but also puts two lines of hyphens above and below the block.

17

How do you get data from Quandl? 

using quandl.get()

within .get() you place the 'TickerID', authtoken i.e API and optionally a start_date and/or end_date. 

18

What does pandas_datareader do ? 

Up to date remote data access for pandas, works for multiple versions of pandas.

19

How do you read a CSV file and convert it to a dataframe? 

What do you need to add to convert a date column to the index? 

pd.read_csv("file_name.csv")

 

Plug in the parameters (parse_dates = True, index_col = 0) into the brackets. 

20

If you make any changes to the iphyton console in preferences what else do you need to remember to do? 

Reset ipython console kernel.  

(Small cog wheel in the top right)

21

If the index isn't sorted in the right direction how can you sort it correctly?

.sort_index(axis = 0, inplace=True, ascending = True)

22

When parsing out paragraphs from a webpage using BeautifulSoup, what do you need to remove the tags so that you're only left with strings? 

What is the most simple way to extract the strings? 

You can use .string or .text. The difference between the two is that .string won't show any string with 'child' tags. 

In most case you probably want .text 

 

That being said, the most simple way is to take your BeautifulSoup object say soup and apply the module .get_text(), i.e. soup.get_text(). Note that the two forms are not exactly the same. 

23

What is pickling? 

Why would you use it? 

How would you use it? 

Pickling is the serializing and de-serializing of python objects to a byte stream. Unpicking is the opposite.

Pickling is used to store python objects. This means things like lists, dictionaries, class objects, and more. 

Pickling will be most useful for data analysis, when you are performing routine tasks on data, such as pre-processing. Also used when working with python specific data types, such as dictionaries.

If you have a large dataset and you're loading a massive dataset into memory every time you run the program, it makes sense to just pickle the data and load that. It may be 50-100x faster. 

How to pickle? 

After having imported the pickle you need to open the file that is pickled. 

pickle_in = open("dict_pickle", 'rb') # rb stands for read byte

example_dict = pickle.load(pickle_in)

24

What module is building block for scraping web pages with BeauftifulSoup?  

After obtaining the desired content by applying .find() on soup(webpage contained by bs4), it is still covered and surrounded by HTML code. To remove it you need to use .find_all(), within the brackets you need to state what tag you want to remove.

 

 

25

What is a quick way to extract table data from webpages?

use

pd.read_html("url")

26

How do you add pickle data to a pickle file? 

pickle.dump(x,y)

x = what you want to add

y = the pickle file you want to 'dump' it in. 

27

After having extracted table data from a webpage through beautiful soup,  you want to iterate through table elements one by one, how would you do that? 

for row in table.findAll('tr')[1:]:

ticker = row.findAll('td').text

tickers.append(ticker)

 

#1st line: iterate through each table row, except for the top row as these are the column labels.

# 2nd line: essentially what this says is find all table data for this row (hence td), convert this data into text. You could slice this list if you only desire content from specific columns. 

 

28

If you want to open an already existing pickle file, what do you need? 

pickle.load(x)

x = pickle file you want to access

 

This can be assigned to a variable to save it.

29

How would you make a new directory? 

os.makedirs('x')

x = define directory name

30

If you have a module applied to library that you consistently use or are going to use, what would you do to make your code writing more efficient? 

Import the library with the module attached and give it a shortened name value using 'as'. 

For example, since the pyplot module is heavily used in matplotlib, it is common to find the module with the library imported and defined as plt. 

i.e. import matplotlib.pyplot as plt

31

When using matplotlib, what do you need to do to make plt.legend() work? 

You need to label your plots, after adding the x and y variables, add a third parameter label.

 

32

How do you read a csv file in python? 

You read a csv using csv.reader(csv_file_name, delimiter = ',')

The delimiter is what the values will be separated by. In the case above they are separated by a comma. 

33

How can you use numpy to load data from files ?  

import numpy as np 

 

np.loadtxt("File_name.type", delimiter = ',' , unpack = True)

 

*Note* The file does not have to be a .txt, it can be a .csv, it can be any file with text in it. 

It's also important to remember to add unpack = True if you have two variables to unpack.

34

What does .split() do ? 

when applied to a string it returns a LIST of the all the words in the string. 

35

How do you open URLs using the urllib library ? 

urllib.request.urlopen()

Inside the brackets you paste url within commas. 

36

Using the os library how do you return the current working directory from a python script? 

os.getcwd()

 

its easier to remember the module if you look at what cwd abbreviates, its an abbreviation for current working directory. 

37

What is sys.argv ?  

sys.agrv allows you to pass a list of command line arguments from the terminal. 

It is a list in python which contains the command-line arguments passed to the script. 

 

38

What library would you use to search in a body of text? 

How would you find all the numbers in a text? 

you should use Regular Expressions written as re in python.

 

re.findall(r'\d', x)

x = text variable 

39

If you want to POST to an URL what are the necessary steps that you need to take?

What changes when you want to do GET request? 

  1. You first need to define the variables that you intend to post in a dictionary, reffered to as values.
  2. For the URL to understand the values it needs to be encoded using data = urllib.parse.urlencode( values). There's another encoding step after that, encoding to utf-8 bytes, i.e. data = data.encode('utf-8')
  3. Once the data is encoded the next step is code a request to the URL to post your values. req = urllib.request.Request(url, data)
  4. The following step is to open the URL with request added on, urllib.request.urlopen(req). Opening the URL with the request will return a response, this will be assigned to the variable resp, i.e. resp = urllib.request.urlopen(req).
  5. Finally to see the response .read() needs to be applied to resp. 

 

A GET request is pretty similar to part 4 above. It uses the same base code urllib.request.urlopen() but now we need to decode. The code should look like this, urllib.request.urlopen(website_url).read().decode()

40

If you want to combine multiple plots on the same grid, what module in plt do you need to use? 

 

If you want to graph two plots on the same grid of 6 row pieces and 1 column piece, with ax1 taking up 5 rows across the 1 column and ax2 taking the rest, what would the code be? 

plt.subplot2grid((x),(y))

x is a tuple stating the number of rows and columns, y is a tuple specifying the origin of the plot. 

 

 

ax1 = subplot2grid((6,1),(0,0), rowspan = 5, colspan = 1)

ax2 = subplot2grid((6,1), (5,1), rowspan = 1, colspan = 1)

*note* you need to remember to adjust the start point. 

41

If you have defined a subplot called ax1, how do you access the labels to change them (not to change the name, to rotate, etc.)? 

ax1.xaxis.get_ticklabels()

 

if you want to access the y axis jus change xaxis to yaxis. 

42

If you want to plot OHLC candles in python what do you need to import? 

you need to use matplotlib.finance to import candlestick_ohlc

 

this is written as:

from matplotlib.finance import candlestick_ohlc

43

How do you add text to a graph ax1 based on matplotlib?

Two options:

ax1.annotate()

ax1.text()

 

For ax1.annotate, the first parameter is the what you want to annotate, it needs to be a string, so ints and floats need to be converted to strings. The second parameter is where you want to annotate, if you're using candlesticks you can specify a specific candle and choose where you want to annotate on the candle, ohlc. 

44

With deep learning, how should you approach testing? 

The price data is split up into the training set and a test set. 

The model is built on the training set and then applied to the unseen test set to see if similar results are obtained. 

45

How is .loc() used ? 

It is a module applied to a dataframe say df to access a group of rows or columns using the labels used. 

Note that placing one label in loc returns the values in that row (or column) as a series. 

If there is more than one label, then a dataframe is returned. 

46

How do you parse webpage content using Beautiful Soup? 

How is this applied to tables?

With import bs4 as bs.

To parse the content we need to first convert the URL data into a Beautiful Soup object. The Beautiful Soup object is obtained by applying the .BeautifulSoup() module to bs from bs4 library. It is by convention that the object is assigned to the variable soup, i.e. soup = bs.BeautifulSoup(text, ' lxml '). 

 

With the content now as a Beautiful Soup object, other modules in the library can be applied to parse it.

One of the most common modules is find_all(), it is used on 'soup' and it allows you to filter specific content based on HTML tags. 

For example, if you want to extract all the URLs in the webpage you can write soup.find_all('a'). 

 

To find whole tables you need to apply the module soup.find('table', {'class': 'wikitable sortable'}). From there you can use find_all() to filter through the table rows('tr') and within table rows you can access the table data ('td'). 

 

*Note: Beautiful Soup does not acquire web page content, this needs to be done using urllib or requests.*

47

What are the arguments for using the request over urllib?

The request package allows you to do what urllib does bu shorter and more succinct.

It only takes one line to get content from a URL

resp = requests.get('url')

Similarly, posting information to a URL is a lot shorter. To post the request.post() module simply takes a dictionary as the argument. 

search_data = {"search": "Hello World"}

resp = request.post('url', data=search_data)

48

How does .join() work for pandas? 

It joins columns with other data frames either on the index or on a key column. 

There are optional parameters to customize the joining, one important parameter you need to consider is 'how' it is going to join. The default of 'how' is set to left, which means that the calling frame's index is used, right is the opposite, outer forms union of calling frame index with other and sorts it lexographically, lastly inner is the opposite of outer it forms an intersection. 

49

What happens when you apply .values to a panda dataframe? 

A numpy representation of the dataframe is returned. 

50

If you apply a .shape() to a numpy array, what is returned? 

A tuple with the numpy array dimensions. 

51

What does numpy.arange(x) do ? 

Return evenly spaced values within a given interval. 

52

What does ax.xaxis.tick_top() do? 

Move ticks and ticklabel (if present) to the top of the axes.

53

What does pandas.DataFrame.columns do ?

Returns the column labels of the dataframe. 

54

When you need to remove a column from a dataframe using .drop() what does the axis parameter need to be set to? 

To remove a column you need to set axis equal to 1. 

55

When using .drop() on a dataframe and you keep the inplace parameter on the default False, what will happen?

Leaving inplace to false does not permanently change the dataframe. To change the underlying data of the dataframe you need to set inplace to True. 

One way to view it is that you want your changes to stay in place, which is why you set it to True.

 

The default value of False for inplace is useful as it allows you to test the changes before making permanent changes. 

56

What are two modules you can use to create heatmaps with matplotlib? 

imshow()

&

pcolormesh()

57

*args and **kwargs

What are they used for? 

They are mostly used in function definitions. 

*args and **kwargs allow you to pass a variable number of arguments to a function. In other words, the number of arguments is dependent on the user.

*args is typicaly seen as a list (note that isn't exactly the same).

**kwargs is seen as a dictionary as you need to pass keyworded arguments, i.e. name ="potato" where name is the keyword and potato the value.

A good way to remember what **kwargs do is to remember that 'kw' stands for keyword, so essentially its **keywordargs. 

58

How do you add columns to a pandas dataframe? 

It is essentially the same as dictionaries, you apply index brackets to the dataframe to assign the column name, this is then equated to what values you want in the column.

59

For machine learning, explain features and labels. 

Simply put, a feature is an input; the label is an output. 

 

A feature is a single column of data in your input set. For example if you're trying to predict what sort of degree someone might choose your input features might be gender, region, family income, etc. The label is the final choice. 

After having trained the model give a new set of inputs for the features and it should return a predicted label. 

60

What does counter do? 

Counts the occurrences of a string in a list and returns a dictionary with strings and their associated occurrences. 

61

Quantopian: What does the initialize function do?

The initialize function runs once when the script starts. 

It takes in one parameter which is context

Context is a python dictionary that stores a bunch of data on your strategy (your protfolio, your performace, leverage, other info about you, etc). 

 

When using quantopian the initialize function needs to be defined but it does not need to be called in the script.

62

Quantopian: what is the history() method and what are its input parameters? 

The history() module returns the price (or volume, etc) for the specified asset for x time back depending on the bar_count and frequency chosen.  

Note: that the module is based on a pandas dataframe. 

 

Input parameters: asset (e.g. the stock), field (type of data, price or volume?), bar_count (how many bars do you want), frequency (time period). 

63

Quantopian: How can you pull price data? 

you can get price data using data.history()

64

Finance: What does alpha represent? 

Alpha represents the performance of a portfolio relative to a benchmark.

In other words, alpha is a measure of the return on investment that is not a result of general movement in the market. 

65

Finance: What is beta? 

Beta is a measurement of the volatility of an asset's returns.

It is used as a measurement of risk. 

A higher beta means greater risk, but also greater expected returns. 

 

β = 1, exactly as volatile as the market.

β > 1,  more volatile than the market.

β < 1 > 0,  less volatile than the market.

β = 0, uncorrelated to the market.

β < 0, negatively correlated to the market.

66

Quantopian: How do you run your own function in quantopian? 

Using schedule_function() written under initialize function.

You need to place your function as a parameter within schedule_function(). 

You can also define how often it runs, hourly, weekly, monthly, etc. 

Also, when it runs relative to the market open. For example, you can make it offset so that it only runs 1 hour after the market opens.

67

What does the blaze ecosystem allow for? 

It provides python users high-level access to efficient computation on inconveniently large data. 

68

Qunatopian: what does blaze.compute() do? 

It returns a pandas dataframe from a blaze.

69

What is rolling.apply() and how is it used? 

Rolling.apply() allows you to apply a function to individual values in a data set. It is typically used to apply functions to values in a dataframe column. 

 

Rolling.apply() applied to a dataframe:

pandas_dataframe_name.Rolling.apply()

70

With sys.argv how do you make sure you only access the command line arguments? 

You need to slice it to remove the program name from the beginning. 

arguments = sys.argv[1:]

71

What does sys.exit() do?

How is it different to break?

sys.exit() can be used anywhere and it causes the entire program to end. 

 

Break is only used in loops, it causes the loop to end but if there is code after the loop it the program continues.

72

How do you convert unix time to readable datetime? 

df['Date'] = pd.to_datetime(df['Date'],unit='s')

73

If you get the error 'int' object has no attribute 'toordinal' when using df['Date'].apply(mdates.date2num), what has happened? 

The dataframes 'date' column is not in the date type 'datetime64' which is obtained from the datetime library thus it needs to be converted using: pd.to_datetime(df['Date'],unit='s'). 

74

If you have a dataframe with dates of type string how do you convert it into a datetime object? 

 

using pd.to_datetime(x)

x= is the date column

 

There are also a lot of optional parameters that can be added.

75

What do you need to remember about using pd.to_datetime()? 

to_datetime() is capable of converting string dates to datetime objects as long as there is a consistent time differnce between the dates, in other words a consistent pattern. 

If there are anomalies, it throws the module off. 

76

If you need to convert a string date to a mdate how would you do it ? 

df['Column_name'].apply(mdates.datestr2num)

converts a string date to num date (which is mdate)

77

What is the easiest module to use to plot candlesticks in python? 

What do you need to remember about it?

candlestick2_ohlc

 

It does not takes dates. 

78

When using matplotlib if the dates are overlapping in a plot how do you fix it? 

Rotate the dates using:

plt.xticks(rotation=x). 

45 or 60 degrees would work well.

79

If the x-axis is showing mdates how do you convert it to a regular datetime representation. 

ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m-%d'))

 

Where ax is referring to what plot you want this to apply to. 

80

When using pd.to_datetime on a dataframe date column and you get inconsistent date values returned (i.e. random dates that are a lot larger or smaller than the dates preceding and following it), how do you solve this issue?

What is happening is that to_datetime is incorrectly identifying the date format, so you need to define the format. 

One of the optional parameters of to_datetime is 'format'. 

Example, format = '%d/%m/%y'. 

*Note* if the date column has years that are truncated (i.e. 15 for 2015), then you need to use a lowercase %y to define the year. 

81

If you want to pull a certain column (or columns) to create a new dataframe, how would you do that?

index the dataframe you want to pull the column from with 2 sets of brackets, i.e. df[['Close']]. 

If you only you use one set you get a pandas series. 

82

What is the easiest way to add new columns to a dataframe? 

On the LHS use index brackets on the dataframe with the name of the column inside, equate this to 

83

How do you add NaN values to a list? 

use:

None

84

If you're going to use a list in multiple variables, what do you need to do and why? 

You're going to need to copy the list. 

If you won't the variables will essentially be linked to the same list and 

85

What does zip() allow you to do? 

It allows you to iterate through more than one tuple, list or dictionary and at once. 

For example, this allows you to pull values from two different lists at once and use them for calculations or aggregate them. 

You need to bear in mind that zip() returns a tuple of the iterables. 

86

Remind yourself, what are the slicing rules again? 

The start value is inclusive, the end value is exclusive. 

So remember, that start value is included, end isn't.

87

What is the relation between figure and subplots.

A figure can have multiple subplots. 

88

Say you download a csv fill, the dates are strings and they are formatted weirdly, what do you need to do to convert it into a datetime object? 

First of all, remove any characters that interfere with the identification of time values, do this with strip(). For example, if there are AMs and PMs. 

Next, the date column needs to be passed through pd.to_datetime(). The first parameter is the column, the next parameter is the format of the date, make sure to correctly pass through the format of the string date including any spaces/hyphens/backslashes between the values. 

 

 

89

What do you need to remember about ylabels, xlabels, xticks, etc, when you have multiple subplot2grid calls in one figure? 

Any specific plt definitions for a plot like ylabels , xlabels , xticks , et, need to be defined before any other subplot2grid mentions. 

For example, say you're creating a plot with volume, rsi and price data. If you want the date for the Volume plot to be rotated you need to define plt.xticks(rotation=45) right after you write ax = plt.subplot2grid(). 

90

If datetime ticks of one plot are unnecessary, how do you remove it? 

plt.setp(ax2.get_xticklabels(), visible = False)

ax2 just refers to the plot that this applies to. 

91

How to create an empty dataframe? 

pd.DataFrame()

92

How do you sort the index of a dataframe from smallest to largest? 

use df.sort_index()

the parameters you need to add inside the brackets are ascending = True and inplace = True.

Ascending set to true specifies that you want the index to be sorted from smallest to largest. 

Inplace set to true means that you want this change to be permanent. 

93

When applying modules that you have never applied before, what do you always need to check to save time? 

Check if the module has a inplace parameter defined. 

If it does then you need to set inplace to True if you want your changes to last. 

94

How can you create arrows without using the arrow() module? 

Using annotate()

plt.annotate('text', xy=(x,y), xytext=(x,y), arrowprops=dict( arrowstyle="-|>", color='r', lw=1.5))

95

How do you reference a pandas dataframe column by index? 

df.iloc(:, n)

n is the column number of the column that you want to use. 

The first parameter is the index of the values or values that you want to obtain from the column, : specifies the whole column.