Week 3 (Tensorflow) Flashcards

Tensorflow, Python, Pandas

1
Q

What is Tensorflow?

A

A graph based computational framework for building machine learning models.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the purpose of the toolkit part: Estimator (tf.estimator)?

A

Higher-level APIs to specify predefined architectures, such as linear regressors or neural networks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Describe pandas DataFrame.

A

A DataFrame in Pandas can be described as a relational data table, with rows and named columns. A DataFrame contains one or more Series and a name for each Series.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Describe pandas Series.

A

A Series in Pandas is a single column.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How to start using Pandas in Python?

A

import pandas as pd

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How to create a DataSeries object in Python?

A

Use the pandas library call Series() with the data in square brackets, separated by comma. This constructs a Series object.

city_names = pd.Series(['San Francisco', 'San Jose', 'Sacramento'])
population = pd.Series([852469, 1015785, 485199])
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How to create a DataFrame object in Python?

A

Use the pandas library call DataFrame() with a dictionary mapping string (e.g. a dict is a combination of {‘column name’: DataSeries}).

cities_df = pd.DataFrame({‘City names’: city_names, ‘Population’: population})

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How to read data from a CSV file into a DataFrame object in Python?

A

Use the pandas library call read_csv () with the path or address to the file and you can specify a separator.

california_housing_dataframe = pd.read_csv(“https://download.mlcc.google.com/mledu-datasets/california_housing_train.csv”, sep=”,”)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How can you get quick stats about the data in a DataFrame?

A

Use the describe() function of a DataFrame object.

california_housing_dataframe.describe()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How can you see the first few rows in a DataFrame?

A

Use the head() function of the DataFrame object

california_housing_dataframe.head()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How can you quickly see the distribution of values in a column in a DataFrame?

A

Use the hist() function of the DataFrame object

california_housig_dataframe.hist(‘housing_median_age’)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Describe 3 easy ways to see/access data in a DataFrame?.

A

Specify in squared brackets the column name, to see the series data (e.g. cities_df[‘city name’])

Specify the element (starting with 0) to see a single value (e.g. cities_df[city name][2])

Specify a range (starting with 0) to see all columns and its data in the range (e.g. cities_df[0:2])

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Can you manipulate all data in a Series at once?

A

Yes. E.g. population / 1000

divides all elements in the series population by 1000 and returns the resulting Series

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is NumPy?

A

NumPy is a popular toolkit for scientific computing.
You import it by:

import numpy as np

Pandas Series can be used as arguments to most NumPy functions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What can you use the Pandas Series.apply function?

A

For more complex single-column transformations, you can use Series.apply. Like the Python map function, Series.apply accepts as an argument a lambda function, which is applied to each value.

population.apply(lambda val: val > 1000000)
(returns a new series (same size as population) with boolean values that represent if the population value is above 100000)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How can you add a column to a DataFrame?

A

Assign a Series to a DataFrame column - giving the index a new column name (e.g. ‘Area square miles’ and ‘Population density’ are names of the new columns):

cities[‘Area square miles’] = pd.Series([46.87, 176.53, 97.92])
cities[‘Population density’] = cities[‘Population’] / cities[‘Area square miles’]

17
Q

Can you use data in two Series to calculate a new Series?

A

Yes. E.g.

cities[‘Population density’] = cities[‘Population’] / cities[‘Area square miles’]

18
Q

Write Python: Find all rows in a DataFrame (cities_df) that have a city name (‘City name’) starting with ‘San’.

A

With lamda functions:
cities_df[‘City name’].apply(lambda var: var.startswith(‘San’))

With loops:
cities[‘City name’][i].startswith(‘San’)

19
Q

What is an index in Python (Pandas)?

A

An index is a property of a DataFrame or Series, that identifies a Row in a DataFrame or an item in a Series (uniquely). It is by default automatically created when creating such an object.

20
Q

What does this code do?
(pd pandas library; np numpy library; cities a DataFrame object)

pd.reindex(np.random.permutation(cities.index))

A

Reindexing is a great way to shuffle (randomize) a DataFrame. In the example, we take the index, which is array-like, and pass it to NumPy’s random.permutation function, which shuffles its values in place. Calling reindex with this shuffled array causes the DataFrame rows to be shuffled in the same way.

21
Q

What doe this code do?
(cities index currently 0, 1, 2)

cities.reindex([‘A’,’B’,’C’])

A

Create a new index and reindex the dataframe.
By default values in the new index that do not have corresponding records in the dataframe are assigned NaN.

All values in the DataFrame are set to NaN.

22
Q

What doe this code do?
(cities index currently 0, 1, 2)

cities.reindex([2,1,0,3])

A

Create a new index and reindex the dataframe.
By default values in the new index that do not have corresponding records in the dataframe are assigned NaN.

The values for the new index 3 in the DataFrame are set to NaN.

23
Q

What is categorical data?

A

Features, that are text values. Such as descriptions e.g. ‘modern’, ‘male’, ‘child’.

Features having a discrete set of possible values. For example, consider a categorical feature named house style, which has a discrete set of three possible values: Tudor, ranch, colonial. By representing house style as categorical data, the model can learn the separate impacts of Tudor, ranch, and colonial on house price.

24
Q

What is numerical data?

A

Features represented as integers or real-valued numbers. For example, in a real estate model, you would probably represent the size of a house (in square feet or square meters) as numerical data. Representing a feature as numerical data indicates that the feature’s values have a mathematical relationship to each other and possibly to the label. For example, representing the size of a house as numerical data indicates that a 200 square-meter house is twice as large as a 100 square-meter house. Furthermore, the number of square meters in a house probably has some mathematical relationship to the price of the house.

25
Q

Does a feature column in Tensorflow store data?

A

No.
In TensorFlow, we indicate a feature’s data type using a construct called a feature column. Feature columns store only a description of the feature data; they do not contain the feature data itself.

26
Q

Describe the steps to build a LinearRegression model with Tensorflow.

A
Step 1.1 Define the input feature (array of data, Series)
# Define the input feature: total_rooms.
my_feature = california_housing_dataframe[["total_rooms"]]
Step 1.2 Configure a feature column in tensorflow (tf) with its data type
# Configure a numeric feature column for total_rooms.
feature_columns = [tf.feature_column.numeric_column("total_rooms")]

Step 2: Define the targets (array of data, Series) (e.g. housing value)
targets = california_housing_dataframe[‘median_house_value’]

Step 3: Configure a linear regression model using the LinearRegressor.
my_optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.0000001)
my_optimizer = tf.contrib.estimator.clip_gradients_by_norm(my_optimizer, 5.0)

Step 4: Define the input function.
To import our data into our LinearRegressor, we need to define an input function, which instructs TensorFlow how to preprocess the data, as well as how to batch, shuffle, and repeat it during model training. Our input function constructs an iterator for the dataset and returns the next batch of data to the LinearRegressor.

Step 5: Train the model
We can now call train() on our linear_regressor to train the model.
_ = linear_regressor.train(
input_fn = lambda:my_input_fn(my_feature, targets),
steps=100
)

Step 6: Evaluate the model

27
Q

Is There a Standard Heuristic for Model Tuning?

A

The short answer is that the effects of different hyperparameters are data dependent. So there are no hard-and-fast rules; you’ll need to test on your data.

That said, here are a few rules of thumb that may help guide you:

Training error should steadily decrease, steeply at first, and should eventually plateau as training converges.

If the training has not converged, try running it for longer.

If the training error decreases too slowly, increasing the learning rate may help it decrease faster.
But sometimes the exact opposite may happen if the learning rate is too high.

If the training error varies wildly, try decreasing the learning rate.
Lower learning rate plus larger number of steps or larger batch size is often a good combination.

Very small batch sizes can also cause instability. First try larger values like 100 or 1000, and decrease until you see degradation.

Again, never go strictly by these rules of thumb, because the effects are data dependent. Always experiment and verify.