ML Code Flashcards
What usually makes up a large fraction of the code needed to perform machine learning?
Data handling.
Before we can train a model, we need to get the data into a form which is compatible with the training and testing.
Why do we train and test in batches?
Training models on the full dataset all at once would take too long.
What sample is used at each iteration of training?
Different random sample.
What is the main aim of building a model?
Make predictions on unseen data.
Classification or regression.
What is generalisation?
The model’s ability to make predictions on new, unseen data.
How do we read in a dataset?
df = pd.read_csv(“file.csv”)
For image-based data, how do you create the training data?
train_data = torch.tensor(train_df.iloc[:, 1:].to_numpy(dtype = ‘float32’)/255).reshape(-1,1,28,28)
For image-based data, how do you create the training labels?
train_labels = torch.tensor(train_df[‘label’].values)
What is true of the training data and training labels shape?
The number of elements in the first training data result is the same.
How do we create the dataset in the form of tensors?
train_dataset = TensorDataset(train_data, train_labels)
What is the basic process for conducting data handling and training?
[See flashcard]
How do you create a training data loader?
train_loader = DataLoader(train_datset, batch_size=64, shuffle=True)
How do you create the testing data loader?
test_loader = DataLoader(test_dataset, batch_size=100, shuffle=False)
How do we generate a subsample?
Enumerate.
examples = enumerate(train_loader)
We can use the data loaders as iterators. So we can go through each batch of data and look at that at a time.
What does enumerate do?
Enumerate is a built-in function in python that allows you to keep track of the number of iterations (loops) in a loop.
What can we do with the generated subsamples?
Look at one to invest age.
batch_idx, (example_data, example_targets) = next(examples)
Can look at shape and type of each. Keep asking for the next batch of data.
How do we plot images from the data?
Iterate through the data loader, to look at a batch of images and targets. Creates an array of plots to display the images and shows the target label.
Using ax.imshow() which is part of pyplot.
fig, axs = plt.subplots(nrows=2, ncols=4, figsize=(10, 5))
axs = axs.flatten()
for ax, image, label in zip(axs,example_data,example_targets):
ax.set_axis_off()
ax.imshow(image[0], cmap=plt.cm.gray_r, interpolation=”nearest”)
ax.set_title(“Training: %i” % label)
What do we do after we have investigated the initial tensor data?
Flatten the images, so that each are 1D tensors.
What is torch?
The library for PyTorch
What does data type = torch.float32 mean?
They are torch tensors and the data are stored as floating point numbers.
Why is it important that the test and training sets come from the same source?
If there are systematic differences, it would be hard to teach a model to deal with that. The model needs to have seen similar examples in order to learn how to interpret the images.
What is the torch.nn library?
The neural network library.
For the basic, fully connected neural network, what should you do?
Flatten all of the images, to turn it into 1D data.
image_size = train_data.shape[1:]
To investigate the size:
input_layer_size =np.prod(image_size)
Each image (28x28 pixels) will then be represented by 784 numbers.
How do we define a model which is trained to convert image pixel values to labels?
[See flash card]