Lecture 3 Flashcards
(48 cards)
What is invariance in machine learning?
When some change is made to a data point, the target output is not changed. Therefore the network output should be the same.
Invariance in machine learning is a property of a model that ensures its output remains the same when the input is transformed in a way that doesn’t change the task the model was designed for.
In what contexts is it important to consider invariance?
- Image recognition (classification - does it contain a red square, it doesn’t matter where the red square is in space)
- Text recognition (scaling / enlarging shouldn’t change the answer)
- Talking (invariance under speed - talking at a different rate shouldn’t change the meaning of the sentence)
- Computational chemistry (asking if a drug can be used to treat disease, should get the same answer regardless of the position in space)
What types of invariance are there?
- Translational
- Rotational
- Reflection
- Scaling
- Swapping (permutation)
What ways can we ensure that a model has the necessary invariances?
[Not clever but easy]
- Augment the training set with transformed examples
- Extract features that are also invariant
- Build the invariance into the network structure (this leads to CNNs)
[Clever but hard]
Give an example of augmenting the training set with transformed examples.
Rotating/tilting images
For invariance, if we apply some transformation T to the inputs xi, what happens to the output ti?
The output ti is unchanged.
How can the network learn this invariance?
If we augment the dataset. D –> D’
The original dataset contains the original input and output, in addition to the transformed input and output. In this way, we have doubled the size of the dataset. We now have a bigger dataset to train the model.
What are the advantages and disadvantages of augmenting the dataset?
Advantage
- Very straightforward to implement
- Can produce a score ???
Disadvantage
- Makes the training set much bigger
- It will only learn the invariance approximately (unless you have infinite data)
How can we enforce invariance exactly?
If the features input into the network are invariant to the transformation.
Harder, but more robust (it guarantees symmetry)
What is an example of invariant features that can be used?
We explored the example of energy of two atoms. Each has a position in space represented by three coordinates and energy as an output.
We could naively use the coordinates as features. However, these are not invariant to different operations. Energy is invariant to translation, rotation and permutation of atoms (ie swapping atom 1 for atom 2).
We find that we can only depend on the norm relative vector of two atoms. This also reduces the number of features from 6 to 1. A single feature now contains all of the information we need and is invariant to all of the required transformations.
What does symmetry refer to?
In machine learning, “symmetry” refers to the concept of a data pattern or transformation where an object remains unchanged under certain operations, like rotation, translation, or reflection, which allows models to learn more efficiently by leveraging these invariances in the data
Describe the difference between symmetry and invariance.
In machine learning, “symmetry” refers to a property of the data itself, where certain transformations can be applied without changing its underlying structure, while “invariance” describes a model’s ability to produce the same output regardless of those transformations applied to the input data; essentially, symmetry is a characteristic of the data, while invariance is a desired feature of a machine learning model that leverages that data symmetry.
What kind of neural network architecture can we build that gives invariant predictions?
Convolutional neural networks (CNNs) account for translational invariance in the inputs.
In a deep NN, the outputs from one layer can be thought of as the features input into the next layer - they have this modularity.
How are images processed in the brain?
Hierarchically
We start off by collecting low-level information on small scales and build our way up, passing information up the hierarchy. It is transformed into higher-level (and larger-scale) information.
This is similar to a deep neural network.
What is the problem when using a fully-connected deep neural network for processing images?
It would require a lot of weights.
Eg camera pictures can have 2M pixels, each with a corresponding RGB value, leading to 6M pieces of information. If we had 2000 nodes in the first hidden layer, there would be around 4B weights.
High-quality image recognition with a fully-connected deep neural network is unfeasible. We can help matters out by noting that these networks do not take into account spatial correlation.
Why should spatial correlation be considered in image processing?
Parts of an image that are closer together are more likely to be similar to each other than those that are far apart. We want to combine information from input features that are correlated with each other spatially.
What do convolutional neural networks (CNNs) do and how do they achieve this?
CNNs mimic the processing of images in the human brain.
They consist of feature learning and classification. Feature learning involves two new kinds of layers, convolutional and pooling layers. From this, we get a much smaller output set of convolutions which represent the image and these are fed into the standard fully connected layers for classification.
What are the two different layers of a CNN and what do they do?
- Convolutional layers - search for visual elements in groups of spatially correlated pixels
- Pooling layers - combine information from nearby pixels to create a smaller image
In this order usually
What does a convolutional unit / kernel do?
Looks at small groups of input features (pixels). It has a receptive field, which refers to the specific area of an input image that a single neuron in a convolutional layer of a neural network can “see” and use to calculate its output
They scan over the whole image, combining the inputs in their receptive fields
What is the output of a convolutional unit?
A smaller image
What is the size of a convolutional unit?
(n x m)
How many input units it is connected to
What is the stride s?
How often the convolutional unit observes information.
How many convolutional units in a layer can you have?
Many, each one scans for “something different” and produces a separate output image.
What letter is used to represent the convolution kernel?
W - weights of the convolutional kernel