LLMs Flashcards
(43 cards)
What is a corpus?
Corpus (or corpora for plural) is the text/body of texts the AI was trained on.
What is context?
The section of the prompt the AI uses in their prediction of the next word.
Explain the Markhov assumption
The future evolution of an object is independent of its history and solely based on the last step.
Describe the process of a unigram n-gram model?
Counts the number of words in a corpus, assigns weightings to each word based on it’s count, suggests words based on these weightings.
Describe the process of a bigram n-gram model?
Given the last word of the context, what is likely to be the next word. It counts each word-pair in the corpus and assigns weightings based on the count. It then checks the last word of the context, then suggests a word based on the probabilities of the word pairs with which the first word of the pair is the last word of the prompt.
Describe the process of a trigram n-gram model?
Given the last two words of the context, what is likely to be the next word. It counts the occurrence of each three-word sequence in the corpus and assigns weightings based on the count. It then checks the last two words of the context, then suggests a word based on the weightings of the three-word sequences with which the first two words of the pair is the last two words of the prompt.
Why do n-gram models using a higher n number fall short?
A sequence of words in the context needs to appear exactly the same in the corpus in order for the model to recognise it, check a weighting, and suggest the next word. If the exact context is not in the corpus, the model cannot suggest the next word.
What are the drawbacks of n-gram models?
It cannot link information from different sections of a text.
What is a deterministic model?
A model with a temperature of zero which gives a predictable result.
What is an LLM?
Large Language Models are computational neural networks notable for their ability to achieve general-purpose language generation and other natural language processing tasks such as classification.
What is a limitation of LLM’s and how is this circumvented?
LLM’s can only suggest words that are included in the corpus. To circumvent this, we use large training data sets such as social media or the Internet.
What is the temperature of an AI model and what does a low and high temperature give? What are the use cases?
Temperature - how random an LLM is.
A temperature of zero gives zero randomness and a temperature of 2 give a high degree of randomness. At a low temperature, only the most likely outcome is selected, whereas at a high temperature, all words are equally likely to be selected.
Different temperatures have different uses. A low temperature is good for predictable text, such as for a cover letter, whereas a higher temperature can give poetry.
What is an epoch?
An epoch is a training run, or a pass through the corpus.
What is the training progression of an LLM?
Before training starts and even after the first few epochs, the predictions stored and given by the model are random. Any coherence is random.
Going through training runs increases coherence and relevancy, but a lot of training runs (1000s to 10000s are needed).
What is the best method of correction and why?
A human can correct the LLMs, but due to the high level of epochs, it’s much more efficient for the model to correct itself based on the original text due to the large amount of training data.
What is the measure of loss and what is the target figure? What is the word describing an LLM that is overtrained?
During each epoch, the neural network compares its prediction with the original data. The model’s prediction is likely off by some amount. The difference between the predicted and actual values is called theloss.
Each epoch reduces the loss, though how much we can reduce the loss and make the AI more accurate decreases over time.
Although reducing the loss does make an LLM more coherent and give better advice, a loss of zero would mean the outputted text is exactly the same as the original text, defeating the purpose of the LLM. It cannot output novel text. This is called overfitting.
A model isoverfitwhen it replicates patterns in the training data so well that it cannot account for new data or generate new patterns.
To prevent overfitting, we need to monitor training closely once the model begins performing well.
What is preprocessing and why is it needed?
Need to turn raw data (full of mistakes and inconsistencies) into a clean data set via two processes - tokenisation and preprocessing.
Preprocessing makes all letters lowercase and removes punctuation. As a computer sees M and m as different, making all characters lowercase increases the frequency of words that would have lowercase and uppercase variations.
What considerations should we make with regards to punctuation and capitalisation during preprocessing?
Sometimes, such as spam filtration, it’s important to take capitalisation into account and not weed this out in preprocessing.
One also needs to account for punctuation, but a punctuation mark can mean multiple tings. It’s not reliable to assume a capital letter and full stop means a new sentence. Isntead, we can add a new special character
What is tokenisation?
Tokenization is breaking a corpus into units that the model can train on. Words and punctuation are split into separate tokens.
What is white-space tokenisation?
Whitespace tokenisation is splitting words by spaces.
What are the drawbacks of using syllables as tokens and what is done to overcome this?
We can also split plural and tense words to isolate the stem (e.g. in started, start!, and starting, start is the stem). This requires knowledge of the grammar of the language.
Instead, one can build an LLm that uses characters as tokens.
What is Byte-Pair Encoding and what is it’s process?
Byte-Pair Encodingis a tokenization algorithm that builds tokens from characters.
A further optimization is to search for the most common word pair, then make that word pair into a single token and replace all instances of the word pair in the corpus. If a single character no longer appears as it is counted for in a word pair token, we can remove it from the vocabulary of the LLM.
What is a neural network?
A neural network is a linked collection of nodes split into layers. It has an input layer, and output layer, and any number of middle hidden layers. Each layer consists of nodes, and the nodes are connected between layers. Each node is assigned a random weight (or zero).
During development, the input nodes pass information to the nodes of the hidden layer. The strength of the signal is adjusted according to the weights and passed onto all nodes of the next layer - this is a probability between 0 and 1.
Each node is looking for a specific feature. Generally speaking, the further along the NN, the more complex the feature the node is looking for is.
Eventually, an output is given. In supervised learning, the human analyses the accuracy of the output. If accurate, positive feedback is given and the weights of the contributing nodes are increased so the result is more likely to be given again. If it is wrong, the weights are decreased so the result is less likely to be given.
The architecture is the neurons and hidden layers, whereas the weights are the calculation.
If the variables are plotted onto a graph, there is a linear regression showing the trend - this is the decision boundary.
How is information inputted into a neural network?
To be inputted into a neural network, data needs to be in numerical form.