NER Flashcards
(38 cards)
What does NER stand for?
Named Entity Recognition
What is a Named Entity?
It is anything with a proper name
People, Locations, Organisations, Events, Dates, Time, Money
What is Named Entity Recognition?
It is the task of labelling a text span with types of named entities
What are some popular NER tags?
People (PER)
Organisations (ORG)
Locations (LOC)
Geo-Political Entity (GPE)
What are NER approached based on?
They are based on tagsets
What is a popular NER tagset? How many types does it define?
The Automatic Contact Extraction (ACE) tagset is a very popular that defines 7 types.
What are some challenges of NER?
It works with spans, so working out how big a phrase needs to be labelled for a NE
Named entity type ambiguity, where a NE can be different types depending on the context
What is the key method used for NER?
We treat it as a sequence labelling problem so we use BIO tagging
What is BIO tagging?
It is a common approach for sequence labelling requiring span-recognition
What does BIO stand for?
Begin, Inside, Outside
What is the idea of BIO tagging?
We assign a tag to each word in our sequence, and each tag may represent the beginning, the middle or the end of something
Explain what the image shows
It shows different types of tagging methods. It shows that IO is difficult to comprehend, as it is difficult to understand where one NE begins and ends. BIO tagging builds on this by using begin labels, which shows where NEs begin. BIOES takes this even further.
What model is used to learn text according to the BIO scheme to identify NEs?
Conditional Random Fields (CRFs)
What type of features are useful for NER?
Non-word features such as captilisation.
Why is the Hidden Markov Model (HMM) not a good model for NER?
As they are generative, is is hard to add feature patterns
What type of model is CRF?
It is a discriminative sequence model based on a log-linear model. It is widely used for this type of sequence labelling problem.
What is the input and output for a CRF model, and their lengths?
The input is a sequence of words, the output is a sequence of BIO tags. The length of the input will always be the same as the length of the output
What does CRF want to find?
It wants to find the most probable sequence given a set of all possible sequences for a set of words.
What is the equation for CRF?
Y hat is the most probable sequence, which is found using argmax. We compute the probability for each possible sequence we have given the input words and let argmax take the most probable
What does CRF define?
A function F, which takes an input of a sequence of words and a sequence of BIO tags up to that point.
What needs to be created in order to performed NER?
A set of K features
What does each feature have in a CRF model?
Each feature has a corresponding weight
What is the global feature vector in a CRF model?
It is the sum of all local features
What are the local features in a CRF model?
Local features are features at a particular word index in the sentence. For each index position in the sentence, we use the local feature function to compute the features for that particular position. This can be summed to give a global feature vector for the entire sequence.