Hugging Face ecosystem | HF NLP course | 7. Main NLP Tasks | Priority Flashcards

1
Q

[q] How would you inspect the class names of a token classification dataset?

A

label_names = raw_datasets[“train”].features[“ner_tags”].feature.names

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

[q] How to map each token to its corresponding word?

A

inputs.word_ids()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

[q] Basics (3 steps) of how texts need to be converted to token IDs before the model can make sense of them.

A

“– Apply a function to tokenize and align labels for each split of the dataset with map().
– Write a function to combine tokenization and aligning labels to tokens for the examples from one split.
– Write a function to align labels with tokens for one example.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

[q] How to pad the labels the exact same way as the inputs so that they stay the same size.

A

“from transformers import DataCollatorForTokenClassification
data_collator = DataCollatorForTokenClassification(tokenizer=tokenizer)”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

[q] How is a metric loaded for token classification?

A

“!pip install seqeval
import evaluate
metric = evaluate.load(““seqeval””)”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

[q] What are the basic steps in a compute_metrics() function that takes the arrays of predictions and labels, and returns a dictionary with the metric names and values?

A

”* Take the argmax of logits to get predictions.
* Convert integer indices to labels, ignoring special tokens.
* Call metric.compute() on the predictions and labels.
import numpy as np
def compute_metrics(eval_preds):
logits, labels = eval_preds
predictions = np.argmax(logits, axis=-1)

#Remove ignored index (special tokens) and convert to labels
true_labels = [[label_names[l] for l in label if l != -100] for label in labels]
true_predictions = [
    [label_names[p] for (p, l) in zip(prediction, label) if l != -100]
    for prediction, label in zip(predictions, labels)
]
all_metrics = metric.compute(predictions=true_predictions, references=true_labels)
return {
    ""precision"": all_metrics[""overall_precision""],
    ""recall"": all_metrics[""overall_recall""],
    ""f1"": all_metrics[""overall_f1""],
    ""accuracy"": all_metrics[""overall_accuracy""],
}"
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

[q] How do you set up an Accelerator with a model to train?

A

“from accelerate import Accelerator
accelerator = Accelerator()
model, optimizer, train_dataloader, eval_dataloader = accelerator.prepare(
model, optimizer, train_dataloader, eval_dataloader
)”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

[q] What does a postprocess() function need to do during a token classification model’s training?

A

takes predictions and labels and converts them to lists of strings, like our metric object expects.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly