FB_InferSent Flashcards Preview

ML cards > FB_InferSent > Flashcards

Flashcards in FB_InferSent Deck (11):


1. use universal sentence representations as features on a wide range of transfer learning tasks
2. how to use Natural Language Inference (NLI) to train a sentence encoder for universal embeddings
3. investigate the network architecture to use for the sentence encoder


SNLI dataset

Stanford Natural Language Inference (SNLI) dataset:
- 570k human-generated English sentences pairs
- manually labeled as 1 of 3 categories: entailement, contradiction, neutral


Model training methods

1. sentence based encoding model that separates the encoding of individual sentences (premise, hyphothesis)
2. model that does use encodings of both sentences for cross-features and attention between them
Method 1 is selected in this paper


Vector representation

Uses 3 matching methods for the 2 sentence encoding vectors u, v:
1. concatenation (u, v)
2. element-wise product u * v
3. absolute value of element-wise difference | u - v |


Model layout

The model that uses the vector representation is
- a 3-class classifier with multiple fully connected layers with a softmax output layer


Models - network architectures

- standard LSTM, GRU
- BiLSTM with mean/max-pooling
- self-attentive BiLSTM
- hierarchical ConvNey


BiLSTM with mean/max-pooling vector

- over T time-steps a sentence is represented by the hidden states (hs) in direct and reverse reading; from the concatenation of the hs, for each time-step either the max value or the mean value is selected as the dimension of the final vector representation of the sentence


Training process

- SGD with learning rate: 0.1, weight decay: 0.99, mini batches: 64
- for decreased dev accuracy over an epoch the learning rate is divided by 5
- classifier: multi-layer perceptron with 1 hidden-layer with 512 hidden units


Evaluation for transfer learning

- used the sentence embeddings in evaluation of 12 transfer tasks: binary and multiclass classification, entailment and semantic relatedness, phrase detection, caption-image retrieval


Selected sentence encoder model

- BiLSTM with max-pooling with 4096 embedding size


NLI task suitability hypothesis

- NLI is a task based on high-level understanding of semantic relationships within sentences