Lectures Flashcards

1
Q

different types of computational modelling have radically different…

A

assumptions about the nature of cognition

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

most forms of computational modelling…

A

involve some form of simulating a cognitive process

ie. input -> “model” -> behavioural output

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

models are different on their level of analysis

A

Marr’s levels:
- neural
- algorithmic
- computational

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

how does computational modelling aid in understanding human behaviour?

A

by establishing a concrete definition of a cognitive process

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

origins of modelling

A

computer simulations have been popular since early years of psychology

the importance of computation was recognized at an early stage ie. Turing in 1950

Weiner (1948) and Shannon (1949) conducted early mathematical theories of information and communications

Society for Computation in Psychology

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Weiner and Shannon

A

Weiner (1948) and Shannon (1949)

conducted early work in mathematical theories of information and communications

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Society for Computation in Psychology

A

formed in 1971

one of the early subgroups of cognitive psychology

prof is a member

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

2 types of analytical models

A
  1. recognition memory experiment
  2. signal detection theory
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

recognition memory experiment

A

presented with a list of words

presented with pictures of those words

tested for old or new words

sometimes falsely accept things that didn’t occur

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

signal detection theory

A

measurement of the difference between two distinct patterns

first pattern is the one you’re supposed to pay attention to

second pattern involves the random noise that distracts a person/machine’s ability to collect and process info

essentially looks at how easy/difficult it is for someone to process info and respond to it when they’re also being exposed to background noise/distractions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

the primary model type we’ll look at in this course…

A

simulation models

output of model isn’t deterministic

underlying randomness in the model (typically implemented with random number generators)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

mind as computer

A

Pylyshyn 1984

mind takes in information from senses

integrates them and creates perceptual experience and behaviour

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

knowledge acquisition: Plato vs Chomsky

A

Plato: knowledge must be gained via experience

Chomsky: we are born with innate knowledge and learning mechanisms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

poverty of the stimulus

A

there is no way that we must hear every form of language we produce in order to learn it

we produce more language than we experience

and all possible language is even greater than the language we produce

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

the difference between ‘language experienced’ and ‘language produced’ is accounted for through…

A

innate knowledge

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

possible solution: Simon (1969)

A

discussing the path taken by an ant on a beach, Simon noted that the ant’s path is “irregular, complex, hard to describe. but its complexity is really a complexity in the surface of the beach, not a complexity in the ant.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

big data and natural language processing

A

collection of large text sources has changed how we think about studying language

possible to propose learning mechanism and train on realistic data

a model can be “born” into a realistic language environment

we then gain insights into cognition and language performance by examining how the model learns and functions

also is a powerful natural language processing tool

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

T/F: virtual environments are approaching real world complexity levels

A

true

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

language learning: bi-directional benefit

A

we benefit from using large, realistic text sources because we can train models on them

the models give us insight into cognition/language performance/learning

also become powerful natural language processing tools

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

corpus-driven modelling

A

identifies strong tendencies for words/grammatical constructions to pattern together in particular ways

while other theoretically possible combos rarely occur

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

corpus-driven modelling allow for…

A

connections between lexical experience and lexical behaviour

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

first corpus ever

A

Brown corpus of Kucera and Francis

1967

consisted of about 1 million words, sampled from different areas

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

examples of text-based resources now available for use for corpus-driven modelling

A

Grade 1-12 textbooks

Scientific journal articles

Newspaper articles

Wikipedia

TV and movie subtitles

Books

Urban dictionary

Reddit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

distributional models of semantics

A

usage-based model of meaning

based on assumption that statistical distribution of linguistic items in context plays key role in characterizing their semantic behaviour

distributional models build semantic representations by extracting co-occurrences from corpora

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

internal versus external theories of cognition

A

internal: involves attending internally to thoughts, memories and mental imagery

external: involves attending to stimuli in the external environment

brain, body, environment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

organization of long term memory

A

long term memory

splits into:
explicit/declarative (conscious) and implicit (unconscious)

explicit/declarative splits into:
semantic (events, experiences) and episodic memory (facts, concepts)

implicit splits into:
priming and procedural memory (skills, tasks)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

explicit/declarative memory splits into…

A
  1. semantic memory (events, experiences)
  2. episodic memory (facts, concepts)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

implicit memory splits into…

A
  1. priming
  2. procedural memory (skills, tasks)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

semantic memory

A

refers to what you know

events, experiences

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

how is semantic memory tied to language?

A

not necessarily tied to language, but intimately connected

language is a general organizing principle of memory

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

lexical semantic memory

A

memory of word meanings

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

study of semantic memory examines…

A

storage and retrieval

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

modern theories of semantics

A

based in experience

environment serves as model/constraints

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

2 branches of “based in experience” theories of semantics

A
  1. grounded/embodied theories
    - our perceptual world (and our brains, which are embodied) is used as our main info source to understand the world around us
  2. text-based machine learning
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

frontal lobe

A

language processing

emotional regulation

executive functioning

planning

organizing

memory

impulse control

problem solving

selective focus

decision making

behavioural control

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

temporal lobe

A

episodic memory
(involved in comprehension, storage and retrieval of memory)

hearing ability
- first area that processes speech info, turns it into a linguistic code

memory acquisition

some visual perceptions

categorization of objects

comprehension

memory retrieval

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

perisylvian region

A

area of brain responsible for language

composed of:
- primary auditory cortex
- wernicke’s area
- angular gyrus
- arcuate fasciculus
- primary motor cortex
- broca’s area

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

wernicke’s area

A

constructs rep of meaning for linguistic info

damage from stroke to this area = fluent/receptive aphasia
- loss of ability to understand and create meaningful language
- grammatically correct but incorrect meaning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

broca’s area

A

responsible for linguistic production

damage from stroke to this area = non-fluent/productive aphasia
- loss of ability to produce fluent language
- but can still understand language

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

wernicke’s location

A

posterior temporal lobe

many connections to primary auditory cortex

heavily connected to Broca’s area

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

wernicke’s = important for…

A

storage and retrieval of word representations, meanings, grammar

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

broca’s location

A

posterior inferior frontal region

next to primary motor cortex (responsible for muscles used to produce speech)

sometimes called motor speech areea

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

arcuate fasiculus

A

connection between Wernicke’s and Broca’s area

important for BOTH phonological and lexical-semantic processing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

early theory of semantic memory - devised by Collins & Quillian

A

hierarchical networks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

hierarchical networks

A

Collins & Quillian

suggest our info in memory is organized hierarchically - can be repped by a tree

  • superordinate at the top
  • as you continue down the network, get more subordinate info
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

what kind of info is at the bottom of the tree in hierarchical networks?

A

actual instances of a category

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

if information is stored in the brain in the way suggested by hierarchical networks, then there should be a corresponding connection between…

A

the amount of time it takes you to find connections between these properties

direct connections will be faster

think about it like walking from point to point

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

living thing: example of hierarchical network

A

living thing - connects to propositions “is” and “can” and then to “grow” and “living”

living thing: connects to propositions “is a” and then to either
1. plant
2. animal

plant - connects to “is a”
1. tree
2. flower

these eventually link into specific examples
- pine, oak, rose, daisy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

how did Collins and Quillian test if the timing of their network in validating closeness of associations actually applies to human processes?

A

gave people a sentence that was true or false

had them say whether it was true or false

ie. ‘a canary can sing’, ‘can walk’, ‘has skin’
- looking at properties progressively higher up in the network

turns out that increasingly high properties take longer to validate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q

are Collins & Quillian’s findings supported in all categories?

A

no, not validated in all categories

a good first step, but not exhaustive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
51
Q

2 pieces of theoretical refinement: Smith, Shoben & Rips

A
  1. proposed that items can be repped as a SET OF FEATURES
    - each concept is described by a set of features that define it
  2. meaning can be described as a position in a geometric space
    - vectors
52
Q

vectors

A

look at how similar and different certain vectors are

use trigonometry to calculate the angles between different vectors

once you have the numerical similarity between the vectors, you can plot how they are distributed in space

53
Q

vector cosine

A

calculated using trigonometry that examines angles between different vectors

will come up with value between 1 and -1

1 = the same (very similar)
-1 = opposite

54
Q

multidimensional scaling

A

uses the vector cosines to place words in a 2D space

visually shows their similarity

more similar items will be closer to each other within the space

helps visualize how we connect things in our minds

55
Q

what are features?

A

classical approaches propose that they are properties of categories

ie. features of cars: “has wheels”, “used for transportation”, “has doors”, “has an engine”

56
Q

uninterpretable features: multidimensional scaling models

A

multidimensional scaling don’t carry interpretable features

the locations of things in space don’t map onto features like “has wheels” or “has a door”

can’t say that location x in a matrix means that word y has a door

57
Q

how do machine learning models construct features?

A

from text

not typically based in perceptual environment

some are interpretable, others are not

58
Q

in neural networks, all info is distributed across…

A

the WIDTH of the network

if you damage the network, all information decays together (not like you just lose a chunk of it)

59
Q

topic models

A

probabilistically places words matched on whether that word has a feature or not

ie. probability value that a certain word is a living thing, or is red, or can move etc/

good for information organization, can categorize info well

60
Q

topic models are good at ______ but aren’t really used as a _______

A

good at information organization/categorization

but aren’t really used as a theory of cognition

61
Q

rogers and mclelland worked on what kind of model

A

neural network

62
Q

basic idea behind roger and mclelland’s neural network

A

based of off interest in how children acquire language

take propositions (sentences)

give model a sentence, derived from a representation network model

give the model a word (canary) and proposition (can)

then have an output layer with all sorts of possible options

want model to produce certain options, and not produce others

ie. want it to produce ‘swim’, ‘grow’, ‘fly’ but not ‘swim’

if the model gets something wrong, it uses back-propagation to adjust the weights so that next time it’s less likely to make the same mistake

can do this because it’s a supervised network (we know what we want the network to produce, so we know when it’s wrong)

by end of training cycle, model produces the correct output

63
Q

models of collins & quillian versus models of rogers & mclelland

A

collins & quillian:
- hierarchical networks

rogers & mclelland:
- neural networks

64
Q

supervised networks

A

we know what we want the network to produce

so we know when it is wrong

allows for back-propagation/error-driven learning

ie. neural networks are supervised

65
Q

back-propagation

A

error-driven learning

possible in supervised networks

when we know the output that we want the model to produce

at first, the network will produce “noise” (the wrong things)

but since we know what we want it to produce, we can CHANGE THE CONNECTIONS OF THE WEIGHTS

so that next time it’s incrementally more likely to produce the correct activations

do this hundreds of thousands, millions of times

eventually the network will produce the right activation

66
Q

error-driven learning is really just…

A

reinforcement learning

67
Q

each arrow in a network…

A

reps a diff weight/numerical value

which is adjusted depending on how incorrect the network is

68
Q

do we want a high or low learning rate?

A

low learning rate

so that small changes are made to each input

means that a lot of learning trials are required

generally must be trained multiple times on same corpus

69
Q

other term for backpropagation

A

the backward pass

70
Q

what comes out as the output is essentially just the…

A

most activated node in the hidden layers

71
Q

2 main approaches to neural networks

A
  1. localist network:
    - each node reps only one entity
    - people tend to think these are neurologically implausible
  2. distributed representation:
    - info is spread across the nodes
    - instead of being confined to one node
    - preferred, because more similar to brain’s function
72
Q

issue with the whole ‘input = output from many many hidden layers’ thing

A

results in a kind of black box model

what exactly is happening in the hidden layers is unclear

can’t “get into the head of the model” - can’t map it onto what humans do in experimental tasks

led to Bayesian models (back-propagation networks that feed into other back-propagation networks…train each layer separately, don’t have to go all the way back to the first layer)

73
Q

three names associated with back-propagation

A

Rumelhart, Hinton & Williams (1986)

74
Q

the trajectory of learning followed by rogers’ and mclelland’s model maps onto…

A

learning trajectories of children as they acquire language

in the beginning, model produces noise (outputs are all equally likely, close together in 2D space)

but with training, they begin to split apart and are weighted differently (just like how kids begin to learn words)

75
Q

closed versus open models

A

closed models:
- restricts the model to working with the training materials
- assumes all of the knowledge about the world = contained in the training materials
- allows for clarity in resulting explanation

open models:
- uses millions of samples
- noise is eventually reduced through greater levels of experience
- better than closed

76
Q

Rogers and McLelland models = based on what assumption? open or closed networks?

A

based on the SIMPLIFICATION ASSUMPTION

they are closed networks

“the more detail we incorporate, the harder the model is to understand”
- think of the growing complexity and non-interpretability of chatGPT

77
Q

simplification assumption

A

linked to closed models

suggests when you’re training a model you should give it simple training data

because complicated materials make it unclear as to whether the model is succeeding/failing because of the quality of the data

simple data provides researchers with clarity regarding how good the model was

78
Q

closed models and ecological validity

A

closed models have low ecological validity

not reflective of tasks that humans actually perform

language is very noisy, lots of info all the time

so using simple training materials doesn’t reflect the task that humans face when they’re learning

79
Q

open models require _____ information

A

more

80
Q

BEAGLE model on 300 versus 300 000 propositions

A

300 propositions = closed model
- only takes 300 trials to learn propositions
- can cluster info right away
- not error-driven
- presents sentences as more structured than they are in reality

300 000 propositions = open model
- derived from a large corpus of language
- takes much longer to train, about 300 000 trials

81
Q

why does it take the larger BEAGLE model longer to learn?

A

because the learning corpus and the actual corpus are different (open model)

the actual corpus has more noise and nuance

therefore takes longer to settle and to produce the correct output

because open models learn from actual sentences, it takes more examples of info to come up with the correct structure

82
Q

Current NLP Machine Learning Wars

A

people keep building bigger models, competing against each other

BERT, RoBERTa, GPT-2, T%, Turing NLG, GPT-3

GPT-3 is winning

83
Q

NLP

A

natural language processing

84
Q

is ChatGPT a good model for the brain?

A

not really

it contains way more info than the human brain does

not really an applicable model with which to assess human cognition

85
Q

LLM

A

large language model

ChatGPT, FaceBook, Google

86
Q

perceptual symbol systems

A

proposed by Barsalou as a general theory of cognition

classic view: amodal symbols in cognition

amodal systems have NO CONNECTION to perceptual environment

87
Q

amodal systems have no connection to…

A

perceptual environment

amodal symbol system transduces a partial perceptual experience into a completely new representation language that is INHERENTLY NON-PERCEPTUAL

88
Q

3 problems with amodal approach

A
  1. neurological evidence:
    - findings show that damage to sensory-motor cortex impairs processing of certain modality-based categories (ie. birds)
  2. failure of transduction:
    - no system can elegantly go from perception to symbols
  3. symbol grounding problem: how does the system know what it’s computing?
89
Q

an alternative to amodal systems

A

neural representations

90
Q

neural representations

A

not a physical copy of the perceptual experience

instead a RECORD OF THE NEURAL ACTIVATION that arises during perception

similar to representations of imagery

likely stored in CONVERGENCE ZONES: integrate info in sensory-motor maps to represent knowledge

never completely transduced, perceptual traces are reconstructured

91
Q

8 examples of semantic memory tasks

A

many diff behaviours are studied

  1. word similarity
  2. false memory
  3. free association
  4. semantic priming
  5. verbal fluency
  6. sentence comprehension
  7. discourse comprehension
  8. feature judgments
92
Q

semantic memory models: word similarity

A

most common type of data used for these models

used in model development and model evaluation

give people two words and get them to RATE HOW SIMILAR THEY ARE on a scale

collect ratings from people and average them

compare this number to computational model that’s also learning these words

93
Q

semantic memory models: verbal fluency

A

used in more applied situations

ie. diagnosing conditions like alzheimer’s or schizophrenia

give people a category and ask them to generate as many things as possible from that category

compare the model’s output to output of humans - see if the person fits the model made for a schizophrenic, for example

94
Q

models and dementia

A

models can examine how language use changes prior to diagnosis

because they’re based on data from people in the years leading up to their diagnosis

can quantitatively see how their memory systems are changing

models = a tool to understand how the mem systems of people with dementia change over time

95
Q

representation types: network models

A

words are connected within a semantic network

(ie. ‘release’ connects to ‘capture ‘connects to ‘pirate’ connects to ‘sailor’ connects to ‘anchor’)

generate representation of each item based on the nodes they’re connected to

96
Q

how are network models typically derived?

A

from free association data

give people a word (like ‘car’) and get them to generate features associated with these items

this is how they generate the semantic networks/network models

97
Q

Turk problems

A

issue with network models

explains human behaviours using other human behaviours

Turk problems arise when the representational input is derived directly from human behavioural data

COMPLEXITY OF THE MODEL = HIDDEN WITHIN THE REPRESENTATION

98
Q

who coined the Turk problems?

A

Jones, Hills, Todd

99
Q

are back-propagation models feature models?

A

yes!

features are the activation values of the hidden ;ayer

activation of hidden layer can be used as featural rep of a word

100
Q

important changes occurring in 1990’s-2000’s that helped progress Big Data and Natural Language Processing

A

pre-1990’s - didn’t have large enough language corpora to train models on

but with internet, larger texts were gathered

2000’s - further movement to digitize existing/old texts

large corpora of text brought in a diff domain of modelling

COLLECTION OF LARGE TEXT HAS CHANGED HOW WE THINK ABOUT STUDYING LANGUAGE

101
Q

large corpora has changed how we think about studying language…

A

now possible to PROPOSE LEARNING MECHANISMS and to TRAIN ON REALISTIC DATA

model can be “born” into a realistic language environment

we gain insights into cognition and language performance by examining how it learns/functions

102
Q

T/F: virtual environments are approaching real world complexity levels

A

true

103
Q

NLPs not only help us understand cognition and language performance, but also…

A

are powerful natural language processing tools

104
Q

quantification of the natural language environment: Herbert Simon’s take

A

Herbert Simons said “the apparent complexity of our behaviour over time is largely a reflection of the complexity of the environment in which we find ourselves”

behaviour is adaptive: we shape our cognition to the requirements of our environment
- cognitive system is built such that we can change our behaviours to match the needs of our environment

105
Q

classic goal in the cognitive sciences

A

quantification of the natural language environment

106
Q

quantification of the natural language environment: William Estes’ take

A

William Estes stated that theories of behaviour should shift “the burden of explanation from hypothesized processes in the organism to statistical properties of environmental events”

saying we should look at how people are learning from the environment/responding to it

he was particularly interested in mathematical properties

107
Q

distributional models

A

these types of models learn the meanings of words from the distribution of how they’re used in language

aka embedding models

learn meaning of words from co-occurrence statistics

108
Q

first major distributional model

A

Landauer & Dumais (1997)

Latent Semantic Analysis model

Lan and Dum wanted to switch from current algorithms which would simply be cued with specific words and come up with documents with most overlap

they wanted a more MEANING-BASED approach
- get rid of polysemy effect
- introduce recognition of synonymic meanings

109
Q

LSA works by…

A
  1. examining a large corpus of text
  2. extracting information about how words are used
  3. information is based on frequency usage for particular words
  4. build a vector that reps the meaning of the word in terms of its similarity to other words
  5. decompose the matrix into smaller number of features
110
Q

is LSA error-free?

A

yes, there’s no error signal in the model’s learning (unlike neural networks)

simply accumulate information in memory and use that to drive the model

not using predictive process to hone the model’s learning - it treats each lexical experience equally

111
Q

LSA: supervised or unsupervised?

A

unsupervised

just learns the structure of the dataset

112
Q

4 things we need for distributional models

A
  1. input
    - corpus for model to learn
  2. processing
    - learning algorithm
    - by which info is gleaned from input, extracted and stored in memory
  3. memory
    - feature space
    - representation of where we keep info about the word’s meaning
  4. output
    - task problem
113
Q

distributional models: processing/learning mechanism details

A

neural embedding models take a sentence

they sequentially activate each word on its own

want the model to predict the words that surround that word in that context

predictions = in the output layer

see if the predictions are correct

back-propagate to increase accuracy

114
Q

problem with chatGPT

A

it’s too complex

too many layers - we don’t really know what’s happening

it’s a “black box”

115
Q

Firth quote about word co-occurrences

A

“you shall know a word by the company it keeps”

116
Q

context (source of text) for distributional models

A

many different possibilities

paragraphs, documents, books, authors etc.

117
Q

when processing a sentence, distributional models pre-process. how?

A

pre-processing modifies the sentences/inputs to improve processing

  1. stop list
  2. subsampling
118
Q

stop list

A

stop list of high frequency function words

any word included on the stop list is removed from the sentence

119
Q

subsampling

A

first a frequency distribution is run (custom to the corpus in question)

creates a probability distribution - words with super super high frequencies are skipped

120
Q

if you don’t use stop lists or subsampling to get rid of certain words, then…

A

the model is quickly overwhelmed

every single word will be understood to be similar to “the”

121
Q

are there parallel processes to stop lists/subsampling in real people?

A

yes

eye tracking studies show that when people read a page, they generally skip function words

122
Q

which is better? stop list or subsampling?

A

subsampling

gives you more control over what the model is processing

and it’s controlled by parameters

more training flexibility

123
Q

example of sentence before and after stop list/subsampling

A

if the solvent is insoluble the mixture can be decanted

solvent insoluble mixture decanted

124
Q

after pre-processing…

A

the remaining words are examined

specifically their occurrences with each other word in the corpus

each pair that is found modifies the count in the matrix (strength increases with each pair found)

done word by word: find all the pairs for one word first, then move onto the next word…

125
Q

fundamental component of the processing of these distribution models….

A

similarity between words

126
Q

typical similarity metric in distributional semantics

A

cosine

use a vector cosine: gives value between 1 (very similar) and -1 (no similarity)

value represents placement of the vectors in a 2D space

highly aligned in terms of featural reps = high similarity value

127
Q

to determine if our model actually captures any semantic info…

A

we examine its performance with a word similarity task:
- get people to rate how similar a pair of words are on a scale
- get a set of values pertaining to the relation of words
- TAKE COSINE SIMILARITY OF EACH WORD PAIR (between 1 and -1)

TAKE CORRELATION between the cosine value the model has produced and the similarity value that people are producing

use these values to see how similar the model’s and people’s results are

ideally you want a positive correlation