The Language of Vision - Cavanagh Flashcards
(22 cards)
Chomsky’s Theory and reinterpretation
Chomsky believed that language is innate, or in other words, we are born with a capacity for language.
the deep structure—originally thought to underlie spoken language—may actually have evolved to resolve ambiguities in visual perception, implying that vision and language share fundamental cognitive architecture
What is a “visual language”?
a structured system used by the visual system to encode and communicate information to other parts of the brain.
Why is it unlikely that visual information is passed to other brain areas as pictures or movies?
Because receiving brain areas would require their own visual systems to interpret such formats, which they do not have. Thus, visual information must be translated into a more abstract and accessible format.
What is the purpose of exporting visual descriptions to other brain modules?
To enable integrated cognitive functions such as speaking about visual experiences, planning movements (e.g., catching a ball), remembering faces, and reading.
Global workspace theory
Bernard Baars
a “blackboard” in the brain, where different brain areas “post and read messages” in a form that is understandable to the other areas. Only selected, important information gets posted.
What elements might a visual language include, and how can these be structured?
A visual language may include elements such as objects (like nouns), actions (like verbs), and spatiotemporal relations (like prepositions).
What are 4 properties required for a communication system to qualify as a language?
- Compositionality/Productivity (building complex messages from simpler parts - verbs/actions - reusable),
- Arbitrariness (no inherent link between symbol and meaning-nouns/objects),
- Displacement (ability to refer to things not currently present- prepositions / behind), and
- Recursion (ability to embed structures within structures)
How does visual object recognition support the idea of arbitrariness in visual language?
The pattern of activity in the brain is an arbitrary label, it is not a little picture of the object - vision has arbitrariness
How are actions represented in visual language, and how do they exhibit compositionality?
Actions function like verbs, represented by familiar motion patterns that can flexibly take various subjects (objects), as seen in compositional visual descriptions. This structure mirrors how verbs work in spoken language.
they are reusable
How does vision represent time (past, present, future) in static scenes?
The visual system can imply temporal information even in static images—e.g., a crushed soda can implies a past event, and a poised baseball bat may suggest an imminent action.
To what extent does the visual system, as opposed to cognitive processes, make predictions about the future?
Some studies suggest that only situations with immediate consequences may trigger predictions about future events within the visual system.
Delay in understanding (even familiar events) suggests cognitive deduction.
Why is the visual preposition “behind” especially important?
When we see a dog behind a gate, we complete the missing parts of the dog - languages can reference things that are not present in front of us - vision has displacement.
main difference between spacial relations and objects&actions
it takes time and cognitive effort to notice the relations between elements
Is the perception of causality a cognitive or visual process?
Causality - who did what to whom
Some require more cognitive processing, but the brain can detect some causal interactions (like launching or colliding objects) directly from visual input, without requiring high-level reasoning.
What is recursion, and how does it appear in vision?
Recursion is the embedding of a structure within another structure of the same type (e.g., a sentence within a sentence).
In vision, recursion appears in pictures within pictures, paintings within paintings, and objects that carry their own history (like deformations), suggesting that visual perception can be recursive just like language.
What is meant by ‘visual grammar’?
Visual grammar refers to the rules and structures that guide how visual components are combined and identified, much like grammatical rules in spoken language determine sentence structure. It underlies how we recognize what something is and what it’s doing in a visual scene.
What is meant by “ungrammatical vision”?
visual scenes that violate the implicit rules of visual grammar—for example, impossible events or illogical structures—which cause confusion or hesitation in perception, much like a grammatically incorrect sentence in language.
Why is distinguishing visual vs. cognitive errors in (visual) grammar important?
(Chomsky quote)
because not all perceptual errors are caught by the visual system; some are detected later by cognitive processes. For instance, a sentence might be grammatically correct but semantically nonsensical—this distinction also applies to vision (optical illusions)
Chomsky: “Colorless green ideas sleep furiously”
- rules of grammar are independent of high-level meaning. Similarly, we perceive a nonsensical AI photo even though it violates high-level “semantics”.
How is visual grammar thought to be acquired?
Through exposure to regularities in visual input—like recurring patterns in objects, actions, and spatial relations—without needing a specialized understanding of physics. These patterns help form internal rules, such as actions requiring agents.
Is visual language the origin for all spoken language?
While vision may have offered a template, spoken and visual grammars are structurally distinct. However, they might share common algorithms for processing and rule extraction, making visual grammar a precursor, not a direct ancestor, of spoken language.
Reception, production and action in language and vision
Vision is receptive and language is productive?
Language of vision changes vision from purely receptive to both productive and receptive.
Actions are guided by vision but not produced by it.
Function of awareness in the language of vision
- selects items to send to the other parts
- constructs the information in a “language” they will understand