High level perception - THEORIES Flashcards
(37 cards)
Theorists in structural description models of object recognition
Biderman (1987)
Marr (1982)
Biderman’s theory of object recognition
- geons
He calls his theory of object recognition: ‘recognition by components’
We recognise objects by combining simple 3D shapes called “geons.” We can combine these geons in different ways to create an abstract form of an object
These geons are identified using a few basic edge features that are easy for our brains to detect, no matter how we view the object or how clear the image is. This makes it possible for us to recognize objects even from different angles or when the image is unclear.
In Biderman’s model, what are 3D objects represented using
3D objects are represented using basic volumetric parts (primitives) known as ‘geons’
How many geons did Biderman say were enough to represent most objects
36
The role of the long term memory in Biderman’s theory
We have these geons in our LTM and we are able to construct them in different ways and make a very abstract form of an object
Torch example: Biderman’s theory (1987)
In order for us to recognise something as a torch, we would recognise its three dimensional constituent parts - the shapes that make up the torch (geons)
Geons
Constituent 3 dimensional parts
You can think of geons as basic shapes like cylinders, bricks, pyramids etc that you could use to ‘build’ more complex objects
Marr (1982)
Cylinder
Structural description theory of high-level vision
Came before Bidermann’s theory but they are both similar - have the same concept
Now, the difference here in Mars theory is that instead of having all of these geons (in Bidermann’s theory , he says, we have about 36)
In Mars’ theory he argues we only have one geon that he calls a generalised cylinder
He argues we can build abstract objects from this generalised cylinder
Key points to know about these theories of object recognition
- Biderman and Mar’s theory are both structural description models of object recognition
- They both argue that object recognition is accomplished by having an abstract representation in memory, whether that’s in the form of geons or in the form of a generalised cylinder
Similiarites of Biderman and Marr + Nishihara’s theories of object recognition
Both theories aim to explain how humans recognise objects based on their 3d shapes
Both propose a hierarchal approach to object recognition, starting with basic features and progressing to more complex representations
Both suggest that recognition process is unaffected by viewpoint.
Difference between Bicderman and Marr + Nishihara’s theories of object recognition - building blocks
RBC uses a limited set of 36 “geons” (e.g., cylinders, cones) as fundamental building blocks for object recognition. Marr and Nishihara, on the other hand, propose a more general approach using a ‘generalised cylinder’
Limitations with the object recognition theories
Their focus on static object representations: they ignore the importance of motion in object recognition e.g. We recognise patterns of movement associated with people and objects to aid in their recognition (such as recognising somebody by their walk)
Specifically, both theories struggle to account for the influence of viewpoint and the dynamic nature of object recognition in real-world scenarios where movement and context play a crucial role.
These models only explain recognition of basic classes of objects, but to identify and distinguish between different faces, breeds of animal or types of pen will require a more complete explanation.
Marr’s theory of object recognition
1982
He argued object recognition involves various processing stages and is much more complex than had previously been thought.
Explains how we understand what we see by breaking the process down into 3 main stages:
- Primal sketch
- 2.5D sketch
- 3-D model representation
- Primal sketch
First, we detect basic features like edges, light, and dark areas—this is like an outline of the image
- 2.5-D sketch
Next, we build a rough idea of the shapes and how they’re positioned in space, based on lighting and depth, but only from our point of view. disparity. It resembles the primal sketch in being
viewer-centred or viewpoint-dependent (i.e., it is influenced by the angle from which the observer sees objects or the environment).
- 3-D model representation
This describes objects’ shapes and their relative positions three-dimensionally; it is independent of the observer’s viewpoint and so is viewpoint-invariant.
We create a full, detailed 3D representation that helps us recognize objects no matter the angle we see them from.
Viewpoint invariant
Object recognition is independent of the oberver’s viewpoint
Why was Marr’s theory so influential?
He successfully combined ideas from neurophysiology, anatomy and computer vision
He was among the first to recognise the enormous complexity of object recognition
His distinction between viewpoint-dependent and viewpoint-invariant representations triggered much subsequent research
Limitations of Marr’s theory - bottom-up processing
He focused excessively on bottom-up processes, admitting himself that “Top-down
processing is sometimes used and necessary.”
However, he de-emphasised the major role expectations and knowledge play in object recognition
Limitations of Marr’s theory - vision
Marr assumed that “Vision tells the truth about what is out there” - he assumed that our visual system always gives us an accurate and truthful picture of the world.
But there are numerous exceptions - Our perception can be distorted by things like distance or visual illusions.
e.g. When you look down from a tall building, people look tiny—even though you know they’re not.
In the vertical-horizontal illusion, a vertical line can look longer than an identical horizontal one, even though they’re the same length.
These examples show that vision doesn’t always “tell the truth,” which goes against Marr’s idea. So, a key limitation is that his theory doesn’t fully account for the ways our perception can be misleading or influenced by context.
Limitations of Marr’s theory - complexity
“The computations
required to produce view-independent 3-D object models are now thought
by many researchers to be too complex.”
Researchers now think that the brain likely doesn’t do all of these complicated calculations because it would take too much time and effort. Instead, we may rely on simpler, faster methods to recognize objects.
So, the limitation is that Marr’s theory might overestimate how much detailed processing the brain actually does in everyday vision. It may not be realistic to assume we always create full 3D models just to recognize things.
Biderman’s stages of object recognition - geons
- The geons of an object are determined
- When this information is available, it is
matched with stored object representations or structural models consisting
of information about the nature of the relevant geons, their orientations,
sizes and so on. - Whichever stored representation fits best with the geon-based information obtained from the visual object determines which object is identified by observers.
Biderman’s stages of object recognition - first stage
- The first step to in recognizing an object is detecting its edges, based on things like brightness, texture, and color. This creates a basic outline, like a line drawing. Then, the brain figures out how to break the object into its basic parts, called geons.
Biderman’s stages of object recognition - stage 2
Which edges should we focus on?
Biederman (1987) said that we pay most attention to non-accidental properties—features that stay the same no matter the viewing angle. Examples include whether a line is straight or curved, and whether a shape bends inward (concave) or outward (convex). Concave parts are especially important. Using these stable features, the brain builds the object’s geons.