C4 recognition Flashcards
Recognition in the wider context of cognition/definition
Recognition is the process through which a set of basic sensory descriptions (2½D) of an object are turned into a 3D description that matches what has been seen before (“re-cognizance” = “re-knowing”), irrespective of the angle it’s seen from.
This process involves:
Converting the sensory stimuli to an internal representation and storing this;
Comparing what is sensed (seen/heard/etc.) to what has been experienced before;
Identifying what is perceived, irrespective of its orientation (object-centred description).
Humphreys and Bunce proposed a 5 step model of recognition
Humphreys and Bunce 5 step model of recognition
- Early visual processing
similar to Marr’s raw and full primal sketches - Viewpoint-dependent object descriptions
similar to Marr’s 2½D sketch - Perceptual classification
“what kind of thing is it ?” e.g. it’s a book - Semantic classification/categorisation
“what particular thing is it ?” e.g. it’s a heavy blue paperback - Naming
“which thing is it ?” e.g. it’s the DD303 course book
Types of recognition - Object and face recognition
Two different types of recognition:
•Between-category: “What” something is, e.g. it’s a person, not a vehicle
• Within-category: “What its name is”, e.g. it’s Sigmund Freud
Object recognition and face recognition are considered separately:
• Within-category is more often used with people, objects less so (“It’s an orange, it (probably) doesn’t have a name”);
• An individual face can change due to time, emotion etc.
In studying face recognition a distinction is made between familiar faces and unfamiliar faces:
• Pike et al., found that:
• People can often identify poor representations of famous faces;
• Our recognition accuracy for unfamiliar faces (e.g. in an identity parade) is poor.
Recognising someone’s face appears to use different cognitive processes to recognising the emotion they’re displaying (Young et al.)
Types of recognition - Active processing - recognizing objects by touch
Recognition is an active process (Gibson), even when using visual recognition we actively engage with the environmental stimuli (e.g. visual scanning of a Penrose triangle).
This is even more obvious in the process of recognising objects by touch:
• The brain and touch receptors in the skin form a feedback system where the pressure we apply is regulated by the brain based on the sensory information generated from tactile exploration of an object;
• Information from stretch receptors in muscles enables the location of limbs to be calculated (kinesthesis);
• The sense of proprioception enables the relative location of body parts to be calculated (“how far is my finger from my nose with my eyes closed ?”);
• All this haptic information can be used to generate a mental image of an object;
Lederman and Klatzky (1990) showed that humans use consistent exploratory procedures when examining objects, such as enclosing them, stroking the texture, pressing to gauge hardness etc.
While haptic perception is useful for recognising objects by weight, hardness etc., visual perception can operate at greater distances.
Types of recognition - Recognizing two-dimensional objects
Recognising 2D images may use different cognitive processes to 3D-object recognition.
Different types of theory have been proposed to explain 2D recognition:
Types of recognition - Recognizing two-dimensional objects - Pattern matching theories
The sensed image is compared to a range of templates in memory until a match is found
This seems an unlikely explanation of 2D recognition as it would require either very generic templates or a large number of templates to handle the enormous variety of similar patterns (e.g. of the letter “R”)
Types of recognition - Recognizing two-dimensional objects - Feature recognition theories
The key features of the image are extracted and compared to internal representations until a match is found
This is more generic than pattern matching, so a better explanation;
However ambiguity is problematic - e.g. “a line and a curve” could describe a D, G, P or Q.
Types of recognition - Recognizing two-dimensional objects - Structural description theories
Structural descriptions comprising the key features and how they are organised in relation to each other are compared to internal representations until a match is found
Appears to cope with variety and ambiguity;
Can be described in human and computer language;
Also works with recognition of 3D versions of 2D objects, e.g. a 3D letter “L” has the same structural descriptions irrespective of orientation.
Types of recognition - Object-centred vs viewer-centred descriptions
Because 3D objects can be rotated or viewed from many different angles, 3D recognition cannot be explained by theories such as pattern matching or feature recognition that don’t consider the relative location of object parts.
The description that is tested against prior knowledge (and the internal representations in which that knowledge is stored) must be object-centred - i.e. describe the object generally, rather than viewer-centred, otherwise the object could only be recognised from one angle of view.
Recognizing three-dimensional objects - Marr and Nishihara’s theory
Marr and Nishihara suggested three-dimensional objects are recognised by breaking them up into generalised cones.
(A generalised cone is any round-sided 3D solid with the same cross-sectional shape (not necessarily size) throughout its length. For example cutting across a cone or vase shape produces circular cross-sections)
This analysis allows any object to be described within a canonical coordinate frame, i.e. the process can be used to describe all objects in the same standard way.
They proposed a multi-step process:
Recognizing three-dimensional objects - Marr and Nishihara’s theory - step one
Step 1: Derive the object shape
Identify the central axis of the object using information from the 2½D sketch;
Work out what shape would result if the silhouette or contour generator of the object was rotated around the central axis (e.g. a rectangular silhouette rotated around its central axis would produce a cylinder, a triangle would make a cone);
For this mental process to result in an accurate conclusion of what the 3D object is like depends on three assumptions:
•each point on the silhouette matches only one point on the 3D object;
• points near each other in the 2D image are near each other in the 3D object;
• points on the silhouette all lie in the same plane.
If any of these assumptions do not hold the object may be incorrectly recognised:
Example: a hexagonal prism viewed end-on and a cube viewed edge-on and tilted forward have the same silhouette:
These violate assumption 3, because the points marked “a” in the hexagon are coplanar while points (a, b, c) in the outline of the cube lie in three different planes; Consequently a cube may be mis-recognised as being a hexagon.
Recognizing three-dimensional objects - Marr and Nishihara’s theory - step two
Step 2: Locate the objects component axis/axes and derive a 3D description
Work out the areas of concavity (where the silhouette ‘bends in’);
Divide the object into component parts (primitives) by joining the areas of concavity;
Find an axis for each primitive;
Link all the primitives to form a 3D description by working out how each of their axes relates to the horizontal axis of the object.
Simple object outlines (e.g. a circle) don’t have any areas of concavity so the axis of symmetry is used instead.
Complex objects represented as a hierarchy of primitives allows for general recognition (“It’s a person”) as well as capturing detail (“Four limbs hang off their body, each ends in fingers/toes, and a head sticks out the top”)
Recognizing three-dimensional objects - Marr and Nishihara’s theory - step three
Step 3: Compare the 3D description to a mental catalogue of objects to find a match
The 3D description is compared against a mental catalogue of 3D models of all previously seen objects;
This catalogue is hierarchical, with more detail at each level;
If a match is found, the process stops and the object is recognised.
Recognition does not depend on viewing angle as the description and model entries are all 3D representations.
Recognizing three-dimensional objects - Marr and Nishihara’s theory - evidence for
Marr and Nishihara’s claims that locating the central axis is critical to recognition is supported by evidence:
Lawson and Humphreys (1996) showed that recognition was adversely affected in line drawings that were rotated so that their major axis was foreshortened (i.e. rotated towards the observer), maybe because this made it difficult to locate;
Warrington and Taylor (1978) found patients with right-hemisphere focal lesions had difficulty recognising objects presented from an unusual viewpoint, or confirming that two photos were of the same item if one showed an unusual view;
• They may have been unable to convert the 2D image to a 3D object-centred representation;
• Features that were important to identification may have been obscured by the rotation.
Humphreys and Riddoch (1984) used foreshortened images and others of the same objects where features were hidden - the foreshortened ones were recognised less, suggesting that major axis identification is important to forming the 3D model.
Explains misinterpretation if the contour generator is misidentified (Step 1)
Recognizing three-dimensional objects - Marr and Nishihara’s theory - evidence against
Within-category discrimination is hard to explain because the conversion of an object to generalised cones should map all exemplars of the category to the same representation. This would mean that we can’t tell the difference between one instance of a thing and another (e.g. all border collies would be recognised as the same thing)
Recognizing three-dimensional objects - Biederman’s theory
Agrees with Marr and Nishihara’s theory:
• Assumed complex objects are represented as hierarchies of simpler shapes;
• Proposed that approx. 36 geons (primitive shapes including rectangular prisms as well as generalised cones) are used to represent objects;
• Also assumed that concavity is used to sub-divide objects.
Offers a different explanation to Marr and Nishihara of how the 3D model is created from the 2D image:
Every geon has five non-accidental properties (key invariant features):
• Curvilinearity: a curve in the 2D image maps to a curve in the 3D model;
• Parallelism: parallel lines in the image map to parallel lines in the model;
• Cotermination: edges that meet in the image will meet at the same point in the model;
• Symmetry: the same axes of symmetry exist in the image and the model;
• Collinearity: straight lines in the image map to straight lines in the model.
Each sub-component of the image is analysed in terms of these and matched to the geon with the same properties;
The matching geons are assembled into a 3D representation;
Recognition is completed by comparing this model against previous exemplars stored in memory.
Recognizing three-dimensional objects - Biederman’s theory - evidence for
Explains misinterpretation as being due to incorrect geon matching:
• Example: a wheel viewed edge-on may be interpreted as a rectangle because the collinearity property will match a rectangular geon rather than a cylindrical one due to the viewpoint.
Explains why objects were harder to recognise when parts of the image with greater concavity were removed than when less concave parts were deleted (Biederman,);
Object-priming studies Biederman and Gerhardstein,) show that creating the object-centred model is critical to recognition:
• showing an image of an object from one viewpoint improved recognition of a second image of the object if the viewpoints of the images were less than 135⁰ apart - priming would not occur if an object-centred model was not formed;
• recognition performance declined if one or more geons was hidden between views, even if the difference in viewpoints was less than 135⁰ apart - suggests geons are indeed used in forming the 3D model.
Recognizing three-dimensional objects - Biederman’s theory - evidence against
Bulthoff and Edelman found participants were unable to recognise objects presented from a novel viewpoint, even when that should have made it possible to form a 3D model.
Tarr: recognition may not rely completely on forming an object-centred model, some factors may depend on viewpoint.
Within-category discrimination is hard to explain for the same reason as Marr & Nishihara: conversion of an object geons should map all exemplars of the category to the same representation. This would mean that we can’t tell the difference between one instance of a thing and another (e.g. all border collies would be recognised as the same thing)
Face recognition
Face recognition is a different skill to other forms of recognition: all faces are similar at a level and have the same generic features.
Face recognition different to object recognition:
• within-category recognition
• more specific recognition (“it’s a face, but which face is it (who’s face) ? is it familiar ? is it male/female ? what ethnicity ? etc.”)
Takana showed this is an expert-level skill, similar to experts who can distinguish types of bird for example.
Familiar and unfamiliar face recognition also appear to be different processes:
• Familar faces can be recognised even after long gaps:
• Bahrick et al. (1975) showed participants recognised names and faces of schoolmates after 35 years.
• However while teachers recognised 69% of recently-taught pupils, they only recognised 26% of those taught 8 years previously (Bahrick)
• Yin (1964) found that 93% of unfamiliar faces were remembered after a short period but fewer were recognised if the viewpoint or expression was changed. Bruce (1982) suggested this showed that ability to remember facial features was being tested, not facial recognition.
Kemp et al. studied participant’s ability to match faces to similar ones:
• Cashiers accepted a credit card with a picture of a different but similar person 34% of the time, even if only the sex and ethnicity matched.
Bruce et al. found that participants shown a picture of an unfamiliar male only identified the face when presented with nine others 80% of the time.
• Performance was worse if they were not told whether the face was present in the test stimuli, or if the pose was changed.
Kilgour and Ledermen showed unfamiliar face recognition by vision and touch was no more accurate than by touch alone.
Modelling in face recognition - a connectionist model
Recognising faces uses information from a set of processes:
• Sensory information analysis -> perception (“it’s a face”) -> recognition (“it’s familiar”) -> retrieval of other information and name
Errors in any step can lead to recognition mistakes
• Young et al. (1985) used a diary study to research recognition mistakes:
• 22 participants recorded their mistakes over 8 weeks
• 5 main types of error in 3 groups were identified:
•Mistaking someone familiar as unfamiliar and vice-versa (~48%)
• Being unable to ‘place’ someone - maybe in an unusual context or due to poor viewing conditions (~48%)
• Recognising someone but rejecting it (“It can’t be them because they’re abroad on holiday”) (~5%)
They concluded that:
• different types of error show that different types of information are recalled by different processes
• information about people is always retrieved before their name, and
• names are never recalled without other information.
On this basis they (and others such as Bruce and Young) suggested a cognitive model of face recognition:
• we encode people’s faces when we first meet them - this is compared to data in FRUs (face recognition units)
• if the data in a FRU matches the encoded data, PINs (Person Identity Nodes) are activated
• PINs store information about people (occupation etc.), and when activated their data and the person’s name (if known) are available.
Modelling in face recognition - Bruce and Young connectionist IAC model
Errors in face recognition and based on the cognitive model of face recognition - Bruce and Young proposed a connectionist IAC (Interaction Activation and Competition) model
The model proposes pools of the following unit types:
• FRU: one face recognition unit exists for each familiar person, activated when that person is seen from any angle. Each stores a representation of what the person looks like.;
• PIN: one person identity node exists for each familiar person, containing information about them.
• SIU: Information about types of information are contained in semantic information units, e.g. names of occupations.
• Lexical output: units representing words, names etc.
• WRU: word recognition units are like FRUs, but are activated when a word is perceived.
• NRU: people’s names are stored in name recognition units, linked to the PINs representing those people. Words in WRUs that are names are linked to the corresponding NRU.
Three key pathways exist between pools (note pools are connected bi-directionally):
• Triggered by seeing a face: FRU ↔ PIN ↔ SIU ↔ Lexical output
• Triggered by a name: WRU ↔ NRU ↔ PIN ↔ SIU ↔ Lexical output
• Triggered by other information: WRU ↔ SIU ↔ Lexical output
A strength of this model is that it can be used to explain a number of phenomena
Priming: Activated SIUs for the familiar person will also activate PINs for any other people with the same information, e.g. if recognising “Michael Jackson” activates the SIU for “Singer”, PINs for other singers will be excited in turn so other singers will be recognised more quickly.
Modelling in face recognition - Bruce and Young connectionist IAC model - explaining phenomena - face recognition
Process A. The step-wise process of face recognition:
- Step 1: When a familiar person is seen, their FRU is activated;
- Step 2: This in turn activates/excites the PIN for that person highly enough for conscious recognition, confirming their familiarity, and inhibits all other PINs;
- Step 3: The SIUs for that person (name/nationality/occupation etc.) are activated making information about them available, all other SIUs are inhibited;
- Step 4: Lexical output units linked to the activated SIUs are activated - the person (and the other information known about them) can be named. Other lexical output units are inhibited.
Modelling in face recognition - Bruce and Young connectionist IAC model - explaining phenomena - by seeing/hearing peoples names
Process B. Recognition by seeing/hearing people’s names:
- Step 1: When a familiar name is seen/heard, the WRU for that name is activated (e.g. “Jackson”). other WRUs are inhibited;
- Step 2: The NRUs for people with that name (e.g. “Jackson Pollock”, “Michael Jackson”) are activated/excited, all other NRUs are inhibited;
- The process continues from Process A, Step 2
Modelling in face recognition - Bruce and Young connectionist IAC model - explaining phenomena - other personal information
Process C. Recognition of other personal information (occupation, nationality etc.)
- Step 1: The WRU for the word sensed is activated (e.g. “Teacher”);
- Step 2: This in turn activates/excites the corresponding SIU, and inhibits all other SIUs;
- The process continues from Process A, Step 2