Flashcards in How vision guides action Deck (76)
How would you build a robot for making apple pie? What kind of ‘computational brain’ would it need?
· Robots used to
o Test predictions
- Describe processes – produce description then used as qualitative model
Pancakes made by dual-arm robots from an internet recipe (Beetz et al., 2011)
specific tasks that need to be done
Pancakes made by dual-arm robots from an internet recipe (Beetz et al., 2011) research
Tenorth and Beetz (2013)
Tenorth and Beetz (2017)
Holz et al. (2014)
Tenorth and Beetz (2013)
Autonomous service robots will have to understand vaguely described tasks, such as “set the table” or “clean up”. Performing such tasks as intended requires robots to fully, precisely, and appropriately parameterize their low-level control programs. We propose knowledge processing as a computational resource for enabling robots to bridge the gap between vague task descriptions and the detailed information needed to actually perform those tasks in the intended way. In this article, we introduce the KnowRobknowledge processing system that is specifically designed to provide autonomous robots with the knowledge needed for performing everyday manipulation tasks. The system allows the realization of “virtual knowledge bases”: collections of knowledge pieces that are not explicitly represented but computed on demand from the robot’s internal data structures, its perception system, or external sources of information. This article gives an overview of the different kinds of knowledge, the different inference mechanisms, and interfaces for acquiring knowledge from external sources, such as the robot’s perception system, observations of human activities, Web sites on the Internet, as well as Web-based knowledge bases for information exchange between robots. We evaluate the system’s scalability and present different integrated experiments that show its versatility and comprehensiveness.
Tenorth and Beetz (2017)
In order to robustly perform tasks based on abstract instructions, robots need sophisticated knowledge processing methods. These methods have to supply the difference between the (often shallow and symbolic) information in the instructions and the (detailed, grounded and often real-valued) information needed for execution. For filling these information gaps, a robot first has to identify them in the instructions, reason about suitable information sources, and combine pieces of information from different sources and of different structure into a coherent knowledge base. To this end we propose the KnowRob knowledge processing system for robots. In this article, we discuss why the requirements of a robot knowledge processing system differ from what is commonly investigated in AI research, and propose to re-consider a KR system as a semantically annotated view on information and algorithms that are often already available as part of the robot's control system. We then introduce representational structures and a common vocabulary for representing knowledge about robot actions, events, objects, environments, and the robot's hardware as well as inference procedures that operate on this common representation. The KnowRob system has been released as open-source software and is being used on several robots performing complex object manipulation tasks. We evaluate it through prototypical queries that demonstrate the expressive power and its impact on the robot's performance.
Holz et al. (2014)
Grasping individual objects from an unordered pile in a box has been investigated in stationary scenarios so far. In this work, we present a complete system including active object perception and grasp planning for bin picking with a mobile robot. At the core of our approach is an efficient representation of objects as compounds of simple shape and contour primitives. This representation is used for both robust object perception and efficient grasp planning. For being able to manipulate previously unknown objects, we learn object models from single scans in an offline phase. During operation, objects are detected in the scene using a particularly robust probabilistic graph matching. To cope with severe occlusions we employ active perception considering not only previously unseen volume but also outcomes of primitive and object detection. The combination of shape and contour primitives makes our object perception approach particularly robust even in the presence of noise, occlusions, and missing information. For grasp planning, we efficiently pre-compute possible grasps directly on the learned object models. During operation, grasps and arm motions are planned in an efficient local multiresolution height map. All components are integrated and evaluated in a bin picking and part delivery task
From perception-guided pancake making by robots to robotic prosthetics: example of bioinspired engineering
• Learning concepts from nature and applying them to the design of artificial systems
◦ "taking autonomous robot control from pick and place tasks to everyday object manipulation is a big step that requires robots to understand much better what they are doing, much more capable perception capabilities, as well as sophisticated force-adaptive control mechanisms (manipulations with a number of fingers) that even involve the operation of tools such as the spatula" (Beetz et al., 2011)
◦ "it's hard. You have done a hundred or thousand of hours of minimal exercises, just trying to do the different grips, bends, rotations ... it takes a lot of time" (Johnny Matheny, 2016, first patient with modular and mind-controlled prosthetic limb implanted directly into skeleton)
- shows how much the brain does that we are not aware of
· Move in specific ways that then bring more information to you
- Memory, sensory and action systems don’t work independently
perception-action cycle research
Escobar et al. (2020)
Perri et al. (2015)
Escobar et al. (2020)
This work presents the HSS-Cognitive project, which is a Healthcare Smart System that can be applied in measuring the efficiency of any therapy where neuronal interaction gives a trace whether the therapy is efficient or not, using mathematical tools. The artificial intelligence of the project underlies in the understanding of brain signals or Electroencephalogram (EEG) by means of the determination of the Power Spectral Density (PSD) over all the EEG bands in order to estimate how efficient was a therapy. Our project HSS-Cognitive was applied, recording the EEG signals from two patients treated for 8 min in a dolphin tank, measuring their activity in five experiments and for 6 min measuring their activity in a pool without dolphin in four experiments. After applying our TEA (Therapeutic Efficiency Assessment) metric for patient 1, we found that this patient had gone from having relaxation states regardless of the dolphin to attention states when the dolphin was presented. For patient 2, we found that he had maintained attention states regardless of the dolphin, that is, the DAT (Dolphin Assisted Therapy) did not have a significant effect in this patient, perhaps because he had a surgery last year in order to remove a tumor, having impact on the DAT effectiveness. However, patient 2 presented the best efficiency when doing physical therapy led by a therapist in a pool without dolphins around him. According to our findings, we concluded that our Brain-Inspired Healthcare Smart System can be considered a reliable tool for measuring the efficiency of a dolphin-assisted therapy and not only for therapist or medical doctors but also for researchers in neurosciences.
Perri et al. (2015)
The event-related potential (ERP) literature described two error-related brain activities: the error-related negativity (Ne/ERN) and the error positivity (Pe), peaking immediately after the erroneous response. ERP studies on error processing adopted a response-locked approach, thus, the question about the activities preceding the error is still open. In the present study, we tested the hypothesis that the activities preceding the false alarms (FA) are different from those occurring in the correct (responded or inhibited) trials. To this aim, we studied a sample of 36 Go/No-go performers, adopting a stimulus-locked segmentation also including the pre-motor brain activities. Present results showed that neither pre-stimulus nor perceptual activities explain why we commit FA. In contrast, we observed condition-related differences in two pre-response components: the fronto-central N2 and the prefrontal positivity (pP), respectively peaking at 250 ms and 310 ms after the stimulus onset. The N2 amplitude of FA was identical to that recorded in No-go trials, and larger than Hits. Because the new findings challenge the previous interpretations on the N2, a new perspective is discussed. On the other hand, the pP in the FA trials was larger than No-go and smaller than Go, suggesting an erroneous processing at the stimulus-response mapping level: because this stage triggers the response execution, we concluded that the neural processes underlying the pP were mainly responsible for the subsequent error commission. Finally, sLORETA source analyses of the post-error potentials extended previous findings indicating, for the first time in the ERP literature, the right anterior insula as Pe generator
Tea-making: searching, locating, monitoring and grasping objects (Land et al., 1999)
· Looked at body and eye movements and manipulations by the hand
- Eye doesn’t always follow movement and vice versa – movement independent of vision
eye fixations during tea-making
· Vision is an active process: seeing and looking (gaze fixation), sampling of information in time and space
· In primates and humans, eyes are highly mobile
o eye movements can be slow (compensating shifts of gaze, foveal tracking) or fast saccades (relocating gaze, fixating new target)
· Land et al. (1999): 1/3 fixations linked to subsequent actions (first fixations to new objects), 2/3 of fixation after an action
o fixations for locating, directing, guiding and checking
o eye movements are often predictive
· Some fixations for locating were not followed by immediate actions (look ahead fixations)
o suggests that some form of transaccadic memory exist as information is not lost when another saccade is made
- eyes remember where they have been and can move back to a particular location
eye fixations during tea-making research
Johansson and Flanagan (2009)
Hessels et al. (2018)
Johansson and Flanagan (2009)
o Object manipulation tasks comprise sequentially organized action phases that are generally delineated by distinct mechanical contact events representing task subgoals. To achieve these subgoals, the brain selects and implements action-phase controllers that use sensory predictions and afferent signals to tailor motor output in anticipation of requirements imposed by objects' physical properties.
o Crucial control operations are centred on events that mark transitions between action phases. At these events, the CNS both receives and makes predictions about sensory information from multiple sources. Mismatches between predicted and actual sensory outcomes can be used to quickly and flexibly launch corrective actions as required.
o Signals from tactile afferents provide rich information about both the timing and the physical nature of contact events. In addition, they encode information related to object properties, including the shape and texture of contacted surfaces and the frictional conditions between these surfaces and the skin.
o A central question is how tactile afferent information is encoded and processed by the brain for the rapid detection and analysis of contact events. Recent evidence suggests that the relative timing of spikes in ensembles of tactile afferents provides such information fast enough to account for the speed with which tactile signals are used in object manipulation tasks.
o Contact events in manipulation can also be represented in the visual and auditory modalities and this enables the brain to simultaneously evaluate sensory predictions in different modalities. Multimodal representations of subgoal events also provide an opportunity for the brain to learn and uphold sensorimotor correlations that can be exploited by action-phase controllers.
A current challenge is to learn how the brain implements the control operations that support object manipulations, such as processes involved in detecting sensory mismatches, triggering corrective actions, and creating, recruiting and linking different action-phase controllers during task progression. The signal processing in somatosensory pathways for dynamic context-specific decoding of tactile afferent messages needs to be better understood, as does the role of the descending control of these pathways
Hessels et al. (2018)
Eye movements have been extensively studied in a wide range of research fields. While new methods such as mobile eye tracking and eye tracking in virtual/augmented realities are emerging quickly, the eye-movement terminology has scarcely been revised. We assert that this may cause confusion about two of the main concepts: fixations and saccades. In this study, we assessed the definitions of fixations and saccades held in the eye-movement field, by surveying 124 eye-movement researchers. These eye-movement researchers held a variety of definitions of fixations and saccades, of which the breadth seems even wider than what is reported in the literature. Moreover, these definitions did not seem to be related to researcher background or experience. We urge researchers to make their definitions more explicit by specifying all the relevant components of the eye movement under investigation: (i) the oculomotor component: e.g. whether the eye moves slow or fast; (ii) the functional component: what purposes does the eye movement (or lack thereof) serve; (iii) the coordinate system used: relative to what does the eye move; (iv) the computational definition: how is the event represented in the eye-tracker signal. This should enable eye-movement researchers from different fields to have a discussion without misunderstandings.
The recent study of overt attention during complex scene viewing has emphasized explaining gaze behavior in terms of image properties and image salience independently of the viewer's intentions and understanding of the scene. In this Opinion article, I outline an alternative approach proposing that gaze control in natural scenes can be characterized as the result of knowledge-driven prediction. This view provides a theoretical context for integrating and unifying many of the disparate phenomena observed in active scene viewing, offers the potential for integrating the behavioral study of gaze with the neurobiological study of eye movements, and provides a theoretical framework for bridging gaze control and other related areas of perception and cognition at both computational and neurobiological levels of analysis.
Playing cricket: how batsmen hit the ball (Land and McLeod, 2000)
· Eyes don’t track the ball all the time
- Move gaze to where predict ball will bounce
Playing cricket: how batsmen hit the ball (Land and McLeod, 2000) research
Hayhoe AND BALLARD (2005)
HASANZADEH ET AL. (2018)
Hayhoe and Ballard (2005)
The classic experiments of Yarbus over 50 years ago revealed that saccadic eye movements reflect cognitive processes. But it is only recently that three separate advances have greatly expanded our understanding of the intricate role of eye movements in cognitive function. The first is the demonstration of the pervasive role of the task in guiding where and when to fixate. The second has been the recognition of the role of internal reward in guiding eye and body movements, revealed especially in neurophysiological studies. The third important advance has been the theoretical developments in the fields of reinforcement learning and graphic simulation. All of these advances are proving crucial for understanding how behavioral programs control the selection of visual information.
Hasanzahdeh et al. (2018)
The risk of major occupational accidents involving tripping hazards is commonly underestimated with a large number of studies having been conducted to better understand variables that affect situation awareness: the ability to detect, perceive, and comprehend constantly evolving surroundings. An important property that affects situation awareness is the limited capacity of the attentional system. To maintain situation awareness while exposed to tripping hazards, a worker needs to obtain feedforward information about hazards, detect immediate tripping hazards, and visually scan surroundings for any potential environmental hazards. Despite the importance of situation awareness, its relationship with attention remains unknown in the construction industry. To fill this theoretical knowledge gap, this study examines differences in attentional allocation between workers with low and high situation awareness levels while exposed to tripping hazards in a real construction site. Participants were exposed to tripping hazards on a real jobsite while walking along a path in the presence of other workers. Situation awareness was measured using the situation awareness rating technique, and subjects’ eye movements were tracked as direct measures of attention via a wearable mobile eye tracker. Investigating the attentional distribution of subjects by examining fixation-count heat maps and scan paths revealed that as workers with higher situation awareness walked, they periodically looked down and scanned ahead to remain fully aware of the environment and its associated hazards. Furthermore, this study quantitatively compared the differences between the eye-tracking metrics of worker with different situation awareness levels (low versus high) using permutation simulation. The results of the statistical analysis indicate that subjects did not allocate their attention equally to all hazardous areas of interest, and these differences in attentional distribution were modulated by the workers’ level of situation awareness. This study advances theory by presenting one of the first attempts to use mobile eye-tracking technology to examine the role of cognitive processes (i.e., attention) in human error (i.e., failure to identify a hazard) and occupational accidents.
Events involved in an object related action as the building block of task- or goal-directed action sequences (Land, 2009)
Not only does the visual system locate and reorganise objects, it also continuously guides actions in order to produce adaptive behavioural responses
Events involved in an object related action as the building block of task- or goal-directed action sequences (Land, 2009) research
Foulsham et al. (2011)
Lavoie et al. (2018)
Foulsham et al. (2011)
o How do people distribute their visual attention in the natural environment? We and our colleagues have usually addressed this question by showing pictures, photographs or videos of natural scenes under controlled conditions and recording participants’ eye movements as they view them. In the present study, we investigated whether people distribute their gaze in the same way when they are immersed and moving in the world compared to when they view video clips taken from the perspective of a walker. Participants wore a mobile eye tracker while walking to buy a coffee, a trip that required a short walk outdoors through the university campus. They subsequently watched first-person videos of the walk in the lab. Our results focused on where people directed their eyes and their head, what objects were gazed at and when attention-grabbing items were selected. Eye movements were more centralised in the real world, and locations around the horizon were selected with head movements. Other pedestrians, the path, and objects in the distance were looked at often in both the lab and the real world. However, there were some subtle differences in how and when these items were selected. For example, pedestrians close to the walker were fixated more often when viewed on video than in the real world. These results provide a crucial test of the relationship between real behaviour and eye movements measured in the lab.
o Gaze of walkers in real environment compared to people watching scene on video.
o Fixations biased to centre; walkers often engage in head-centred looking.
o Walkers look more often at the path than observers in the lab.
People in the scene are fixated early, and rarely close-up in real walking.
Lavoie et al. (2018)
This study explores the role that vision plays in sequential object interactions. We used a head-mounted eye tracker and upper-limb motion capture to quantify visual behavior while participants performed two standardized functional tasks. By simultaneously recording eye and motion tracking, we precisely segmented participants' visual data using the movement data, yielding a consistent and highly functionally resolved data set of real-world object-interaction tasks. Our results show that participants spend nearly the full duration of a trial fixating on objects relevant to the task, little time fixating on their own hand when reaching toward an object, and slightly more time—although still very little—fixating on the object in their hand when transporting it. A consistent spatial and temporal pattern of fixations was found across participants. In brief, participants fixate an object to be picked up at least half a second before their hand arrives at the object and stay fixated on the object until they begin to transport it, at which point they shift their fixation directly to the drop-off location of the object, where they stay fixated until the object is successfully released. This pattern provides additional evidence of a common system for the integration of vision and object interaction in humans, and is consistent with theoretical frameworks hypothesizing the distribution of attention to future action targets as part of eye and hand-movement preparation. Our results thus aid the understanding of visual attention allocation during planning of object interactions both inside and outside the field of view.
prey catching behaviour in toads: Simplest hypothesis: a sensorimotor pathway for each action (Ewert, 1987; Carew, 2000)
• Each behavioural segment is mediated by a separate releasing mechanism (RM)
• Motivation can modulate each RM – lowers or raises threshold
- Toads have 4 distinct actions
prey catching behaviour in toads: Simplest hypothesis: a sensorimotor pathway for each action (Ewert, 1987; Carew, 2000) research
Giese and Poggio (2003)
Pessoa et al. (2019)
Manzano et al. (2017)
Giese and Poggio (2003)
o Humans can recognize biological movements, such as walking, accurately and robustly. This review uses a neurophysiologically plausible and quantitative model as a tool for organizing and making sense of the available experimental data, despite its growing size and complexity.
o Most experimental results can be accounted for by simple neural mechanisms, under the two key assumptions that recognition is based on a hierarchical feedforward cortical architecture and learned prototypical patterns. Such prototypes might be stored in specific neurons in the visual system.
o The model shows that recognition of biological movements can be achieved with plausible neural mechanisms in a way that is quantitatively consistent with the experimental data on pattern selectivity, view dependence and robustness of recognition.
o The model comprises two parallel pathways, one corresponding to the dorsal pathway (specialized for the analysis of motion information) and one to the ventral pathway (specialized for the analysis of form information). In each pathway, neural feature detectors extract form or optic-flow features with increasing complexity along the hierarchy. The position and size invariance of the feature detectors also increases along the hierarchy. Experimental data and quantitative simulations indicate that the ventral and dorsal pathways are both needed for the recognition of normal biological movement stimuli, whereas the recognition of point-light stimuli seems mainly to depend on the dorsal pathway.
o The proposed architecture predicts the existence of neurons that can learn to respond selectively to new biological movement patterns. It also predicts that arbitrary complex movement patterns should be learnable, as long as they provide suitable stimulation of the mid- and low-level feature detectors of the two pathways.
o The model predicts the existence of neurons in the dorsal pathway that become selectively activated by complex optic-flow patterns that arise for biological movement patterns.
o It is demonstrated that attention and top–down influences are not required to solve the basic tasks of motion recognition. These factors may be necessary for more sophisticated motion recognition tasks. The model cannot account for such influences of attention and of different tasks.
A number of open questions and predictions of the model are considered. The use of a quantitative model allows us to generate specific predictions and to show that a neurophysiologically consistent, learning-based, feedforward model can reproduce many key experimental results. Open questions include how information from the two pathways is integrated, and which neural mechanisms underlie sequence selectivity in both pathways.
Pessoa et al. (2019)
o Integration is a basic features of the vertebrate brain needed to adapt to a changing world.
o This property is not restricted to few isolated brain centers, but resides in neuronal networks working together in a context-dependent manner.
o In different vertebrates, we identify shared large-scale connectional systems.
o There is a high degree of crosstalk and association between these systems at different levels, giving support to the notion that cognition cannot be separated from emotion and motivation.
Cognition is considered a hallmark of the primate brain that requires a high degree of signal integration, such as achieved in the prefrontal cortex. Moreover, it is often assumed that cognitive capabilities imply “superior” computational mechanisms compared to those involved in emotion or motivation. In contrast to these ideas, we review data on the neural architecture across vertebrates that support the concept that association and integration are basic features of the vertebrate brain, which are needed to successfully adapt to a changing world. This property is not restricted to a few isolated brain centers, but rather resides in neuronal networks working collectively in a context-dependent manner. In different vertebrates, we identify shared large-scale connectional systems involving the midbrain, hypothalamus, thalamus, basal ganglia, and amygdala. The high degree of crosstalk and association between these systems at different levels supports the notion that cognition, emotion, and motivation cannot be separated – all of them involve a high degree of signal integration.