Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content
  • Victor Ng-Thow-Hing received the MSc and PhD degrees in computer science in 1994 and 2001, respectively, from the Uni... moreedit
ACM SIGGRAPH / Eurographics, Symposium on Computer Animation. July 26 - 27, 2003. San Diego, California. In cooperation with ACM SIGGRAPH and Eurographics. ...
Robots that perform complex manipulation tasks must be able to generate strategies that make and break contact with the object. This requires reasoning in a motion space with a particular multi-modal structure, in which the state contains... more
Robots that perform complex manipulation tasks must be able to generate strategies that make and break contact with the object. This requires reasoning in a motion space with a particular multi-modal structure, in which the state contains both a discrete mode (the contact state) and a continuous configuration (the robot and object poses). In this paper we address multi-modal motion planning in the common setting where the state is high-dimensional, and there are a continuous infinity of modes. We present a highly general algorithm, Random-MMP, that repeatedly attempts mode switches sampled at random. A major theoretical result is that Random-MMP is formally reliable and scalable, and its running time depends on certain properties of the multi-modal structure of the problem that are not explicitly dependent on dimensionality. We apply the planner to a manipulation task on the Honda humanoid robot, where the robot is asked to push an object to a desired location on a cluttered table, ...
ABSTRACT Studies of different robot features and behaviors are leading to the development of models for more effective peer-based interaction between robots and children.
This proxemics study examines whether the physical distance between robots and humans differ based on the following factors: 1) age: children vs. adults, 2) who initiates the approach: humans approaching the robot vs. robot approaching... more
This proxemics study examines whether the physical distance between robots and humans differ based on the following factors: 1) age: children vs. adults, 2) who initiates the approach: humans approaching the robot vs. robot approaching humans, 3) prompting: verbal invitation vs. non-verbal gesture (e.g., beckoning), and 4) informing: announcement vs. permission vs. nothing. Results showed that both verbal and non-verbal
Research Interests:
classesTkDANCE Core Core classesFigure 1: The main components of DANCE.a joint development project by the authors to build acommon environment where multiple controllers andobject models could co-exist and interact with eachother. The... more
classesTkDANCE Core Core classesFigure 1: The main components of DANCE.a joint development project by the authors to build acommon environment where multiple controllers andobject models could co-exist and interact with eachother. The long-term goal is to reduce the startupoverhead of future research projects in this area sothat investigators can proceed directly to controllerdesign and object model construction. The resultingsystem, named
We present a human-robot interactive scenario consisting of a memory card game between Hondapsilas humanoid robot ASIMO and a human player. The game features perception exclusively through ASIMOpsilas on-board cameras and both reactive... more
We present a human-robot interactive scenario consisting of a memory card game between Hondapsilas humanoid robot ASIMO and a human player. The game features perception exclusively through ASIMOpsilas on-board cameras and both reactive and proactive behaviors specific to different situational contexts in the memory game. ASIMO is able to build a dynamic environmental map of relevant objects in the game such as the table and card layout as well as understand activities from the player such as pointing at cards, flipping cards and removing them from the table. Our system architecture, called the Cognitive Map, treats the memory game as a multi-agent system, with modules acting independently and communicating with each other via messages through a shared blackboard system. The game behavior module can model game state and contextual information to make decisions based on different pattern recognition modules. Behavior is then sent through high-level command interfaces to be resolved into actual physical actions by the robot via a multi-modal communication module. The experience gained in modeling this interactive scenario will allow us to reuse the architecture to create new scenarios and explore new research directions in learning how to respond to new interactive situations.
The role of context in recognizing a person’s affect is being increasingly studied. In particular, context arising from the presence of multi-modal information such as faces, speech and head pose has been used in recent studies to... more
The role of context in recognizing a person’s affect is being increasingly studied. In particular, context arising from the
presence of multi-modal information such as faces, speech and head pose has been used in recent studies to recognize facial expressions. In most approaches, the modalities are independently considered and the effect of one modality on the other, which we call inter-modal influence (e.g. speech or head pose modifying the facial appearance) is not modeled. In this paper, we describe a system that utilizes context from the presence of such inter-modal influences to recognize facial expressions. To do so, we use 2-D contextual masks which are activated within the facial expression recognition pipeline depending on the prevailing context. We also describe a framework called the Context Engine. The Context Engine offers a scalable mechanism for extending the current system to address additional modes of context that may arise during human-machine interactions. Results on standard data sets demonstrate the utility of modeling inter-modal contextual effects in recognizing facial expressions.
This chapter seeks to promote a richer understanding of design choices in the development of effective human-robot interactions. Social robots often have human-like qualities, including appearance, behavior, and intelligence. These... more
This chapter seeks to promote a richer understanding of design choices in the development of effective human-robot interactions. Social robots often have human-like qualities, including appearance, behavior, and intelligence. These features elicit a social response from humans that provide distinctive ways to examine how humans interact with robots. Social robots consist of strong social components that enable humans to share knowledge, ideas, and develop “peer-like” relationships. The approach described in this chapter is user centered and employs social schemas: mental modes that humans have of the world.
The goal of this chapter is to identify features and conditions that may provide the reader with useful sets of information that can be incorporated into the design of future human-robot interactions. To that end, the chapter presents a series of studies that reveal how specific design choices (i.e., speaking and listening features, social schema, and realism) can influence human- robot interactions. The chapter then reflects on other factors that are independent of robotic features (e.g., developmental differences, children’s prior knowledge) but may still have an impact on behavioral outcomes.
Research Interests:
Research Interests:
Augmented reality (AR) in automobiles has the potential to significantly alter the driver's user experience. Prototypes developed in academia and industry demonstrate a range of applications from advanced driver assist systems to... more
Augmented reality (AR) in automobiles has the potential to significantly alter the driver's user experience. Prototypes developed in academia and industry demonstrate a range of applications from advanced driver assist systems to location-based information services. A user-centered process for creating and evaluating designs for AR displays in automobiles helps to explore what collaborative role AR should serve between the technologies of the automobile and the driver. In particular, we consider the nature of this role along three important perspectives: understanding human perception, understanding distraction and understanding human behavior. We argue that AR applications should focus solely on tasks that involve the immediate local driving environment and not secondary task spaces to minimize driver distraction. Consistent depth cues should be supported by the technology to aid proper distance judgement. Driving aids supporting situation awareness should be designed with knowledge of current and future states of road users, while focusing on specific problems. Designs must also take into account behavioral phenomena such as risk compensation, inattentional blindness and an over-reliance on augmented technology in driving decisions.
With the capability of fast, wireless communication, combined with cloud and location-based services, modern drivers can potentially access a wide variety of information about their automobile's environment. This paper presents a system... more
With the capability of fast, wireless communication, combined with cloud and location-based services, modern drivers can potentially access a wide variety of information about their automobile's environment. This paper presents a system for information query by the driver by using a simple pointing mechanism, combined with visual feedback in the form of a 3-D Head-up Display (3D-HUD). Because of its 3-D properties, the HUD can also be used for Augmented Reality (AR) as it allows physical elements in the driver's field of view to be annotated with computer graphics. The combination of simple natural user input tailored for the constraints of the driver with a see-thru 3D-HUD allows drivers to query information while minimizing visual and manual distraction.
Making left turns across oncoming traffic without a protected left-turn signal is a significant safety concern at intersections. In a left turn situation, the driver typically does not have the right of way and must determine when to... more
Making left turns across oncoming traffic without a protected left-turn signal is a significant safety concern at intersections. In a left turn situation, the driver typically does
not have the right of way and must determine when to initiate the turn maneuver safely. It has been reported that a
driver’s inability to correctly judge the velocity and time gap
of the oncoming vehicles is a major cause of left turn crashes.
Although the position and velocity of surrounding vehicles is
available using camera and laser based vehicle detection and
tracking, methods on how to effectively communicate such
information to help the driver have been relatively underexplored. In this paper, we describe a left turn aid that
displays a 3 second projected path of the oncoming vehicle in the driver’s environment with a 3D Head-Up Display
(3D-HUD). Utilizing the abilities of our 3D-HUD to show
the projected path in Augmented Reality (AR) could help
increase driver intuition and alleviate visual distraction as
compared to other possible non-AR solutions. Through an
iterative process utilizing early user feedback, the design of
the left turn aid was refined to interfere less with the driver
view and be more effective. A pilot study has been designed
for a driving simulation environment and can be used to
evaluate the potential of the proposed AR left turn aid in
helping the driver be more cautious or efficient when turning
left.
Do people prefer gestures that are similar to their own? There is evidence that in conversation, people will tend to adopt the postures, gestures and mannerisms of their interaction partners [1]. This mirroring, sometimes called the... more
Do people prefer gestures that are similar to their own? There is evidence that in conversation, people will tend to adopt the postures, gestures and mannerisms of their interaction partners [1]. This mirroring, sometimes called the “chameleon effect”, is associated with affiliation, rapport and liking. It may be that a useful way to build rapport in human-agent/robot interaction is to have the agent/robot perform gestures similar to the human. As a step towards that, this study explores if people prefer gestures similar to their own over gestures similar to those of other people. Participants were asked to evaluate a series of agent motions, some of which mimic their own gestures, and rate their preference. A second study first showed participants videos of their own gesturing to see if self-awareness would impact their preference. Different scenarios for soliciting gesture behavior were also explored. Evidence suggests people do have some preference for motions similar to their own, but self-awareness has no effect.
In-vehicle contextual augmented reality (I-CAR) has the poten- tial to provide novel visual feedback to drivers for an enhanced driving experience. To enable I-CAR, we present a parametrized road trench model (RTM) for dynamically... more
In-vehicle contextual augmented reality (I-CAR) has the poten- tial to provide novel visual feedback to drivers for an enhanced driving experience. To enable I-CAR, we present a parametrized road trench model (RTM) for dynamically extracting display sur- faces from a driver’s point of view that is adaptable to constantly changing road curvature and intersections. We use computer vi- sion algorithms to analyze and extract road features that are used to estimate the parameters of the RTM. GPS coordinates are used to quickly compute lighting parameters for shading and shadows. Novel driver-based applications that use the RTM are presented.
This proxemics study examines whether the physical distance between robots and humans differ based on the following factors: 1) age: children vs. adults, 2) who initiates the approach: humans approaching the robot vs. robot approaching... more
This proxemics study examines whether the physical distance
between robots and humans differ based on the following factors: 1) age: children vs. adults, 2) who initiates the approach: humans approaching the robot vs. robot approaching humans, 3) prompting: verbal invitation vs. non-verbal gesture (e.g., beckoning), and 4) informing: announcement vs. permission vs. nothing. Results showed that both verbal and non-verbal prompting had significant influence on physical distance. Physiological data is also used to detect the appropriate timing of approach for a more natural and comfortable interaction.
Research Interests:
Two studies examined the different features of humanoid robots and the influence on children’s affective behavior. The first study looked at interaction styles and general features of robots. The second study looked at how the robot’s... more
Two studies examined the different features of humanoid robots and the influence on children’s affective behavior. The first study looked at interaction styles and general features of robots. The second study looked at how the robot’s attention influences children’s behavior and engagement. Through activities familiar to young children (e.g., table setting, story telling), the first study found that cooperative interaction style elicited more oculesic behavior and social engagement. The second study found that quality of attention, type of attention, and length of interaction influences affective behavior and engagement. In the quality of attention, Wizard-of-Oz (woz) elicited the most affective behavior, but automatic attention worked as well as woz when the interaction was short. The type of attention going from nonverbal to verbal attention increased children’s oculesic behavior, utterance, and physiological response. Affective interactions did not seem to depend on a single mechanism, but a well-chosen confluence of technical features.
The Learning with Kids (LwK) project is an ongoing collaboration with the long-term goal of building a humanoid robot that is capable of being a peer-based learning partner for children. Experiments for the first three years have recently... more
The Learning with Kids (LwK) project is an ongoing collaboration with the long-term goal of building a humanoid robot that is capable of being a peer-based learning partner for children. Experiments for the first three years have recently been completed, and complete analysis of all data is currently in progress. We review the studies we have performed so far, the tools we developed to conduct these studies, and the adjustments we made as we learned along the way.
The role of context in recognizing a person's affect is being increasingly studied. In particular, context arising from the presence of multi-modal information such as faces, speech and head pose has been used in recent studies to... more
The role of context in recognizing a person's affect is being increasingly studied. In particular, context arising from the presence of multi-modal information such as faces, speech and head pose has been used in recent studies to recognize facial expressions. In most approaches, the modalities are independently considered and the effect of one modality on the other, which we call inter-modal influence (e.g. speech or head pose modifying the facial appearance) is not modeled. In this paper, we describe a system that utilizes context from the presence of such inter-modal influences to recognize facial expressions. To do so, we use 2-D contextual masks which are activated within the facial expression recognition pipeline depending on the prevailing context. We also describe a framework called the Context Engine. The Context Engine offers a scalable mechanism for extending the current system to address additional modes of context that may arise during human-machine interactions. Results on standard data sets demonstrate the utility of modeling inter-modal contextual effects in recognizing facial expressions.
Extended human-robot interactions possess unique aspects which are not exhibited in short-term interactions spanning a few minutes or extremely long-term spanning days. In order to comprehensively monitor such interactions, we need... more
Extended human-robot interactions possess unique aspects which are not exhibited in short-term interactions spanning a few minutes or extremely long-term spanning days. In order to comprehensively monitor such interactions, we need special recording mechanisms which ensure the interaction is captured at multiple spatio-temporal scales, viewpoints and modalities(audio, video, physio). To minimize cognitive burden, we need tools which can automate the process of annotating and analyzing the resulting data. In addition, we also require these tools to be able to provide a unified, multi-scale view of the data and help discover patterns in the interaction process. In this paper, we describe recording and analysis tools which are helping us analyze extended human-robot interactions with children as subjects. We also provide some experimental results which highlight the utility of such tools.
Humanoid robots consist of biologically inspired features, human-like appearance, and intelligent behavior that naturally elicit social responses. Complex interactions are now possible, where children interact and learn from robots. A... more
Humanoid robots consist of biologically inspired features, human-like appearance, and intelligent behavior that naturally elicit social responses. Complex interactions are now possible, where children interact and learn from robots. A pilot study attempted to determine which features in robots led to changes in learning and behavior. Three common learning styles, lecture, cooperative, and self-directed, were implemented into ASIMO to see if children can learn from robots. General features such as monotone robot-like voice and human-like voice were compared. Thirty-seven children between the ages 4-to 10- years participated in the study. Each child engaged in a table-setting task with ASIMO that exhibited different learning styles and general features. Children answered questions in relation to a table-setting task with a learning measure. Promissory evidence shows that learning styles and general features matter especially for younger children.
We present a model that is capable of synchronizing expressive gestures with speech. The model, implemented on a Honda humanoid robot, can generate a full range of gesture types, such as emblems, iconic and metaphoric gestures, deictic... more
We present a model that is capable of synchronizing expressive gestures with speech. The model, implemented on a Honda humanoid robot, can generate a full range of gesture types, such as emblems, iconic and metaphoric gestures, deictic pointing and beat gestures. Arbitrary input text is analyzed with a part-of-speech tagger and a text-to-speech engine for timing information of spoken words. In addition, style tags can be optionally added to specify the level of excitement or topic changes. The text, combined with any tags, is then processed by several grammars, one for each gesture type to produce several candidate gestures for each word of the text. The model then selects probabilistically amongst the gesture types based on the desired degree of expressivity. Once a gesture type is selected, it coincides with a particular gesture template, consisting of trajectory curves that define the gesture. Speech timing patterns and style parameters are used to modulate the shape of the curve before it sent to the whole body control system on the robot. Evaluation of the model’s parameters were performed, demonstrating the ability of observers to differentiate varying levels of expressiveness, excitement and speech synchronization. Modification of gesture speed for trajectory tracking found that positive associations like happiness and excitement accompanied faster speeds, with negative associations like sadness or tiredness occurred at slower speeds.
The panoramic attention model has an idle mode driven by an idle-motion policy which creates a superficial impression that the humanoid is idly looking about its surroundings. Internally, in the course of these idle motions, the humanoid’s... more
The panoramic attention model has an idle mode driven by an idle-motion policy which creates a superficial impression that the humanoid is idly looking about its surroundings. Internally, in the course of these idle motions, the humanoid’s cameras span the entire panorama and register incidental observations such as identities of people it comes across or objects present in gaze directions it looks at. Thus, the idle-motion behavior imparts the
humanoid with a human-like liveliness while it concurrently notes details of surroundings. The associated information may be later accessed when needed for a future task involving references to such entities i.e. the humanoid can immediately attend to the task bypassing the preparatory search for them. The active mode of the panoramic attention model is triggered by top-level tasks and triggers. In this mode, it responds in a task-specific manner
(e.g. tracking a known person). Another significant contribution from the model is the notion of cognitive panoramic habituation. Entities registered in the panorama do not enjoy a permanent existence. Instead, their lifetimes are regulated by entity-specific persistence models(e.g. isolated objects tend to be more persistent than people who are likely to move about). This habituation mechanism enables the memories of entities in the panorama to fade away, thereby creating a human-like attentional effect. The memories associated with a panoramically registered entity are refreshed when the entity is referenced by top-down commands. With the panoramic attention model, out-of-scene speakers can also be handled. The humanoid robot employed(Honda, 2000) uses a 2-microphone array which records audio from the environment. The audio signals are processed to perform localization, thus determining which direction speech is coming from, as well as source-specific attributes such as pitch and amplitude. In particular, the localization information is mapped onto the panoramic framework. Subsequent sections shall describe how audio information is utilized.
In this paper, we present a novel three-layer model of panoramic attention for our humanoid robot. In contrast to similar architectures employing coarse discretizations of the panoramic field, saliencies are maintained only for... more
In this paper, we present a novel three-layer model of panoramic attention for our humanoid robot. In contrast to similar architectures employing coarse discretizations of the panoramic field, saliencies are maintained only for cognitively prominent entities( e.g, faces). In the absence of attention triggers, an idle-policy makes the humanoid span the visual field of panorama imparting a human-like idle gaze while simultaneously registering attention-worthy entities. We also describe a model of cognitive panoramic habituation which maintains entity-specific persistence models, thus imparting lifetimes to entities registered across the panorama. This mech­ anism enables the memories of entities in the panorama to fade away, creating a human-like attentional effect. We describe sce­ narios demonstrating the aforementioned aspects. In addition, we present experimental results which demonstrate how the cognitive filtering aspect of our model reduces processing time and false-positive rates for standard entity related modules such as face-detection and recognition.
The Cognitive Map robot architecture is used to build multi-agent systems where components can communicate with each other using a publish-subscribe mechanism of message passing. Messages can be sent as discrete events or via continuous... more
The Cognitive Map robot architecture is used to build multi-agent systems where components can communicate with each other using a publish-subscribe mechanism of message passing. Messages can be sent as discrete events or via continuous data streams. Our approach of isolating the component interface within a single API layer allows easy conversion of legacy code into components within our system. Our components can be divided into four main roles: perception, knowledge/state representation, decision-making and expression. Interaction, Task Matrix and Multi-Modal Communication are special components described for facilitating human-robot interaction with Honda’s humanoid robot ASIMO. By focusing on the design of these components and the abstraction layers between them, our architecture can be dynamically reconfigured to handle different
applications with minimal changes to the entire system.
Distributed architectures for implementing tasks on humanoid robots is a design challenge, both in theory and practice. Although important functionality resides within the component modules of the system, the performance of the... more
Distributed architectures for implementing tasks on humanoid robots is a design challenge, both in theory and practice.  Although important functionality resides within the component modules of the system, the performance of the middleware –  the software for mediating information between modules –  is critical to overall system performance.  We have designed an architecture serving various functional roles and information exchange within a distributed system, using three different communication subsystems: the Cognitive Map (CogMap), Distributed Operation via Discrete Events (DiODE), and Multimodal Communication (MC). The CogMap is implemented in Psyclone, a framework for constructing large AI systems, and allows sharing and transformation of information streams dynamically between modules. DiODE provides a direct connection between two modules while MC implements a multi-modal server that streams raw sensory data to requesting external (off-board) perceptual modules. These have been implemented and tested on the Honda Motor Corporation's ASIMO humanoid robot. To identify trade-offs and understand performance limitations in robots with distributed system architectures, we performed a variety of tests on these subsystems under different network conditions, operating systems and computational loads. The results indicate that delays due to our middleware is negligible compared to computational costs associated with actual processing within the modules, provided a network with high enough bandwidth. The Cognitive Map appears to be scalable to an increasing number of connected modules with negligible degradation of package delays.
We present a human-robot interactive scenario consisting of a memory card game between Hondapsilas humanoid robot ASIMO and a human player. The game features perception exclusively through ASIMOpsilas on-board cameras and both reactive... more
We present a human-robot interactive scenario consisting of a memory card game between Hondapsilas humanoid robot ASIMO and a human player. The game features perception exclusively through ASIMOpsilas on-board cameras and both reactive and proactive behaviors specific to different situational contexts in the memory game. ASIMO is able to build a dynamic environmental map of relevant objects in the game such as the table and card layout as well as understand activities from the player such as pointing at cards, flipping cards and removing them from the table. Our system architecture, called the Cognitive Map, treats the memory game as a multi-agent system, with modules acting independently and communicating with each other via messages through a shared blackboard system. The game behavior module can model game state and contextual information to make decisions based on different pattern recognition modules. Behavior is then sent through high-level command interfaces to be resolved into actual physical actions by the robot via a multi-modal communication module. The experience gained in modeling this interactive scenario will allow us to reuse the architecture to create new scenarios and explore new research directions in learning how to respond to new interactive situations.
We present a human-robot interactive scenario consisting of a memory card game between Honda’s humanoid robot ASIMO and a human player. The game features perception exclusively through ASIMO’s on-board cameras and both reactive and... more
We present a human-robot interactive scenario consisting of a memory card game between Honda’s humanoid robot ASIMO and a human player. The game features perception exclusively through ASIMO’s on-board cameras and both reactive and proactive behaviors specific to different situational contexts in the memory game. ASIMO is able to build a dynamic environmental map of relevant objects in the game such as the table and card layout as well as understand activities from the player such as pointing at cards, flipping cards and removing them from the table. Our system architecture, called the Cognitive Map, treats the memory game as a multi-agent system, with modules acting independently and communicating with each other via messages through a shared blackboard system. The game behavior module can model game state and contextual information to make decisions based on different pattern recognition modules. Behavior is then sent through high-level command interfaces to be resolved into actual physical actions by the robot via a multi-modal communication module. The experience gained in modeling this interactive scenario will allow us to reuse the architecture to create new scenarios and explore new research directions in learning how to respond to new interactive situations.
Capturing pose from observation can be an intuitive interface for humanoid robots. In this paper, a method is presented for estimating human pose from a sequence of images taken by a single camera. The method is based on a machine... more
Capturing pose from observation can be an intuitive interface for humanoid robots. In this paper, a method is presented for estimating human pose from a sequence of images taken by a single camera. The method is based on a machine learning technique and it partitions human body into a number of clusters. Body parts are tracked over the image sequence while satisfying body constraints. An active sensing hardware is used in both methods to capture a stream of depth images at video rates, which are consequently analyzed for pose extraction. Experimental results are shown to validate our approach and characteristics of our approach are discussed
This paper presents a motion planner that enables a humanoid robot to push an object on a flat surface. The robot’s motion is divided into distinct walking, reaching, and pushing modes. A discrete change of mode can be achieved with a... more
This paper presents a motion planner that enables a humanoid robot to push an object on a flat surface. The robot’s motion is divided into distinct walking, reaching, and pushing modes. A discrete change of mode can be achieved with a continuous single-mode motion that satisfies mode-specific constraints (e.g. dynamics, kinematic limits, avoid obstacles). Existing techniques can plan well in single modes, but choosing the right mode transitions is difficult. Search-based methods are vastly inefficient due to over-exploration of similar modes. Our new method, Random-MMP, randomly samples mode transitions to distribute a sparse number of modes across configuration space. Results are presented in simulation and on the Honda ASIMO robot.
Researchers and engineers have used primitive actions to facilitate programming of tasks since the days of Shakey [1]. Task-level programming , which requires the user to specify only subgoals of a task to be accomplished, depends on such... more
Researchers and engineers have used primitive actions to facilitate programming of tasks since the days of Shakey [1]. Task-level programming , which requires the user to specify only subgoals of a task to be accomplished, depends on such a set of primitive task programs to perform these subgoals. Past research in this area has used the commands from robot programming languages as the vocabulary of primitive tasks for robotic manipulators. We propose drawing from work measurement systems to construct the vocabulary of primitive task programs. We describe one such work measurement system, present several primitive task programs for humanoid robots inspired from this system, and show how these primitive programs can be used to construct complex behaviors.
Index Terms— robot programming, task-level programming, humanoid robots
Many humanoid robots like ASIMO are built to potentially perform more than one type of task. However, the need to maintain a consistent physical appearance of the robot restricts the installation of additional sensors or appendages that... more
Many humanoid robots like ASIMO are built to potentially perform more than one type of task. However, the need to maintain a consistent physical appearance of the robot
restricts the installation of additional sensors or appendages that
would alter its visual identity. Limited battery power for free-
moving locomotive robots places temporal and spacial complexity limits on the algorithms we can deploy on the robot. With these conditions in mind, we have developed a distributed robot architecture that combines onboard functionality with external system modules to perform tasks involving interaction with the environment. An information model called the Cognitive Maporganizes output produced by multiple perceptual modules and presents a common abstraction interface for other modules to access the information. For the planning and generation of motion on the robot, the Task Matrix embodies a task abstraction model that maps a high level task description into its primitive motions realizable on the robot. Our architecture supports different control paradigms and information models that can be tailored for specific tasks. We demonstrate environmental tasks we implemented with our system, such as pointing at moving objects and pushing an object around a table in simulation and on the actual ASIMO robot
The successful acquisition and organization of a large number of skills for humanoid robots can be facilitated with a collection of performable tasks organized in a task matrix. Tasks in the matrix can utilize particular preconditions and... more
The successful acquisition and organization of a large number of skills for humanoid robots can be facilitated with a collection of performable tasks organized in a task matrix. Tasks in the matrix can utilize particular preconditions and inconditions to enable execution, motion trajectories to specify desired movement, and references to other tasks to perform subtasks. Interaction between the matrix and external modules such as goal planners is achieved via a high-level interface that categorizes a task using its semantics and execution parameters, allowing queries on the matrix to be performed using different selection criteria. Performable tasks are stored in an XML-based file format that can be readily edited and processed by other applications. In its current implementation, the matrix is populated with sets of primitive tasks (eg., reaching, grasping, arm-waving) and macro tasks that reference multiple primitive tasks (Pick-and-place and Facing-and-waving).
Despite several successful humanoid robot projects from both industry and academia, generic motion interfaces for higher-level applications are still absent. Direct robot driver access proves to be either very difficult due to the... more
Despite several successful humanoid robot projects
from both industry and academia, generic motion interfaces
for higher-level applications are still absent. Direct robot driver
access proves to be either very difficult due to the complexity
of humanoid robots, very unstable due to constant robot hardware
upgrade and re-design, or inaccessible due to proprietary
software and hardware. Motion interfaces do exist, but these
are either hardware-specific designs, or generic interfaces that
support very simple robots (non-humanoids). Thus, this paper
introduces RoboTalk, a new motion interface for controlling
robots. From the ground up our design model considers three
factors: mechanism-independence to abstract the hardware from
higher-level applications, a versatile network support mechanism
to enable both remote and local motion control, and an easy-tomanage
driver interface to facilitate the incorporation of features
by hardware developers. The interface is based on a motion
specification that supports a wide range of robotic mechanisms,
from mobile bases such as a Pioneer 2 to humanoid robots. The
specification allows us to construct interfaces from basic blocks,
such as wheeled bases, robot arms and legs. We have tested and
implemented our approach on the Honda ASIMO robot and a
Pioneer 2 mobile robot.
We developed a method based on interactive B-spline solids for estimating and visualizing biomechanically important parameters for animal body segments. Although the method is most useful for assessing the importance of unknowns in... more
We developed a method based on interactive B-spline solids for estimating and visualizing biomechanically important parameters for animal body segments. Although the method is most useful for assessing the importance of unknowns in extinct animals, such as body contours, muscle bulk, or inertial parameters, it is also useful for non-invasive measurement of segmental dimensions in extant animals. Points measured directly from bodies or skeletons are digitized and visualized on a computer, and then a B-spline solid is fitted to enclose these points, allowing quantification of segment dimensions. The method is computationally fast enough so that software implementations can interactively deform the shape of body segments (by warping the solid) or adjust the shape quantitatively (e.g., expanding the solid boundary by some percentage or a specific distance beyond measured skeletal coordinates). As the shape changes, the resulting changes in segment mass, center of mass (CM), and moments of inertia can be recomputed immediately. Volumes of reduced or increased density can be embedded to represent lungs, bones, or other structures within the body. The method was validated by reconstructing an ostrich body from a fleshed and defleshed carcass and comparing the estimated dimensions to empirically measured values from the original carcass. We then used the method to calculate the segmental masses, centers of mass, and moments of inertia for an adult Tyrannosaurus rex, with measurements taken directly from a complete skeleton. We compare these results to other estimates, using the model to compute the sensitivities of unknown parameter values based upon 30 different combinations of trunk, lung and air sac, and hindlimb dimensions. The conclusion that T. rex was not an exceptionally fast runner remains strongly supported by our models—the main area of ambiguity for estimating running ability seems to be estimating fascicle lengths, not body dimensions. Additionally, the craniad position of the CM in all of our models reinforces the notion that T. rex did not stand or move with extremely columnar, elephantine limbs. It required some flexion in the limbs to stand still, but how much flexion depends directly on where its CM is assumed to lie. Finally we used our model to test an unsolved problem in dinosaur biomechanics: how fast a huge biped like T. rex could turn. Depending on the assumptions, our whole body model integrated with a musculoskeletal model estimates that turning 45° on one leg could be achieved slowly, in about 1–2 s.
We introduce the Dynamic Animation and Control Environment (DANCE) as a publicly available simulation platform for research and teaching. DANCE is an open and extensible simulation framework and rapid prototyping environment for computer... more
We introduce the Dynamic Animation and Control Environment
(DANCE) as a publicly available simulation
platform for research and teaching. DANCE is an open
and extensible simulation framework and rapid prototyping
environment for computer animation. The main focus
of the DANCE platform is the development of physically-based
controllers for articulated figures. In this paper
we (a) present the architecture and potential applications
of DANCE as a research tool, and (b) discuss lessons
learned in developing a large framework for animation.-
The characters in modern video games continue to be updated with improving surface geometry and real-time rendering effects with each incarnation of graphics hardware. In contrast, the standard joint hierarchy used to animate these same... more
The characters in modern video games continue to be updated with improving surface geometry and real-time rendering effects with each incarnation of graphics hardware. In contrast, the standard joint hierarchy used to animate these same characters has changed little since it first started appearing in graphics applications in the 1980’s. Although simple to
implement, a great deal of realism in joint articulation for human characters is compromised. Several biomechanical enhancements are presented that can capture more realistic behavior of joints in articulated figures. These new joints are capable of handling non orthogonal, non-intersecting axes of rotation and changing joint centers that are often found in the kinematics of real anatomical joints. Coordinated movement and dependencies among several joints are realized. Although the joint behaviour may appear complex, the simplicity of controls for the animator is retained by providing a small set of intuitive handles. An animator is restricted from putting the skeleton into an infeasible pose. We illustrate these concepts with detailed and realistic human spine and shoulder models exhibiting real-time performance and simple controls for the animator.

And 15 more

Augmented reality (AR) in automobiles has the potential to significantly alter the driver’s user experience. Prototypes developed in academia and industry demonstrate a range of applications from advanced driver assist systems to... more
Augmented reality (AR) in automobiles has the potential to significantly alter the driver’s user experience. Prototypes developed in academia and industry demonstrate a range of applications from advanced driver assist systems to location-based information services. A user-centered process for creating and evaluating designs for AR displays in automobiles helps to explore what collaborative role AR should serve between the technologies of the automobile and the driver. In particular, we consider the nature of this role along three important perspectives: understanding human perception, understanding distraction and understanding human behavior. We argue that AR applications should focus solely on tasks that involve the immediate local driving environment and not secondary task spaces to minimize driver distraction. Consistent depth cues should be supported by the technology to aid proper distance judgement. Driving aids supporting situation awareness should be designed with knowledge of current and future states of road users,while focusing on specific problems. Designs must also take into account behavioral phenomena such as risk compensation, inattentional blindness and an over-reliance on augmented technology in driving decisions.
The challenge of introducing augmented reality to head-up displays for automobiles requires balancing between the visual, immersive richness this medium provides with the need for the driver to stay focused on the primary task of driving.... more
The challenge of introducing augmented reality to head-up displays for automobiles requires balancing between the visual, immersive richness this medium provides with the need for the driver to stay focused on the primary task of driving. This session explores how to solve these problems by combining design methodologies with technological research. Before field testing ideas in actual cars, high fidelity prototypes with driving simulators are utilized with an actual windshield head-up display to visualize the augmented graphics. UI Composer is leveraged with proprietary software to engage designers in the prototyping process.
The presentation will first introduce the challenges of building systems capable of human-robot and human-machine interaction and motivate the need for modular, parallel systems that can communicate and interact with each other. I will... more
The presentation will first introduce the challenges of building systems capable of human-robot and human-machine interaction and motivate the need for modular, parallel systems that can communicate and interact with each other. I will describe my journey designing the MOVE-IT system and how this design evolved from earlier system frameworks I worked on to become increasingly decentralized in architecture. Not only does this allow rapid prototyping of ideas, but computational performance is very close to the final production systems required. We will show the main innovative features of our architecture by going through how they come into play with various use cases. The system is interesting to Qt users because we leverage off the power of the signal and slot mechanism to create multi-modal interaction, as well as the ability to interconnect modules dynamically at runtime with scripting. We also use of the model-view-control design pattern to create seamless dashboard interfaces which remove much of the clutter associated with window elements in a traditional GUI. OpenGL is used to create 3-D interfaces mixed in with 2-D elements to create novel applications such as a robot capable of interacting with kids and interactive head-up displays for augmented reality.
The role of context in recognizing a person's affect is being increasingly studied. In particular, context arising from the presence of multi-modal information such as faces, speech and head pose has been used in recent studies to... more
The role of context in recognizing a person's affect is being increasingly studied. In particular, context arising from the presence of multi-modal information such as faces, speech and head pose has been used in recent studies to recognize facial expressions. In most approaches, the modalities are independently considered and the effect of one modality on the other, which we call inter-modal influence (e.g. speech or head pose modifying the facial appearance) is not modeled. In this paper, we describe a system that utilizes context from the presence of such inter-modal influences to recognize facial expressions. To do so, we use 2-D contextual masks which are activated within the facial expression recognition pipeline depending on the prevailing context. We also describe a framework called the Context Engine. The Context Engine offers a scalable mechanism for extending the current system to address additional modes of context that may arise during human-machine interactions. Results on standard data sets demonstrate the utility of modeling inter-modal contextual effects in recognizing facial expressions.
More Info: Sarvadevabhatla, Ravi Kiran and Benovoy, Mitchel and Musallam, Sam and Ng-Thow-Hing, Victor, Adaptive facial expression recognition using inter-modal top-down context,
We present a general joint component framework model that is capable of exhibiting complex behavior of joints in articulated figures. The joints are capable of handling non-orthogonal, non-intersecting axes of rotation and changing joint... more
We present a general joint component framework model that is capable of exhibiting complex behavior of joints in articulated figures. The joints are capable of handling non-orthogonal, non-intersecting axes of rotation and changing joint centers that are often found in the kinematics of real anatomical joints. The adjustment of joint articulation is done with a relatively small set of intuitive parameters compared to the number of articulations in the motions they parametrize. This is done by making various linear and nonlinear joint dependencies implicit within our framework. An animator is restricted from putting the skeleton in an infeasible pose. We have used our joint framework model to successfully model highly-articulated complex joints such as the human spine and shoulder.
The relationshipbetween the design elements of form and function is fundamental in producing new insights and understanding of the objects we encounter every day. Of particular interest is the study of human and animal anatomy, and how... more
The relationshipbetween the design elements of form and function is fundamental in producing new insights and understanding of the objects we encounter every day. Of particular interest is the study of human and animal anatomy, and how the elegance of shape and form we take for granted must simultaneously serve a practical role to locomote the body and perform other essential tasks for survival. We demonstrate how an integrated mathematical model, the Bspline solid, can be used to successfully capture geometric aspects of musculotendons as well as their physical characteristics. The B-spline solid is flexible enough to specify large insertion and origin attachment areas for musculotendons to the skeleton. Furthermore, it is scalable to facilitate detailed three-dimensional fibre reconstruction of internal muscle architecture. From these geometric reconstructions, we can embed different physical models to simulate phenomena such as volume preservation, active muscle contraction, and contact collisions. A software framework is developed to allow these musculotendon models to co-exist with other anatomic tissues, such as bones, ligaments, fat, and skin. The construction of these musculotendon models and the existence of a software environment for their utilization in different scenarios promises to enable detailed studies of the interdependencies in the body design of humans and other animals.
We argue that B-spline solids are effective primitives for the animation of physically-based deformable objects. After reviewing the mathematical formulation of B-spline solids, we describe how to quickly display and modulate their... more
We argue that B-spline solids are effective primitives for the animation of physically-based deformable objects. After reviewing the mathematical formulation of B-spline solids, we describe how to quickly display and modulate their shapes. We apply our ideas to muscle modelling and provide techniques for initial shape definition and subsequent shape deformation. Data-fitting techniques are developed to build muscles from profile curves or from contour data taken from medical images. By applying a spring-mass model to the resulting B-spline solid, we have transformed a static model to a deformable one. The 3-D parameterization of the solid allows us to model microstructures within the solids such as fibre bundles in a muscle. B-spline solids are powerful and versatile deformable shape primitives that can be used in practical settings, such as the building-blocks of a muscle-based modeller and animation system for anatomical design.
Physically-based animation (PBA) is increasingly used to generate realistic motions in articulated figures (AFs). Three major strategies for motion control in PBA are local optimization, global optimization and tailored controller... more
Physically-based animation (PBA) is increasingly used to generate realistic motions in articulated figures (AFs). Three major strategies for motion control in PBA are local optimization, global optimization and tailored controller techniques. In local optimization, spacetime constraints are compatible with keyframing systems, but controller synthesis methods offer better robustness and reuse of motions. Global optimization strategies, such as generate-and-test and genetic programming, explore various search spaces to provide several different motion control solutions. Tailored controllers focus on specialized motions like walking, but offer general design principles that can be applied to many motions. The animator's involvement in motion specification, the scalability of the method, robustness and reuse of controllers, and the quality of motion generated are criteria used to compare the three above methods. We examine PBA's potential as an animation tool and how it compares to alternative methods. Finally, future research directions such as biomechanical modelling in animation and decompositional control strategies for animating very complex articulated figures are discussed.