Abstract
We present a framework for identifying, communicating, and addressing risk in shared-autonomy mobile manipulator applications. This framework is centered on the capacity of the mobile manipulator to sense its environment, interpret complex and cluttered scenes, and estimate the probability of actions and configuration states that may result in task failures, such as collision (i.e., identifying “risk”). If the threshold for acceptable risk is exceeded, a remote operator is notified and presented with timely, actionable information in which the person can quickly assess the situation and provide guidance for the robot. This framework is demonstrated with a use case in which a mobile manipulator performs machine tending and material handling tasks.
1 Introduction
Mobile manipulation systems have become very popular due to their versatility, precision, and ability to complete tasks safely. By combining one or more robotic manipulators with a mobile platform, mobile manipulation systems extend the working volume compared to fixed-based manipulators and offer novel navigation-based capabilities. These systems, as a result, can perform a wide range of operations in a variety of applications—everything from industrial deployments [1,2] to medical, domestic, and military domains.
The unique benefits of a mobile manipulation system do not come without technical challenges, however, and the suitable level of autonomy required in a system is mainly dependent on the application scenarios. Fully autonomous operations of mobile manipulation systems have been studied in many works [3–5] and have produced solutions that require humans to only specify high-level task goals, while the robot does the planning and execution. However, the inherent sensing uncertainty in semi-structured, unstructured, or cluttered environments can make fully autonomous operations risky. For example, errors in obstacle modeling can lead to collisions, damaging the robot, its surroundings, and critical objects of interests. One method to mitigate these risks is for a human to operate or supervise the mobile manipulator [6,7]. Humans can observe a robot and its workspace remotely, monitor progress, define conditions and requirements for safe operations, detect and predict potential issues, intervene or correct robot plans, and make context-dependent decisions that affect performance and efficiency. It is likely, however, that the human’s attention may be divided across multiple systems, necessitating clear, timely, and actionable information to enable efficient and effective contextual awareness.
We posit that humans play an important role in some mobile manipulation applications, and systems need to be designed in a flexible fashion so that human input can be leveraged when appropriate to improve, correct, or override autonomous operation. Properly designed mobile manipulation systems can then strike the critical balance of autonomous and manual control that maximizes the benefits of autonomous agents performing metric and computationally challenging tasks, while humans perform higher-level decision-making and risk-mitigating tasks. In the prior work ([2,7]), an example of semi-autonomous mobile manipulation systems was presented that offer flexibility by providing both autonomous and manual control modes to remotely located human operator.
An efficient mobile manipulation system can offer the human different tools for autonomous operations that reduce the human’s workload and ensure safety and quality in task performance. Automated alert generation is an effective means for soliciting human input, overcoming difficult situations, and managing risk. A sufficiently intelligent, autonomous agent can plan trajectories for navigation, manipulation, and grasping as well as quantify uncertainty in its own actions. Given some set of human-defined alert conditions, an autonomous agent should also be able to accurately analyze the current conditions using its state estimation and uncertainty quantification to rapidly and proactively alert the human if there is excessive risk in safety or performance in its planned actions. These alerts provide the human an opportunity to adjust risky plans, discard plans altogether, or suggest alternative actions. Alert generation also provides supplemental safety assurance in addition what is already integrated in many state-of-the-art motion and trajectory planners.
Research Challenges and Contributions: This article investigates effective incorporation of human assistance for risk mitigation in semi-autonomous mobile manipulation systems, specifically for the application of object transportation. Three research challenges have been identified within this scope, proposed solutions to address each challenge, and developed a system architecture to unify these techniques, which we have implemented on hardware (Sec. 3). For the purposes of this work, we define the risk metric to be the probability of task failure and task plan to be the manipulator trajectory plan, including opening and closing of the gripper on the end-effector to complete manipulation tasks.
The first research challenge this paper explores is how to design flexibility and extensibility in mobile manipulation systems, i.e., humans should be able to change or update existing alert conditions and add new ones seamlessly. We propose using probabilistic temporal logic, specifically metric temporal logic (MTL), to mathematically encode the human-provided, alert-triggering conditions and failure modes to a machine-interpretable form that enables time- or task stage-dependent computational analysis (Sec. 4).
The second research challenge addressed in this article is how to rapidly and accurately generate alerts to ensure decision-making relevancy. Precise mobile manipulation in real-world environments can require redundant sensing and dense three-dimensional (3D) model building of the workspace for the robot to plan and act. We present computationally efficient methods to estimate uncertainty in our 3D workspace models (Sec. 5), estimate the probability of collision between the robot and objects (Sec. 7), and estimate task failures using these models to generate alerts (Sec. 6), all on the order of seconds.
Finally, visualization features are investigated to improve human understanding of the mobile manipulation system and the automated alerts. The human–robot interface must be interactive to enable bidirectional information flow as well as be comprehensible to facilitate the human’s decision-making and not being overwhelming at the same time. Information regarding potential failure risk for each task plan under consideration is useful for the human to evaluate risk and authorize safer task execution. A wide range of visualization tools for the human supervisor is proposed for viewing and assessing risk better (Secs. 8 and 9).
2 Related Works
In human–robot collaboration, risk is an ever-present factor that must be addressed. As such, a considerable portion of the existing literature is dedicated toward the identification, awareness, and planning to compensate for risks, largely stemming from collisions or other task-related hazards. While the research presented in this article is focused on the estimation and communication of these risks with remote human operators for mobile manipulation, the literature is full of relevant research that warrants highlighting. Given the focus of this article is on risk-based planning, related works for the topics of risk prediction, risk awareness, and risk-based planning are discussed in this section. A review of works relevant to human-assisted operation and teleoperation of robots of task execution is discussed in Ref. [7] and is not detailed here.
2.1 Risk Assessment.
Generalized risk assessment methodologies for robotic applications [8–10] walk through the process of identifying different sources and severities of hazards to humans and can be based on risks posed by the equipment, the environment, or the task itself. The Association for Advancing Automation (A3) outlines a standardized risk assessment methodology in Ref. [8] for industrial robots, and articles such as [9,10] highlight more research-oriented approaches for risk identification and mitigation. Such approaches are first-pass efforts to addressing risks before operations begin and are generally leveraged to integrate engineering and administrative controls during the commissioning stage of a robotic application. Approaches for dynamically allocating tasks or responsibilities in collaborative operations (e.g., Refs. [11,12]) take advantage of risk assessments and prior experience to provide estimates of hazards to humans and determine what actions should be taken over entirely by robots to reduce the risks to humans.
Such approaches, however, are overwhelmingly based on a priori knowledge of the task, environment, and equipment. Even the so-called black swan events—though based on extreme outliers on the probability spectrum—require anticipating hazards before they are encountered [13]. Indeed, even in the context of this article, the focus on risk awareness is exclusively with regards to the robot colliding with objects in its work volume in tool tending and part manipulation tasks. To this end, the literature is filled with many different approaches for predicting, measuring, and avoiding collisions in applications ranging from robotics to graphics and from naval navigation to missile guidance. See the review by Marvel and Bostelman [14] for an overview of some of the most prevalent means of assessing collisions across the myriad of domains.
In terms of risk measurements, most collision-aware robot systems are with regards to collisions with humans (e.g., Ref. [15]). However, in the context of this article, the focus is on assessing risk based on the likelihood of the robot colliding with its environment while planning and executing grasping operations. To this end, the literature is rich with different approaches to measuring and estimating risks. For example, in the work by Kozai and Hashimoto, a “picking risk” metric is introduced that defines interobject relationships to provide estimates for grasping parameters to minimize the likelihood of colliding with objects during picking operations [16]. Other approaches assess reachability in cluttered environments (e.g., Refs. [17,18]), collisions during manipulation of grasped objects (e.g., Refs. [19,20]), and collision-based challenges of grasping moving objects (e.g., Ref. [21]). Once the risks have been identified and, presumably, quantified, motion planners can either generate and execute paths that minimize the risks (e.g., Ref. [22]) or—as is the case of this report—propose alternatives to the operator such that the human assumes responsibility of the risks.
2.2 Assessing Risk Awareness.
Once the hazards or risks have been identified, they must be brought to the attention of the human operators. Raising awareness of hazards is fairly simple and straightforward. Knowing that an issue exists is only part of the equation. The operator must also be made aware of the nature of the risk, the risk severity, and the means by which the risks can be resolved. As such, providing information in a way that is clearly understandable, actionable, and timely is a significant issue [23].
While many robotic systems attempt to raise awareness of risks or systematic issues, many of the approaches in the literature (as was demonstrated in Sec. 2.1) are focused on the identification of the sources and severity of risks. In many cases, researchers focus on directly measuring the situation awareness (SA) of operators in human–robot teams. Support and measures of real-time situation awareness is generally difficult, and, in the literature, many attempts to measure situation awareness rely on physiological or directly observable measures (e.g., Refs. [24,25]). Otherwise, researchers are limited to postfactor questionnaires regarding situation awareness (e.g., the work by Gervits et al. in which human–robot team performance is leveraged against subjective surveys [26]) or operator confidence in the interface and information quality (e.g., via the International Organization for Standardization (ISO) standard 25010 for software quality [27]).
Still, other approaches focus on the evaluation of the system itself in terms of its ability to maintain situation awareness, for example, the Risk SituatiOn Awareness Provision (RiskSOAP, [28]) methodology for assessing the performance of the dissemination of meaningful information regarding situation awareness of risks. In this case, RiskSOAP provides objective measures to evaluate how information is shared among interested parties and supports communication networks to enable and maintain distributed situation awareness of a robotic system’s risks. Similarly, the research presented by Schaefer et al. evaluates user displays in their capacity to share autonomous systems’ task-relevant information to a variety of different stakeholders [29]. Schaefer’s work focused on high-level assessments of SA for trust by leveraging the situation awareness-based agent transparency model by Chen et al. [30] and is meaningful in that it provides an approach for measuring SA from the perspective of multiple operator roles.
3 System Architecture
A typical mobile manipulator system [31] has one or more robotic manipulators mounted on a mobile base, along with robot controllers, a planner, and a perception system with some sensors. In most applications, the robot is not fully autonomous, and some level of human supervision is present. In that case, the mobile manipulator system also includes a human–robot interaction (HRI) interface, which typically supports trivial tasking or teleoperation by humans [6,32]. Existing system architecture with mobile manipulator robot hardware and controller, perception system, planner, and a basic HRI unit is not sufficient to support risk-aware decision-making by remotely located human operators. In real-world applications, there is uncertainty and possibility of potential hazards during operation. Therefore, more assistance and guidance to human operators are required to enable human intervention toward better and safer task performance.
In the presence of uncertainty, humans need to know about potential risk associated with any task plan before executing the plan. Being in a remote location, the person might not be able to perceive the possibility of certain hazards just from the perception data coming from the robot sensors. Therefore, the system is required to perform risk assessment and provide a meaningful interaction with the human operator to enable risk-aware decision-making. This calls for a novel system architecture with an independent module for risk assessment and communication with humans, which is agnostic of the rest of the system. The new module should be easily integratable with any mobile manipulator system with robot hardware and controller, any perception system, and any planner. Therefore, we propose the system architecture in Fig. 1. Our main contribution lies in the alert generation module and its interaction with HRI unit to facilitate superior operation. Our goal is to assess the risk associated with any system-generated or operator-provided task plan (e.g., manipulator trajectory plan), issue an alert to the human operator when there is a risk, and efficiently convey the risk information to the human. To achieve this, four process blocks are in included in our alert module, which serve the following purposes—mathematical modeling of alert-triggering conditions, estimating uncertainty in 3D model of the environment, computing failure or risk probability for any given plan, generating alerts when the computed risk is high, and providing visualization for humans to assess the risk themselves and make decisions.

The architecture of our mobile manipulator system. The right-most three boxes indicate hardware components, and the remaining things are software modules. The white rectangular boxes within the light gray modules refer to process blocks in our software suite and dark gray rounded-edged boxes are for data or information. Our main contribution lies in the four process blocks within the alert generation module. The human–robot interface and the planning module use already-existing state-of-the-art technologies for our human–robot interface and generating robot motion plans.

The architecture of our mobile manipulator system. The right-most three boxes indicate hardware components, and the remaining things are software modules. The white rectangular boxes within the light gray modules refer to process blocks in our software suite and dark gray rounded-edged boxes are for data or information. Our main contribution lies in the four process blocks within the alert generation module. The human–robot interface and the planning module use already-existing state-of-the-art technologies for our human–robot interface and generating robot motion plans.
Humans provide necessary specifications related to failure risks, which are used to mathematically encode the alert conditions for plan risk as described in Sec. 4. Probability is required to represent the level of failure risk, and the failure conditions might depend heavily upon time. The framework should allow humans to add or remove alerts as and when needed without the need of programming. We choose probabilistic MTL formula to encode alert conditions for the failure condition with time, and the probability portion indicates high (greater than a human-set value) probability of such task failure. Alert modeling is illustrated using different subtasks for transportation of an object from one place to another.
To generate an alert by estimating the probability of failure, the uncertainty in our measurements need to be quantified. Toward this, the point cloud data coming from the perception system are processed to generate an uncertainty model of the 3D workspace. We use the data coming from the red-green-blue plus depth (RGB-D) camera mounted on the manipulator end-effector, as the manipulator moves and scans the environment. The reason for using the camera on the robot arm, instead of a stationary camera, is to get a better field of view. The camera can be moved around to view the workspace from many different locations and angles and generate a more complete model of the environment. Cameras at fixed locations can only view very limited regions on the work space. However, the moving camera and the uncertainty in manipulator joint angles might cause additional uncertainty to the 3D model of the environment on top of the generic error in the camera captures. Multiple measurements are used to build the uncertainty model of the 3D environment. Uncertainty estimates are generated by performing multiple scans of the environment from many different camera poses. These data are fused together, and spatial discrepancy in the fused data is used to estimate the uncertainty in the model built by the system. This process in described in Sec. 5.
The system should rapidly issue an alert to gain human attention in case of potential high risk. Whenever the planning system generates a plan for performing a task or human handcrafts a task plan, our alert generation module computes the risk associated with that plan before plan execution and provides alerts when the risk is higher than the human-specified level. For any given plan, the alert generator uses the uncertainty model compute task failure probabilities and checks for alerts, as depicted in Sec. 6. The estimation accuracy should be sufficiently high, and the computation time for alert generation should be relatively small to provide useful, real-time assistance. Finally, humans need to visualize and assess the plan risk which is auto-generated by the system and make their own judgments before executing or abandoning a plan. Our plan risk visualizer provides an human-interpretable and interactive way for this visualization in the user interface, which is described with details in Sec. 8. It shows potential failure modes and related them to the source of uncertainty.
The interaction between the alert generation module and the HRI unit needs to be intuitive and adjustable based on the operator’s preference for complementing the human’s operations. The operator needs to be able to receive alerts, change the alert condition settings on-the-fly, and visualize plan risk. This visualization should have a variety of options so that the operator can customize the settings based on their preference and the situation for better perception. Based on the risk assessment provided by the system, we believe the operator should be able to provide suggestions to the planner or edit the plan directly to generate safer plans. Therefore, in addition to the usual features like perception data from sensors, task goals, and robot commands, there are additional incoming and outgoing data blocks in our HRI module in Fig. 1.
We implement our framework and conduct experiments using our mobile manipulator system, ADAMMS ([2,7]), which is designed to find a balance between reducing the workload on remotely located human operators and ensuring safe task execution by the mobile manipulator in the presence of high uncertainty. The hardware system (Fig. 2) of our robot consists of a differential drive mobile robot, with a Universal Robots UR51 robotic manipulator physically mounted on the chassis. A Robotiq 2-fingered gripper is attached to the tool flange of the manipulator. A multisensor suite is used to monitor the work volume and consists of multiple depth cameras attached to both the mobile base and the manipulator, multiple color cameras (forward, backward, left, and right), a LiDAR (light detection and ranging) area scanner, a 9 degrees-of-freedom inertial measurement unit (IMU), and encoders on the mobile base’s wheels. For localization of the mobile base, RTAB-Map [33] is used in robot operating system (ROS) to take advantage of the fused sensor data from the Kinect and two-dimensional (2D) LiDAR for mapping and localization. Also, improved odometry estimates are provided as inputs to RTAB-Map by incorporating data from wheel encoders and an onboard IMU using an extended Kalman filter. There is an Intel RealSense camera on the end-effector, which is used to generate 3D point cloud of the scene or workspace.
Our planning modules are designed to leverage the existing, state-of-the-art motion planning techniques to execute tasks as instructed by the human operator. The operator tasks the system by providing task goals. These tasks could be to scan the environment to generate 3D model of the workspace, manipulate an object, or navigate to a different place. Motion planning for mobile manipulators can be done in two different ways. The mobile base and the manipulator can move concurrently ([34–38]) or the mobile base can be held stationary at a specific location, while the manipulator plans and moves for grasping and accompanying tasks. We do the latter in this work using dynamic window approach [39] in ROS for mobile base motion. Once the mobile base arrives at the appropriate position, the manipulator moves to a target configuration. There are several motion planners developed in the literature for manipulators based on random sampling [40–42] and optimization [43,44]. This work uses RRT-Connect [45] motion planner available in the Open Motion Planning Library (OMPL) [46]. For collision avoidance, the flexible collision library [47] is used with mesh-to-mesh collision detection. Since the planner randomly samples configurations, there is a risk of failure to complete the motion. Hence, there may be a need to re-plan several times before determining a successful motion plan. The uncertainty in the environment model also adds to the difficulty of generating a safe plan. Therefore, a risk assessment tool is useful, regardless of the type of planner used, to make sure that plans are achievable.
4 Modelling Alert Conditions
Complex alert conditions need to be modeled mathematically to offer flexibility and ease of use to the human operators. Applications with mobile manipulator involves the robot and/or other objects moving with time, and so a language with temporal properties is necessary to capture time-sensitive hazards. In many operations, certain failure mode can be sensitive to a specific time window. Therefore, MTL is a good option as an MTL formula can support particular time intervals. It offers the expressibility and flexibility that our system requires for modeling different kinds of task failure modes. In the presence of uncertainty, probability is important for decision-making as it provides a measure of assessed risk. Therefore, we choose probabilistic MTL framework to model alert conditions [48].
Human operators can set up alerts so that alerts are issued if there is a high probability of task failure for the existing level of uncertainty and a specific task execution plan. Humans can provide necessary parameter values or settings required for the alerts. For example, humans provide the probability threshold value for defining high probability for task failure (e.g., probability greater than 0.7 or 0.8), which triggers an alert. Moreover, humans need to use system parameters to devise MTL formula that defines task failures caused by different situations, e.g., colliding with obstacles while moving or missing an object during grasping.
Consider the complex operation of transporting an object from one place to another using a mobile manipulator as an illustrative example for how modeling alert conditions can be done using our probabilistic MTL framework. This entire task of transportation is broken down into seven subtasks as depicted in Fig. 3. After the robot has reached the object to pick, the first subtask is achieving a pregrasp pose where the manipulator extends into the workspace and positions the end-effector in preparation for object grasping. The second subtask is grasping by closing the gripper. The next subtask is retrieval of the object where the arm retracts while holding the object, and the object gets released from the surface where it was resting before it was picked up. Then, the robot transports the object near its goal location for placement. The fifth subtask, placement of the object, is to move the manipulator such that the grasped object finally touches the surface where it is meant to be placed. The sixth subtask is to open the gripper and the release the object, and finally the arms needs to be retracted as the seventh and final subtask.

Subtasks in transporting objects from one place to another by a mobile manipulator. The labeling shows the 3D object/space representations used in expressing alert conditions: (a) to pre grasp pose, (b) grasping, (c) retrieval, (d) transportation, (e) placement, (f) release, and (g) retraction.
Each of these subtasks has different requirements to be successfully completed. It has different reasons or conditions to fail; therefore, we encode failure condition for each subtask with a unique MTL formula, Φi, where is the subtask index. Each of these formulae maps the operation state to a Boolean, such that it obtains True value when the corresponding subtask fails and False otherwise. Alert-triggering condition is expressed as a probabilistic temporal logic formula, , where 0 ≤ pth ≤ 1, and Φ is an MTL formula referring to one subtask failure or a combination of several failure modes. Thus, indicates that the probability of Φ being True, i.e., probability of failure, is greater than pth. Humans operators set this pth value for each alert condition according to their preferences.
Success or failure of each subtask in transportation application is dependent on the relative pose (location and orientation) and 3D geometry of different objects (e.g., robot, object, obstacles), and how they change with time. We consider four major geometrical objects separately as each of them has a separate role in this operation. They are the robot, the gripper, the workpiece to be transported or manipulated, and obstacles that include everything else in the workspace. The gripper is considered separately from the robot since the gripper portion would actively manipulate the workpiece. Let , , , indicate the Cartesian space occupied by the mobile manipulator robot, gripper, obstacles, and the workpiece, respectively. These may change with time when their configuration change. Let us assume is the surface where is to be picked from or placed on. is the 3D space within the fingers of the gripper, which is denoted as the grasping volume. If we have a multifinger gripper, each finger sweeps through a maximum volume of space, which can be denoted with . In our experiments, a two-finger parallel plate gripper is used, where each finger sweeps through a rectangular cuboid space, which we indicate with and as shown in Fig. 3. All these items can be used to craft different failure conditions during the transportation operation to be done by the robot.
The 3D models for the robot and the gripper are available beforehand as provided by the manufacturers. These models and their pose and configuration provide , , at any given time during task execution. However, 3D models for the workpiece and workspace obstacles typically are not given, they need to be captured by the robot before task planning. Then, the system needs to segment the captured model using human operator’s inputs (e.g., selection of the workpiece located in the workspace) and to identify , , , as needed for risk assessment. Once these models are registered, the system can generate robot motion plans and estimate plan risk to generate alerts. The human operator can specify alert-triggering conditions using these parameters and symbols beforehand and can guide the system identifying the workspace-related items (, , ) when the workspace model gets generated from the robot perception unit.
The following enumerated sections will show how MTL can be used to model failure conditions during different steps of the operation. Here, indicates the total 3D space occupied by the two items, . refers to the common space or surface occupied by both the items. It is nonempty only when the objects get into one another or they touch each other. In addition, let’s assume that lA and qA are the location and orientation of object . can be denoted as while denoting as a function of time if a formula requires. The following logic statements employ the operators finally in (, meaning “at some time the specified condition holds true”) and globally in (, meaning “the specified condition holds true at all times”).
- To pre-grasp pose: In this stage, the robot or gripper cannot collide or come in contact with the workspace obstacles or the workpiece of object to be manipulated. Failure occurs in the case of collision.
- Grasping: For successful grasping, it needs to grasp the object within its fingers. Before closing the gripper, we need to check whether some part of the workpiece is within the grasping volumes, so that the gripper does not just grasp air. Grasping air would be a failure condition which requires an alert. If toc is time of closing the gripper, this failure mode can be expressed as the following formula, where T is logical True.
- Retrieval: A successful retrieval requires that the workpiece is released from the workspace surface (os) as the robot grasps and picks it up, otherwise it is a failure.If failure mode from Φ3a does not happen, then failure can still happen from collision after release from the surface at t = =tor (time of release). In addition, the work-piece may need to be held and kept at a certain range of orientation (≈ q0) at all time.
- Transportation: As the robot navigates to the goal location for transporting the workpiece, collision must be avoided, and the orientation constraint should be met at all times.
- Placement: Collision and orientation condition should remain the same until the object is placed at the destination site.
- Release: The object needs to be released upon placement by opening the gripper. As the gripper opens, it should avoid collision with the surrounding obstacles.
- Retraction: As the mobile manipulator retracts its arm, collision must be avoided.
This example with transportation task illustrates how alert conditions can be modeled with probabilistic MTL formula, by dividing complex task into subtasks, defining the required items and using them to craft failure/risk conditions. The process can be extended toward other applications involving mobile manipulators. Our example application includes commonly used operations like robot motion, object handling, and so on. We have expressed one of the most frequent failure modes collision, as well as some failure modes related to object grasping and transportation. This demonstrates usability and potential of our approach for a wide variety of applications, hence making this approach flexible and extensive.
5 Uncertainty Estimation in Environment Model
Uncertainty in robot perception of the environment is one of the major challenges incurred in any application, especially in the case of robotic manipulation. We estimate the uncertainty in the environment model coming from the perception sensor of the robot, which is later used to estimate failure probability for alert generation. There have been many works on quantifying uncertainty in data from depth sensors like Lidar, Kinect, etc. [49,50]. However, a modular framework is needed, which can support a wide variety of depth sensors integrated with any mobile manipulator. Our previous works [2,7] highlight how stationary sensor(s) located at the fixed location(s) causes lack of visibility and thus unnecessarily shrinks the free space to make motion planning more challenging and risky. We propose attaching a depth sensor near the end-effector of the mobile manipulator so that the robot can capture depth images from many different locations and viewing angles by moving its arm, which increases visibility. For such sensor setup, this article describes a sensor-agnostic method developed for estimating an uncertainty model of 3D environment, which makes use of multiple measurements [51] and data comparison and fusion.
The RGB-D camera mounted near the manipulator end-effector captures the workspace area of the robot, while the manipulator moves and goes through a sequence of locations and orientation. Figure 4(a) shows an image taken during our physical experiments as the manipulator was doing a scanning motion from the front of the workspace to generate the 3D model. Multiple measurements for the 3D model of the environment are analyzed to estimate the uncertainty. The 3D workspace environment is represented using a nominal point cloud and an uncertainty map that maps the location uncertainty of each point in the nominal point cloud. Location uncertainty only along the surface normal direction at each point is considered. This assumption is made because the uncertainty along normal direction can inflate or deflate the object as a whole, which offers enough possibilities to check for our failure modes like collision. In our case, it is not required to distinguish between different points on the surface of an object, rather only the overall 3D space occupied by it is required. Therefore, assuming uncertainty only along surface normal direction is reasonable.

(a) Robot scanning an environment to generate 3D uncertainty model using the RGB-D camera (RealSense D415) mounted at the end-effector and (b) representation of the uncertainty model generated for the scene during our experiments. The point cloud shows nominal point locations, and green-to-red color scheme shows the uncertainty level in each point along its normal direction. Greener points indicate lower uncertainty, and redder points higher. (Color version online.)

(a) Robot scanning an environment to generate 3D uncertainty model using the RGB-D camera (RealSense D415) mounted at the end-effector and (b) representation of the uncertainty model generated for the scene during our experiments. The point cloud shows nominal point locations, and green-to-red color scheme shows the uncertainty level in each point along its normal direction. Greener points indicate lower uncertainty, and redder points higher. (Color version online.)
The uncertainty in the 3D point cloud model of the environment is contributed by the inherent inaccuracy present in most depth sensors and also the errors propagated in the robot links. As the robot executes its trajectory plan for scanning the workspace, the point clouds are captured continuously at 30 frames per second. Filtering is performed in the incoming point clouds using the camera viewing angle and distance. We use these filtered clouds and the manipulator trajectory information to transform and fuse the point clouds and generate the nominal point cloud. Due to the high bitrate nature of the incoming clouds, our approach uses specialized data structures to iteratively calculate the uncertainty. This method reduces the error in the nominal point cloud and the uncertainty model because of the large number of point clouds collected over time.
The uncertainty in depth measurements of an RGB-D camera increases with the distance between the object and the sensor. RGB-D cameras also have minimum working distances due to the reliance on the disparity between features, which negatively impacts their effectiveness at identifying and measuring close objects. Moreover, the data in a depth sensor have asymmetric error distributions with primary directions along the sensor lines of sight. The sweet spot of a depth sensor is defined as a cylindrical frustum created using the aforementioned points, and an incoming 3D point is said to lie in the sweet spot when it lies in this cylindrical frustum. Our method filters out those points from the incoming point clouds, which lie outside this frustum. Each filtered point cloud is then applied a transformation based on the corresponding manipulator pose, which brings the point cloud in the robot base coordinate frame. The filtered and transformed point clouds are used fused into generating the nominal point cloud as well as the associated uncertainty model.
The camera uncertainty is present along the line connecting the origin of the camera origin and the point being viewed [52]. This is because the depth values are obtained by projecting an infrared pattern on the surface and measuring distortion. The uncertainty in determining the reflected ray results in range uncertainty along the line of sight. In addition, since the point clouds of the environment are created by fusing measurements taken from various locations facing the work space, the errors in the robotic links are also propagated. We are interested to quantify the uncertainty in points on a surface along the surface normal directions. Since we require uncertainty along one particular direction, a narrow cylindrical geometry (cylinder axis aligned to the normal direction) around a nominal point can be suitable to capture its uncertainty in space along the line. All the points coming from all the filtered and transformed point clouds that lies within the cylinder are considered while computing this uncertainty.
Computational time is a big challenge in handling a large number of point clouds and generating an uncertainty model. The workspace is discretized into an occupancy grid, which makes it computationally easy to keep the track of the point clouds being captured, filtered, and transformed. To get the normal direction for a voxel Vi, we first estimate the plane parameters of the voxels mid-point Ci. We perform principal component analysis (PCA) for the points in the local neighborhood and use the eigenvector associated with the greatest eigenvalue as the normal vector or normal direction. Since we are doing this in an occupancy grid, the computation is sufficiently fast. We assume a hypothetical cylinder along this vector as the principal axis, centered around Ci. The value of radius and height (h) of this cylinder are taken appropriately based on the desired resolution and the camera standard deviation along the depth direction of the camera. All the points that lie within the cylinder are then projected onto the principal axis and then averaged to get the centroid of the points in voxel Vi. The distances of these projected points from the newly computed centroid are then used to calculate the uncertainty. We assume zero-mean Gaussian distribution for uncertainty, and so our approach is to simply compute standard deviation of those distances for σ parameter of the Gaussian distribution. This process is depicted in Figs. 5 and 6.

Use of voxel based representation: (a) occupancy grid of the environment. Green voxels are occupied, and white voxels are empty, (b) to calculate the normal vector for a voxel, we find its plane parameters using PCA on neighboring points, and (c) the eigen vector corresponding to the highest eigen value is used as the principal axis (red line) for the cylinder. Let, we are calculating the cylinder geometry for the blue voxel. We hash both the empty and occupied voxels that are lying inside the cylinder. (Color version online.)

Use of voxel based representation: (a) occupancy grid of the environment. Green voxels are occupied, and white voxels are empty, (b) to calculate the normal vector for a voxel, we find its plane parameters using PCA on neighboring points, and (c) the eigen vector corresponding to the highest eigen value is used as the principal axis (red line) for the cylinder. Let, we are calculating the cylinder geometry for the blue voxel. We hash both the empty and occupied voxels that are lying inside the cylinder. (Color version online.)

Estimation of each nominal point location along with its uncertainty level, using points inside the hypothetical cylinder (with principal axis along the normal direction of the voxel): (a) all points lying inside the cylinder is used to estimate the uncertainty model, (b) the points are projected on the normal vector, and the average location of them is considered as the nominal point, indicated by a green ⊗ symbol, and (c) The standard deviation among the lengths of the blue dashed lines provides σ of the zero-mean Gaussian distribution for uncertainty of the nominal point along the normal direction (red line). This is the uncertainty metric for the nominal point (green). (Color version online.)

Estimation of each nominal point location along with its uncertainty level, using points inside the hypothetical cylinder (with principal axis along the normal direction of the voxel): (a) all points lying inside the cylinder is used to estimate the uncertainty model, (b) the points are projected on the normal vector, and the average location of them is considered as the nominal point, indicated by a green ⊗ symbol, and (c) The standard deviation among the lengths of the blue dashed lines provides σ of the zero-mean Gaussian distribution for uncertainty of the nominal point along the normal direction (red line). This is the uncertainty metric for the nominal point (green). (Color version online.)
To do this process in real time without incurring a heavy memory requirement, we avoided using a dense or hierarchical data structure and instead chose a hashmap to store the occupied voxels. This efficiently exploits the fact that most of the space in a regular voxel grid is either unobserved space or free space. Each voxel Vi stores the centroid of the points along the surface normal, the uncertainty measurement, a list of voxels whose uncertainty measurement depends on Vi, and a buffer to store unprocessed points in the voxel if we do not have sufficient neighborhood voxels to calculate the normal.
The updates are stopped when the scanning motion of the manipulator stops, and all the captured point clouds have been used. Finally, a point cloud (nominal) with meta data for uncertainty estimates is generated using the data for all the occupied voxels.
We have verified the applicability and feasibility of our method by conducting some physical experiments. Lawn-mowing patterns was used for scanning the environment from a vertical plane in front of the workspace. The data captured at 50 points along the path were used to generate the uncertainty 3D point cloud model of the environment. The point cloud fusion, comparison, and uncertainty model generation was completed within a few minutes.
Figure 4(b) shows the uncertainty model generated by scanning the environment in Fig. 4(a). This figure shows nominal point cloud, and the color of each point shows its uncertainty level along the normal direction. The uncertainty parameter is σ, where σ is the standard deviation of Gaussian distribution. So, the actual location of each point is expected to lie within −3σ to 3σ range along the surface normal direction, with respect to the nominal location. Thus, the spread of a point is 6σ. In our experiment, we found the maximum σ to be 0.33 cm, which gives the spread to be nearly 2 cm. However, the average spread across the scene was found to be around 1.5 cm.
Figure 4(b) also shows how the uncertainty varied across the scene. We found that nearer points have lower uncertainty than the points further away from the robot. This is expected according to the camera specification, since the nearer points are at the ideal distance from the camera, and increasing distance results in additional error. Another thing we observed is the change in uncertainty with respect to the surface normal direction of the points. The depth information (hence, location) of a point is expected to be more accurate when the surface normal is parallel to the camera’s viewing direction. As Fig. 4(a) shows, our scanning motion mostly used forward-facing, but slightly tilted in the downward direction. Therefore, the surfaces that are vertical with respect to the ground, facing the robot, had the minimum uncertainty. Conversely, the surface along the ground (like the table’s top surface) has higher level of uncertainty.
The approach described in this article for our uncertainty model generator module is sensor-agnostic and modular, as required by our system architecture. Our physical experiments demonstrate feasibility in terms of application and computational time, and intuitive correctness in estimating the uncertainty model of the environment. Our alert generator module uses this uncertainty model to assess the risk for executing any task plan under the uncertainty and generates alerts as needed.
6 Estimating Task Failure and Alert Generation
The alert generation module requires to estimate the probability of task failure as expressed by MTL formulae in our framework and then to issue alerts to human operators when the probability is higher than the specified level in an alert condition. The failure modes specified in the illustrative example in Sec. 4 for alert generation are based on the 3D spaces occupied different items (i.e., the robot, the gripper, the workpiece, and any obstacles) and the temporal changes in their poses. A task plan provides a sequence of robot poses—along with the grasped object after a successful grasping—with respect to time. Our system needs to compute the probability of failure for a particular plan, to check for the alert conditions to be true. The uncertainty model of the work space in 3D (Sec. 5), generated by our system, is used in this probability computation. In general, people would prefer to execute plans where the risk level (or failure probability) is minimal despite some unavoidable uncertainty in the work space model and gripper pose. Based on the operator-specified alert-triggering risk level, an alert is generated for a plan if the estimated risk exceeds the threshold.
This article specifies three kinds of alert conditions for the subtasks of the object transport operation have associated failure modes. Let A and B indicate the 3D space occupied by any two objects, with associated orientations, qA and qB. and represent the 3D space of the object to be grasped and the grasping volume, respectively. The three failure modes include: (1) collision (), (2) missed grasping (), and (3) object orientation not maintained ()
6.1 Collision Probability for a Task Plan.
Collision is the most common reason for task failure for almost any operation with a mobile manipulator. In our example, we have specified collision-based failure mode for which probability of collision between two objects needs to be estimated when one object is moving. We assume that both the 3D models have uncertainty, which can contribute to a nonzero probability of collision during executing the motion plan, even when the nominal 3D models will never collide.
1: : the moving object, : the stationary object
2: : motion plan for , where pose of at time ,
3: nominal point clouds of , , with associated uncertainty models
4: : maximum possible Cartesian displacement of per unit time
5: : minimum distance between the nominal point clouds which ensures zero probability of collision despite the uncertainty in the 3D models
6:
7: function Plan Collision Probability (, , , )
8:
9:
10: while
11: transform ()
12: portion of within the region of interest around
13: minimum distance between
14: if then
15:
16: ifthen
17:
18:
19: else
20:
21: return
A task plan provides a series of configurations of the moving object(s). Collision probability for a plan is the maximum collision probability found among all discrete instances. However, if collision checks are performed for every instance, a large number of collision checks would be required, which would make the process infeasible in terms of computational time. Using adaptive time instances for configurations to do collision checks will reduce the number of calculations. The process is described in the psuedocode for plan collision probability in Algorithm 1. The adaptivity is based on the minimum distance between two objects, and the maximum possible rate of displacement (dm) of the moving object(s). If the moving object is the gripper with or without any object, dm can be estimated based on the joint velocity limits of the manipulator. In a collision checking stage, if the two objects are found to be geospatially distant (meaning collision is not possible despite uncertainty in the 3D models), our method skips few instances and explores later stage configurations. Conversely, when the two objects are close to each other, the collision probabilities are calculated for consecutive configuration states in the plan until the distance between the objects become large.
Estimating probability of collision between two static point clouds with uncertainty (line 15 in Algorithm 1) is one of the major process blocks within the alert generator module. Careful consideration is required in developing the method to maintain high accuracy with low computational time, the two conflicting requirements. Our methodology and our computational efficiency and estimation accuracy calculations are described in Sec. 7.
6.2 Probability of Missed Grasping.
Estimating probability of missed grasping, right before closing the gripper, can be beneficial in ensuring successful grasping of the workpiece. When there is uncertainty in the environment model, including the workpiece object to be grasped, making grasping decision solely based on the nominal point cloud might not be wise, as Fig. 7 illustrates. This figure shows nominal workpiece model, and uncertainty on the front surface using red arrows. Without considering the uncertainty, one may overlook a nonzero probability of the missed grasping. Therefore, an alert based on the risk of missed grasping can be beneficial.

Missed grasping is possible under uncertainty. The red two-way arrows indicate uncertainty of point locations on the front surface of the workpiece (green). (Color version online.)
We need a computationally fast approach to ensure rapid alert generation. One method is to create point clouds that enclosing the grasping volumes and use some estimated uncertainty model based on the uncertainty in robot joint angles, attachment mechanism of the gripper etc., like used in our collision probability estimation method when the gripper is considered. The workpiece has uncertainty according to the scanning of the workspace, and it is computed using the approach in Sec. 5. If the gripper closes while the manipulator is stationary, we can call point cloud collision probability function once (line 15 in Algorithm 1, method described in Sec. 7) using the workpiece point cloud, and the generated point cloud(s) for grasping volume(s). Probability of collision computed by the algorithm will be equivalent to probability of not grasping air. From this, the probability of grasping air can be easily estimated. Similarly, if the manipulator moves while the gripper is closing, we need to consider the gripper motion as well as the gripper closing moving. Then, we can use the entire function, plan collision probability, in Algorithm 1, which is applicable for the a motion plan. Here, the motion will be from the gripper closing, and so the grasping volume point cloud(s) will keep changing throughout the process. Instead of a 3D model moving through place for robot motion, we have a 3D model changing its geometry with time (with or without motion as well) in this case. So, the transform function in line 11 will work differently than a typical moving object. Also, the adaptability in time instances might not be applicable in the gripper closure process.
There is another way that can be computationally less time consuming where the gripper model is assumed to be deterministic. The uncertainty model is only considered for the workpiece, which is part of the scene. Typically a good hardware and software system would ensure a sufficiently low level of uncertainty in the gripper’s pose, but the environment model can have higher uncertainty. Therefore, this assumption can be reasonable in some scenarios. Figure 8 illustrates this second method of computing the probability of grasping air. For conceptual clarity, only three points (blue colored) from the workpiece are shown in this figure.

Computing probability of missed grasping under certainty. (a) Nominal points in workpiece object with their uncertainty spread along surface normal. (b) Points are moved to +3σ (outward from the surface) position. A point is evaluated if the moved point location is inside the grasping volume. (c) The line segment of 6σ line for each point (going inward) that is inside the grasping volume is computed. This line segment provides the probability of each point being within grasping volume.

Computing probability of missed grasping under certainty. (a) Nominal points in workpiece object with their uncertainty spread along surface normal. (b) Points are moved to +3σ (outward from the surface) position. A point is evaluated if the moved point location is inside the grasping volume. (c) The line segment of 6σ line for each point (going inward) that is inside the grasping volume is computed. This line segment provides the probability of each point being within grasping volume.
First, our method takes all the points that the workpiece point cloud and extends them outward from the surface by respective 3σ values. Next, the points that fall within the grasping volumes are identified. Let P be the set of all such points. We use points extending outward from the surface since the robot will approach grasping from the outside of the surface. Therefore, any extended point that lies within the grasping volumes has a nonzero probability of being grasped. For each of the identified points, we consider its 6σ inward line (along the surface normal direction), which indicates the range of probable location of that point. The portion of that line that lies within the grasping volumes is checked, which is used with the analytical formula for Gaussian distribution to compute the probability of that point actually being within the grasping volume. Let the set of these probabilities of a point being grasped is Pg. Clearly, max(Pg) is the maximum probability of being grasped. Therefore, the probability of grasping air is computed as 1 − max(Pg).
6.3 Orientation Maintenance.
In this framework, it is assumed that changes in the pose of any object can be caused only by the robot (e.g., gripper movement, grasping and lifting of the workpiece, or colliding with the environment). It is also assumed that firm grasping occurs if the grasping condition is met.
As such, the orientation maintenance requirement is used when the orientation of the workpiece needs to be maintained throughout the entire part handling process. Such a requirement might be needed for specific workpieces, like handling an open container with liquid. The assumption of firm grasping ensures no slipping or sliding of the workpiece after grasping. Therefore, the object orientation is solely dependent on the gripper orientation, hence the robot end-effector orientation which has relatively little uncertainty. The condition of orientation checks whether the Euler angles of rotation stays within a limit set by the operator, with respect to the original (or required) orientation. For a particular pose, this is considered deterministic. When checking for this condition for a path plan, all way-points along the path are evaluated. If at least one way-point pose violates the condition, the probability of this failure mode occurring for that plan is 1. Otherwise, there is zero probability of orientation violation.
6.4 Alert Generation.
Using the aforementioned methodologies, the probability of a relevant task failure as specified in the alert-triggering conditions is estimated for any given task plan. An alert is issued to the human operator if the estimated probability is higher than the threshold value set by humans in the probabilistic MTL formula. The alert can be in the form of a message pop-up, sound, and other means according to the user preference. In addition to the brief alert, the operator should have access to additional details, and the risk information needs to be presented in an easily interpretable way.
7 Collision Probability Estimation
Collision is one of the most common failure modes applicable to all applications with mobile manipulator. Estimating collision probability between moving objects is an integral process needed within the alert generation module. There are many research works on collision probability estimation between stationary objects as well as moving objects. Many motion planning frameworks has in-built collision probability estimation. However, they are not suitable due to the lack of flexibility and accessibility from the robot user. Moreover, a planner-agnostic way of estimating collision probability is required, which is computationally fast enough and easily integrated with any mobile manipulator system. It should be able to use a task plan generated from any source, and let it be a custom plan from a human operator or a plan from a motion planning module. It is required to perform the heavy computation for collision probability estimation very quickly and with good accuracy, to ensure rapid alert generation with good quality.
The probability of collision between two 3D objects executing a task plan—where at least one object is moving with time—must be estimated to check for an alert condition related to most failure modes in our illustrative example in Sec. 4. The moving object goes through a series of different configurations with time, and collision probability changes with each configuration. Because both 3D models have associated uncertainties, the probability of collision between the robot and every object needs to be computed. The approach for estimating collision probability for a task plan is described in Algorithm 1 in Sec. 6.1. In the following sections, we will describe few building blocks within that algorithm and analyze computational efficiency and estimation accuracy of our method.
7.1 Collision Probability Between Point Clouds With Uncertainty
Collision probability estimation between two point clouds with uncertainty models
1: function Point Cloud Collision Probability (, )
2: pairs of points in and within distance. ▷(k-NN search)
3: ifthen
4: return
5:
6: fordo
7:
8: ifthen
9:
10: return
The major computational block in computing plan collision probability is estimating the collision probability between objects in fixed configurations (line 15 in Algorithm 1). Pseudocode for Point Cloud Collision Estimation in Algorithm 2 illustrates this computational process. Given two point clouds representing separate objects A and B, each consists of collections of points with nominal locations on the surface of the respective object. Each point has an associated uncertainty along the normal direction. This uncertainty assumes a zero-mean Gaussian distribution with respect to the nominal locations, and the standard deviation of each case indicates the uncertainty level. Assume that the maximum uncertainty levels found in the two objects are σA and σB, respectively. For Gaussian distributions, it can be assumed that each point lies within its [− 3σ, 3σ] range with certainty. Therefore, when the minimum distance between the objects (nominal point clouds) is found to be greater than (3σA + 3σB), it is assumed that there is zero probability of collision. Otherwise, a collision is possible, and the probability of this collision must be computed. In which case, a k-nearest neighbors (k-NN) search to find pairs of points from the two point clouds that are within γ = (3σA + 3σB) distance (line 2 in Algorithm 2), and estimate the probability of collision between each point pair. A point-to-point (P2P) collision probability for each pair of points is calculated (line 7 in Algorithm 2), and the highest probability found among all the point pairs is the collision probability for the two objects at their respective configurations.
7.2 Point-to-Point Collision Probability in the Presence of Uncertainty.
When P2P collision checking (P2P Collision Probability in Algorithm 2) is performed between the pairs of points from each of the object point clouds, it is assumed that each of these points has an associated location uncertainty. If there was no uncertainty, then the collision checking result would be binary: collision or no collision. One means of collision checking between pairs of points by using penetration cylinders of each point. If one point is found inside another one’s cylinder, P2P collision occurs. Given two points a and b from two different point clouds, having surface normal directions na and nb, respectively. Choosing an appropriately small diameter penetration cylinder (i.e., a cylinder extending from a to the opposite direction of surface normal na) for point a with penetration depth of x, check whether b lies inside that cylinder. Then, the same test is performed with b’s penetration cylinder. If any point of the pair is found to be inside the penetration cylinder of the other point, it is confirmed that a and b are in a collision state. Otherwise, there is no collision.
Extending the collision cylinder methodology to P2P collision checking under uncertainty computes the probability of collisions between two points, given their nominal locations, surface normal directions, and uncertainty in position along the normal direction. Since each point would lie within [− 3σ, 3σ] from its nominal location, the penetration cylinder is extended by 3σ in both directions as is shown for point a in Fig. 9. If another point is inside the extended cylinder, there is a nonzero probability of collision. However, the other point b also has position uncertainty: it, too, can lie anywhere between [− 3σb, + 3σb] from the nominal location along the line nb. Therefore, if the ± 3σb line segment on nb penetrates into the extended cylinder of a, there is a nonzero probability of collision. If that line segment for b is completely outside of the cylinder, the same test is performed using the extended cylinder of b and line segment of a. If no penetration is found again, there is no probability of collision. Otherwise, the probability of collision between a and b needs to be computed. Figure 9 illustrates this concept.
In the event of a nonzero P2P collision probability, the focus shifts to one or both combination(s) of cylinder and line segment of the two points, where the line segment penetrates the extended cylinder. Sampling a sufficiently large number of points, N, on the line segment, assuming a Gaussian distribution, the probability of collision can be computed for each sample and the point of the cylinder using an analytical formula for a Gaussian variable. Figure 10 illustrates two cases with an extended cylinder of a and two sampled points, bi, for b along nb. In both cases, the point bi will be inside the penetration cylinder (penetration depth x) of a only if a ∈ [α1, α2] (within ± 3σa of a). Thus, the probability of collision between each point bi and a can be calculated as pi = Pr(a ∈ [α1, α2]). The value of pi is calculated for all samples, bi, where . Finally, the mean probability of collision between points a and b is calculated as Similarly, the value of pba is calculated if a line segment of a penetrates the extended cylinder of b. The maximum between pab and pba represents the collision probability between points a and b.
7.3 Efficiency in Computational Time and Estimation Accuracy.
It is of utmost importance to check for alert conditions and generate alerts extremely fast. The entire computation for collision probability has high level of complexity. It deals with a sequence of many configurations for at least one of the two objects, each having its own uncertainty model. Therefore, it can easily become computationally infeasible, where the time required for estimating collision probability for a task plan goes beyond a reasonable limit. To ensure feasibility, we go through each of the major recurring computations in our method and provide insights on minimizing computational time. However, the accuracy in estimating the probability of failure is crucial to ensure high-quality meaningful alerts. If the risk associated with a plan is underestimated or overestimated, the human operator will not be able to make a good assessment of the situation. Naturally, lowering the computational time and increasing estimation accuracy are conflicting objectives. It becomes easy to maintain high estimation accuracy if one can afford performing more computations, i.e., higher computational time. Since good performance is required in both fronts, we analyze our method and check its feasibility and performance.
First, let’s look into the major computational blocks in our approach. As the plan collision probability computation in Algorithm 1 outlines, collision computation is performed only for adaptive time instances in the trajectory plan of the moving object. This ensures that we need to do less number of computations as compared to doing for every step. If a plan being tested is reasonable (i.e., only minor collisions, no major collisions), the number of way-points in the trajectory plan being checked is relatively low. The computations for each way-point being tested are in lines 11, 12, 13, and 15. The transformation of gripper point cloud, cropping scene point cloud, and computing minimum distance between the two fixed point clouds are relatively fast. The main computational bottleneck is in line 15, Point Cloud Collision Probability, whose pseudocode is provided in Algorithm 2. The two major computations are in line 2 and line 7. Initial search using k-NN algorithm to find pairs of nearby points is done only once, while the P2P collision probability computation is done for every pair of points found from the k-NN search.
The computational time for every computation mentioned earlier, except P2P collision probability computation, is heavily dependent on the size of the point clouds. Downsampling the point clouds or filtering out point cloud data that falls outside of a defined region of interest at each step can improve this performance. In our case, we only consider the gripper’s model while grasping since this is the part of the robot, which enters through the work space obstacles, and only a cropped out portion from the scene (or nonrobot stationary object) that lies within a specific region of interest based on the gripper pose. However, it is ensured not to downsample too much, otherwise points can be too sparse causing us to underestimate the collision probability.
To test the feasibility of our method in terms of computational time, we tested a few cases in matlab. The moving and stationary objects were the gripper and the scene respectively, and we had suitable uncertainty models for the point clouds of both the objects. Uncertainty in gripper may arise from the error in manipulator joint angles, issues with the gripper attachment, or jerking-related problems in mobile base or manipulator base. Downsampled point clouds were used where the distance between points is in the range of 3–5 mm. Every time a particular way-point of the gripper trajectory is tested, our method crop out the scene points that lie within the region of interest based on the gripper pose. In our experiments, number of points in the downsampled point clouds of the gripper and the scene were between 1,500–2,500 and 30,000–40,000, respectively. Cropping of the scene point cloud (line 12 in Algorithm 1) reduced the size of the point cloud by at least 82%, bringing the number of points in the scene point cloud down to the range of 0–7,000. For cases with moderate collision probability in our experiments, the number of points in the cropped scene point cloud was around 2,000–4,000, and even less negligible for lower collision risk cases. If people use any state-of-the-art manipulator planner to generate a plan, it is not expected for a plan to have too high probability of collision. So, the aforementioned size of the point clouds from out experiments should deem reasonable as we test for computational time of our approach.
In our experiments using matlab on Intel Xeon CPU Quad-Core E5-1620 3.6 GHz processor with NVIDIA Quadro 600 and 16GB RAM, we found that the computational time of line 11 in Algorithm 1 is only a couple of milliseconds, and lines 12, 13 are done in tens of milliseconds. The main computational expense incurs in Point Cloud Collision Probability function call. k-NN-search (line 2 in algorithm 2) is done once, and it takes lower tens of milliseconds. The main challenge is performing P2P Collision Probability for every pair of points found. Since all points might not have the same uncertainty level, therefore only computing the probability for the point pair with shortest distance does not work, computing probability has to be done for all the pairs.
Our method for P2P collision probability computation requires sampling of one point, and use analytical formula for Gaussian distribution for the other point. To make the process simpler, one could do sampling for both the points, perform deterministic collision check, and compile the results for probability of collision. Let’s denote our method as method-1, and the alternative methods as method-2. Let’s assume we are using N samples of each point’s uncertainty in location. Our method requires N computations, whereas the alternative method would require N × N or N2 computations. Let’s assume the time required for probability of P2P collision is tp, and computational time for each deterministic collision check is td. So, the total time required for P2P collision probability computation using method-1 is t1 = 2N tp, and method-2 is t2 = 2N2td + δ. Here, δ is the time required for combining the deterministic results, and 2 in both formulas come from interchanging the points for putting the penetration cylinder.
Clearly, the deterministic computation is extremely simple and fast, and therefore, td < tp. However, if Ntp ≤ td, then t1 < t2, and so method-1 will be faster for P2P collision probability computation. We found from our experiments that tp ≈ 40 td. So, for any N > 40, our method will be computationally faster. In case of sampling from a probability distribution, there is a high risk of inaccuracy in estimation. Using larger number of samples provides higher confidence in the computed result, but that increases computational time. Therefore, the choice of N is crucial to ensure estimation quality. Since our method uses sampling for only one point, instead of two in method-2, our method provides at least the same level of confidence as method-2 for the same value of N.
To get insights on choosing an appropriate value of N, some experiments are performed for method-2 with a large number of randomly generated cases with point pairs. For each case, we generate the locations of the points in a pair and their normal directions. Some cases have high probability of collision, and some have low or zero. The findings from 1000 cases are accumulated for the results presented in each of the rows in Tables 1 and 2. We assumed that N = 3000 provides 100% confidence in estimating the probability of collision, which are denoted in column 1 in both the tables as the base values. When smaller N values (N = 30, 50, 100, 300) are used for the same cases, clearly the computed probability of collision deviates from the base value found using N = 3000. Certainly, one would like to go as close as possible to the base values to ensure high accuracy in estimation.
Maximum error in estimating P2P collision probability using N2 Monte-Carlo samples with different N values, based on randomly generated 1000 cases
Collision | Maximum error in probability Estimate | Max %error in probability Estimate | ||||||
---|---|---|---|---|---|---|---|---|
Probability range | N = 300 | N = 100 | N = 50 | N = 30 | N = 300 | N = 100 | N = 50 | N = 30 |
0.01 − 0.05 | 0.026 | 0.062 | 0.060 | 0.094 | 137.2% | 313.8% | 349.1% | 406.5% |
0.05 − 0.15 | 0.049 | 0.092 | 0.138 | 0.196 | 50.8% | 101.7% | 255.9% | 212.0% |
0.15 − 0.30 | 0.081 | 0.138 | 0.184 | 0.198 | 35.0% | 60.3% | 77.5% | 107.5% |
0.30 − 0.50 | 0.117 | 0.128 | 0.255 | 0.263 | 27.4% | 35.7% | 55.8% | 72.0% |
>0.50 | 0.082 | 0.157 | 0.212 | 0.260 | 14.1% | 22.54% | 38.4% | 49.6% |
Collision | Maximum error in probability Estimate | Max %error in probability Estimate | ||||||
---|---|---|---|---|---|---|---|---|
Probability range | N = 300 | N = 100 | N = 50 | N = 30 | N = 300 | N = 100 | N = 50 | N = 30 |
0.01 − 0.05 | 0.026 | 0.062 | 0.060 | 0.094 | 137.2% | 313.8% | 349.1% | 406.5% |
0.05 − 0.15 | 0.049 | 0.092 | 0.138 | 0.196 | 50.8% | 101.7% | 255.9% | 212.0% |
0.15 − 0.30 | 0.081 | 0.138 | 0.184 | 0.198 | 35.0% | 60.3% | 77.5% | 107.5% |
0.30 − 0.50 | 0.117 | 0.128 | 0.255 | 0.263 | 27.4% | 35.7% | 55.8% | 72.0% |
>0.50 | 0.082 | 0.157 | 0.212 | 0.260 | 14.1% | 22.54% | 38.4% | 49.6% |
In case of low-P2P-collision probability randomly generated 1000 cases, failure in detecting any possible collision for using a limited number of samples (different N values when using N2 Monte-Carlo samples for uncertainty)
Percent of cases—failure to estimate | ||||
---|---|---|---|---|
Collision Probability | collision probability to be nonzero | |||
range | N = 300 | N = 100 | N = 50 | N = 30 |
0.00–0.01 | 23.2% | 44.0% | 60.7% | 73.5% |
0.01–0.02 | 0.1% | 4.8% | 10.6% | 23.2% |
0.02–0.03 | 0.0% | 0.9% | 5.4% | 12.5% |
Percent of cases—failure to estimate | ||||
---|---|---|---|---|
Collision Probability | collision probability to be nonzero | |||
range | N = 300 | N = 100 | N = 50 | N = 30 |
0.00–0.01 | 23.2% | 44.0% | 60.7% | 73.5% |
0.01–0.02 | 0.1% | 4.8% | 10.6% | 23.2% |
0.02–0.03 | 0.0% | 0.9% | 5.4% | 12.5% |
The allowable error or inaccuracy level in probability estimation can be different for high probability cases versus low probability cases. Therefore, the cases for different ranges of probability are separated in different rows in Table 1. The maximum error found among 1000 random cases is presented, as well as the maximum percent error with respect to the base probability. It is clear that N ≤ 100 provides high maximum error, hence unacceptable. N = 300 still incurs inaccuracy, but the level might be acceptable depending on the situation. Table 1 shows that even N = 300 can still have an error of up to 0.08 in estimating a probability with base value higher than 0.50. So, clearly, N ≥ 300 should be used to perform good quality probability estimation.
Since collision probability is majorly focused in our application, it is especially important to compute a nonzero probability of collision, even for cases with very low probability. In many cases, humans may want to get alerted for any plan with a nonzero probability of collision or plans with collision probability greater than 0.05 or 0.01. Table 2 presents our findings regarding collision detection for low collision probability (≤0.03) cases. The first row shows that for cases with probability of collision less than 0.01, method-2 with N = 30 fails to detect any probability of collision for more than of the cases. Even N = 100 fails more than of the times, but N = 300 seems performing well. As we look into cases with slightly higher collision probability (0.01–0.03), we can see failure in detecting any collision decreases for all values of N. Please note that this table does not represent the accuracy level in probability estimates. Clearly, this table also shows that N ≤ 100 cannot be used since that could misrepresent the risk level to human, and results in unsafe robot operations.
Since our method will be computationally faster for any N > 40, and here we need to use N ≥ 100, preferably N ≈ 300, to generate reliable results, we can conclude that our method outperforms N2 method in terms of performing superior estimation in less time. In our experiments using matlab, we managed to compute probability of collision for risky task plans in less than 10 s every time. If c++ or more computationally efficient platform is used, it can be certainly performed with even higher accuracy and less computational time. Thus, our method meets the requirement of generating alerts within seconds.
Our collision probability estimation method is feasible in terms of computational time and quality of estimation. By using this method, collision probability for a task plan can be estimated, which aids in generating alerts for one of the most common failure modes in mobile manipulator applications.
8 Visualizing Plan Risk
Typically, a human–robot interaction interface for a mobile manipulator provides visualization of the environment model (without uncertainty), some task goal or plan selection capability, and animation of a motion plan. However, we require more sophisticated visualization to support risk-aware decision-making by human operators so that the data from our risk assessment and alert generation module can facilitate better performance. We need to graphically show potential failure modes and relate them to the source of uncertainty. We have added several features and settings in our system that can support operator’s individual preference as well as situation-specific need to improve human perception of plan risk.
A plan visualizer is provided to show animations of the robot executing a particular plan selected by the operator before executing it. People can use any state-of-the-art planner to generate plans for a given goal. The operator can select a particular plan for which our system calculates the risk associated with the plan, generate alerts, and let the human see how the plan execution will look like. We use rviz, the 3D graphical interface of ROS, for this visualization. If the plan has some risk associated with it, we need to graphically represent (1) failure mode and risk level (probability), (2) uncertainty in the data involved in the failure, (3) role of the uncertainty in triggering failures. All these metrics need to be covered for ensuring good understanding of the risk by human operators.
Each task plan includes the robot motion for perform the task (or subtask). The manipulator motion is typically the most challenging since robot needs to avoid colliding with different obstacles in the workspace while physically interacting with certain objects, like grasping and retrieving an object or placing it. When the robot operator gives a particular goal, the manipulator planner generates one or more plans to execute the task. The person can select a particular plan and simulate and visualize the motion. Figure 11 shows an instance of the animation of a manipulator motion to grasp an object.
![An instance in visualizing manipulator plan for grasping an object, where the green arm indicates the goal pose [7]. (Color version online.)](https://asmedc.silverchair-cdn.com/asmedc/content_public/journal/computingengineering/22/5/10.1115_1.4054088/2/m_jcise_22_5_050906_f011.png?Expires=1698771093&Signature=H0qsxYbxmhX7DIUEXAw4Tx2g08bi~-4v4rAmIY3dxwErjHgNWEG9ybffu~fkzSRT3957GvPk4jHZkiea6w89BBDqGqalHa4axTSfEsVjTOhHxPee0jxTRsrt9QrzOqNDT8-ZT7m1ugjgZUSJPwAsbv57Qm-XQA24Mkzj2vmd-UTZtuUnZ34NDjeJvevMcc0iK67c9v4fxBle8RfXc8mbC-rh~eWWlJ95ZEaaPcfzhq~Yk~u-H~C0Bpf9mns0lRrF~3jy2FQgQ8sR1pr7mZomsL9PTrAdnIvvwak5X0wgbfTkiWPB7UR1kTbgwQJBXG1VEIkEsotnY2~2JdzVBPZSsA__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
An instance in visualizing manipulator plan for grasping an object, where the green arm indicates the goal pose [7]. (Color version online.)
![An instance in visualizing manipulator plan for grasping an object, where the green arm indicates the goal pose [7]. (Color version online.)](https://asmedc.silverchair-cdn.com/asmedc/content_public/journal/computingengineering/22/5/10.1115_1.4054088/2/m_jcise_22_5_050906_f011.png?Expires=1698771093&Signature=H0qsxYbxmhX7DIUEXAw4Tx2g08bi~-4v4rAmIY3dxwErjHgNWEG9ybffu~fkzSRT3957GvPk4jHZkiea6w89BBDqGqalHa4axTSfEsVjTOhHxPee0jxTRsrt9QrzOqNDT8-ZT7m1ugjgZUSJPwAsbv57Qm-XQA24Mkzj2vmd-UTZtuUnZ34NDjeJvevMcc0iK67c9v4fxBle8RfXc8mbC-rh~eWWlJ95ZEaaPcfzhq~Yk~u-H~C0Bpf9mns0lRrF~3jy2FQgQ8sR1pr7mZomsL9PTrAdnIvvwak5X0wgbfTkiWPB7UR1kTbgwQJBXG1VEIkEsotnY2~2JdzVBPZSsA__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
An instance in visualizing manipulator plan for grasping an object, where the green arm indicates the goal pose [7]. (Color version online.)
Each subtask as defined in Sec. 4 has a different condition for task failure. Whenever the operator visualizes a plan, our alert generation module computes probabilistic risk and issues an alert when needed. In the animation of plan with an alert triggered, a yellow bubble appears around the region of failure when the failure occurs like shown in Fig. 12. Failure type and risk level are also provided to the human operator. As collision is perhaps the most common failure mode, we will illustrate different features that ensures effective graphical representation for collision-based failures.

Yellow bubble and red points indicate collision between the gripper and the wall of 3D printer. An alert message pops up to indicate high risk (probability) of failure for a given task plan. An alert sound also gets played along with the pop-up message to attract human attention. (Color version online.)

Yellow bubble and red points indicate collision between the gripper and the wall of 3D printer. An alert message pops up to indicate high risk (probability) of failure for a given task plan. An alert sound also gets played along with the pop-up message to attract human attention. (Color version online.)
8.1 Collision Visualization Settings.
Collision probability is computed based on the robot motion plan and the uncertainty model of the scene and the gripper. If the estimated collision probability is greater than threshold value specified in the alert condition, an alert needs to be given. While or even before simulating a given plan, a pop-up warning message (Fig. 12) can be useful to attract the operator’s attention. Ideally, the message should be simple and easily interpretable. An alert sound along with the alert box can be even more effective.
The alert message pop-up indicates the failure mode. However, the operator needs to know the estimated risk level (here, collision probability), as well as, when, how, and why the failure might be occurring. As collision can happen due to the uncertainty in environment and gripper model, there might be a nonzero probability of collision even if the nominal point clouds of the scene and gripper are not colliding. Alerts triggered in such a situation can be confusing to the operator where a plan that looks safe (during visualizing the plan) might still have a small likelihood for collision depending on the uncertainty level. Therefore, it is extremely important to provide different features to visualize the collision state and assess the situation better.
From the visualization of a plan execution, the operator can pause at any instance and adjust visualization settings for collision. We provide different features that can be enabled or disabled while visualizing a static collision state, which are described and illustrated in this section.
Yellow bubble: A yellow bubble or sphere (see Fig. 12) enclosing the collision region of both objects is useful to draw attention of the operator to the problematic region, both during the plan animation and static inspection.
Color gradient on points in collision: Coloring the points in collision (nonzero collision probability) with a bright color, like red, can highlight the exact region of collision. If the colliding regions in any object is relatively large, the operator may also want to differentiate between highly probable region and low probable region where color gradient can help. In Fig. 13(a), brighter red stands for higher probability of collision, whereas darker red color is for lower probability.
Different colors in two objects: The operator may prefer two different colors for the two objects in collision like is shown in Fig. 13(b) to differentiate the two items, especially when the objects seem too close to each other. The left two images in Fig. 14 show that it is very difficult to inspect if the same color is used when the objects are nearly touching each other, whereas the right-most image shows some distinction.
Using nondominant colors: If the color used to highlight the points in collision is already dominant on one of the objects, then the operator would not be able to distinguish the region of collision properly (Fig. 14(a)). Using colors that are not dominant on the objects is important to ensure clarity as illustrated in Fig. 14.
Hiding noncollision objects: Sometimes it might be useful to hide all other objects in the scene, which are not in collision. Other objects in the background may hamper the inspection.
3σ arrows from region of collision: When the region in collision with the nominal point clouds is being visualized, it seems unclear how far these colored points can extend outwards from the nominal surface. Since the points can go 3σ inward or outward, some sparse outward arrows of length 3σ can show the maximum extent. The operator can enable or disable the arrows of each object individually as enabling both of them might look confusing as illustrated in Fig. 15. This feature is particularly useful when there is some gap between the nominal point clouds to see the extent the uncertainty in collision regions. If the gap is too small, the arrows will go into the other object and so visually would not be very helpful.
Inflating point clouds: Another way to show the extent of the uncertainty of the points is by inflating the objects by σ, 2σ, or 3σ. If there is a nonzero probability of collision, inflation by 3σ will prominently show touch or intrusion of one object into other. Figure 16 shows an example of how inflation by σ makes the objects look closer to each other. Inflating helps especially when there is a visible gap in nominal point clouds, and the human cannot perceive how collision can occur. Here, inflation of objects can improve understanding on where the uncertainty might be causing a collision. It also allows to view uncertainty step by step (σ, 2σ, 3σ) and see how objects get closer with increasing the inflation level to inspect collision.

Color gradients: (a) brighter red colored are the points with higher probability of collision, and darker reds have lower risk and (b) Two different colors, green and blue, to visualize collision region in the gripper and obstacles, respectively. (Color version online.)

Choosing the right color scheme to visualize collision region is important: (a) if one color is used, which is already dominant in one of the objects, it loses clarity, (b) if one nondominant color is used, but the colliding objects are too close, its becomes hard to distinguish between the two objects, and (c) Using two different colors on the objects and using nondominant color can provide better clarity. (Color version online.)

Choosing the right color scheme to visualize collision region is important: (a) if one color is used, which is already dominant in one of the objects, it loses clarity, (b) if one nondominant color is used, but the colliding objects are too close, its becomes hard to distinguish between the two objects, and (c) Using two different colors on the objects and using nondominant color can provide better clarity. (Color version online.)

3σ arrows, along surface normal direction, extending from the collision region helps to show the extent of location uncertainty of the points: (a) arrows from both objects, (b) arrows only from the gripper, and (c) arrows only from the obstacle
These features ensure that all three visualization metrics are supported in our system in case of collision by showing failure mode and risk level, drawing attention to problematic region, highlighting the extent and the role of uncertainty in triggering failures.
Occlusions in the generated 3D model of the workspace can often create major challenges in manipulation tasks and risk visualization. We have tried to reduce occlusion by moving the camera location and viewing angle with the scanning motion of the manipulator. However, there is always some occlusion present in the generated model. We consider the clutter level of the workspace to be such that a manipulator motion plan can be generated, which completely avoids the occluded or unknown region. We assume that the robot stays within the known free space while executing a plan, but there still might be a nonzero probability of collision due to the uncertainty in the models which we focus on. We also consider manipulation motion plans where the gripper approaches the workpiece from the side from where the 3D scanning was performed to generate the workspace model. For example, the robot grabbing an object from the behind is not considered as our scanning might not have captured the back region properly and there will be occluded regions. Under this consideration, the surfaces that the robot has possibility to collide will be captured during the scanning process, and hence, our risk estimation and visualization scheme will be able to offer useful information.
9 Case Study
We have conducted some physical experiments with our mobile manipulator to test different modules of our alert generation framework. We mainly performed grasping and retrieval of objects during these experiments. Two different cluttered environments were used, as shown in Fig. 17. Our robot picked up the hand sanitizer bottle from scene 1 and a 3D-printed red object from scene 2.

The scenes we used in our experiments for grasping and retrieving an object. The objects to be grasped are marked with the arrows: (a) Scene 1 and (b) Scene 2
For each scene, first, the manipulator did a scanning motion from the front of the workspace to generate the 3D model of the environment (Fig. 4(a)). The RealSense RGB-D camera mounted on the manipulator end-effector captured 50 point clouds from different locations and orientations, which are filtered and then fused together. Our uncertainty model generator analyzed the data to get a nominal point cloud and uncertainty model for each scene. Figure 4(b) illustrates the uncertainty model computed for scene 1 with a green-to-red color gradient scheme. Each point has a different level of uncertainty, which can be recognized by its color in the figure.
The generated uncertainty model for each scene was used to compute plan risk and generate alerts for different task plans. We experimented with multiple task plans for executing subtasks 1–3, i.e., going to pregrasp pose, grasping the object, and then retrieval of the object. We took three manipulator motion plans for each scene, one very safe plan, one plan with extremely small probability of collision, and one very risky plan. Safe plans for both scenes resulted in zero probability of failure in our risk calculation. We executed the safe plan on the actual robot and the robot performed the grasped and retrieved the object perfectly (Fig. 18). For the second category of task-plan for each scene, our alert generation system calculated the probability of collision to be below 0.012. Even though the probability of collision was greater than zero, since they were very small numbers, we inspected the plans’ risk in our visualizer, and then executed the plans on the robot. The robot gripper went very close to an obstacle, but did not cause a collision. Finally, we had one very bad plan for each scene, for which our alert generation module computed probability of collision to be 0.6, 0.43. In this case, we only observed the task execution animation and static collision states in our visualizer.
10 Conclusions and Future Work
A model-based, risk identification and management framework is presented in which different hazards—such as collisions—in semi-autonomous mobile manipulator tasks are addressed. A modular architecture has been presented, which allows integration of our alert generation module with any mobile manipulator hardware and software suite. We have shown how human-specified, risk-based, alert-triggering conditions can be expressed in a probabilistic MTL framework. This article demonstrates a sensor-agnostic way of generating uncertainty model of the environment. Using that, a computationally efficient method is described for estimating the probability of collision (and other failure modes) related to specific task plans and alerts are generated accordingly. Finally, various risk visualization features are illustrated to present the risk-related information graphically to human operators effectively. We have verified feasibility of our approach for alert generation by conducting through physical experimentation.
We expect further more experimentation will aid in further verification and demonstration of applicability in performing other kinds of tasks with a mobile manipulator. This article demonstrates only a few failure modes, but these can be expanded for other failure modes from different applications. The collision probability estimation for a plan has been tested only with limited objects (e.g., only between the gripper and the manipulated part). However, if the size of the 3D models increases, the time to complete the collision hazard analysis would also increase. Additional testing is necessary to characterize the limits of the system and effect further improvements to the system. Another important direction is to investigate how to reduce and handle occlusion in the 3D model of workspace so that more challenging manipulation tasks can be carried on safely.
The interaction between the alert generation module and the HRI unit is another important aspect in this work. Additional system development and user studies are necessary to identify the most effective way of graphical visualization of risk factors. It would be interesting to capture and synthesize individual preferences for data presentation. For example, the visualization scheme used in this article is reliant on color differentiation, whereas alternative accessibility options are expected to leverage alternative mechanisms (e.g., haptic feedback). Other aspects of planned investigation include the assessment of timing as a performance factor and potential stressor, and the impacts of different alert mechanisms on task performance.
Footnote
Commercial equipment and materials are identified to adequately specify certain procedures. In no case does such identification imply recommendation or endorsement by the University of Southern California, the National Institute of Standards and Technology, or DEVCOM Army Research Laboratory, nor does it imply that the materials or equipment identified are necessarily the best available for the purpose.
Acknowledgment
This work was supported by the DEVCOM Army Research Laboratory and the Center for Advanced Manufacturing at Viterbi School of Engineering in University of Southern California. Any opinions or conclusions expressed in this article are those of the authors and do not necessarily reflect the views of the sponsor.
Conflict of Interest
There are no conflicts of interest.
Data Availability Statement
The datasets generated and supporting the findings of this article are obtainable from the corresponding author upon reasonable request.