As envisioned by Industry 4.0, the next generation of smart factories and warehouses will highly depend on the collaboration between human and artificial intelligence (AI). This symbiotic partnership can augment human capabilities by providing suggestions, assistance, and explanations as needed—or can utilize direct or indirect human feedbacks in a human-in-the-loop learning framework to enhance AI learning capabilities. This Special Section aims to harvest the latest efforts in fundamental methodologies as well as their applications in human–AI partnership with specific applications for next-generation factories encompassing the design process to manufacturing, production, and inspection.

“Seeking Human Help to Manage Plan Failure Risks in Semi-Autonomous Mobile Manipulation” by Al-Hussaini et al. presents a framework to identify the risk associated with a task failure in a shared-autonomy mobile manipulation task and if needed communicate the potential risk (e.g., collision) with a remote operator for seeking guidance. To this end, first, a probabilistic metric temporal logic approach is proposed to encode human-provided alert-trigging conditions. Second, a sensor-agnostic way of generating an uncertainty model of the environment is proposed to estimate uncertainty in 3D workspace models, estimate the probability of collision, and estimate task failure for generating alerts. Third, to enhance the situation awareness and decision making of the human operators, a wide range of visualization tools are integrated in the human–robot interface. Finally, the framework is demonstrated with a use case in which a mobile manipulator performs machine tending and material handling tasks.

The paper by Li et al. entitled “The Effect of Different Occupational Background Noises on Voice Recognition Accuracy” demonstrates the importance of developing voice recognition algorithms that are designed for specific occupational settings. A customized auto speech recognition (ASR) model with a denoising module is proposed to investigate the effect of unique background noise and the type of communication associated with different settings. The performance of this system is compared to a regular convolutional neural network (CNN) based voice recognition algorithm under several background noise conditions. The ASR model customized for specific occupational setting outperformed the CNN-based model with an overall performance increase between 14% and 35% across all background noises.

“User-Requirements Analysis on Augmented Reality-Based Maintenance in Manufacturing” by Runji et al. is a systematic literature review on augmented reality-based maintenance in manufacturing entities. Reviewing the relevant literature from 2017 to 2021, the specific user needs are categorized as ergonomics, communication, situational awareness, intelligence sources, feedback, safety, motivation, and performance assessment. These predominant user needs are cross-tabulated with the contributing factors (e.g., geographical location), and their results are presented using trend analysis to identify gaps and provide possible future direction.

The paper by He et al., entitled “A Convolutional Neural Network-Based Recognition Method of Gear Performance Degradation Mode,” presents a CNN-based stacking incremental deformable residual block network (SIDRBnet) model to identify the gear performance degradation mode. This method converts the vibration signals measured via four different accelerometers to gray scale images. Comparing to a regular CNN, the average pooling layer replaces the fully connected layer, and the large-size convolution kernel is replaced with a small-size convolution kernel. The paper experimentally evaluates the performance of both single channel and multichannel SIDRBnet and demonstrates the superiority of multichannel recognition model.

Manjunatha et al. focused on physiological data analysis in physical human–robot interaction for optimal role allocation and load sharing. They designed a collaborative task where the human subject guides an admittance-controlled robot. The motor difficulty was changed by varying the admittance parameter of the robot and constraining the motion to gross and fine movements. Muscle activities of subjects were monitored via wearable sensors and used to train a CNN model to predict the motor control difficulty. A transfer learning approach is introduced to personalize this model for new individuals using the information gained from other subjects and very few data of the new subject. The presented CNN-based transfer learning outperforms the Reimann geometry-based Procrustes analysis. The authors also demonstrated that the subject’s skill level in the pretrained model has no significant effect on the transfer learning performance of the new users. This finding is useful for adapting the robot control strategy to new users involved in the workforce training program.

Bomström et al. examined emerging requirements and challenges for human digital twins (HDTs) in three use cases of industry-academia collaboration on complex systems. They examined HDTs from the viewpoint of field maintenance experts for large industrial machines, knowledge workers using dashboards as a part of their jobs, and factory workers in an intelligent IoT factory context. They proposed to consider HDT as an abstraction of workers that encapsulate human aspects or involvement in a digital twin compatible format, which may cover representations of roles, specific types of characters, or human-related activities for simulation purposes. Four design objectives were formulized for human aspects of digital twins based on the cases. The worker representation in these objectives is in line with contemporary worker model concepts that target a broad selection of personal information and capabilities of workers, such as the current workload, work experience, skills, ergonomics, and even personalities. By modeling aspects of human knowledge, and the knowledge-based behavior, HDTs may be a key in simulating processes or systems that include inherent asynchrony and stochasticity due to human activities.

Gilles and Bevacqua presented a systematic review of virtual assistant’s characteristics. By reviewing the most relevant recent literature in the field, they identified the required agent characteristics for designing a virtual assistance to promote efficient human machine cooperation. Their review considers three main factors, the agent representation aspect (embodied or not) and its effects on human factors such as stress and trust, the characteristics of the communication modality with the human operator (speech, gesture), and whether or not the agent should display personality traits. The design, acceptance, and future direction of virtual assistance design are also discussed in the transportation domain.

“Online Detection of Turning Tool Wear Based on Machine Vision” by Dong and Li is a technical brief paper describing a practical machine vision approach to identify tool wear in real-time using a k-mean clustering for computationally efficient image segmentation and Hough transformation for detecting the wear edges. The experimental evaluation demonstrated a Micron-level accuracy in detecting flank face wear on turning tools.

Through worldwide dissemination of the special section's call for papers, we received a large number of submissions. After a minimum of two rounds of reviews, a total of eight papers (two review papers, five research papers, and one technical brief) were accepted as this special section for publication. The guest editors would like to thank all contributing authors for their excellent work. We are very grateful to the reviewers for their precious time and efforts to help finish the review process and for offering constructive comments to authors. Particularly, we would like to express our sincere appreciation to Prof. S. K. Gupta, the Editor-in-Chief of JCISE, for providing the opportunity and his support of this special issue. We also thank Ms. Amy Suski for editorial assistance. Without all of you, this special issue would not have been possible.