## Abstract

Distributed multi-agent systems are becoming increasingly crucial for diverse applications in robotics because of their capacity for scalability, efficiency, robustness, resilience, and the ability to accomplish complex tasks. Controlling these large-scale swarms by relying on local information is very challenging. Although centralized methods are generally efficient or optimal, they face the issue of scalability and are often impractical. Given the challenge of finding an efficient decentralized controller that uses only local information to accomplish a global task, we propose a learning-based approach to decentralized control using supervised learning. Our approach entails training controllers to imitate a centralized controller's behavior but uses only local information to make decisions. The controller is parameterized by aggregation graph neural networks (GNNs) that integrate information from remote neighbors. The problems of segregation and aggregation of a swarm of heterogeneous agents are explored in 2D and 3D point mass systems as two use cases to illustrate the effectiveness of the proposed framework. The decentralized controller is trained using data from a centralized (expert) controller derived from the concept of artificial differential potential. Our learned models successfully transfer to actual robot dynamics in physics-based Turtlebot3 robot swarms in Gazebo/ROS2 simulations and hardware implementation and Crazyflie quadrotor swarms in Pybullet simulations. Our experiments show that our controller performs comparably to the centralized controller and demonstrates superior performance compared to a local controller. Additionally, we showed that the controller is scalable by analyzing larger teams and diverse groups with up to 100 robots.

## 1 Introduction

Robots have become prevalent for various applications such as mapping, surveillance, delivery of goods, transportation, and emergency response. As real-world problems become complex, the need for multi-agent systems (MAS) instead of single-agent systems becomes paramount. In many robotics applications, distributed multi-agent systems are becoming crucial due to their potential for enhancing efficiency, fault tolerance, scalability, robustness, and resilience. These applications span diverse domains such as space exploration [1], cooperative localization [2], collaborative robots in production factories [3], search and rescue [4], and traffic control systems [5]. Hence, swarm robotics has been a focus of research for several years. Swarm robots are simple and large number of robots that collaborate to accomplish a collective objective, drawing inspiration from natural behaviors. Coordination and control of multiple robots has garnered significant interest from researchers and has been explored for various applications, including collective exploration, collective fault detection, collective transport, task allocation, search and rescue, etc. In nature, various systems form groups, where agents unite or separate based on intended purpose or unique characteristics [6]. Control of several agents to achieve a desired pattern or shape is critical for tasks that rely on coordinated action by multi-agent systems such as pattern formation, collective exploration, object clustering, and assembling. Most swarm research works concentrate on robots that are identical or have similar hardware and software frameworks, known as homogeneous systems [7–9]. Nonetheless, many applications of multi-agent systems necessitate the collaboration of diverse teams of agents with distinct characteristics to accomplish a specific task.

A key benefit of employing heterogeneous swarms lies in their capacity to tackle a diverse range of tasks, enabling the assignment of certain functions to a subset of the swarm [10]. In applications like encircling areas with hazardous waste or chemicals, boundary protection, and surveillance, which require the collaboration of multiple robots with diverse sensors or actuation capabilities [11], the robots need to synchronize their movements in a specific pattern. A viable approach would involve the robots organizing themselves into distinct groups for the subtasks. This behavior is known as *Segregation*. A similar behavior called *Aggregation* occurs when the different robot types are intermixed homogeneously. In such circumstances, incorporating all the necessary actuation and sensory functionalities within a single robot can be impractical; thus, a heterogeneous team with the appropriate blend of components becomes necessary.

Controlling large-scale swarms can be very challenging. Centralized methods are generally optimal and a good option for a small scale of robots. All the information is received by a central agent responsible for calculating the actions of each individual agent. In other formulations of centralized controllers, actions are computed for different agents locally but require global information. This is a bottleneck, not robust or scalable, and often impractical. With the increasing number of agents, decentralized control becomes essential. Each agent is responsible for determining its own action based on the local information from its neighbors. However, finding an optimal decentralized controller to achieve a global task using local information is a complex problem. Particularly, the effect of individual action on global behavior is hard to predict and is often governed by emergent behavior. This makes it very hard to solve the inverse problem of finding local agent actions that can lead to desired global behavior.

This paper focuses on developing solutions for addressing the challenges of the decentralized systems—local information, control, and scalability—by exploiting the power of deep learning and Graph Neural Networks. We developed a decentralized controller for segregating and aggregating a heterogeneous robotic swarm based on the robot type. Here, the proposed method focuses on the imitation learning framework and graph neural network parameterization for learning a scalable control policy for holonomic mobile agents moving in a 2D and 3D Euclidean space. The imitation learning framework aims to replicate a global controller's actions and behaviors. The policy is learned using the DAGGER (Dataset Aggregation) algorithm. Our experiments show that the learned controller, which uses only local information, delivers performance close to the expert controller that uses global information for decision-making and outperforms a classical local controller. The number of robots was also scaled up to 100 and 10 groups. Also, the controller was scaled to larger nonholonomic mobile robots and flying robot swarms.

## 2 Related Works

### 2.1 Segregation and Aggregation Behaviors.

Reynolds [12] proposed one of the pioneering research works to simulate swarming behaviors, including herding and the flocking of birds. The flocking algorithm incorporates three fundamental driving rules: *alignment*, guiding agents toward their average heading; *separation*, to avoid collisions; and *cohesion*, steering agents toward their average position. Expanding upon this foundation, numerous researchers have delved into the development of both centralized and decentralized control methodologies, including artificial potential [13], hydrodynamic control [14], leader–follower [7], and more [8,9].

Segregation and Aggregation are behaviors seen in several biological systems and are widely studied phenomena. *Segregation* is a sorting mechanism in various natural processes observed in numerous organisms—cells and animals [15]. Segregation seen in nature includes cell division in embryogenesis [15], strain odor recognition causing aggregation and segregation of cockroaches [16], and tentacle morphogenesis in hydra [17].

Aggregation is an extensively studied behavior found in living organisms such as fish, bacteria, cockroaches, mammals, and arthropods [18,19]. Jeanson et al. [20] developed a model for aggregation in cockroach larvae and reported that cockroaches leave their clusters and join back with the probabilities correlating to the cluster sizes. Garnier et al. [21] demonstrated aggregation behavior of twenty homogeneous Alice robots by modeling this behavior found in German cockroaches. Similarly, [22] analyzed a comparable model showing the combination of locomotion speed and sensing radius required for aggregation using probabilistic aggregation rules. The aggregation problem is quite challenging because the swarms can form clusters instead of being aggregated [23]. Examining methods that display this behavior can assist in designing techniques for distributed systems [24].

It has been previously shown that the variations in intercellular adhesiveness result in sorting or aggregation in specific cellular interactions [25,26]. Steinberg [26] developed a *Differential Adhesion Hypothesis* that states that the differences in the cohesion work in similar and dissimilar cells can achieve cell segregation or cell aggregation. As such, when a cell population encounters more potent cohesive forces from cells of the same type than from dissimilar types, an imbalance emerges, which causes segregation. The reverse of this action causes aggregation.

### 2.2 Decentralized Control of Robot Swarms Using Graph Neural Networks.

Previous research in segregation and aggregation involves classical methods, including convex optimization approach [11], evolutionary methods [27], particle swarm optimization [28], probabilistic aggregation algorithm [29,30], differential potential [24,25,31], model predictive control [32], etc. Probabilistic aggregation algorithms encounter the difficulties associated with unstable aggregation, as robots are consistently entering and adapting [33]. In Ref. [34], genetic algorithm was used to train a neural network for static and dynamic aggregation but faced the issue of scalability, unstable aggregation, and required large computational onboard resources. In Refs. [25] and [35], a differential potential concept was proposed though it uses global information.

However, large-scale systems frequently encounter challenges related to scalability, communication bottlenecks, and robustness. While centralized solutions, where a central agent determines the actions of the entire team, may be viable in small-scale scenarios, the demand for decentralized solutions becomes paramount as the system size grows. In decentralized systems, communication among agents is limited. Each agent must determine its actions using only local information to accomplish a global task. Some works have explored the segregation behavior with varying degrees of decentralization [36–38]. For example, the authors in Edson Filho and Pimenta [36] proposed the use of abstractions and artificial potential functions to segregate the groups, but the proposed method was not completely decentralized. Also, the work in Ref. [38] proposed a distributed mechanism that combines flocking behaviors, hierarchical abstractions, and collision avoidance based on the concept called virtual group velocity obstacle (VGVO). However, the robots start in an already segregated state, so the paper focuses on navigation while maintaining segregation. The main limitations of these previous approaches are that some require global information, analytical solutions, high computational resources, and careful tuning of control parameters. Additionally, most of the work focuses on the segregation problem in 2D spaces. The aggregation problem and segregation in 3D spaces have not been fully explored in literature. Moreover, while much of the work in literature has utilized analytical and control-theoretic methods, data-driven and learning-based controls have not been explored for the problems of segregation and aggregation.

As highlighted in the literature, deriving distributed multi-agent controllers has proven to be a complex task [39]. As a result, the challenge of finding these controllers motivates the adoption of a deep-learning approach. A decentralized controller-trained robust behavioral policies in an imitation learning framework is proposed. These policies learn to imitate a centralized controller's behavior but use only local information to make decisions. Moreover, dimensional growth issues arise as the number of robots increases. Both challenges are effectively addressed by harnessing the capabilities of aggregation graph neural networks (Aggregation GNNs). Aggregation GNNs are particularly well-suited for distributed systems control due to their inherently local structure [40].

Graph neural networks (GNNs) are neural networks designed to work with data structured in the form of graphs. They act as function approximators that can explicitly model the interaction among the entities within the graph. Graph Neural Networks function in a fully localized manner, with communication occurring solely between nearby neighbors; thus, they are well suited for developing decentralized controllers for robot swarms. They are invariant to changes in the order or labeling of the agents within the team. This is particularly important for decentralized systems. GNNs can also adapt to systems beyond the ones they were initially trained on, making them scalable to larger or smaller sets of robots. Graph Neural Networks are promising architectures for parameterization in imitation learning [41,42] and RL algorithms [43,44]. In Ref. [45], aggregation GNNs with Multi-Hop communication coupled with imitation learning for flocking, 1-leader flocking, and flocking of quadrotors in Airsim experiments was proposed. In Ref. [46], Graph Filters, Graph CNN, and Graph RNN were used in imitation learning for flocking and 2D grid path planning. In Ref. [42], the effectiveness of graph CNN coupled with Policy gradient learning compared with Vanilla Graph Policy Gradient and PPO with a fully connected network for formation flying experiments was shown. In Ref. [47], linear and nonlinear graph CNN coupled with imitation learning and PPO for coverage and exploration experiments were proposed. Blumenkamp et al. [48] presented the results showing the application of GNN policies to five robots in ROS2 for navigation through a passage in real environments. We build on these methods for a different class of swarming behaviors—segregation and aggregation which is challenging because of the instability that can occur with inaccurate grouping.

## 3 Main Contributions

As seen from the literature review in Sec. 2.2, most approaches in literature utilize mathematical equations as decentralized control laws that need to be analytically derived to obtain a global behavior, such as aggregation and segregation behaviors, which this paper focuses on. Obtaining such individual control laws for robots to obtain a desired global behavior is an inverse and hard problem and requires trials of many potential control laws. The Neural Network-based techniques proposed in this paper provide a data-driven approach that overcomes the challenge of deriving mathematical control laws. As noted in Sec. 2.2, for the segregation and aggregation behaviors studied in this paper, there is no data-driven and learning-based controller available in literature.

Hence, in contrast to previous approaches to segegrative and aggregative behavior in robot swarms, we present an approach to demonstrate these behaviors using decentralized learning-based control. This approach aims to design local controllers guiding a diverse group of robots to exhibit both Segregation behavior (forming distinct groups) and Aggregation behavior (forming homogeneous mixtures). The approach utilizes Graph Neural Networks to parameterize the controller and trains them using imitation learning algorithm. The proposed method was first presented in our earlier work [49], where the technique was applied to the segregation and aggregation problem for only 2D point mass swarms for up to 50 robots. This paper extends our prior work by: (i) improving the learned controller by including more training features that include robot velocity and a distance parameter, (ii) scaling the 2D point mass simulation experiments to 100 robots and 10 groups, (iii) applying the problem in 3D for point mass systems, (iv) extending application from point mass dynamics to nonholonomic systems, (v) extending the prior work to 3D Crazyflie quadrotor systems in Pybullet simulation environment, and (vi) Implementing the controller on Turtlebot3 Burger both in simulations (Gazebo/ROS2) and in real-world experiments. To the best of our knowledge, this is the first research that employs GNNs with multihop communication, trained through imitation learning, to address segregation and aggregation tasks for both 2D and 3D holonomic point mass robots and actual robots systems—nonholonomic autonomous ground robots and autonomous aerial robots.

The primary contributions of this paper are:

We combine Aggregation Graph Neural Network for time-varying systems, trained using imitation learning for segregating and aggregating heterogeneous robotic swarms in 2D and 3D Euclidean space. This work achieves comparable performance to that of the expert (centralized) controller performance by aggregating information from remote neighbors and outperforms a local controller.

We illustrated the scalability and generalization of the model by training it on a small teams and groups for segregation and testing its performance by progressively increasing the team size and groups, reaching up to 100 robots and 10 groups. The proposed model can also generalize to the aggregation problem without further training.

A transfer framework that transitioned from point mass systems to real robot systems within physics-based simulations was developed. The decentralized controller was implemented on mobile robots in Gazebo/ROS2 and flying robot swarms in Pybullet.

Zero-shot transfer of the learned policies to real-world systems—Turtlebot3 robot swarms showing the efficacy of the policies.

The rest of the paper is structured as follows: Sec. 4 presents the segregation and aggregation problem, along with classical centralized control for point mass systems. Section 5 describes the optimal decentralized control paradigm and the proposed controller using GNN and Imitation Learning. Section 6 gives details of the actual mobile and flying robots kinematics used for the experiments. Experimental results and discussion are detailed in Sec. 7 for the holonomic point mass systems and Sec. 8 for swarms of mobile(Turtlebot3) and flying robots(Crazyflie 2), including the hardware experiments with the Turtlebots. Section 9 presents the conclusions and future directions.

## 4 Problem Formulation

This section describes the equations that govern robot swarms' segregative and aggregative behavior shown in Fig. 1. In addition, we defined the expert controller used in generating the data for imitation learning in both 2D Euclidean space and 3D Euclidean space.

### 4.1 Point Mass Kinematics Model.

*N*fully actuated holonomic agents $V$ = {1, $\u2026,\u2009N}$ navigating within a 2D or 3D Euclidean environment. Each agent is defined by its position $ri(t)\u2282\mathbb{R}2$ or $\u2282\mathbb{R}3$, its velocity $vi(t)\u2282\mathbb{R}2$ or $\u2282\mathbb{R}3$ and its acceleration $ui(t)\u2282\mathbb{R}2$ or $\u2282\mathbb{R}3$ for time steps $t=0,1,2,3,,\u2009\u2026$ where the discrete time-index

*t*denotes the sequential time instances occurring at the sampling time

*T*. It is assumed that the acceleration remains constant during the time interval $[tTs,(t+1)Ts]$. The system's dynamics is expressed as follows:

_{s}for *i* = {1, $\u2026,\u2009N}$.

### 4.2 Decentralized Segregation and Aggregation.

In the segregation and aggregation problem, we assign each robot to a group *N _{k}*, $k={$1, $\u2026,\u2009W}$ where

*W*is the number of the groups. Therefore, the heterogeneous robot swarm is composed of robots within the set of these partitions ${N1,N2,$…$,NW},\u2009NW\u2208N$. Robots belonging to the identical group are classified as the same type. The neighbors of each robot can be either from robots of the same type or of a different type.

For segregation, our objective is to develop a controller capable of sorting diverse types of robots into *M* distinct groups. This aims to create groups that exclusively consist of agents of the same type. The team is considered segregated when the average distance between agents of the same type is less than that between agents of different types, as defined by Kumar et al. [24]. The controller that solves this problem exhibits the *segregative behavior*. On the other hand, when the average distance between agents of the same type is greater than between agents of the different types, the team is said to aggregate. This is referred to as the aggregation problem. The aim is to learn a controller that ensures that the swarm forms a homogeneous mixture of robots of different types while flocking together.

### 4.3 Classical Centralized Control.

*Segregative*and

*Aggregative*behavior can be achieved using a centralized controller defined in Ref. [31]

where $ui*(t)$ is agent *i* control input. $\u2207riUij(|ri\u2212rj|)$ represents the gradient of the artificial potential function governing the interaction between agents *i* and *j*. This gradient is taken with respect to the position vector $ri$ and is evaluated at the positions $ri(t)$ and $rj(t)$ at time *t*. The second term in Eq. (3) accounts for damping, encouraging robots to synchronize their velocities with one another, as described by Santos et al. [31].

*α*represents the scalar controller gain,

*d*is the segregation or aggregation parameter and $rij$ denotes $ri\u2212rj$. Segregation or aggregation can be achieved based on the local groups

_{ij}*N*.

_{k}*d*and

_{AA}*d*controls the interactions between robots of the same and different types, respectively. Hence, the swarm demonstrates segregative behavior when

_{AB}The evaluation metrics are described in the Appendix.

## 5 Decentralized Control

Equation (3) represents a centralized/global controller that requires access to the positions and velocities of all agents. However, acquiring such global information is often challenging in practical situations. Agents usually have access only to local information. This limitation is primarily attributed to the agents' sensing range. Each agent can only communicate with other agents within its sensing range or communication radius *R*; that is, $|rij|\u2264R$. The aim of this paper is to design a decentralized controller that relies solely on local information. Figure 2 describes the flow of the decentralized controller.

### 5.1 Communication Graph.

Here, we describe the agents' communication network. At time *t*, agents *i* and *j* can establish communication if $\Vert rij\Vert \u2264R$, where *R* denotes the communication radius of the agents. As a result, we construct a communication network graph $G={V,E(t)}$, where $V$ represents the set of agents, and $E(t)$ is the set of edges, defined such that $(i,j)\u2208E(t)$ if and only if $\Vert rij\Vert \u2264R$. Consequently, *j* can transmit data to *i* at time *t*, making *j* a neighbor of *i*. We denote $Ni(t)={j\u2208V:(j,i)\u2208E(t)}$ as the set of all agents that agent *i* can communicate with at time *t*.

### 5.2 Local Controller.

The local controller in Eq. (9) involves a summation over only the neighbors of agent *i*, i.e., all agents $j\u2208Ni$. This is different from the centralized controller that sums over all the agents in the team. While the centralized and local controllers have identical stationary points, Eq. (9) typically requires more time for segregation given that the graph remains connected [42]. The following sections introduce a novel learning-based approach to the segregation and aggregation problem. This method relies on an imitation learning algorithm known as Dataset Aggregation (DAgger) and utilizes an Aggregation Graph Neural Network to parameterize the agents' policy. This approach imitates the centralized controller in Eq. (3). We will demonstrate that the GNN-based controller performs similarly to the centralized controller and surpasses the performance of the local controller in Eq. (9).

### 5.3 Delayed Aggregation Graph Neural Network.

In the context of graph theory and signal processing on graphs, a graph signal, denoted by $\xd7:V\u2192\mathbb{R}$, is a function that assigns a scalar value to each node in a graph. Each node is represented by a feature vector $xi(t)\u2208\mathbb{R}F$, where $i\u2208V$ = {1, $\u2026,\u2009N}$ and *N* is the number of nodes in the graph. In our application, each agent is a node. Hence, the set of all agent states is denoted by $X(t)\u2208\mathbb{R}N\xd7F$, where each agent is described by an *F*-dimensional feature vector $xi(t)\u2208\mathbb{R}F$, i.e., the rows of $X(t)$.

*i*,

*f*)-th entry is expressed as

Equation (10) indicates that $S(t)X(t)$ functions as a distributed and local operator. This is evident from the fact that each node undergoes updates based solely on local interactions with its neighboring nodes. The reliance on local interactions is a fundamental feature in the development of controllers for decentralized systems, and it is frequently harnessed in graph-based approaches for information processing and control.

*t*, which is the exchange clock. These exchanges introduces a unit time delay, thus creating a delayed information structure [50].

where $Nik(t)$ is the set of nodes *k*-hops away from node *i* and it is defined recursively as $Nik(t)={j\u2032\u2208Njk\u22121(t\u22121),j\u2208Ni(t)}$ with $Ni1(t)=Ni(t)$ and $Ni0={i}$. We denote $X(t)={Xi(t)}i=1,\u2026,N$ as the set of delayed information history $Xi(t)$ of all nodes. This structure shows that the information available to each node at a given time *t* is past and delayed information from neighbors that are k-hops away.

The challenge in decentralized control learning is to devise a control policy that accommodates the delayed local information structure as outlined in Eq. (11). Consequently, a decentralized controller must effectively handle historical information. It is well-established in the literature that achieving optimal decentralized control is very challenging, even in scenarios involving linear quadratic regulators, which have relatively straightforward centralized solutions [39,50].

In contrast to centralized controllers, the intricacies associated with finding effective decentralized controllers underscore the importance of leveraging learning techniques. This paper hinges on the utilization of graph convolutional neural networks (GCNNs) in conjunction with imitation learning. The choice of GCNNs is justified by their alignment with the local information structure inherent in decentralized control. Imitation Learning is chosen for its relative simplicity in developing decentralized controllers by replicating the behavior observed using a centralized controller.

### 5.4 Graph Convolutional Neural Network.

*N*×

*F*block $Zk(t)$ in the sequence is the delayed state information aggregated from

*k*-hop neighbors. We represent $zi(t)\u2208\mathbb{R}FK$ row

*i*of matrix $Z(t)$ as the state at node

*i*, obtained locally through $(K\u22121)$ exchanges with neighbors. An essential characteristic of the aggregation sequence is its regular temporal structure, which consists of nested aggregation neighborhoods. Subsequently, we can apply a standard convolutional neural network (CNN) with a depth of

*L*to $zi(t)$, effectively mapping the local information to an action. Thus each layer $l=1,\u2026,L$ is shown below

where $\sigma (l)$ is an activation function and $\Theta (l)$ comprises a set of support filters with learnable parameters. The output of the final layer corresponds to the decentralized control action at node *i*, at time *t*.

### 5.5 Imitation Learning Framework.

*i*

It is important to emphasize that $\Theta $ is uniform across all nodes. $\Theta $ is not node or time dependent. As a result, the learned policy is independent of the size and structure of the network which facilitates modularity, scalability to any number of agents and transfer learning.

### 5.6 State Representation.

*R*. This is needed to stabilize the training and ensure the robot segregates or aggregates around a particular location.

## 6 Actual Robot Kinematics

The holonomic ideal point mass model aids in testing different scenarios and parameters for the task and provides a benchmark we can build on. However, we are interested in achieving the tasks in real-world robotic systems under the constraints of delays in observations and slower control rates. In this section, we designed a framework to transfer the GNN-based policies for the point mass model to physics-based nonholonomic mobile robots—Turtlebot3 burger platform in ROS2 both in simulation and real experiments and a swarm of quadrotors in a physics-based simulator—Pybullet using the Crazyflie 2 model without further training.

### 6.1 Decentralized Control for Nonholonomic Robot Swarms.

The point mass system is holonomic. Nonetheless, the GNN controller can be used for a nonholonomic system such as the 2D differential drive model. We follow the feedback linearization approach designed for expressing double integrator dynamics in differential drive robots described in Ref. [52].

#### 6.1.1 Kinematic Modeling and Feedback Linearization Approach.

*N*differential drive mobile agents navigating within a 2D Euclidean space. For brevity, we omit the robots and groups indexes. Each agent dynamics is defined thus

*x*,

*y*position, heading, and heading rate of the robot, respectively.

*v*is the linear (forward) speed of the center of the robot. The point mass actuation are accelerations, therefore we differentiate Eq. (20), resulting into

where $a=\u2009v\u02d9$ is the linear acceleration of the center of the robot.

*d*from the center of the robot as seen in Fig. 3.

_{a}*u*is the control input for the differential drive model, this comes from the point mass acceleration with the defined parameter

_{v}*d*.

_{a}### 6.2 Decentralized Control for Quadrotors.

In this section, we present the transfer of the point mass-trained GNN to a swarm of quadrotors in Pybullet simulation. Figure 4 shows the framework for transferring the trained GNN to control a swarm of quadrotors. The position and velocity of the swarm are passed from the Pybullet environment into the point mass gym environment, where the local features are calculated and sent to the GNN controller to calculate the actions and predict the next state. The current and next state of the quadrotor swarms is then passed into the gym's PID controller, which drives the swarm to the desired state. Then, the current state is passed back into the point mass environment, and this loop continues until the task is achieved. The dynamics and PID control equations are described in Refs. [53–55].

## 7 Point Mass Results and Discussion

We ran a series of experiments to study the performance and scalability of our approach. The experimental results are presented and evaluated using the intersection area of convex hulls metric $M(r,N)$ in Eq. (A1), the number of clusters formed for segregation, and average distances for same and different groups for aggregation (see Appendix for details). We illustrate the scalability of the controller by increasing the swarm size and discuss the performance comparison between traditional controllers (centralized and local) and the GNN controller.

### 7.1 Experiments.

For the 2D and 3D segregation task, the GNN controller was trained on 21 robots and 3 groups. We tested the learned controller on {(10,2); (20,5); (21,7); (30,5); (50,5); (100,5)} (written in format of {(Robots, Groups)} for 2D and in 3D {(20,5); (21,7); (30,5)}{Robots, Groups} for 40 experiments with random initial locations. Without further training, we transferred the segregative GNN controller to a different swarming behavior—Aggregation. All the velocities were set to zeros at the initial state, and positions were uniformly distributed independently of the robot's group. For training, we set *d _{AA}* and

*d*to 3 and 5, respectively. The communication radius,

_{AB}*R*and exchanges,

*K*were set to 6 and 3 for the state vector. However, we ran the test scenarios using $dAA=5$,

*d*= 10,

_{AB}*R*=

*12,*

*K*=

*3. For all the experiments, the goal was randomized between $[\u22121,1]$ and maximum acceleration was set to 1. We compared the performance of all the controllers—centralized described in Sec. 4.3, local described in Sec. 5.2, and learned described in Sec. 5.3 from the same initial configuration. We collected 400 trajectories, each of 500 steps using the centralized control with*

*α*= 3 in Sec. 4.3 for training. The GNN network is structured as a fully connected feed-forward neural network with a single hidden layer comprising 64 neurons and a Tanh activation function. Implemented within the PyTorch framework, we trained the network using an Adam optimizer, an mean squared error (MSE) cost function, and a learning rate of $5\xd710\u22125$. To address challenges in behavior cloning for imitation, we implemented the Dataset Aggregation (DAgger) algorithm [56]. The algorithm used a probability $1\u2212\beta $ of selecting the GNN policy, while the probability

*β*of following the expert policy was set to decay by a factor of 0.993 to a minimum of 0.5 after each trajectory.

### 7.2 Discussion.

**Successful Imitation** We studied the effect of the nature of information sharing in the swarms of robots. In Fig. 5, a specific instance of the temporal progression of the segregation task is depicted, using three controllers for 100 robots distributed across 5 groups in 2D. Additionally, Fig. 6 shows the same for 30 robots and 5 groups in 3D. In both scenarios, it is evident that the local controller struggles to segregate the robots. This challenge is particularly notable when the robots begin in a configuration where not all are interconnected, but each has at least one connection. In such cases, distinct clusters form, each containing robots of the same type. This outcome is expected, considering that the robots are not attracted to their respective groups due to a lack of awareness of their existence. In contrast, the learned controller demonstrates the capability to effectively segregate the robots into groups of varying sizes. These results support the rationale behind utilizing a graph neural network framework, as it enables the dissemination of information throughout the network, ultimately facilitating the successful completion of the task.

**Comparison with Classical Approaches** We compared the GNN controller with the traditional controllers. Figures 7–10 show all the controllers' mean and confidence interval of the segregation tasks metrics for 40 trials for the 2D and 3D experiments. As time progresses in all scenarios with the learned controller, the mean and standard deviation of the area approach zero, while the number of clusters converges to the actual number of groups. The learned controller is able to fully segregate the system. Likewise, experiments with a smaller number of robot groups exhibit faster convergence compared to those with a larger number of groups. However, the local controller becomes trapped in a local minimum and is not able to achieve segregation in the system.

We show the comparative results between the centralized controller in Fig. 8 and our controller in Fig. 10, and in all the test cases, both exhibit comparable performance, indicating that our controller is efficient. Also, it can be seen from the experiments that it is easier and faster for the 3D case to segregate than its 2D counterparts. This is because the robots have more degree-of-freedom to maneuver in new directions. The GNN utilizes local information and still achieves performance comparable to the centralized controller, which relies on global information. This highlights its capability for learning and generalization.

**Scalability Study** We examined a scenario where the number of robots was fixed at 50, and we varied the number of groups. Figure 8 shows that as the number of groups increases, the agents take a longer time to segregate. More specifically, when the number of groups was 10, it took more than 3500 steps to segregate, while for groups 2 and 5, it took less than that. All the experiments show that the policies, though trained on 21 robots, can scale up to 100 robots in 2D and handle groups it has not seen before. Similarly, we see the same behavior in the 3D case.

**Generalization to Aggregation Task** Another interesting aspect of our GNN controller is that it can generate other behaviors besides segregation. We extended the trained GNN for segregation to the aggregation tasks by changing the *d _{AA}* = 10 and

*d*= 5. Without any further training, our segregation controller was able to aggregate the swarm with just the change of parameters. This particularly proves the theory of intercellular adhesivity. A particular instance of the time series progression of the aggregative behavior of the system for the centralized and GNN controller for the 100 robots and 5 groups for the 2D and the 30 robots and 5 groups for the 3D case in Figs. 11 and 12, respectively.

_{AB}Figure 13 shows the plot of the average distances between agents of the same type and agents of different types. Although the trajectory evolution does not look the same for the centralized and GNN controllers, the average distances plot shows that the swarm is aggregated. For example, in the case of 100 robots and 5 groups, the average distance $ravgAA$ and $ravgAB$ (see Appendix A2 for the definition of these quantities) for the 2D case were found to be 4.29 and 4.13, respectively. For the 3D case in the example of 30 robots and 5 groups, the average distance $ravgAA$ and $ravgAB$ were found to be 5.81 and 5.11, respectively. Both cases clearly show that the swarm was aggregated based on the condition in Sec. A2. The difference in the final trajectory shape is dependent on the communication radius.

**Effect of Communication Radius** We analyze the effect of the communication radius, *R*, on the segregation and aggregation behavior by varying the sensing radius of each robot. We chose the following values: *R* = ${4,6,8,10,12}$*m*. We fixed the number of robots and groups to 20 robots and 5 groups and kept other parameters constant with *d _{AA}* = 5 and

*d*= 10. From Fig. 14, even in the case of limited communication, the GNN learns efficient control strategies that enables the swarm to segregate up to the communication radius of $R=8m$. This shows the GNN controller's ability to transmit information to the network even in limited communication. However, as the communication radius reduces to $R=6m$ or below, the network struggles to fully segregate (shown by larger number of clusters than the number of groups and larger mean intersection area). This is because that the system is expected to segregate at a separation distance $dAA=5m$ between agents of same type. Hence, if the communication radius is 6

_{AB}*m*or less, then there is a good chance of having fewer, or even no, agents in the communication radius. As a result, from the figure, it can be seen that the system started from about 10 clusters and converged only until about 7 clusters for $R=6m$. Indeed, the ability to segregate completely depends on the parameters such as number of hops,

*R*,

*d*and

_{AA}*d*.

_{AB}As seen in Table 1 for the aggregation behavior, we started at a segregated state with an $ravgAA=2.17$ which is lower than $ravgAB=9.44$. The goal for aggregation is to ensure that $ravgAA$ is greater than $ravgAB$. With limited communication, our GNN controller can complete the task showing its performance in limited communication. It may be noted that the system achieves the aggregation successfully even with low communication radii (as compared to segregation behavior which fails when radius becomes lower than 6 m). This is because the robots move closer to each other first thereby reducing their separation distance and increasing their level of communication. It is also noted that at communication radius of $R=8m$ or higher, the system converges to the same average distances.

Effect of the communication radius on the 2D GNN aggregation controller | ||
---|---|---|

$ravgAA$ (initial) | $ravgAB$ (initial) | |

Values | 2.17 | 9.44 |

Communication radius, R (m) | $ravgAA$ (final) | $ravgAB$ (final) |

4 | 4.69 | 4.05 |

6 | 4.67 | 3.93 |

8 | 4.33 | 3.63 |

10 | 4.33 | 3.63 |

12 | 4.33 | 3.63 |

Effect of the communication radius on the 2D GNN aggregation controller | ||
---|---|---|

$ravgAA$ (initial) | $ravgAB$ (initial) | |

Values | 2.17 | 9.44 |

Communication radius, R (m) | $ravgAA$ (final) | $ravgAB$ (final) |

4 | 4.69 | 4.05 |

6 | 4.67 | 3.93 |

8 | 4.33 | 3.63 |

10 | 4.33 | 3.63 |

12 | 4.33 | 3.63 |

## 8 Actual Robots Kinematics Results

This section presents the simulation and hardware experiments for the Turtlebot3 and Crazyflie2 robots. Table 2 lists the parameters used for the experiments.

### 8.1 Gazebo Simulations With ROS2.

To evaluate the feasibility of transferring the point mass GNN to a physics-based differential drive robot, we tested the trained policy with Turtlebot3 burger robots—10, 20, 50 robots and 2, 5, 5 groups, respectively. As seen in Fig. 15, we designed an OpenAI gym environment—$gym_gazeboswarm$ which carries out all the communication with Gazebo using ROS2 and interfaces with the GNN-policy to control the swarm. We created a multi-Turtlebot3 Gazebo environment using ROS2. Each robot has its node. The gym environment receives the position of each robot, and the local features are calculated and sent to the GNN to obtain acceleration commands which are then used to obtain the actions—$v,\omega $ of each robot using the method described in Sec. 5.1.1. These actions are published on each robot $cmd_vel$ topic to drive the swarm.

### 8.2 Pybullet Physics-Based Simulation Experiments.

We used an OpenAI Gym environment based on PyBullet^{2} introduced by Panerati et al. [57] to evaluate the performance of our GNN controller in a realistic 3D environment. The environment is parallelizable. It can be run with a GUI or in a headless mode, with or without a GPU. We chose this simulator because it supported realistic aerodynamic effects such as drag, ground effect, downwash, and control algorithms suite. As a result, it gives us a testbed close to the real-world system to test our algorithm. The dynamics of the quadrotors are modeled based on the Crazyflie 2 quadrotor^{3}.

We aim to use the GNN controller to control the quadrotor swarms in the simulator. The trained GNN is robust to any value of *d _{AA}* and

*d*. Hence, we adapt these parameters to suit the Pybullet simulator. We initialized the swarm with varying yaw values, and roll and pitch were set to 0. We consider two types of experiments—fixed height simulation using the trained 2D point mass GNN and varying height simulation using the trained 3D point mass GNN.

_{AB}### 8.3 Fixed Height Results.

The fixed height simulations come from the fact that the 2D point mass model does not have a z-component. Hence, we set z to 1 m and ran the 2D GNN to predict the next states in x- and y- directions.

### 8.4 Varying Height Results.

Here, we implemented the 3D-trained GNN to the Crazyflie simulation. Using the predicted next state from the point mass 3D gym environment for segregation and aggregation task, we were able to achieve the same results as in the point mass model case for a swarm of Crazyflie quadrotors.

### 8.5 Real Robots.

We demonstrated a zero-shot transfer of the policies learned in the Gazebo simulation to the real Turtlebot3 burger robots—8 robots and 2 groups; 9 robots and 3 groups *d _{AA}* = 2 and

*d*= 4. We use a Qualisys Motion Capture System with 18 cameras that provide position updates at 100 Hz. With a multi-agent extended Kalman filter implemented in ROS to reduce noise, we obtained the position and velocity of each agent at 100 Hz. These updates are then used in the GNN controller environment to calculate the control for each robot, as seen in Fig. 16.

_{AB}### 8.6 Discussion.

We reported the mean intersection area and number of clusters metrics for the Turtlebot3 and Crazyflie 2 (fixed and varying heights) experiments.

**• Successful Imitation and Transfer**: Figures 17, 18, and 19 shows a particular instance of the Turtlebot3 and Crazyflie 2 time series evolution of the swarm trajectory for the segregation and aggregation tasks , respectively. For the Turtlebot3 burger robots, we also report the Mean intersection area, number of clusters, and average distances between agents in the same and different groups results in Fig. 20. For the Crazyflie robots, we reported the mean intersection area and the number of cluster metrics for the fixed and varying height segregation experiments in Figs. 21 and 22, respectively.

The results show our GNN controller can successfully transfer to a physics-based quadrotor swarm with different number of robots and groups, communication radius, *d _{AA}*, and

*d*.

_{AB}**• Zero-Shot Transfer to Hardware**: Figures 23 and 24 show the initial and final configuration of the robots trajectory and the metrics for segregation and aggregation tasks. Even with the presence of noise and uneven terrains, the robots were still able to perform the tasks with the GNN controller in a decentralized fashion. This shows the efficacy of our controller to transfer successfully into real-world applications.

## 9 Conclusions

Controlling large-scale dynamical systems in distributed settings poses a significant challenge in the search for optimal and efficient decentralized controllers. This paper uses learned heuristics to address this challenge for agents in 2D and 3D Euclidean space. Our approach involves the design of these controllers parameterized by Aggregation Graph Neural Networks, incorporating information from remote neighbors within an imitation learning framework. These controllers learn to imitate the behavior of an efficient artificial differential potential-based centralized controller, utilizing only local information to make decisions.

Our results demonstrate that large-scale point mass systems, mobile robots, and quadrotors can perform segregation tasks across initial configurations where the swarm is not fully connected, with varied limited communication radius and separation distances. Our policies trained with 21 robots using a point mass model generalizes to larger swarms up to 100 robots and to the aggregation task without further training. Through varied experiments, we illustrated the controller's capability to be deployed in larger swarms.

Furthermore, we compared our controller with the centralized controller and a local controller that only utilized information from its immediate neighbors. The results showed that the system did not converge to a segregated state; instead, multiple clusters of robots of the same type persisted. Thus, we resolved the issue by implementing our controller, demonstrating its superior efficacy over the local controller and comparable performance to the centralized controller. This affirms the significance of multihop information in enhancing overall performance. Therefore, the GNN-based controller is more suitable for distributed systems than the centralized controller, given its scalability, which is vital in practical scenarios where only local information is accessible.

In addition, we showed that the GNN-based policies trained for the holonomic point mass model can be transferred to physics-based robotics swarms in 2D with nonholonomic constraints and in 3D—quadrotors. We present the results demonstrating successful swarm coordination and control in simulation (Gazebo/ROS2 simulation for Turtlebot3 robots and Pybullet simulation for Crazyflie quadrotors) and demonstrate the zero-shot transfer of the GNN policies to real Turtlebot3 robots. Potential future works to this approach will be to implement on the Crazyflie hardware platform, environments with static and dynamic obstacles [58], explore other methods like deep reinforcement learning, and extend it to other swarm behaviors.

## Data Availability Statement

The datasets generated and supporting the findings of this article are obtainable from the corresponding author upon reasonable request.

### Appendix: Evaluation Metrics

##### A1 Segregation Tasks.

In order to evaluate segregation in the swarm, we employed the metrics proposed in Ref. [31]—the pairwise intersection area of the swarm positions' convex hulls $M(r,N)$ and number of clusters metric. Segregation happens when $M(r,N)$ approaches zero, signifying the absence of overlap among clusters.

where CH(Q) and A(Q) represent the convex hull and the area of the set Q, respectively.

##### A2 Aggregation Tasks.

We employed the average distances between agents of the same types ($ravgAA$) and average distances between agents of different types ($ravgAB$) to evaluate aggregation. The system is said to aggregate when the $ravgAA$ is greater than $ravgAB$. This is an intergroup (same groups) and intragroup (different groups) distance.

Given that *N _{A}* represents the number of unique pairs of robots in the same groups and

*N*represents the number of unique pairs of robots in different groups.

_{B}## Footnotes

## References

**21**(4), pp.