Abstract

Self-organizing systems (SOS) can perform complex tasks in unforeseen situations with adaptability. Previous work has introduced field-based approaches and rule-based social structuring for individual agents to not only comprehend the task situations but also take advantage of the social rule-based agent relations to accomplish their tasks without a centralized controller. Although the task fields and social rules can be predefined for relatively simple task situations, when the task complexity increases and the task environment changes, having a priori knowledge about these fields and the rules may not be feasible. In this paper, a multiagent reinforcement learning (RL) based model is proposed as a design approach to solving the rule generation problem with complex SOS tasks. A deep multiagent reinforcement learning algorithm was devised as a mechanism to train SOS agents for knowledge acquisition of the task field and social rules. Learning stability, functional differentiation, and robustness properties of this learning approach were investigated with respect to the changing team sizes and task variations. Through computer simulation studies of a box-pushing problem, the results have shown that there is an optimal range of the number of agents that achieves good learning stability; agents in a team learn to differentiate from other agents with changing team sizes and box dimensions; the robustness of the learned knowledge shows to be stronger to the external noises than with changing task constraints.

References

1.
Reynolds
,
C. W.
,
1987
, “
Flocks, Herds and Schools: A Distributed Behavioral Model
,”
Proceedings of the 14th Annual Conference on Computer Graphics and Interactive Techniques
,
Anaheim, CA
,
July 27–31
, pp.
25
34
.
2.
Ashby
,
W. R.
,
1991
, “Facets of systems science,”
Facets of Systems Science
,
G. J.
Klir
, ed.,
Springer
,
Boston, MA
, pp.
405
417
.
3.
Chiang
,
W.
, and
Jin
,
Y.
,
2012
, “
Design of Cellular Self-Organizing Systems
,”
ASME International Design Engineering Technical Conferences and Computers and Information in Engineering Conference
,
Chicago, IL
,
Aug. 12–15
, American Society of Mechanical Engineers, Vol. 45028, pp.
511
521
.
4.
Humann
,
J.
,
Khani
,
N.
, and
Jin
,
Y.
,
2014
, “
Evolutionary Computational Synthesis of Self-Organizing Systems
,”
AI EDAM
,
28
(
3
), pp.
259
275
.
5.
Khani
,
N.
,
Humann
,
J.
, and
Jin
,
Y.
,
2016
, “
Effect of Social Structuring in Self-Organizing Systems
,”
ASME J. Mech. Des.
,
138
(
4
), p.
041101
.
6.
Khani
,
N.
, and
Jin
,
Y.
,
2015
, “Dynamic Structuring in Cellular Self-Organizing Systems,”
Design Computing and Cognition’14
,
Springer
,
Cham
, pp.
3
20
.
7.
Ji
,
H.
, and
Jin
,
Y.
,
2018
, “
Modeling Trust in Self-Organizing Systems With Heterogeneity
,”
ASME 2018 International Design Engineering Technical Conferences and Computers and Information in Engineering Conferences
,
Quebec City, Canada
,
Aug. 26–29
.
8.
Chen
,
C.
, and
Jin
,
Y.
,
2011
, “
A Behavior Based Approach to Cellular Self-Organizing Systems Design
,”
ASME International Design Engineering Technical Conferences and Computers and Information in Engineering Conference
,
Washington, DC
,
Aug. 28–31
, Vol. 54860, pp.
95
107
.
9.
Sutton
,
R. S.
, and
Barto
,
A. G.
,
2018
,
Reinforcement Learning: An Introduction
,
MIT Press
,
Cambridge, MA
.
10.
Rashid
,
T.
,
Samvelyan
,
M.
,
Schroeder
,
C.
,
Farquhar
,
G.
,
Foerster
,
J.
, and
Whiteson
,
S.
,
2018
, “
Qmix: Monotonic Value Function Factorisation for Deep Multiagent Reinforcement Learning
,”
International Conference on Machine Learning
,
Stockholm, Sweden
,
July 10–15
, PMLR, pp.
4295
4304
.
11.
Bar-Yam
,
Y.
,
2002
,
General Features of Complex Systems. Encyclopedia of Life Support Systems (EOLSS)
, Vol. 1,
UNESCO, EOLSS Publishers
,
Oxford, UK
.
12.
Arroyo
,
M.
,
Huisman
,
N.
, and
Jensen
,
D. C.
,
2018
, “
Exploring Natural Strategies for Bio-Inspired Fault Adaptive Systems Design
,”
ASME J. Mech. Des.
,
140
(
9
), p.
091101
.
13.
Königseder
,
C.
, and
Shea
,
K.
,
2016
, “
Comparing Strategies for Topologic and Parametric Rule Application in Automated Computational Design Synthesis
,”
ASME J. Mech. Des.
,
138
(
1
), p.
011102
.
14.
Meluso
,
J.
, and
Austin-Breneman
,
J.
,
2018
, “
Gaming the System: An Agent-Based Model of Estimation Strategies and Their Effects on System Performance
,”
ASME J. Mech. Des.
,
140
(
12
), p.
121101
.
15.
McComb
,
C.
,
Cagan
,
J.
, and
Kotovsky
,
K.
,
2017
, “
Optimizing Design Teams Based on Problem Properties: Computational Team Simulations and an Applied Empirical Test
,”
ASME J. Mech. Des.
,
139
(
4
), p. 041101.
16.
Min
,
G.
,
Suh
,
E. S.
, and
Hölttä-Otto
,
K.
,
2016
, “
System Architecture, Level of Decomposition, and Structural Complexity: Analysis and Observations
,”
ASME J. Mech. Des.
,
138
(
2
), p.
021102
.
17.
Ferguson
,
S. M.
, and
Lewis
,
K.
,
2006
, “
Effective Development of Reconfigurable Systems Using Linear State-Feedback Control
,”
AIAA J.
,
44
(
4
), pp.
868
878
.
18.
Martin
,
M. V.
, and
Ishii
,
K.
,
1997
, “
Design for Variety: Development of Complexity Indices and Design Charts
,”
ASME Design Engineering Technical Conferences, DFM-4359
,
Sacramento, CA
,
Sept. 14–17
.
19.
Werfel
,
J.
,
2012
, “Collective Construction With Robot Swarms,”
Morphogenetic Engineering
,
Springer
,
Berlin, Heidelberg
, pp.
115
140
.
20.
Beckers
,
R.
,
Holland
,
O. E.
, and
Deneubourg
,
J. L.
,
2000
, “Fom Local Actions to Global Tasks: Stigmergy and Collective Robotics,”
Prerational Intelligence: Adaptive Behavior and Intelligent Systems Without Symbols and Logic, Volume 1, Volume 2 Prerational Intelligence: Interdisciplinary Perspectives on the Behavior of Natural and Artificial Systems, Volume 3
,
Springer
,
Dordrecht
, pp.
1008
1022
.
21.
Dasgupta
,
P.
,
2008
, “
A Multiagent Swarming System for Distributed Automatic Target Recognition Using Unmanned Aerial Vehicles
,”
IEEE Trans. Syst. Man Cybern. Part A Syst. Humans
,
38
(
3
), pp.
549
563
.
22.
Ruini
,
F.
, and
Cangelosi
,
A.
,
2009
, “
Extending the Evolutionary Robotics Approach to Flying Machines: An Application to MAV Teams
,”
Neural Networks
,
22
(
5–6
), pp.
812
821
.
23.
Lamont
,
G. B.
,
Slear
,
J. N.
, and
Melendez
,
K.
,
2007
, “
UAV Swarm Mission Planning and Routing Using Multi-Objective Evolutionary Algorithms
,”
2007 IEEE Symposium on Computational Intelligence in Multi-Criteria Decision-Making
,
Honolulu, HI
,
Apr. 1–5
, IEEE, pp.
10
20
.
24.
Wei
,
Y.
,
Madey
,
G. R.
, and
Blake
,
M. B.
,
2013
, “
Agent-Based Simulation for UAV Swarm Mission Planning and Execution
,”
Proceedings of the Agent-Directed Simulation Symposium
,
Apr.
, pp.
1
8
.
25.
Price
,
I. C.
, and
Lamont
,
G. B.
,
2006
, “
GA Directed Self-Organized Search and Attack UAV Swarms
,”
Proceedings of the 2006 Winter Simulation Conference
,
Monterey, CA
,
Dec. 3–6
, IEEE, pp.
1307
1315
.
26.
Busoniu
,
L.
,
Babuska
,
R.
, and
De Schutter
,
B.
,
2008
, “
A Comprehensive Survey of Multiagent Reinforcement Learning
,”
IEEE Trans. Syst. Man Cybern. Part C Appl. Rev.
,
38
(
2
), pp.
156
172
.
27.
Tampuu
,
A.
,
Matiisen
,
T.
,
Kodelja
,
D.
,
Kuzovkin
,
I.
,
Korjus
,
K.
,
Aru
,
J.
,
Aru
,
J.
, and
Vicente
,
R.
,
2017
, “
Multiagent Cooperation and Competition With Deep Reinforcement Learning
,”
PLoS One
,
12
(
4
), p.
e0172395
.
28.
Foerster
,
J.
,
Farquhar
,
G.
,
Afouras
,
T.
,
Nardelli
,
N.
, and
Whiteson
,
S.
,
2018
, “
Counterfactual Multiagent Policy Gradients
,”
Thirty Second AAAI Conference on Artificial Intelligence
,
New Orleans, LA
,
Feb. 2–7
, Vol. 32, No. 1.
29.
Peng
,
X. B.
,
Berseth
,
G.
,
Yin
,
K.
, and
Van De Panne
,
M.
,
2017
, “
Deeploco: Dynamic Locomotion Skills Using Hierarchical Deep Reinforcement Learning
,”
ACM Trans. Graph.
,
36
(
4
), pp.
1
13
.
30.
Tan
,
M.
,
1993
, “
Multiagent Reinforcement Learning: Independent vs. Cooperative Agents
,”
Tenth International Conference on Machine Learning
,
Amherst, MA
,
July 27–29
, pp.
330
337
.
31.
Watkins
,
C. J. C. H.
,
1989
, “
Learning From Delayed Rewards
,”
Ph.D. dissertation
,
Cambridge University
,
Cambridge, UK
.
32.
Mnih
,
V.
,
Kavukcuoglu
,
K.
,
Silver
,
D.
,
Rusu
,
A. A.
,
Veness
,
J.
,
Bellemare
,
M. G.
,
Graves
,
A.
, et al
,
2015
, “
Human-level Control Through Deep Reinforcement Learning
,”
Nature
,
518
(
7540
), pp.
529
533
.
33.
Foerster
,
J.
,
Nardelli
,
N.
,
Farquhar
,
G.
,
Afouras
,
T.
,
Torr
,
P. H.
,
Kohli
,
P.
, and
Whiteson
,
S.
,
2017
, “
Stabilising Experience Replay for Deep Multiagent Reinforcement Learning
,”
International Conference on Machine Learning
,
Sydney, Australia
,
Aug. 6–11
, PMLR, pp.
1146
1155
.
34.
Hausknecht
,
M.
, and
Stone
,
P.
,
2015
, “
Deep Recurrent Q-Learning for Partially Observable MDPs
,”
arXiv preprint arXiv:1507.06527
.https://arxiv.org/abs/1507.06527
35.
Hochreiter
,
S.
, and
Schmidhuber
,
J.
,
1997
, “
Long Short-Term Memory
,”
Neural Comput.
,
9
(
8
), pp.
1735
1780
.
36.
Chung
,
J.
,
Gulcehre
,
C.
,
Cho
,
K.
, and
Bengio
,
Y.
,
2014
, “
Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling
,”
arXiv preprint arXiv:1412.3555
. https://arxiv.org/abs/1412.3555
37.
Lowe
,
R.
,
Wu
,
Y.
,
Tamar
,
A.
,
Harb
,
J.
,
Abbeel
,
P.
, and
Mordatch
,
I.
,
2017
, “
Multiagent Actor-Critic for Mixed Cooperative-Competitive Environments
,”
arXiv preprint arXiv:1706.02275
.
38.
Brown
,
N.
, and
Sandholm
,
T.
,
2019
, “
Superhuman AI for Multiplayer Poker
,”
Science
,
365
(
6456
), pp.
885
890
.
39.
Baker
,
B.
,
Kanitscheider
,
I.
,
Markov
,
T.
,
Wu
,
Y.
,
Powell
,
G.
,
McGrew
,
B.
, and
Mordatch
,
I.
,
2019
, “
Emergent Tool Use From Multiagent Autocurricula
,”
arXiv preprint arXiv:1909.07528
. https://arxiv.org/abs/1909.07528
40.
Wu
,
S. A.
,
Wang
,
R. E.
,
Evans
,
J. A.
,
Tenenbaum
,
J. B.
,
Parkes
,
D. C.
, and
Kleiman-Weiner
,
M.
,
2021
, “
Too Many Cooks: Bayesian Inference for Coordinating Multi-Agent Collaboration
,”
Top. Cogn. Sci.
,
13
(
2
), pp.
414
432
.
41.
Wang
,
Y.
, and
De Silva
,
C. W.
,
2006
, “
Multi-Robot Box-Pushing: Single-Agent Q-Learning vs. Team Q-Learning
,”
2006 IEEE/RSJ International Conference on Intelligent Robots and Systems
,
Beijing, China
,
Oct. 9–15
, IEEE, pp.
3694
3699
.
42.
Rahimi
,
M.
,
Gibb
,
S.
,
Shen
,
Y.
, and
La
,
H. M.
,
2018
, “
A Comparison of Various Approaches to Reinforcement Learning Algorithms for Multi-Robot Box Pushing
,”
International Conference on Engineering Research and Applications
,
Dec.
,
Springer
,
Cham
, pp.
16
30
.
43.
Mnih
,
V.
,
Kavukcuoglu
,
K.
,
Silver
,
D.
,
Graves
,
A.
,
Antonoglou
,
I.
,
Wierstra
,
D.
, and
Riedmiller
,
M.
,
2013
, “
Playing Atari With Deep Reinforcement Learning
,”
arXiv preprint arXiv:1312.5602
. https://arxiv.org/abs/1312.5602
44.
Wang
,
Z.
,
Schaul
,
T.
,
Hessel
,
M.
,
Hasselt
,
H.
,
Lanctot
,
M.
, and
Freitas
,
N.
,
2016
, “
Dueling Network Architectures for Deep Reinforcement Learning
,”
International Conference on Machine Learning
,
June
,
PMLR
, pp.
1995
2003
.
45.
Foerster
,
J. N.
,
Assael
,
Y. M.
,
de Freitas
,
N.
, and
Whiteson
,
S.
,
2016
, “
Learning to Communicate to Solve Riddles With Deep Distributed Recurrent Q-Networks
,”
arXiv preprint arXiv:1602.02672
. https://arxiv.org/abs/1602.02672
46.
LaValle
,
S. M.
,
2006
,
Planning Algorithms
,
Cambridge University Press
,
New York
.
47.
Jones
,
C.
, and
Mataric
,
M. J.
,
2003
, “
Adaptive Division of Labor in Large-Scale Minimalist Multi-Robot Systems
,”
Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003)(Cat. No. 03CH37453)
,
Las Vegas, NV
,
Oct. 27–31
, IEEE, Vol. 2, pp.
1969
1974
.
48.
Groß
,
R.
,
Bonani
,
M.
,
Mondada
,
F.
, and
Dorigo
,
M.
,
2006
, “
Autonomous Self-Assembly in Swarm-Bots
,”
IEEE Trans. Rob.
,
22
(
6
), pp.
1115
1130
.
49.
Humann
,
J.
,
Khani
,
N.
, and
Jin
,
Y.
,
2016
, “
Adaptability Tradeoffs in the Design of Self-Organizing Systems
,”
International Design Engineering Technical Conferences and Computers and Information in Engineering Conference
,
Charlotte, NC
,
Aug. 21–24
, Vol. 50190, p. V007T06A016.
50.
Liu
,
X.
, and
Jin
,
Y.
,
2018
, “
Design of Transfer Reinforcement Learning Mechanisms for Autonomous Collision Avoidance
,”
International Conference on-Design Computing and Cognition
,
July
,
Springer
,
Cham
, pp.
303
319
.
51.
Ashby
,
W. R.
,
1961
,
An Introduction to Cybernetics
,
Chapman & Hall Ltd.
,
London, UK
.
52.
Makar
,
R.
,
Mahadevan
,
S.
, and
Ghavamzadeh
,
M.
,
2001
, “
Hierarchical Multiagent Reinforcement Learning
,”
Fifth International Conference on Autonomous Agents
,
New York, NY
,
May 28–June 1
, pp.
246
253
.
You do not currently have access to this content.