Abstract

Balancing the exploration and exploitation in reinforcement learning is a commonly dilemma and time-wasting work. In this paper, a novel exploration policy used in Q-Learning, called Memory-greedy policy, is proposed to accelerate learning. By memory storage and playback, the probability of random action selecting can be effectively dealt with or reduced, which hence speeds up learning. The principle of this policy is analyzed by maze scene, and the theoretical convergence is given according to dynamic programming.

References

1.
Rahnama
,
B.
,
Elçi
,
A.
, and
Metani
,
S.
,
2012
, “
An Image Processing Approach to Solve Labyrinth Discovery Robotics Problem
,”
IEEE 36th Annual Computer Software and Applications Conference Workshops
,
Izmir, Turkey
, IEEE, pp.
631
636
.
2.
Jaradat
,
M. A. K.
,
Al-Rousan
,
M.
, and
Quadan
,
L.
,
2011
, “
Reinforcement Based Mobile Robot Navigation in Dynamic Environment
,”
Rob. Comput.-Integr. Manuf.
,
27
(
1
), pp.
135
149
. 10.1016/j.rcim.2010.06.019
3.
Aqel
,
M. O.
,
Issa
,
A.
,
Khdair
,
M.
,
ElHabbash
,
M.
,
AbuBaker
,
M.
, and
Massoud
,
M.
,
2017
, “
Intelligent Maze Solving Robot Based on Image Processing and Graph Theory Algorithms
,”
International Conference on Promising Electronic Technologies (ICPET)
,
Deir El-Balah, Palestine
, IEEE, pp.
48
53
.
4.
Su
,
J.-H.
,
Huang
,
H.-H.
, and
Lee
,
C.-S.
,
2013
, “
Behavior Model Simulations of Micromouse and Its Application in Intelligent Mobile Robot Education
,”
CACS International Automatic Control Conference (CACS)
,
Nantou, Taiwan
, IEEE, pp.
511
515
.
5.
Su
,
J.-H.
,
Cai
,
X.-H.
,
Lee
,
C.-S.
, and
Chen
,
C.-W.
,
2016
, “
The Development of a Half-Size Micromouse and Its Application in Mobile Robot Education
,”
International Conference on Advanced Robotics and Intelligent Systems (ARIS)
,
Taipei, Taiwan
, IEEE, pp.
1
6
.
6.
Tai
,
L.
,
Paolo
,
G.
, and
Liu
,
M.
,
2017
, “
Virtual-to-Real Deep Reinforcement Learning: Continuous Control of Mobile Robots for Mapless Navigation
,”
IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
,
Vancouver, BC, Canada
, IEEE, pp.
31
36
.
7.
Velasco-Villa
,
M.
,
Aranda-Bricaire
,
E.
,
Rodríguez-Cortés
,
H.
, and
González-Sierra
,
J.
,
2012
, “
Trajectory Tracking for Awheeled Mobile Robot Using a Vision Based Positioning System and An Attitude Observer
,”
Eur. J. Control
,
18
(
4
), pp.
348
355
. 10.3166/ejc.18.348-355
8.
Mishra
,
S.
, and
Bande
,
P.
,
2008
, “
Maze Solving Algorithms for Micro Mouse
,”
IEEE International Conference on Signal Image Technology and Internet Based Systems
,
Bali, Indonesia
, IEEE, pp.
86
93
.
9.
Jones
,
P.
,
Ludington
,
B.
,
Reimann
,
J.
, and
Vachtsevanos
,
G.
,
2007
, “
Intelligent Control of Unmanned Aerial Vehicles for Improved Autonomy
,”
Eur. J. Control
,
13
(
2–3
), pp.
320
333
. 10.3166/ejc.13.320-333
10.
Yu
,
X.
,
Wu
,
Y.
, and
Sun
,
X.-M.
,
2019
, “
A Navigation Scheme for a Random Maze Using Reinforcement Learning with Quadrotor Vision
,”
18th European Control Conference (ECC)
,
Naples, Italy
, IEEE, pp.
518
523
.
11.
Lv
,
Z. Y.
,
Wu
,
Y.
, and
Rui
,
W.
,
2020
, “
Nonlinear Motion Control for a Quadrotor Transporting a Cable-suspended Payload
,”
IEEE Transactions on Vehicular Technology
,
69
(
8
), pp.
8192
8206
. 10.1109/TVT.2020.2997733
12.
Jiang
,
Y.
, and
Jiang
,
Z.-P.
,
2012
, “
Computational Adaptive Optimal Control for Continuous-time Linear Systems with Completely Unknown Dynamics
,”
Automatica
,
48
(
10
), pp.
2699
2704
. 10.1016/j.automatica.2012.06.096
13.
Bertsekas
,
D. P
,
1995
,
Dynamic Programming and Optimal Control
, Vol.
1
,
Athena Scientific, Belmont
,
MA
.
14.
Vrabie
,
D.
,
Vamvoudakis
,
K. G.
, and
Lewis
,
F. L.
,
2013
,
Optimal Adaptive Control and Differential Games by Reinforcement Learning Principles
, Vol.
2
,
IET
.
15.
Wang
,
Z.
,
Chen
,
C.
,
Li
,
H.
,
Dong
,
D.
, and
Tarn
,
T. J.
,
2019
, “
Incremental Reinforcement Learning with Prioritized Sweeping for Dynamic Environments
,”
IEEE/ASME Trans. Mechatron.
,
24
(
2
), pp.
621
632
. 10.1109/TMECH.2019.2899365
16.
Tijsma
,
A. D.
,
Drugan
,
M. M.
, and
Wiering
,
M. A.
,
2016
, “
Comparing Exploration Strategies for Q-Learning in Random Stochastic Mazes
,”
IEEE Symposium Series on Computational Intelligence (SSCI)
,
Athens, Greece
, IEEE, pp.
1
8
.
17.
Pham
,
H. X.
,
La
,
H. M.
,
Feil-Seifer
,
D.
, and
Nguyen
,
L. V.
,
2018
,
Autonomous UAV Navigation Using Reinforcement Learning
, arXiv preprint arXiv:1801.05086, 2018.
18.
Wang
,
C.
,
Wang
,
J.
,
Shen
,
Y.
, and
Zhang
,
X.
,
2019
, “
Autonomous Navigation of Uavs in Large-scale Complex Environments: A Deep Reinforcement Learning Approach
,”
IEEE Trans. Vehicular Technol.
,
68
(
3
), pp.
2124
2136
. 10.1109/TVT.2018.2890773
19.
Singh
,
S.
,
Jaakkola
,
T.
,
Littman
,
M. L.
, and
Szepesvári
,
C.
,
2000
, “
Convergence Results for Single-Step on-policy Reinforcement-learning Algorithms
,”
Mach. Learning
,
38
(
3
), pp.
287
308
. 10.1023/A:1007678930559
20.
Dong
,
D.
,
Chen
,
C.
,
Chu
,
J.
, and
Tarn
,
T.-J.
,
2012
, “
Robust Quantum-Inspired Reinforcement Learning for Robot Navigation
,”
IEEE/ASME Trans. Mechatron.
,
17
(
1
), pp.
86
97
. 10.1109/TMECH.2010.2090896
21.
Lei
,
T.
, and
Ming
,
L.
,
2016
, “
A Robot Exploration Strategy Based on Q-Learning Network
,”
IEEE International Conference on Real-time Computing and Robotics (RCAR)
,
Angkor Wat, Cambodia
, IEEE, pp.
57
62
.
22.
Ni
,
Z.
,
He
,
H.
,
Wen
,
J.
, and
Xu
,
X.
,
2013
, “
Goal Representation Heuristic Dynamic Programming on Maze Navigation
,”
IEEE Trans. Neural Netw. Learn. Syst.
,
24
(
12
), pp.
2038
2050
. 10.1109/TNNLS.2013.2271454
23.
Osmanković
,
D.
, and
Konjicija
,
S.
,
2011
, “
Implementation of Q-Learning Algorithm for Solving Maze Problem
,”
Proceedings of the 34th International Convention MIPRO
,
Opatija, Croatia
, IEEE, pp.
1619
1622
.
24.
Kayacan
,
E.
,
Ramon
,
H.
, and
Saeys
,
W.
,
2016
, “
Robust Trajectory Tracking Error Model-based Predictive Control for Unmanned Ground Vehicles
,”
IEEE/ASME Trans. Mechatron.
,
21
(
2
), pp.
806
814
. 10.1109/TMECH.2015.2492984
25.
Xun
,
S.
,
Zhang
,
X.
,
Ouyang
,
T.
,
Li
,
Y.
, and
Raksincharoensak
,
P.
,
2020
, “
Cooperative Comfortable-driving At Signalized Intersections for Connected and Automated Vehicles
,”
IEEE Rob. Autom. Lett.
,
5
(
4
), pp.
6247
6254
. 10.1109/LRA.2020.3014010
26.
Snider
,
J. M.
,
2009
,
Automatic Steering Methods for Autonomous Automobile Path Tracking
.
Robotics Institute
,
Pittsburgh, PA
, Technical Report, CMU-RITR-09-08.
27.
Park
,
M.
,
Lee
,
S.
, and
Han
,
W.
,
2015
, “
Development of Steering Control System for Autonomous Vehicle Using Geometry-Based Path Tracking Algorithm
,”
Etri J.
,
37
(
3
), pp.
617
625
. 10.4218/etrij.15.0114.0123
28.
Gonzalez
,
R.
,
Fiacchini
,
M.
,
Alamo
,
T.
,
Guzman
,
J. L.
, and
Rodriguez
,
F.
,
2010
, “
Adaptive Control for a Mobile Robot Under Slip Conditions Using An Lmi-based Approach
,”
Eur. J. Control
,
16
(
2
), pp.
144
155
. 10.3166/ejc.16.144-155
29.
Ganoni
,
O.
, and
Mukundan
,
R.
,
2017
,
A Framework for Visually Realistic Multi-Robot Simulation in Natural Environment
. arXiv preprint arXiv:1708.01938.
30.
Zuluaga
,
J. G. C.
,
Leidig
,
J. P.
,
Trefftz
,
C.
, and
Wolffe
,
G.
,
2018
, “
Deep Reinforcement Learning for Autonomous Search and Rescue
,”
NAECON 2018-IEEE National Aerospace and Electronics Conference
,
Dayton, OH
, IEEE, pp.
521
524
.
You do not currently have access to this content.