Abstract
Balancing the exploration and exploitation in reinforcement learning is a commonly dilemma and time-wasting work. In this paper, a novel exploration policy used in Q-Learning, called Memory-greedy policy, is proposed to accelerate learning. By memory storage and playback, the probability of random action selecting can be effectively dealt with or reduced, which hence speeds up learning. The principle of this policy is analyzed by maze scene, and the theoretical convergence is given according to dynamic programming.
Issue Section:
Research Papers
References
1.
Rahnama
, B.
, Elçi
, A.
, and Metani
, S.
, 2012
, “An Image Processing Approach to Solve Labyrinth Discovery Robotics Problem
,” IEEE 36th Annual Computer Software and Applications Conference Workshops
, Izmir, Turkey
, IEEE, pp. 631
–636
.2.
Jaradat
, M. A. K.
, Al-Rousan
, M.
, and Quadan
, L.
, 2011
, “Reinforcement Based Mobile Robot Navigation in Dynamic Environment
,” Rob. Comput.-Integr. Manuf.
, 27
(1
), pp. 135
–149
. 10.1016/j.rcim.2010.06.0193.
Aqel
, M. O.
, Issa
, A.
, Khdair
, M.
, ElHabbash
, M.
, AbuBaker
, M.
, and Massoud
, M.
, 2017
, “Intelligent Maze Solving Robot Based on Image Processing and Graph Theory Algorithms
,” International Conference on Promising Electronic Technologies (ICPET)
, Deir El-Balah, Palestine
, IEEE, pp. 48
–53
.4.
Su
, J.-H.
, Huang
, H.-H.
, and Lee
, C.-S.
, 2013
, “Behavior Model Simulations of Micromouse and Its Application in Intelligent Mobile Robot Education
,” CACS International Automatic Control Conference (CACS)
, Nantou, Taiwan
, IEEE, pp. 511
–515
.5.
Su
, J.-H.
, Cai
, X.-H.
, Lee
, C.-S.
, and Chen
, C.-W.
, 2016
, “The Development of a Half-Size Micromouse and Its Application in Mobile Robot Education
,” International Conference on Advanced Robotics and Intelligent Systems (ARIS)
, Taipei, Taiwan
, IEEE, pp. 1
–6
.6.
Tai
, L.
, Paolo
, G.
, and Liu
, M.
, 2017
, “Virtual-to-Real Deep Reinforcement Learning: Continuous Control of Mobile Robots for Mapless Navigation
,” IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
, Vancouver, BC, Canada
, IEEE, pp. 31
–36
.7.
Velasco-Villa
, M.
, Aranda-Bricaire
, E.
, Rodríguez-Cortés
, H.
, and González-Sierra
, J.
, 2012
, “Trajectory Tracking for Awheeled Mobile Robot Using a Vision Based Positioning System and An Attitude Observer
,” Eur. J. Control
, 18
(4
), pp. 348
–355
. 10.3166/ejc.18.348-3558.
Mishra
, S.
, and Bande
, P.
, 2008
, “Maze Solving Algorithms for Micro Mouse
,” IEEE International Conference on Signal Image Technology and Internet Based Systems
, Bali, Indonesia
, IEEE, pp. 86
–93
.9.
Jones
, P.
, Ludington
, B.
, Reimann
, J.
, and Vachtsevanos
, G.
, 2007
, “Intelligent Control of Unmanned Aerial Vehicles for Improved Autonomy
,” Eur. J. Control
, 13
(2–3
), pp. 320
–333
. 10.3166/ejc.13.320-33310.
Yu
, X.
, Wu
, Y.
, and Sun
, X.-M.
, 2019
, “A Navigation Scheme for a Random Maze Using Reinforcement Learning with Quadrotor Vision
,” 18th European Control Conference (ECC)
, Naples, Italy
, IEEE, pp. 518
–523
.11.
Lv
, Z. Y.
, Wu
, Y.
, and Rui
, W.
, 2020
, “Nonlinear Motion Control for a Quadrotor Transporting a Cable-suspended Payload
,” IEEE Transactions on Vehicular Technology
, 69
(8
), pp. 8192
–8206
. 10.1109/TVT.2020.299773312.
Jiang
, Y.
, and Jiang
, Z.-P.
, 2012
, “Computational Adaptive Optimal Control for Continuous-time Linear Systems with Completely Unknown Dynamics
,” Automatica
, 48
(10
), pp. 2699
–2704
. 10.1016/j.automatica.2012.06.09613.
Bertsekas
, D. P
, 1995
, Dynamic Programming and Optimal Control
, Vol. 1
, Athena Scientific, Belmont
, MA
.14.
Vrabie
, D.
, Vamvoudakis
, K. G.
, and Lewis
, F. L.
, 2013
, Optimal Adaptive Control and Differential Games by Reinforcement Learning Principles
, Vol. 2
, IET
.15.
Wang
, Z.
, Chen
, C.
, Li
, H.
, Dong
, D.
, and Tarn
, T. J.
, 2019
, “Incremental Reinforcement Learning with Prioritized Sweeping for Dynamic Environments
,” IEEE/ASME Trans. Mechatron.
, 24
(2
), pp. 621
–632
. 10.1109/TMECH.2019.289936516.
Tijsma
, A. D.
, Drugan
, M. M.
, and Wiering
, M. A.
, 2016
, “Comparing Exploration Strategies for Q-Learning in Random Stochastic Mazes
,” IEEE Symposium Series on Computational Intelligence (SSCI)
, Athens, Greece
, IEEE, pp. 1
–8
.17.
Pham
, H. X.
, La
, H. M.
, Feil-Seifer
, D.
, and Nguyen
, L. V.
, 2018
, Autonomous UAV Navigation Using Reinforcement Learning
, arXiv preprint arXiv:1801.05086, 2018.18.
Wang
, C.
, Wang
, J.
, Shen
, Y.
, and Zhang
, X.
, 2019
, “Autonomous Navigation of Uavs in Large-scale Complex Environments: A Deep Reinforcement Learning Approach
,” IEEE Trans. Vehicular Technol.
, 68
(3
), pp. 2124
–2136
. 10.1109/TVT.2018.289077319.
Singh
, S.
, Jaakkola
, T.
, Littman
, M. L.
, and Szepesvári
, C.
, 2000
, “Convergence Results for Single-Step on-policy Reinforcement-learning Algorithms
,” Mach. Learning
, 38
(3
), pp. 287
–308
. 10.1023/A:100767893055920.
Dong
, D.
, Chen
, C.
, Chu
, J.
, and Tarn
, T.-J.
, 2012
, “Robust Quantum-Inspired Reinforcement Learning for Robot Navigation
,” IEEE/ASME Trans. Mechatron.
, 17
(1
), pp. 86
–97
. 10.1109/TMECH.2010.209089621.
Lei
, T.
, and Ming
, L.
, 2016
, “A Robot Exploration Strategy Based on Q-Learning Network
,” IEEE International Conference on Real-time Computing and Robotics (RCAR)
, Angkor Wat, Cambodia
, IEEE, pp. 57
–62
.22.
Ni
, Z.
, He
, H.
, Wen
, J.
, and Xu
, X.
, 2013
, “Goal Representation Heuristic Dynamic Programming on Maze Navigation
,” IEEE Trans. Neural Netw. Learn. Syst.
, 24
(12
), pp. 2038
–2050
. 10.1109/TNNLS.2013.227145423.
Osmanković
, D.
, and Konjicija
, S.
, 2011
, “Implementation of Q-Learning Algorithm for Solving Maze Problem
,” Proceedings of the 34th International Convention MIPRO
, Opatija, Croatia
, IEEE, pp. 1619
–1622
.24.
Kayacan
, E.
, Ramon
, H.
, and Saeys
, W.
, 2016
, “Robust Trajectory Tracking Error Model-based Predictive Control for Unmanned Ground Vehicles
,” IEEE/ASME Trans. Mechatron.
, 21
(2
), pp. 806
–814
. 10.1109/TMECH.2015.249298425.
Xun
, S.
, Zhang
, X.
, Ouyang
, T.
, Li
, Y.
, and Raksincharoensak
, P.
, 2020
, “Cooperative Comfortable-driving At Signalized Intersections for Connected and Automated Vehicles
,” IEEE Rob. Autom. Lett.
, 5
(4
), pp. 6247
–6254
. 10.1109/LRA.2020.301401026.
Snider
, J. M.
, 2009
, Automatic Steering Methods for Autonomous Automobile Path Tracking
. Robotics Institute
, Pittsburgh, PA
, Technical Report, CMU-RITR-09-08.27.
Park
, M.
, Lee
, S.
, and Han
, W.
, 2015
, “Development of Steering Control System for Autonomous Vehicle Using Geometry-Based Path Tracking Algorithm
,” Etri J.
, 37
(3
), pp. 617
–625
. 10.4218/etrij.15.0114.012328.
Gonzalez
, R.
, Fiacchini
, M.
, Alamo
, T.
, Guzman
, J. L.
, and Rodriguez
, F.
, 2010
, “Adaptive Control for a Mobile Robot Under Slip Conditions Using An Lmi-based Approach
,” Eur. J. Control
, 16
(2
), pp. 144
–155
. 10.3166/ejc.16.144-15529.
Ganoni
, O.
, and Mukundan
, R.
, 2017
, A Framework for Visually Realistic Multi-Robot Simulation in Natural Environment
. arXiv preprint arXiv:1708.01938.30.
Zuluaga
, J. G. C.
, Leidig
, J. P.
, Trefftz
, C.
, and Wolffe
, G.
, 2018
, “Deep Reinforcement Learning for Autonomous Search and Rescue
,” NAECON 2018-IEEE National Aerospace and Electronics Conference
, Dayton, OH
, IEEE, pp. 521
–524
.Copyright © 2021 by ASME
You do not currently have access to this content.