CINXE.COM
Federated reinforcement learning: techniques, applications, and open challenges
<!doctype html> <html data-n-head-ssr lang="en" data-n-head="%7B%22lang%22:%7B%22ssr%22:%22en%22%7D%7D"> <head > <meta data-n-head="ssr" charset="utf-8"><meta data-n-head="ssr" name="viewport" content="width=device-width,initial-scale=1.0,maximum-scale=1.0, user-scalable=0"><meta data-n-head="ssr" http-equiv="Content-Security-Policy" content="default-src * data:; child-src * 'self' blob: http:;img-src * 'self' data: http:; script-src 'self' 'unsafe-inline' 'unsafe-eval' *;style-src 'self' 'unsafe-inline' *"><meta data-n-head="ssr" name="keywords" content="Federated Learning,Reinforcement Learning,Federated Reinforcement Learning"><meta data-n-head="ssr" name="description" content="This paper presents a comprehensive survey of federated reinforcement learning (FRL), an emerging and promising field in reinforcement learning (RL). Starting with a tutorial of federated learning (FL) and RL, we then focus on the introduction of FRL as a new method with great potential by leveraging the basic idea of FL to improve the performance of RL while preserving data-privacy. According to the distribution characteristics of the agents in the framework, FRL algorithms can be divided into two categories, i.e., Horizontal Federated Reinforcement Learning and vertical federated reinforcement learning (VFRL). We provide the detailed definitions of each category by formulas, investigate the evolution of FRL from a technical perspective, and highlight its advantages over previous RL algorithms. In addition, the existing works on FRL are summarized by application fields, including edge computing, communication, control optimization, and attack detection. Finally, we describe and discuss several key research directions that are crucial to solving the open problems within FRL."><meta data-n-head="ssr" name="dc.title" content="Federated reinforcement learning: techniques, applications, and open challenges"><meta data-n-head="ssr" name="journal_id" content="ir.2021.02"><meta data-n-head="ssr" name="dc.date" content="2021-10-12"><meta data-n-head="ssr" name="dc.identifier" content="doi:10.20517/ir.2021.02"><meta data-n-head="ssr" name="dc.publisher" content="OAE Publishing Inc."><meta data-n-head="ssr" name="dc.type" content="Review"><meta data-n-head="ssr" name="dc.source" content=" Intell Robot 2021;1(1):18-57."><meta data-n-head="ssr" name="dc.citation.spage" content="18"><meta data-n-head="ssr" name="dc.citation.epage" content="57"><meta data-n-head="ssr" name="dc.creator" content="Jiaju Qi"><meta data-n-head="ssr" name="dc.creator" content="Qihao Zhou"><meta data-n-head="ssr" name="dc.creator" content="Lei Lei"><meta data-n-head="ssr" name="dc.creator" content="Kan Zheng"><meta data-n-head="ssr" name="dc.subject" content="Federated Learning"><meta data-n-head="ssr" name="dc.subject" content="Reinforcement Learning"><meta data-n-head="ssr" name="dc.subject" content="Federated Reinforcement Learning"><meta data-n-head="ssr" name="citation_reference" content="citation_title=Nair A, Srinivasan P, Blackwell S, et al. Massively parallel methods for deep reinforcement learning. CoRR 2015;abs/1507.04296. Available from: http://arxiv.org/abs/1507.04296.."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Grounds&nbsp;M, Kudenko&nbsp;D. Parallel reinforcement learning with linear function approximation. In: Tuyls&nbsp;K, Nowe&nbsp;A, Guessoum&nbsp;Z, Kudenko&nbsp;D, editors. Adaptive Agents and Multi-Agent Systems III. Adaptation and Multi-Agent Learning. Berlin, Heidelberg: Springer Berlin Heidelberg; 2008. pp. 60-74."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Clemente AV, Martínez HNC, Chandra A. Efficient parallel methods for deep reinforcement learning. CoRR 2017;abs/1705.04862. Available from: http://arxiv.org/abs/1705.04862.."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Lim&nbsp;WYB, Luong&nbsp;NC, Hoang&nbsp;DT, et al. Federated learning in mobile edge networks: a comprehensive survey. IEEE Communications Surveys Tutorials 2020;22:2031-63."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Nguyen&nbsp;DC, Ding&nbsp;M, Pathirana&nbsp;PN, et al. Federated learning for internet of things: a comprehensive survey. IEEE Communications Surveys Tutorials 2021;23:1622-58."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Khan&nbsp;LU, Saad&nbsp;W, Han&nbsp;Z, Hossain&nbsp;E, Hong&nbsp;CS. Federated learning for internet of things: recent advances, taxonomy, and open challenges. IEEE Communications Surveys Tutorials 2021;23:1759-99."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Yang&nbsp;Q, Liu&nbsp;Y, Cheng&nbsp;Y, et al. 1st ed. Morgan &amp; Claypool; 2019."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Yang&nbsp;Q, Liu&nbsp;Y, Chen&nbsp;T, Tong&nbsp;Y. Federated machine learning: concept and applications. ACM Transactions on Intelligent Systems and Technology (TIST) 2019;10:1-19."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Qinbin L, Zeyi W, Bingsheng H. Federated learning systems: vision, hype and reality for data privacy and protection. CoRR 2019;abs/1907.09693. Available from: http://arxiv.org/abs/1907.09693.."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Li&nbsp;T, Sahu&nbsp;AK, Talwalkar&nbsp;A, Smith&nbsp;V. Federated learning: challenges, methods, and future directions. IEEE Signal Processing Magazine 2020;37:50-60."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Wang&nbsp;S, Tuor&nbsp;T, Salonidis&nbsp;T, Leung&nbsp;KK, Makaya&nbsp;C, et al. Adaptive federated learning in resource constrained edge computing systems. IEEE Journal on Selected Areas in Communications 2019;37:1205-21."><meta data-n-head="ssr" name="citation_reference" content="citation_title=McMahan HB, Moore E, Ramage D, y Arcas BA. Communication-efficient learning of deep networks from decentralized data. CoRR 2016;abs/1602.05629. Available from: http://arxiv.org/abs/1602.05629.."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Phong&nbsp;LT, Aono&nbsp;Y, Hayashi&nbsp;T, Wang&nbsp;L, Moriai&nbsp;S. Privacy-preserving deep learning via additively homomorphic encryption. IEEE Transactions on Information Forensics and Security 2018;13:1333-45."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Zhu&nbsp;H, Jin&nbsp;Y. Multi-objective evolutionary federated learning. IEEE Transactions on Neural Networks and Learning Systems 2020;31:1310-22."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Kairouz P, McMahan HB, Avent B, et al. Advances and open problems in federated learning. CoRR 2019;abs/1912.04977. Available from: http://arxiv.org/abs/1912.04977.."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Pan&nbsp;SJ, Yang&nbsp;Q. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering 2010;22:1345-59."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Li Y. Deep reinforcement learning: an overview. CoRR 2017;abs/1701.07274. Available from: http://arxiv.org/abs/1701.07274.."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Xu&nbsp;Z, Tang&nbsp;J, Meng&nbsp;J, et al. Experience-driven networking: a deep reinforcement learning based approach. In: IEEE INFOCOM 2018-IEEE Conference on Computer Communications. IEEE; 2018. pp. 1871-79."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Mohammadi&nbsp;M, Al-Fuqaha&nbsp;A, Guizani&nbsp;M, Oh&nbsp;JS. Semisupervised deep reinforcement learning in support of IoT and smart city services. IEEE Internet of Things Journal 2018;5:624-35."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Bu F, Wang X. A smart agriculture IoT system based on deep reinforcement learning. Future Generation Computer Systems 2019;99:500–507. Available from: https://www.sciencedirect.com/science/article/pii/S0167739X19307277.."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Xiong&nbsp;X, Zheng&nbsp;K, Lei&nbsp;L, Hou&nbsp;L. Resource allocation based on deep reinforcement learning in IoT edge computing. IEEE Journal on Selected Areas in Communications 2020;38:1133-46."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Lei&nbsp;L, Qi&nbsp;J, Zheng&nbsp;K. Patent analytics based on feature vector space model: a case of IoT. IEEE Access 2019;7:45705-15."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Shalev-Shwartz S, Shammah S, Shashua A. Safe, multi-agent, reinforcement learning for autonomous driving. CoRR 2016;abs/1610.03295. Available from: http://arxiv.org/abs/1610.03295.."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Sallab&nbsp;AE, Abdou&nbsp;M, Perot&nbsp;E, Yogamani&nbsp;S. Deep reinforcement learning framework for autonomous driving. Electronic Imaging 2017;2017:70-76."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Taylor ME. Teaching reinforcement learning with mario: an argument and case study. In: Second AAAI Symposium on Educational Advances in Artificial Intelligence; 2011. Available from: https://www.aaai.org/ocs/index.php/EAAI/EAAI11/paper/viewPaper/3515.."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Holcomb&nbsp;SD, Porter&nbsp;WK, Ault&nbsp;SV, Mao&nbsp;G, Wang&nbsp;J. Overview on deepmind and its alphago zero ai. In: Proceedings of the 2018 international conference on big data and education 2018. pp. 67-71."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Watkins CJ, Dayan P. Q-learning. Machine learning 1992;8:279–92. Available from: https://link.springer.com/content/pdf/10.1007/BF00992698.pdf.."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Thorpe TL. Vehicle traffic light control using sarsa. In: Online]. Available: citeseer. ist. psu. edu/thorpe97vehicle. html. Citeseer; 1997. Available from: https://citeseer.ist.psu.edu/thorpe97vehicle.html.."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Silver D, Lever G, Heess N, et al. Deterministic policy gradient algorithms. In: Xing EP, Jebara T, editors. Proceedings of the 31st International Conference on Machine Learning. vol. 32 of Proceedings of Machine Learning Research. Bejing, China: PMLR; 2014. pp. 387–95. Available from: https://proceedings.mlr.press/v32/silver14.html.."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Williams&nbsp;RJ. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning 1992;8:229-56."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Konda VR, Tsitsiklis JN. Actor-critic algorithms. In: Advances in neural information processing systems; 2000. pp. 1008–14. Available from: https://proceedings.neurips.cc/paper/1786-actor-critic-algorithms.pdf."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Henderson P, Islam R, Bachman P, et al. Deep reinforcement learning that matters. In: Proceedings of the AAAI conference on artificial intelligence. vol. 32; 2018. Available from: https://ojs.aaai.org/index.php/AAAI/article/view/11694.."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Lei&nbsp;L, Tan&nbsp;Y, Dahlenburg&nbsp;G, Xiang&nbsp;W, Zheng&nbsp;K. Dynamic energy dispatch based on Deep Reinforcement Learning in IoT-Driven Smart Isolated Microgrids. IEEE Internet of Things Journal 2021;8:7938-53."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Lei&nbsp;L, Xu&nbsp;H, Xiong&nbsp;X, Zheng&nbsp;K, Xiang&nbsp;W, et al. Multiuser resource control with deep reinforcement learning in IoT edge computing. IEEE Internet of Things Journal 2019;6:10119-33."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Ohnishi&nbsp;S, Uchibe&nbsp;E, Yamaguchi&nbsp;Y, Nakanishi&nbsp;K, Yasui&nbsp;Y, et al. Constrained deep q-learning gradually approaching ordinary q-learning. Frontiers in neurorobotics 2019;13:103."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Peng&nbsp;J, Williams&nbsp;RJ. Incremental multi-step Q-learning. In: machine learning proceedings 1994. Elsevier; 1994. pp. 226-32."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Mnih&nbsp;V, Kavukcuoglu&nbsp;K, Silver&nbsp;D, et al. Human-level control through deep reinforcement learning. Nature 2015;518:529-33."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Lei&nbsp;L, Tan&nbsp;Y, Zheng&nbsp;K, et al. Deep reinforcement learning for autonomous internet of things: model, applications and challenges. IEEE Communications Surveys Tutorials 2020;22:1722-60."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Van Hasselt H, Guez A, Silver D. Deep reinforcement learning with double q-learning. In: proceedings of the AAAI conference on artificial intelligence. vol. 30; 2016. Available from: https://ojs.aaai.org/index.php/AAAI/article/view/10295.."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Schaul T, Quan J, Antonoglou I, Silver D. Prioritized experience replay. arXiv preprint arXiv:151105952 2015. Available from: https://arxiv.org/abs/1511.05952.."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Gu S, Lillicrap TP, Ghahramani Z, Turner RE, Levine S. Q-Prop: sample-efficient policy gradient with an off-policy critic. CoRR 2016;abs/1611.02247. Available from: http://arxiv.org/abs/1611.02247.."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Haarnoja T, Zhou A, Abbeel P, Levine S. Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: Dy J, Krause A, editors. Proceedings of the 35th International Conference on Machine Learning. vol. 80 of Proceedings of Machine Learning Research. PMLR; 2018. pp. 1861–70. Available from: https://proceedings.mlr.press/v80/haarnoja18b.html.."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Mnih V, Badia AP, Mirza M, Graves A, Lillicrap T, et al. Asynchronous methods for deep reinforcement learning. In: Balcan MF, Weinberger KQ, editors. Proceedings of The 33rd International Conference on Machine Learning. vol. 48 of Proceedings of Machine Learning Research. New York, New York, USA: PMLR; 2016. pp. 1928–37. Available from: https://proceedings.mlr.press/v48/mniha16.html.."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Lillicrap TP, Hunt JJ, Pritzel A, et al. Continuous control with deep reinforcement learning. arXiv preprint arXiv: 150902971 2015. Available from: https://arxiv.org/abs/1509.02971.."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Barth-Maron G, Hoffman MW, Budden D, et al. Distributed distributional deterministic policy gradients. CoRR 2018;abs/1804.08617. Available from: http://arxiv.org/abs/1804.08617.."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Fujimoto S, van Hoof H, Meger D. Addressing function approximation error in actor-critic methods. In: Dy J, Krause A, editors. Proceedings of the 35th International Conference on Machine Learning. vol. 80 of Proceedings of Machine Learning Research. PMLR; 2018. pp. 1587–96. Available from: https://proceedings.mlr.press/v80/fujimoto18a.html.."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Schulman J, Levine S, Abbeel P, Jordan M, Moritz P. Trust region policy optimization. In: Bach F, Blei D, editors. Proceedings of the 32nd International Conference on Machine Learning. vol. 37 of Proceedings of Machine Learning Research. Lille, France: PMLR; 2015. pp. 1889–97. Available from: https://proceedings.mlr.press/v37/schulman15.html.."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O. Proximal policy optimization algorithms. CoRR 2017;abs/1707.06347. Available from: http://arxiv.org/abs/1707.06347.."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Zhu P, Li X, Poupart P. On improving deep reinforcement learning for POMDPs. CoRR 2017;abs/1704.07978. Available from: http://arxiv.org/abs/1704.07978.."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Hausknecht M, Stone P. Deep recurrent q-learning for partially observable mdps. In: 2015 aaai fall symposium series; 2015. Available from: https://www.aaai.org/ocs/index.php/FSS/FSS15/paper/viewPaper/11673.."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Heess N, Hunt JJ, Lillicrap TP, Silver D. Memory-based control with recurrent neural networks. CoRR 2015;abs/1512.04455. Available from: http://arxiv.org/abs/1512.04455.."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Foerster J, Nardelli N, Farquhar G, et al. Stabilising experience replay for deep multi-agent reinforcement learning. In: Precup D, Teh YW, editors. Proceedings of the 34th International Conference on Machine Learning. vol. 70 of Proceedings of Machine Learning Research. PMLR; 2017. pp. 1146–55. Available from: https://proceedings.mlr.press/v70/foerster17b.html.."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Van der Pol E, Oliehoek FA. Coordinated deep reinforcement learners for traffic light control. Proceedings of learning, inference and control of multi-agent systems (at NIPS 2016) 2016. Available from: https://www.elisevanderpol.nl/papers/vanderpolNIPSMALIC2016.pdf.."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Foerster J, Farquhar G, Afouras T, Nardelli N, Whiteson S. Counterfactual multi-agent policy gradients. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 32; 2018. Available from: https://ojs.aaai.org/index.php/AAAI/article/view/11794.."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Lowe R, Wu Y, Tamar A, et al. Multi-agent actor-critic for mixed cooperative-competitive environments. CoRR 2017;abs/1706.02275. Available from: http://arxiv.org/abs/1706.02275.."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Nadiger&nbsp;C, Kumar&nbsp;A, Abdelhak&nbsp;S. Federated Reinforcement Learning for Fast Personalization. In: 2019 IEEE Second International Conference on Artificial Intelligence and Knowledge Engineering (AIKE) 2019. pp. 123-27."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Liu B, Wang L, Liu M, Xu C. Lifelong federated reinforcement learning: a learning architecture for navigation in cloud robotic systems. CoRR 2019;abs/1901.06455. Available from: http://arxiv.org/abs/1901.06455.."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Ren&nbsp;J, Wang&nbsp;H, Hou&nbsp;T, Zheng&nbsp;S, Tang&nbsp;C. Federated learning-based computation offloading optimization in edge computing-supported internet of things. IEEE Access 2019;7:69194-201."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Wang&nbsp;X, Wang&nbsp;C, Li&nbsp;X, Leung&nbsp;VCM, Taleb&nbsp;T. Federated deep reinforcement learning for internet of things with decentralized cooperative edge caching. IEEE Internet of Things Journal 2020;7:9441-55."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Chen J, Monga R, Bengio S, Józefowicz R. Revisiting distributed synchronous SGD. CoRR 2016;abs/1604.00981. Available from: http://arxiv.org/abs/1604.00981.."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Mnih V, Badia AP, Mirza M, et al. Asynchronous methods for deep reinforcement learning. In: Balcan MF, Weinberger KQ, editors. Proceedings of The 33rd International Conference on Machine Learning. vol. 48 of Proceedings of Machine Learning Research. New York, New York, USA: PMLR; 2016. pp. 1928–37. Available from: https://proceedings.mlr.press/v48/mniha16.html.."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Espeholt L, Soyer H, Munos R, et al. IMPALA: Scalable distributed deep-RL with importance weighted actor- learner architectures. In: Dy J, Krause A, editors. Proceedings of the 35th International Conference on Machine Learning. vol. 80 of Proceedings of Machine Learning Research. PMLR; 2018. pp. 1407–16. Available from: http://proceedings.mlr.press/v80/espeholt18a.html.."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Horgan D, Quan J, Budden D, et al. Distributed prioritized experience replay. CoRR 2018;abs/1803.00933. Available from: http://arxiv.org/abs/1803.00933.."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Liu&nbsp;T, Tian&nbsp;B, Ai&nbsp;Y, et al. Parallel reinforcement learning: a framework and case study. IEEE/CAA Journal of Automatica Sinica 2018;5:827-35."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Zhuo HH, Feng W, Xu Q, Yang Q, Lin Y. Federated reinforcement learning. CoRR 2019;abs/1901.08277. Available from: http://arxiv.org/abs/1901.08277.."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Canese L, Cardarilli GC, Di Nunzio L, et al. Multi-agent reinforcement learning: a review of challenges and applications. Applied Sciences 2021;11:4948. Available from: https://doi.org/10.3390/app11114948.."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Busoniu&nbsp;L, Babuska&nbsp;R, De Schutter&nbsp;B. A Comprehensive survey of multiagent reinforcement learning. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 2008;38:156-72."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Zhang&nbsp;K, Yang&nbsp;Z, Başar&nbsp;T. Multi-agent reinforcement learning: a selective overview of theories and algorithms. Handbook of Rein forcement Learning and Control 2021:321-84."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Stone&nbsp;P, Veloso&nbsp;M. Multiagent systems: a survey from a machine learning perspective. Autonomous Robots 2000;8:345-83."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Szepesvári&nbsp;C, Littman&nbsp;ML. A unified analysis of value-function-based reinforcement-learning algorithms. Neural computation 1999;11:2017-60."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Littman&nbsp;ML. Value-function reinforcement learning in markov games. Cognitive systems research 2001;2:55-66."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Tan&nbsp;M. Multi-agent reinforcement learning: independent vs. cooperative agents. In: proceedings of the tenth international conference on machine learning 1993. pp. 330-37."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Lauer M, Riedmiller M. An algorithm for distributed reinforcement learning in cooperative multi-agent systems. In: In Proceedings of the Seventeenth International Conference on Machine Learning. Citeseer; 2000. Available from: http://citeseerx.ist.psu.edu/viewdoc/summary.."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Monahan&nbsp;GE. State of the art—a survey of partially observable Markov decision processes: theory, models, and algorithms. Management science 1982;28:1-16."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Oroojlooyjadid A, Hajinezhad D. A review of cooperative multi-agent deep reinforcement learning. CoRR 2019;abs/1908.03963. Available from: http://arxiv.org/abs/1908.03963.."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Bernstein&nbsp;DS, Givan&nbsp;R, Immerman&nbsp;N, Zilberstein&nbsp;S. The complexity of decentralized control of Markov decision processes. Mathematics of operations research 2002;27:819-40."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Omidshafiei S, Pazis J, Amato C, How JP, Vian J. Deep decentralized multi-task multi-agent reinforcement learning under partial observability. In: Precup D, Teh YW, editors. Proceedings of the 34th International Conference on Machine Learning. vol. 70 of Proceedings of Machine Learning Research. PMLR; 2017. pp. 2681–90. Available from: https://proceedings.mlr.press/v70/omidshafiei17a.html.."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Han&nbsp;Y, Gmytrasiewicz&nbsp;P. Ipomdp-net: A deep neural network for partially observable multi-agent planning using interactive pomdps. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 33 2019. pp. 6062-69."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Karkus P, Hsu D, Lee WS. QMDP-Net: Deep learning for planning under partial observability; 2017. Available from: https://arxiv.org/abs/1703.06692.."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Mao&nbsp;W, Zhang&nbsp;K, Miehling&nbsp;E, Başar&nbsp;T. Information state embedding in partially observable cooperative multi-agent reinforcement learning. In: 2020 59th IEEE Conference on Decision and Control (CDC) 2020. pp. 6124-31."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Mao H, Zhang Z, Xiao Z, Gong Z. Modelling the dynamic joint policy of teammates with attention multi-agent DDPG. CoRR 2018;abs/1811.07029. Available from: http://arxiv.org/abs/1811.07029.."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Lee&nbsp;HR, Lee&nbsp;T. Multi-agent reinforcement learning algorithm to solve a partially-observable multi-agent problem in disaster response. European Journal of Operational Research 2021;291:296-308."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Sukhbaatar S, szlam a, Fergus R. Learning multiagent communication with backpropagation. In: Lee D, Sugiyama M, Luxburg U, Guyon I, Garnett R, editors. Advances in Neural Information Processing Systems. vol. 29. Curran Associates, Inc.; 2016. Available from: https://proceedings.neurips.cc/paper/2016/file/55b1927fdafef39c48e5b73b5d61ea60-Paper.pdf.."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Foerster JN, Assael YM, de Freitas N, Whiteson S. Learning to communicate with deep multi-agent reinforcement learning. CoRR 2016;abs/1605.06676. Available from: http://arxiv.org/abs/1605.06676.."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Buşoniu&nbsp;L, Babuška&nbsp;R, De Schutter&nbsp;B. Multi-agent reinforcement learning: an overview. Innovations in multiagent systems and applications 1 2010:183-221."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Hu&nbsp;Y, Hua&nbsp;Y, Liu&nbsp;W, Zhu&nbsp;J. Reward shaping based federated reinforcement learning. IEEE Access 2021;9:67259-67."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Anwar A, Raychowdhury A. Multi-task federated reinforcement learning with adversaries. CoRR 2021;abs/2103.06473. Available from: https://arxiv.org/abs/2103.06473.."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Wang&nbsp;X, Han&nbsp;Y, Wang&nbsp;C, et al. In-edge AI: intelligentizing mobile edge computing, caching and communication by federated learning. IEEE Network 2019;33:156-65."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Wang&nbsp;X, Li&nbsp;R, Wang&nbsp;C, et al. Attention-weighted federated deep reinforcement learning for device-to-device assisted heterogeneous collaborative edge caching. IEEE Journal on Selected Areas in Communications 2021;39:154-69."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Zhang&nbsp;M, Jiang&nbsp;Y, Zheng&nbsp;FC, Bennis&nbsp;M, You&nbsp;X. Cooperative edge caching via federated deep reinforcement learning in Fog-RANs. In: 2021 IEEE International Conference on Communications Workshops (ICC Workshops) 2021. pp. 1-6."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Majidi&nbsp;F, Khayyambashi&nbsp;MR, Barekatain&nbsp;B. HFDRL: an intelligent dynamic cooperate cashing method based on hierarchical federated deep reinforcement learning in edge-enabled IoT. IEEE Internet of Things Journal 2021:1-1."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Zhao&nbsp;L, Ran&nbsp;Y, Wang&nbsp;H, Wang&nbsp;J, Luo&nbsp;J. Towards cooperative caching for vehicular networks with multi-level federated reinforcement learning. In: ICC 2021 - IEEE International Conference on Communications 2021. pp. 1-6."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Zhu&nbsp;Z, Wan&nbsp;S, Fan&nbsp;P, Letaief&nbsp;KB. Federated multi-agent actor-critic learning for age sensitive mobile edge computing. IEEE Internet of Things Journal 2021:1-1."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Yu S, Chen X, Zhou Z, Gong X, Wu D. When deep reinforcement learning meets federated learning: intelligent multi-timescale resource management for multi-access edge computing in 5G ultra dense network. arXiv:200910601 [cs] 2020 Sep. ArXiv: 2009.10601. Available from: http://arxiv.org/abs/2009.10601.."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Tianqing&nbsp;Z, Zhou&nbsp;W, Ye&nbsp;D, Cheng&nbsp;Z, Li&nbsp;J. Resource allocation in IoT edge computing via concurrent federated reinforcement learning. IEEE Internet of Things Journal 2021:1-1."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Huang&nbsp;H, Zeng&nbsp;C, Zhao&nbsp;Y, et al. Scalable orchestration of service function chains in NFV-enabled networks: a federated reinforcement learning approach. IEEE Journal on Selected Areas in Communications 2021;39:2558-71."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Liu&nbsp;YJ, Feng&nbsp;G, Sun&nbsp;Y, Qin&nbsp;S, Liang&nbsp;YC. Device association for RAN slicing based on hybrid federated deep reinforcement learning. IEEE Transactions on Vehicular Technology 2020;69:15731-45."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Wang&nbsp;G, Dang&nbsp;CX, Zhou&nbsp;Z. Measure Contribution of participants in federated learning. In: 2019 IEEE International Conference on Big Data (Big Data) 2019. pp. 2597-604."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Cao&nbsp;Y, Lien&nbsp;SY, Liang&nbsp;YC, Chen&nbsp;KC. Federated deep reinforcement learning for user access control in open radio access networks. In: ICC 2021 - IEEE International Conference on Communications 2021. pp. 1-6."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Zhang&nbsp;L, Yin&nbsp;H, Zhou&nbsp;Z, Roy&nbsp;S, Sun&nbsp;Y. Enhancing WiFi multiple access performance with federated deep reinforcement learning. In: 2020 IEEE 92nd Vehicular Technology Conference (VTC2020-Fall) 2020. pp. 1-6."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Xu&nbsp;M, Peng&nbsp;J, Gupta&nbsp;BB, et al. Multi-agent federated reinforcement learning for secure incentive mechanism in intelligent cyber-physical systems. IEEE Internet of Things Journal 2021:1-1."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Zhang&nbsp;X, Peng&nbsp;M, Yan&nbsp;S, Sun&nbsp;Y. Deep-reinforcement-learning-based mode selection and resource allocation for cellular V2X communications. IEEE Internet of Things Journal 2020;7:6380-91."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Kwon&nbsp;D, Jeon&nbsp;J, Park&nbsp;S, Kim&nbsp;J, Cho&nbsp;S. Multiagent DDPG-based deep learning for smart ocean federated learning IoT networks. IEEE Internet of Things Journal 2020;7:9895-903."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Liang X, Liu Y, Chen T, Liu M, Yang Q. Federated transfer reinforcement learning for autonomous driving. arXiv:191006001 [cs] 2019 Oct. ArXiv: 1910.06001. Available from: http://arxiv.org/abs/1910.06001.."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Lim HK, Kim JB, Heo JS, Han YH. Federated reinforcement learning for training control policies on multiple IoT devices. Sensors 2020 Mar;20:1359. Available from: https://www.mdpi.com/1424-8220/20/5/1359.."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Lim&nbsp;HK, Kim&nbsp;JB, Ullah&nbsp;I, Heo&nbsp;JS, Han&nbsp;YH. Federated reinforcement learning acceleration method for precise control of multiple devices. IEEE Access 2021;9:76296-306."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Mowla&nbsp;NI, Tran&nbsp;NH, Doh&nbsp;I, Chae&nbsp;K. AFRL: Adaptive federated reinforcement learning for intelligent jamming defense in FANET. Journal of Communications and Networks 2020;22:244-58."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Nguyen&nbsp;TG, Phan&nbsp;TV, Hoang&nbsp;DT, Nguyen&nbsp;TN, So-In&nbsp;C. Federated deep reinforcement learning for traffic monitoring in SDN-Based IoT networks. IEEE Transactions on Cognitive Communications and Networking 2021:1-1."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Wang&nbsp;X, Garg&nbsp;S, Lin&nbsp;H, et al. Towards accurate anomaly detection in industrial internet-of-things using hierarchical federated learning. IEEE Internet of Things Journal 2021:1-1."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Lee&nbsp;S, Choi&nbsp;DH. Federated reinforcement learning for energy management of multiple smart homes with distributed energy resources. IEEE Transactions on Industrial Informatics 2020:1-1."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Samet H. The quadtree and related hierarchical data structures. ACM Comput Surv 1984;16:187–260. Available from: https://doi.org/10.1145/356924.356930.."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Abdel-Aziz&nbsp;MK, Samarakoon&nbsp;S, Perfecto&nbsp;C, Bennis&nbsp;M. Cooperative perception in vehicular networks using multi-agent reinforcement learning. In: 2020 54th Asilomar Conference on Signals, Systems, and Computers 2020. pp. 408-12."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Wang H, Kaplan Z, Niu D, Li B. Optimizing federated learning on Non-IID data with reinforcement learning. In: IEEE INFOCOM 2020 - IEEE Conference on Computer Communications. Toronto, ON, Canada: IEEE; 2020. pp. 1698–707. Available from: https://ieeexplore.ieee.org/document/9155494/.."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Zhang&nbsp;P, Gan&nbsp;P, Aujla&nbsp;GS, Batth&nbsp;RS. Reinforcement learning for edge device selection using social attribute perception in industry 4.0. IEEE Internet of Things Journal 2021:1-1."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Zhan&nbsp;Y, Li&nbsp;P, Leijie&nbsp;W, Guo&nbsp;S. L4L: experience-driven computational resource control in federated learning. IEEE Transactions on Computers 2021:1-1."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Dong&nbsp;Y, Gan&nbsp;P, Aujla&nbsp;GS, Zhang&nbsp;P. RA-RL: reputation-aware edge device selection method based on reinforcement learning. In: 2021 IEEE 22nd International Symposium on a World of Wireless, Mobile and Multimedia Networks (WoWMoM) 2021. pp. 348-53."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Sahu AK, Li T, Sanjabi M, et al. On the convergence of federated optimization in heterogeneous networks. CoRR 2018;abs/1812.06127. Available from: http://arxiv.org/abs/1812.06127.."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Chen&nbsp;M, Poor&nbsp;HV, Saad&nbsp;W, Cui&nbsp;S. Convergence time optimization for federated learning over wireless networks. IEEE Transactions on Wireless Communications 2021;20:2457-71."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Li X, Huang K, Yang W, Wang S, Zhang Z. On the convergence of fedAvg on Non-IID data; 2020. Available from: https://arxiv.org/abs/1907.02189?context=stat.ML.."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Bonawitz KA, Eichner H, Grieskamp W, et al. Towards federated learning at scale: system design. CoRR 2019;abs/1902.01046. Available from: http://arxiv.org/abs/1902.01046.."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Mnih V, Kavukcuoglu K, Silver D, et al. Human-level control through deep reinforcement learning. Nature 2015;518:529–33. Available from: https://doi.org/10.1038/nature14236.."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Lillicrap TP, Hunt JJ, Pritzel A, et al. Continuous control with deep reinforcement learning; 2019. Available from: https://arxiv.org/abs/1509.02971.."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Lyu L, Yu H, Yang Q. Threats to federated learning: a survey. CoRR 2020;abs/2003.02133. Available from: https://arxiv.org/abs/2003.02133.."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Fung C, Yoon CJM, Beschastnikh I. Mitigating sybils in federated learning poisoning. CoRR 2018;abs/1808.04866. Available from: http://arxiv.org/abs/1808.04866.."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Anwar&nbsp;A, Raychowdhury&nbsp;A. Multi-task federated reinforcement learning with adversaries 2021."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Zhu L, Liu Z, Han S. Deep leakage from gradients. CoRR 2019;abs/1906.08935. Available from: http://arxiv.org/abs/1906.08935.."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Nishio&nbsp;T, Yonetani&nbsp;R. Client Selection for federated learning with heterogeneous resources in mobile edge. In: ICC 2019-2019 IEEE International Conference on Communications (ICC) 2019. pp. 1-7."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Yang T, Andrew G, Eichner H, et al. Applied federated learning: improving google keyboard query suggestions. CoRR 2018;abs/1812.02903. Available from: http://arxiv.org/abs/1812.02903.."><meta data-n-head="ssr" name="citation_reference" content="citation_title=Yu H, Liu Z, Liu Y, et al. A fairness-aware incentive scheme for federated learning. In: Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society. AIES ’20. New York, NY, USA: Association for Computing Machinery; 2020. p. 393–399. Available from: https://doi.org/10.1145/3375627.3375840.."><meta data-n-head="ssr" name="citation_journal_title" content="Intelligence & Robotics"><meta data-n-head="ssr" name="citation_publisher" content="OAE Publishing Inc."><meta data-n-head="ssr" name="citation_title" content="Federated reinforcement learning: techniques, applications, and open challenges"><meta data-n-head="ssr" name="citation_publication_date" content="2021/10/12"><meta data-n-head="ssr" name="citation_online_date" content="2021/10/12"><meta data-n-head="ssr" name="citation_doi" content="10.20517/ir.2021.02"><meta data-n-head="ssr" name="citation_volume" content="1"><meta data-n-head="ssr" name="citation_issue" content="1"><meta data-n-head="ssr" name="citation_firstpage" content="18"><meta data-n-head="ssr" name="citation_lastpage" content="57"><meta data-n-head="ssr" name="citation_author" content="Jiaju Qi"><meta data-n-head="ssr" name="citation_author" content="Qihao Zhou"><meta data-n-head="ssr" name="citation_author" content="Lei Lei"><meta data-n-head="ssr" name="citation_author" content="Kan Zheng"><meta data-n-head="ssr" name="prism.issn" content="ISSN 2770-3541 (Online)"><meta data-n-head="ssr" name="prism.publicationName" content="OAE Publishing Inc."><meta data-n-head="ssr" name="prism.publicationDate" content="2021-10-12"><meta data-n-head="ssr" name="prism.volume" content="1"><meta data-n-head="ssr" name="prism.section" content="Review"><meta data-n-head="ssr" name="prism.startingPag" content="18"><meta data-n-head="ssr" name="prism.url" content="https://www.oaepublish.com/articles/ir.2021.02"><meta data-n-head="ssr" name="prism.doi" content="doi:10.20517/ir.2021.02"><meta data-n-head="ssr" name="citation_journal_abbrev" content="ir"><meta data-n-head="ssr" name="citation_article_type" content="Review"><meta data-n-head="ssr" name="citation_language" content="en"><meta data-n-head="ssr" name="citation_doi" content="10.20517/ir.2021.02"><meta data-n-head="ssr" name="citation_id" content="ir.2021.02"><meta data-n-head="ssr" name="citation_issn" content="ISSN 2770-3541 (Online)"><meta data-n-head="ssr" name="citation_publication_date" content="2021-10-12"><meta data-n-head="ssr" name="citation_author_institution" content="Correspondence Address: Dr. Lei Lei, School of Engineering, University of Guelph, 50 Stone Road East, Guelph, ON N1G 2W1, Canada. E-mail: leil@uoguelph.ca"><meta data-n-head="ssr" name="citation_pdf_url" content="https://f.oaes.cc/xmlpdf/38ebf366-5fba-47ab-86ac-aa0a6cc124ef/4325.pdf"><meta data-n-head="ssr" name="citation_fulltext_html_url" content="https://www.oaepublish.com/articles/ir.2021.02"><meta data-n-head="ssr" name="fulltext_pdf" content="https://f.oaes.cc/xmlpdf/38ebf366-5fba-47ab-86ac-aa0a6cc124ef/4325.pdf"><meta data-n-head="ssr" name="twitter:type" content="article"><meta data-n-head="ssr" name="twitter:title" content="Federated reinforcement learning: techniques, applications, and open challenges"><meta data-n-head="ssr" name="twitter:description" content="This paper presents a comprehensive survey of federated reinforcement learning (FRL), an emerging and promising field in reinforcement learning (RL). Starting with a tutorial of federated learning (FL) and RL, we then focus on the introduction of FRL as a new method with great potential by leveraging the basic idea of FL to improve the performance of RL while preserving data-privacy. According to the distribution characteristics of the agents in the framework, FRL algorithms can be divided into two categories, i.e., Horizontal Federated Reinforcement Learning and vertical federated reinforcement learning (VFRL). We provide the detailed definitions of each category by formulas, investigate the evolution of FRL from a technical perspective, and highlight its advantages over previous RL algorithms. In addition, the existing works on FRL are summarized by application fields, including edge computing, communication, control optimization, and attack detection. Finally, we describe and discuss several key research directions that are crucial to solving the open problems within FRL."><meta data-n-head="ssr" name="og:url" content="https://www.oaepublish.com/articles/ir.2021.02"><meta data-n-head="ssr" name="og:type" content="article"><meta data-n-head="ssr" name="og:site_name" content="Intelligence & Robotics"><meta data-n-head="ssr" name="og:title" content="Federated reinforcement learning: techniques, applications, and open challenges"><meta data-n-head="ssr" name="og:description" content="This paper presents a comprehensive survey of federated reinforcement learning (FRL), an emerging and promising field in reinforcement learning (RL). Starting with a tutorial of federated learning (FL) and RL, we then focus on the introduction of FRL as a new method with great potential by leveraging the basic idea of FL to improve the performance of RL while preserving data-privacy. According to the distribution characteristics of the agents in the framework, FRL algorithms can be divided into two categories, i.e., Horizontal Federated Reinforcement Learning and vertical federated reinforcement learning (VFRL). We provide the detailed definitions of each category by formulas, investigate the evolution of FRL from a technical perspective, and highlight its advantages over previous RL algorithms. In addition, the existing works on FRL are summarized by application fields, including edge computing, communication, control optimization, and attack detection. Finally, we describe and discuss several key research directions that are crucial to solving the open problems within FRL."><title>Federated reinforcement learning: techniques, applications, and open challenges</title><link data-n-head="ssr" rel="icon" type="image/x-icon" href="/favicon.ico"><link data-n-head="ssr" rel="canonical" href="https://www.oaepublish.com/articles/ir.2021.02"><script data-n-head="ssr" src="https://accounts.google.com/gsi/client" async></script><script data-n-head="ssr" src="https://g.oaes.cc/oae/dist/relijs.js" async></script><script data-n-head="ssr" src="https://www.googletagmanager.com/gtag/js?id=G-FM6KBJGRBV" async></script><link rel="preload" href="https://g.oaes.cc/oae/nuxt/b06ddfb.js" as="script"><link rel="preload" href="https://g.oaes.cc/oae/nuxt/0a3b980.js" as="script"><link rel="preload" href="https://g.oaes.cc/oae/nuxt/css/8176b15.css" as="style"><link rel="preload" href="https://g.oaes.cc/oae/nuxt/3e8004d.js" as="script"><link rel="preload" href="https://g.oaes.cc/oae/nuxt/css/f3a19d3.css" as="style"><link rel="preload" href="https://g.oaes.cc/oae/nuxt/b19d7ea.js" as="script"><link rel="preload" href="https://g.oaes.cc/oae/nuxt/css/52ac674.css" as="style"><link rel="preload" href="https://g.oaes.cc/oae/nuxt/d1923ac.js" as="script"><link rel="stylesheet" href="https://g.oaes.cc/oae/nuxt/css/8176b15.css"><link rel="stylesheet" href="https://g.oaes.cc/oae/nuxt/css/f3a19d3.css"><link rel="stylesheet" href="https://g.oaes.cc/oae/nuxt/css/52ac674.css"> </head> <body > <div data-server-rendered="true" id="__nuxt"><!----><div id="__layout"><div data-fetch-key="data-v-0baa1603:0" data-v-0baa1603><div class="PcComment" data-v-43aad25a data-v-0baa1603><div class="ipad_bg" style="display:none;" data-v-43aad25a></div> <div class="head_top" data-v-43aad25a><div class="wrapper head_box" data-v-43aad25a><span class="qk_jx" data-v-43aad25a><img src="https://i.oaes.cc/upload/journal_logo/ir.png" alt data-v-43aad25a></span> <a href="/ir" class="qk_a_name" data-v-43aad25a><span class="title font20" data-v-43aad25a>Intelligence & Robotics</span></a> <i class="el-icon-caret-right sjbtn" style="color:rgb(0,71,187);" data-v-43aad25a></i> <div class="top_img" data-v-43aad25a><a href="https://www.scopus.com/sourceid/21101199351" target="_blank" data-v-43aad25a><img src="https://i.oaes.cc/uploads/20240813/49390c7e86ab40a58ee862e8c1af65ba.png" alt data-v-43aad25a></a><a href="" target="_blank" data-v-43aad25a><img src="https://i.oaes.cc/uploads/20240506/ea3d9071c35b4bf3982ffe25f1083620.png" alt data-v-43aad25a></a></div> <div class="oae_menu_box" data-v-43aad25a><a href="/alljournals" data-v-43aad25a><span data-v-43aad25a>All Journals</span></a></div> <span class="search" data-v-43aad25a><i class="icon-search icon_right font24" data-v-43aad25a></i> <span data-v-43aad25a>Search</span></span> <span class="go_oae" data-v-43aad25a><a href="https://oaemesas.com/login?JournalId=ir" target="_blank" data-v-43aad25a><i class="icon-login-line icon_right font24" data-v-43aad25a></i> <span data-v-43aad25a>Log In</span></a></span></div></div> <div class="cg" style="height: 41px" data-v-43aad25a></div> <!----> <div class="head_text" style="border-bottom:3px solid rgb(0,71,187);" data-v-43aad25a><div class="head_search wrapper" style="display:none;" data-v-43aad25a><div class="box_btn" data-v-43aad25a><div class="qk_miss" data-v-43aad25a><img src="https://i.oaes.cc/uploads/20231121/59802903b17e4eebae240e004311d193.jpg" alt class="qk_fm" data-v-43aad25a> <div class="miss_right" data-v-43aad25a><div class="miss_btn" data-v-43aad25a><span data-v-43aad25a><span class="font_b" data-v-43aad25a>Editor-in-Chief:</span> Simon X. Yang</span></div> <div class="miss_btn" data-v-43aad25a><div class="text_index" data-v-43aad25a><span class="font_b" data-v-43aad25a>Indexing: </span> <span data-v-43aad25a><a href="https://www.oaepublish.com/news/ir.852" target="_blank" data-v-43aad25a>ESCI</a><span class="xing_d" data-v-43aad25a>, </span></span><span data-v-43aad25a><a href="https://www.scopus.com/sourceid/21101199351" target="_blank" data-v-43aad25a>Scopus</a><span class="xing_d" data-v-43aad25a>, </span></span><span data-v-43aad25a><a href="https://scholar.google.com.hk/citations?view_op=list_works&hl=zh-CN&hl=zh-CN&user=-Hx5OVYAAAAJ" target="_blank" data-v-43aad25a>Google Scholar</a><span class="xing_d" data-v-43aad25a>, </span></span><span data-v-43aad25a><a href="https://app.dimensions.ai/discover/publication?and_facet_source_title=jour.1423782" target="_blank" data-v-43aad25a>Dimensions</a><span class="xing_d" data-v-43aad25a>, </span></span><span data-v-43aad25a><a href="https://www.lens.org/lens/search/scholar/list?p=0&n=10&s=date_published&d=%2B&f=false&e=false&l=en&authorField=author&dateFilterField=publishedYear&orderBy=%2Bdate_published&presentation=false&preview=true&stemmed=true&useAuthorId=false&publicationType.must=journal%20article&sourceTitle.must=Intelligence%20%26%20Robotics&publisher.must=OAE%20Publishing%20Inc." target="_blank" data-v-43aad25a>Lens</a><span class="xing_d" data-v-43aad25a>, </span></span></div> <div class="text_jour" data-v-43aad25a><!----><!----><span class="font_b" data-v-43aad25a>Submission to first decision: </span><span data-v-43aad25a>40 days</span></div></div> <div class="btn_box_t" data-v-43aad25a><button type="button" class="el-button el-button--text " data-v-43aad25a><!----><!----><span><a href="https://f.oaes.cc/index_ad/flyer/IR-flyer.pdf" target="_blank" data-v-43aad25a><i class="icon-download font20" data-v-43aad25a></i> Journal Flyer</a></span></button><!----></div></div></div> <div class="grid-content bg-purple search_box" data-v-43aad25a><span data-v-43aad25a><div role="tooltip" id="el-popover-178" aria-hidden="true" class="el-popover el-popper" style="width:undefinedpx;display:none;"><!----> <!----> <div class="search_hot" data-v-43aad25a><div class="title" data-v-43aad25a><span class="text" data-v-43aad25a>Hot Keywords</span></div> <div class="hot_list" data-v-43aad25a><span data-v-43aad25a>Intelligence</span><span data-v-43aad25a>Robotics</span><span data-v-43aad25a>Reinforcement Learning</span><span data-v-43aad25a>Machine Learning</span><span data-v-43aad25a>Unmanned Vehicles</span><span data-v-43aad25a>UAV</span></div></div></div><span class="el-popover__reference-wrapper"><div class="el-input el-input--suffix" data-v-43aad25a><!----><input type="text" autocomplete="off" placeholder="Keywords/Title/Author Name/DOI" class="el-input__inner"><!----><span class="el-input__suffix"><span class="el-input__suffix-inner"><i class="icon-search font24 el-input__icon" data-v-43aad25a></i><!----><!----><!----><!----></span><!----></span><!----><!----></div></span></span></div></div></div> <div class="head_menu" data-v-43aad25a><div class="wrapper" data-v-43aad25a><div class="menu_box" data-v-43aad25a><ul role="menubar" class="el-menu-demo el-menu--horizontal el-menu" style="background-color:;" data-v-43aad25a><li role="menuitem" tabindex="-1" exact="" class="el-menu-item" style="color:;border-bottom-color:transparent;background-color:;" data-v-43aad25a><a href="/ir" data-v-43aad25a>Home</a></li></ul><ul role="menubar" class="el-menu-demo el-menu--horizontal el-menu" style="background-color:;" data-v-43aad25a><li role="menuitem" aria-haspopup="true" exact="" class="el-submenu" data-v-43aad25a><div class="el-submenu__title" style="border-bottom-color:transparent;color:;background-color:;">About<i class="el-submenu__icon-arrow el-icon-arrow-down"></i></div><div class="el-menu--horizontal" style="display:none;"><ul role="menu" class="el-menu el-menu--popup el-menu--popup-" style="background-color:;"> <div class="menuItem" data-v-a0c70e7e data-v-43aad25a><li role="menuitem" tabindex="-1" class="el-menu-item" style="color:;background-color:;" data-v-a0c70e7e><a href="/ir/about_the_journal" data-v-a0c70e7e>About the Journal</a></li><li role="menuitem" tabindex="-1" class="el-menu-item" style="color:;background-color:;" data-v-a0c70e7e><a href="/ir/aims_and_scope" data-v-a0c70e7e>Aims and Scope</a></li><li role="menuitem" tabindex="-1" class="el-menu-item" style="color:;background-color:;" data-v-a0c70e7e><a href="/ir/editorial_policies" data-v-a0c70e7e>Editorial Policies</a></li><li role="menuitem" tabindex="-1" class="el-menu-item" style="color:;background-color:;" data-v-a0c70e7e><a href="/ir/editor" data-v-a0c70e7e>Editorial Board</a></li><li role="menuitem" tabindex="-1" class="el-menu-item" style="color:;background-color:;" data-v-a0c70e7e><a href="/ir/junior_editorial_board" data-v-a0c70e7e>Junior Editorial Board</a></li><li role="menuitem" tabindex="-1" class="el-menu-item" style="color:;background-color:;" data-v-a0c70e7e><a href="/ir/awards" data-v-a0c70e7e>Journal Awards</a></li><li role="menuitem" tabindex="-1" class="el-menu-item" style="color:;background-color:;" data-v-a0c70e7e><a href="/ir/news" data-v-a0c70e7e>News</a></li><li role="menuitem" tabindex="-1" class="el-menu-item" style="color:;background-color:;" data-v-a0c70e7e><a href="/ir/partners" data-v-a0c70e7e>Partners</a></li><li role="menuitem" tabindex="-1" class="el-menu-item" style="color:;background-color:;" data-v-a0c70e7e><a href="/ir/advertise" data-v-a0c70e7e>Advertise</a></li><li role="menuitem" tabindex="-1" class="el-menu-item" style="color:;background-color:;" data-v-a0c70e7e><a href="/ir/contact_us" data-v-a0c70e7e>Contact Us</a></li></div></ul></div></li></ul><ul role="menubar" class="el-menu-demo el-menu--horizontal el-menu" style="background-color:;" data-v-43aad25a><li role="menuitem" aria-haspopup="true" exact="" class="el-submenu" data-v-43aad25a><div class="el-submenu__title" style="border-bottom-color:transparent;color:;background-color:;">Publish with us<i class="el-submenu__icon-arrow el-icon-arrow-down"></i></div><div class="el-menu--horizontal" style="display:none;"><ul role="menu" class="el-menu el-menu--popup el-menu--popup-" style="background-color:;"> <div class="menuItem" data-v-a0c70e7e data-v-43aad25a><li role="menuitem" aria-haspopup="true" class="el-submenu" data-v-a0c70e7e><div class="el-submenu__title" style="border-bottom-color:transparent;color:;background-color:;">For Authors<i class="el-submenu__icon-arrow el-icon-arrow-right"></i></div><div class="el-menu--horizontal" style="display:none;"><ul role="menu" class="el-menu el-menu--popup el-menu--popup-" style="background-color:;"> <div class="menuItem ts_item" data-v-a0c70e7e data-v-a0c70e7e><li role="menuitem" tabindex="-1" class="el-menu-item" style="color:;background-color:;" data-v-a0c70e7e><a href="/ir/author_instructions" data-v-a0c70e7e>Author Instructions</a></li><li role="menuitem" tabindex="-1" class="el-menu-item" style="color:;background-color:;" data-v-a0c70e7e><a href="/ir/article_processing_charges" data-v-a0c70e7e>Article Processing Charges</a></li><li role="menuitem" tabindex="-1" class="el-menu-item" style="color:;background-color:;" data-v-a0c70e7e><a href="/ir/editorial_process" data-v-a0c70e7e>Editorial Process</a></li><li role="menuitem" tabindex="-1" class="el-menu-item" style="color:;background-color:;" data-v-a0c70e7e><a href="/ir/manuscript_templates" data-v-a0c70e7e>Manuscript Templates</a></li><li role="menuitem" tabindex="-1" class="el-menu-item" style="color:;background-color:;" data-v-a0c70e7e><a href="https://oaemesas.com/login?JournalId=ir" target="_blank" data-v-a0c70e7e>Submit a Manuscript</a></li><li role="menuitem" tabindex="-1" class="el-menu-item" style="color:;background-color:;" data-v-a0c70e7e><a href="/ir/video_abstract_guidelines" data-v-a0c70e7e>Video Abstract Guidelines</a></li></div></ul></div></li><li role="menuitem" aria-haspopup="true" class="el-submenu" data-v-a0c70e7e><div class="el-submenu__title" style="border-bottom-color:transparent;color:;background-color:;">For Reviewers<i class="el-submenu__icon-arrow el-icon-arrow-right"></i></div><div class="el-menu--horizontal" style="display:none;"><ul role="menu" class="el-menu el-menu--popup el-menu--popup-" style="background-color:;"> <div class="menuItem ts_item" data-v-a0c70e7e data-v-a0c70e7e><li role="menuitem" tabindex="-1" class="el-menu-item" style="color:;background-color:;" data-v-a0c70e7e><a href="/ir/peer_review_guidelines" data-v-a0c70e7e>Peer Review Guidelines</a></li></div></ul></div></li></div></ul></div></li></ul><ul role="menubar" class="el-menu-demo el-menu--horizontal el-menu" style="background-color:;" data-v-43aad25a><li role="menuitem" aria-haspopup="true" exact="" class="el-submenu" data-v-43aad25a><div class="el-submenu__title" style="border-bottom-color:transparent;color:;background-color:;">Articles<i class="el-submenu__icon-arrow el-icon-arrow-down"></i></div><div class="el-menu--horizontal" style="display:none;"><ul role="menu" class="el-menu el-menu--popup el-menu--popup-" style="background-color:;"> <div class="menuItem" data-v-a0c70e7e data-v-43aad25a><li role="menuitem" tabindex="-1" class="el-menu-item" style="color:;background-color:;" data-v-a0c70e7e><a href="/ir/articles" data-v-a0c70e7e>All Articles</a></li><li role="menuitem" tabindex="-1" class="el-menu-item" style="color:;background-color:;" data-v-a0c70e7e><a href="/ir/articles_videos" data-v-a0c70e7e>Articles With Video Abstracts</a></li><li role="menuitem" tabindex="-1" class="el-menu-item" style="color:;background-color:;" data-v-a0c70e7e><a href="/ir/video_abstract_guidelines" data-v-a0c70e7e>Video Abstract Guidelines</a></li></div></ul></div></li></ul><ul role="menubar" class="el-menu-demo el-menu--horizontal el-menu" style="background-color:;" data-v-43aad25a><li role="menuitem" aria-haspopup="true" exact="" class="el-submenu" data-v-43aad25a><div class="el-submenu__title" style="border-bottom-color:transparent;color:;background-color:;">Special Issues<i class="el-submenu__icon-arrow el-icon-arrow-down"></i></div><div class="el-menu--horizontal" style="display:none;"><ul role="menu" class="el-menu el-menu--popup el-menu--popup-" style="background-color:;"> <div class="menuItem" data-v-a0c70e7e data-v-43aad25a><li role="menuitem" tabindex="-1" class="el-menu-item" style="color:;background-color:;" data-v-a0c70e7e><a href="/ir/special_issues" data-v-a0c70e7e>All Special Issues</a></li><li role="menuitem" tabindex="-1" class="el-menu-item" style="color:;background-color:;" data-v-a0c70e7e><a href="/ir/ongoing_special_issues" data-v-a0c70e7e>Ongoing Special Issues</a></li><li role="menuitem" tabindex="-1" class="el-menu-item" style="color:;background-color:;" data-v-a0c70e7e><a href="/ir/completed_special_issues" data-v-a0c70e7e>Completed Special Issues</a></li><li role="menuitem" tabindex="-1" class="el-menu-item" style="color:;background-color:;" data-v-a0c70e7e><a href="/ir/closed_special_issues" data-v-a0c70e7e>Closed Special Issue</a></li><li role="menuitem" tabindex="-1" class="el-menu-item" style="color:;background-color:;" data-v-a0c70e7e><a href="/ir/special_issues_ebooks" data-v-a0c70e7e>Special Issue Ebooks</a></li><li role="menuitem" tabindex="-1" class="el-menu-item" style="color:;background-color:;" data-v-a0c70e7e><a href="/ir/special_issue_guidelines" data-v-a0c70e7e>Special Issue Guidelines</a></li></div></ul></div></li></ul><ul role="menubar" class="el-menu-demo el-menu--horizontal el-menu" style="background-color:;" data-v-43aad25a><li role="menuitem" tabindex="-1" exact="" class="el-menu-item" style="color:;border-bottom-color:transparent;background-color:;" data-v-43aad25a><a href="/ir/volumes" data-v-43aad25a>Volumes</a></li></ul><ul role="menubar" class="el-menu-demo el-menu--horizontal el-menu" style="background-color:;" data-v-43aad25a><li role="menuitem" tabindex="-1" exact="" class="el-menu-item" style="color:;border-bottom-color:transparent;background-color:;" data-v-43aad25a><a href="/ir/pre_onlines" data-v-43aad25a>Pre-onlines</a></li></ul><ul role="menubar" class="el-menu-demo el-menu--horizontal el-menu" style="background-color:;" data-v-43aad25a><li role="menuitem" aria-haspopup="true" exact="" class="el-submenu" data-v-43aad25a><div class="el-submenu__title" style="border-bottom-color:transparent;color:;background-color:;">Features<i class="el-submenu__icon-arrow el-icon-arrow-down"></i></div><div class="el-menu--horizontal" style="display:none;"><ul role="menu" class="el-menu el-menu--popup el-menu--popup-" style="background-color:;"> <div class="menuItem" data-v-a0c70e7e data-v-43aad25a><li role="menuitem" tabindex="-1" class="el-menu-item" style="color:;background-color:;" data-v-a0c70e7e><a href="/ir/webinars" data-v-a0c70e7e>Webinars</a></li><li role="menuitem" tabindex="-1" class="el-menu-item" style="color:;background-color:;" data-v-a0c70e7e><a href="/ir/academic_talks" data-v-a0c70e7e>Academic Talks</a></li><li role="menuitem" tabindex="-1" class="el-menu-item" style="color:;background-color:;" data-v-a0c70e7e><a href="/ir/videos" data-v-a0c70e7e>Videos</a></li><li role="menuitem" tabindex="-1" class="el-menu-item" style="color:;background-color:;" data-v-a0c70e7e><a href="/ir/interviews" data-v-a0c70e7e>Interviews</a></li></div></ul></div></li></ul></div></div> <div class="wrapper menu_ipad" data-v-43aad25a><div class="nav_box" data-v-43aad25a><div class="nav_list colorH_ir" data-v-43aad25a><a href="/ir" class="tab_item" data-v-43aad25a><span class="tab_span" data-v-43aad25a>Home</span></a> <a href="/ir/articles" class="tab_item nuxt-link-active" data-v-43aad25a><span class="tab_span" data-v-43aad25a>Articles</span></a> <a href="/ir/special_issues" class="tab_item" data-v-43aad25a><span class="tab_span" data-v-43aad25a>Special Issues</span></a> <a href="/ir/volumes" class="tab_item" data-v-43aad25a><span class="tab_span" data-v-43aad25a>Volumes</span></a> <a href="/ir/webinars" class="tab_item" data-v-43aad25a><span class="tab_span" data-v-43aad25a>Webinars</span></a> <a href="/ir/videos" class="tab_item" data-v-43aad25a><span class="tab_span" data-v-43aad25a>Videos</span></a></div></div> <button type="button" class="el-button el-button--text" data-v-43aad25a><!----><i class="icon-nav-line"></i><span>Menu</span></button> <!----></div></div></div></div> <div class="MoComment" data-v-55208cba data-v-0baa1603><div class="head_top" data-v-55208cba><a href="/" class="nuxt-link-active" data-v-55208cba><span class="qk_jx" data-v-55208cba><img src="https://i.oaes.cc/upload/journal_logo/ir.png" alt="" data-v-55208cba></span></a> <div class="head_left" data-v-55208cba><a href="/" class="tab_item nuxt-link-active" data-v-55208cba><span class="title font18" data-v-55208cba>Intelligence & Robotics</span></a> <i class="el-icon-caret-right sjbtn" style="color:rgb(0,71,187);" data-v-55208cba></i></div> <div class="head_right" data-v-55208cba><a href="/ir/search" class="search" data-v-55208cba><span data-v-55208cba>Search</span></a> <span class="go_oae" style="background:rgb(0,71,187);" data-v-55208cba><a href="https://oaemesas.com/login?JournalId=ir" target="_blank" data-v-55208cba><span data-v-55208cba>Submit</span></a></span></div></div> <div class="cg" style="height: 50px" data-v-55208cba></div> <div class="fix_box" style="display:none;" data-v-55208cba><div class="miss_right" data-v-55208cba><div class="flex_tit" data-v-55208cba><div class="top_img" data-v-55208cba><a href="https://www.scopus.com/sourceid/21101199351" target="_blank" data-v-55208cba><img src="https://i.oaes.cc/uploads/20240813/49390c7e86ab40a58ee862e8c1af65ba.png" alt data-v-55208cba></a><a href="" target="_blank" data-v-55208cba><img src="https://i.oaes.cc/uploads/20240506/ea3d9071c35b4bf3982ffe25f1083620.png" alt data-v-55208cba></a></div></div> <div class="miss_btn wid70" data-v-55208cba><div data-v-55208cba><div data-v-55208cba><span class="font_b" data-v-55208cba>Editor-in-Chief:</span> Simon X. Yang</div></div></div> <div class="miss_btn" style="width:calc(100% - 74px);" data-v-55208cba><!----></div> <div data-v-55208cba><!----> <div data-v-55208cba><span class="font_b" data-v-55208cba>Submission to first decision: </span>40 days</div></div></div></div> <div class="fix_box" data-v-55208cba><div class="navigation colorH_ir" style="border-bottom:2px solid rgb(0,71,187);" data-v-55208cba><div class="nav_box" data-v-55208cba><div class="nav_list" data-v-55208cba><a href="/ir" class="tab_item" data-v-55208cba><span class="tab_span" data-v-55208cba>Home</span></a> <a href="/ir/articles" class="tab_item nuxt-link-active" data-v-55208cba><span class="tab_span" data-v-55208cba>Articles</span></a> <a href="/ir/special_issues" class="tab_item" data-v-55208cba><span class="tab_span" data-v-55208cba>Special Issues</span></a> <a href="/ir/volumes" class="tab_item" data-v-55208cba><span class="tab_span" data-v-55208cba>Volumes</span></a> <a href="/ir/webinars" class="tab_item" data-v-55208cba><span class="tab_span" data-v-55208cba>Webinars</span></a> <a href="/ir/videos" class="tab_item" data-v-55208cba><span class="tab_span" data-v-55208cba>Videos</span></a></div></div> <button type="button" class="el-button el-button--text" data-v-55208cba><!----><!----><span><i class="icon-nav-line" data-v-55208cba></i>Menu</span></button></div></div> <!----> <!----></div> <main data-v-0baa1603><div class="article_cont" data-v-6dffe839 data-v-0baa1603><!----><!----><!----> <div id="ipad_bg" class="ipad_bg" style="display:none;" data-v-6dffe839></div> <div class="art_bread wrapper" data-v-6dffe839><div aria-label="Breadcrumb" role="navigation" class="el-breadcrumb" data-v-6dffe839><span class="el-breadcrumb__item" data-v-6dffe839><span role="link" class="el-breadcrumb__inner is-link">Home</span><span role="presentation" class="el-breadcrumb__separator"></span></span> <span class="el-breadcrumb__item" data-v-6dffe839><span role="link" class="el-breadcrumb__inner is-link">Articles</span><span role="presentation" class="el-breadcrumb__separator"></span></span> <span class="el-breadcrumb__item" data-v-6dffe839><span role="link" class="el-breadcrumb__inner">Article</span><span role="presentation" class="el-breadcrumb__separator"></span></span></div></div> <div class="fixd_top" style="display:none;" data-v-6dffe839><div class="left_art" data-v-6dffe839><!----></div> <div class="content_b" data-v-6dffe839><span class="PcComment" data-v-6dffe839>Federated reinforcement learning: techniques, applications, and open challenges</span> <a href="https://f.oaes.cc/xmlpdf/38ebf366-5fba-47ab-86ac-aa0a6cc124ef/4325_down.pdf?v=42" data-v-6dffe839><span class="down_pdf" data-v-6dffe839><span data-v-6dffe839>Download PDF</span> <i class="el-icon-download" data-v-6dffe839></i></span></a></div> <div class="right_art" data-v-6dffe839><!----></div></div> <div class="wrapper pos_res" data-v-6dffe839><button id="mathjaxRady" data-v-6dffe839></button> <div class="line_list" data-v-6dffe839></div> <div id="art_left_b" class="art_content" data-v-6dffe839><div class="el-row" style="margin-left:-10px;margin-right:-10px;" data-v-6dffe839><div class="el-col el-col-24 el-col-sm-24 el-col-md-18" style="padding-left:10px;padding-right:10px;" data-v-6dffe839><div class="art_left" data-v-6dffe839><a href="https://f.oaes.cc/xmlpdf/38ebf366-5fba-47ab-86ac-aa0a6cc124ef/4325_down.pdf?v=42" class="MoComment" data-v-6dffe839><span class="down_pdf_a" data-v-6dffe839><span data-v-6dffe839>Download PDF</span> <i class="el-icon-download" data-v-6dffe839></i></span></a> <!----> <div class="ContentJournal" data-v-6dffe839><div id="Article-content-left" class="Article-content view5287" data-v-6dffe839><div class="article_block" data-v-6dffe839><span class="font-999" data-v-6dffe839>Review</span> <span data-v-6dffe839> | </span> <span class="block-f17452" data-v-6dffe839>Open Access</span> <span data-v-6dffe839> | </span> <span class="font-999" data-v-6dffe839>11 Oct 2021</span></div> <div class="tit_box mgt30" data-v-6dffe839><h1 id="art_title" class="art_title2" data-v-6dffe839><span data-v-6dffe839>Federated reinforcement learning: techniques, applications, and open challenges</span><!----></h1> <div class="art_seltte" data-v-6dffe839><i class="iconfont icon-yuyan" data-v-6dffe839></i> <div class="el-select selete_language" data-v-6dffe839><!----><div class="el-input el-input--suffix"><!----><input type="text" readonly="readonly" autocomplete="off" placeholder="" class="el-input__inner"><!----><span class="el-input__suffix"><span class="el-input__suffix-inner"><i class="el-select__caret el-input__icon el-icon-arrow-up"></i><!----><!----><!----><!----><!----></span><!----></span><!----><!----></div><div class="el-select-dropdown el-popper" style="min-width:;display:none;"><div class="el-scrollbar" style="display:none;"><div class="el-select-dropdown__wrap el-scrollbar__wrap el-scrollbar__wrap--hidden-default"><ul class="el-scrollbar__view el-select-dropdown__list"><!----><li class="el-select-dropdown__item" data-v-6dffe839><span>English</span></li><li class="el-select-dropdown__item" data-v-6dffe839><span>中文</span></li><li class="el-select-dropdown__item" data-v-6dffe839><span>Deutsch</span></li><li class="el-select-dropdown__item" data-v-6dffe839><span>Français</span></li><li class="el-select-dropdown__item" data-v-6dffe839><span>日本語</span></li><li class="el-select-dropdown__item" data-v-6dffe839><span>Русский язык</span></li><li class="el-select-dropdown__item" data-v-6dffe839><span>한국어</span></li><li class="el-select-dropdown__item" data-v-6dffe839><span>Italiano</span></li><li class="el-select-dropdown__item" data-v-6dffe839><span>Español</span></li><li class="el-select-dropdown__item" data-v-6dffe839><span>Português</span></li></ul></div><div class="el-scrollbar__bar is-horizontal"><div class="el-scrollbar__thumb" style="width:0;transform:translateX(0%);ms-transform:translateX(0%);webkit-transform:translateX(0%);"></div></div><div class="el-scrollbar__bar is-vertical"><div class="el-scrollbar__thumb" style="height:0;transform:translateY(0%);ms-transform:translateY(0%);webkit-transform:translateY(0%);"></div></div></div><p class="el-select-dropdown__empty"> No data </p></div></div></div></div> <div class="viewd_top font13" data-v-6dffe839><span class="f1" data-v-6dffe839><span style="color:#aa0c2f;" data-v-6dffe839>Views:</span> <span id="articleViewCountLeft" data-v-6dffe839>7412</span> | </span> <span class="f1" data-v-6dffe839><span style="color:#aa0c2f;" data-v-6dffe839>Downloads:</span> <span id="pdfDownloadCountLeft" data-v-6dffe839>1789</span><span data-v-6dffe839> | </span></span> <span class="f1" data-v-6dffe839><span style="color:#aa0c2f;" data-v-6dffe839>Cited:</span> <img alt="" src="" class="Crossref" data-v-6dffe839> <a href="/articles//citation/" target="_blank" style="color:#4475e1;margin-left:1px;" data-v-6dffe839>74</a> <!----></span></div> <div id=" authorString" class="article-authors" data-v-6dffe839><span class="authors_item" data-v-6dffe839><div affNumList="" data-v-dc220f24 data-v-6dffe839><span class="pos_re" data-v-dc220f24><div role="tooltip" id="el-popover-184" aria-hidden="true" class="el-popover el-popper" style="width:300px;display:none;"><!----><h3 class="font16 no_sup" style="color:#333;margin-bottom:20px;" data-v-dc220f24>Jiaju Qi<sup>1</sup></h3> <div class="Aff_current font14 no_sup" data-v-dc220f24><div data-v-dc220f24><div class="author_cont" data-v-dc220f24><sup>1</sup><institution>School of Engineering, University of Guelph</institution>, <addr-line>Guelph, ON N1G 2W1, Canada</addr-line>.</div></div><div data-v-dc220f24><div class="author_cont" data-v-dc220f24><sup>2</sup><institution>Intelligent Computing and Communications (IC<sup>2</sup>) Lab, Beijing University of Posts and Telecommunications</institution>, <addr-line>Beijing 100876, China</addr-line>.</div></div></div> <i class="close_btn el-icon-close" data-v-dc220f24></i> <a href="https://scholar.google.com/scholar?q=Jiaju Qi" target="_blank" data-v-dc220f24><button type="button" class="el-button el-button--primary el-button--mini" data-v-dc220f24><!----><!----><span>Google Scholar</span></button></a></div><span class="el-popover__reference-wrapper"><span class="author_name" data-v-dc220f24>Jiaju Qi<sup>1</sup></span></span></span></div> <!----> <!----> <!----> <!----> <i data-v-6dffe839> , </i></span><span class="authors_item" data-v-6dffe839><div affNumList="" data-v-dc220f24 data-v-6dffe839><span class="pos_re" data-v-dc220f24><div role="tooltip" id="el-popover-8658" aria-hidden="true" class="el-popover el-popper" style="width:300px;display:none;"><!----><h3 class="font16 no_sup" style="color:#333;margin-bottom:20px;" data-v-dc220f24>Qihao Zhou<sup>2</sup></h3> <div class="Aff_current font14 no_sup" data-v-dc220f24><div data-v-dc220f24><div class="author_cont" data-v-dc220f24><sup>1</sup><institution>School of Engineering, University of Guelph</institution>, <addr-line>Guelph, ON N1G 2W1, Canada</addr-line>.</div></div><div data-v-dc220f24><div class="author_cont" data-v-dc220f24><sup>2</sup><institution>Intelligent Computing and Communications (IC<sup>2</sup>) Lab, Beijing University of Posts and Telecommunications</institution>, <addr-line>Beijing 100876, China</addr-line>.</div></div></div> <i class="close_btn el-icon-close" data-v-dc220f24></i> <a href="https://scholar.google.com/scholar?q=Qihao Zhou" target="_blank" data-v-dc220f24><button type="button" class="el-button el-button--primary el-button--mini" data-v-dc220f24><!----><!----><span>Google Scholar</span></button></a></div><span class="el-popover__reference-wrapper"><span class="author_name" data-v-dc220f24>Qihao Zhou<sup>2</sup></span></span></span></div> <a href="https://orcid.org/0000-0002-1839-1439" target="_blank" data-v-6dffe839><img alt="" src="https://g.oaes.cc/oae/nuxt/img/orcid.a3b6f80.png" class="id_img" data-v-6dffe839></a> <!----> <!----> <!----> <i data-v-6dffe839> , ... </i></span><span class="authors_item" data-v-6dffe839><!----></span><span class="authors_item" data-v-6dffe839><div affNumList="" data-v-dc220f24 data-v-6dffe839><span class="pos_re" data-v-dc220f24><div role="tooltip" id="el-popover-4173" aria-hidden="true" class="el-popover el-popper" style="width:300px;display:none;"><!----><h3 class="font16 no_sup" style="color:#333;margin-bottom:20px;" data-v-dc220f24>Kan Zheng<sup>2</sup></h3> <div class="Aff_current font14 no_sup" data-v-dc220f24><div data-v-dc220f24><div class="author_cont" data-v-dc220f24><sup>1</sup><institution>School of Engineering, University of Guelph</institution>, <addr-line>Guelph, ON N1G 2W1, Canada</addr-line>.</div></div><div data-v-dc220f24><div class="author_cont" data-v-dc220f24><sup>2</sup><institution>Intelligent Computing and Communications (IC<sup>2</sup>) Lab, Beijing University of Posts and Telecommunications</institution>, <addr-line>Beijing 100876, China</addr-line>.</div></div></div> <i class="close_btn el-icon-close" data-v-dc220f24></i> <a href="https://scholar.google.com/scholar?q=Kan Zheng" target="_blank" data-v-dc220f24><button type="button" class="el-button el-button--primary el-button--mini" data-v-dc220f24><!----><!----><span>Google Scholar</span></button></a></div><span class="el-popover__reference-wrapper"><span class="author_name" data-v-dc220f24>Kan Zheng<sup>2</sup></span></span></span></div> <!----> <!----> <!----> <!----> <!----></span> <button type="button" class="el-button el-button--primary el-button--mini" data-v-6dffe839><!----><i class="el-icon-plus"></i><span>Show Authors</span></button></div> <div class="article-header-info" data-v-6dffe839><div data-v-6dffe839> <i>Intell Robot</i> 2021;1(1):18-57.</div> <div class="mgt5" data-v-6dffe839><a href="https://doi.org/10.20517/ir.2021.02" target="_blank" data-v-6dffe839>10.20517/ir.2021.02</a> | <span class="btn_link" data-v-6dffe839>© The Author(s) 2021.</span></div></div> <div class="top_btn_box" data-v-6dffe839><div class="btn_item" data-v-6dffe839><i class="el-icon-caret-right" data-v-6dffe839></i><span data-v-6dffe839>Author Information</span></div> <div class="btn_item" data-v-6dffe839><i class="el-icon-caret-right" data-v-6dffe839></i><span data-v-6dffe839>Article Notes</span></div> <div class="btn_item" data-v-6dffe839><i class="el-icon-caret-right" data-v-6dffe839></i><span data-v-6dffe839>Cite This Article</span></div></div> <div class="author_box" style="display:none;" data-v-6dffe839><div data-v-6dffe839><div data-v-6dffe839><sup>1</sup><institution>School of Engineering, University of Guelph</institution>, <addr-line>Guelph, ON N1G 2W1, Canada</addr-line>.</div></div><div data-v-6dffe839><div data-v-6dffe839><sup>2</sup><institution>Intelligent Computing and Communications (IC<sup>2</sup>) Lab, Beijing University of Posts and Telecommunications</institution>, <addr-line>Beijing 100876, China</addr-line>.</div></div> <div class="CorrsPlus" data-v-6dffe839><div data-v-6dffe839><span id="cirrsMail" data-v-6dffe839>Correspondence Address: Dr. Lei Lei, School of Engineering, University of Guelph, 50 Stone Road East, Guelph, ON N1G 2W1, Canada. E-mail: <email>leil@uoguelph.ca</email></span></div></div></div> <div class="notes_box" style="display:none;" data-v-6dffe839><div class="articleDate mag_top10" data-v-6dffe839><span><b>Received:</b> 24 Aug 2021 | </span><span><b>First Decision:</b> 14 Sep 2021 | </span><span><b>Revised:</b> 21 Sep 2021 | </span><span><b>Accepted:</b> 22 Sep 2021 | </span><span><b>Published:</b> 12 Oct 2021</span></div> <div class="articleDate" data-v-6dffe839><span><b>Academic Editor:</b> Simon X. Yang | </span><span><b>Copy Editor:</b> Xi-Jun Chen | </span><span><b>Production Editor:</b> Xi-Jun Chen</span></div></div> <div class="article_bg" data-v-6dffe839><h2 id="art_Abstract" data-v-6dffe839>Abstract<!----></h2> <div id="seo_des" class="article_Abstract mag_btn10" data-v-6dffe839><p>This paper presents a comprehensive survey of federated reinforcement learning (FRL), an emerging and promising field in reinforcement learning (RL). Starting with a tutorial of federated learning (FL) and RL, we then focus on the introduction of FRL as a new method with great potential by leveraging the basic idea of FL to improve the performance of RL while preserving data-privacy. According to the distribution characteristics of the agents in the framework, FRL algorithms can be divided into two categories, <i><i>i.e.</i></i>, Horizontal Federated Reinforcement Learning and vertical federated reinforcement learning (VFRL). We provide the detailed definitions of each category by formulas, investigate the evolution of FRL from a technical perspective, and highlight its advantages over previous RL algorithms. In addition, the existing works on FRL are summarized by application fields, including edge computing, communication, control optimization, and attack detection. Finally, we describe and discuss several key research directions that are crucial to solving the open problems within FRL.</p></div> <!----> <h2 id="art_Keywords" data-v-6dffe839>Keywords<!----></h2> <div class="article_Abstract" data-v-6dffe839><span data-v-6dffe839><span data-v-6dffe839>Federated Learning</span><i data-v-6dffe839>, </i></span><span data-v-6dffe839><span data-v-6dffe839>Reinforcement Learning</span><i data-v-6dffe839>, </i></span><span data-v-6dffe839><span data-v-6dffe839>Federated Reinforcement Learning</span><!----></span></div></div> <div class="MoComment" data-v-6dffe839><div class="top_banner" data-v-6dffe839><div class="oae_header" data-v-6dffe839>Author's Talk</div> <div class="line" data-v-6dffe839></div> <div class="img_box" data-v-6dffe839><img src="https://i.oaes.cc/uploads/20240205/1103dab1ab644cf2818a1cab0dd2b8ff.jpg" alt="" data-itemid="4325" data-itemhref="https://v1.oaepublish.com/files/talkvideo/4325.mp4" data-itemimg="https://i.oaes.cc/uploads/20240205/1103dab1ab644cf2818a1cab0dd2b8ff.jpg" data-v-6dffe839> <i data-itemid="4325" data-itemhref="https://v1.oaepublish.com/files/talkvideo/4325.mp4" data-itemimg="https://i.oaes.cc/uploads/20240205/1103dab1ab644cf2818a1cab0dd2b8ff.jpg" class="bo_icon" data-v-6dffe839></i></div></div> <!----> <div class="article_link" data-v-6dffe839><span data-v-6dffe839><a href="https://f.oaes.cc/xmlpdf/38ebf366-5fba-47ab-86ac-aa0a6cc124ef/4325_down.pdf?v=42" data-v-6dffe839><b data-v-6dffe839><i class="icon-download icon_right4" data-v-6dffe839></i> Download PDF</b></a></span> <span data-v-6dffe839><i class="comment-l icon-commentl iconfont icon_right4" data-v-6dffe839></i> <!----><b data-v-6dffe839>0</b></span> <span data-v-6dffe839><span data-v-6dffe839><div role="tooltip" id="el-popover-4131" aria-hidden="true" class="el-popover el-popper" style="width:170px;display:none;"><!----><div class="icon_share" style="text-align:right;margin:0;" data-v-6dffe839><a href="http://pinterest.com/pin/create/button/?url=&media=&description=https://www.oaepublish.com/articles/" target="_blank" class="pinterest-sign" data-v-6dffe839><i class="iconfont icon-pinterest" data-v-6dffe839></i></a> <a href="https://www.facebook.com/sharer/sharer.php?u=https://www.oaepublish.com/articles/" target="_blank" class="facebook-sign" data-v-6dffe839><i aria-hidden="true" class="iconfont icon-facebook" data-v-6dffe839></i></a> <a href="https://twitter.com/intent/tweet?url=https://www.oaepublish.com/articles/" target="_blank" class="twitter-sign" data-v-6dffe839><i class="iconfont icon-tuite1" data-v-6dffe839></i></a> <a href="https://www.linkedin.com/shareArticle?url=https://www.oaepublish.com/articles/" target="_blank" class="linkedin-sign" data-v-6dffe839><i class="iconfont icon-linkedin" data-v-6dffe839></i></a></div> </div><span class="el-popover__reference-wrapper"><button type="button" class="el-button colorddd el-button--text el-button--mini" data-v-6dffe839><!----><!----><span><i class="icon-fenxiang iconfont icon_right4" data-v-6dffe839></i> <!----><b data-v-6dffe839>3</b></span></button></span></span></span> <span data-v-6dffe839><span class="no_zan" data-v-6dffe839><i class="icon-like-line icon_right4" data-v-6dffe839></i> <!----><i class="num_n" data-v-6dffe839><b data-v-6dffe839>13</b></i></span></span></div></div> <div id="artDivBox" class="art_cont" data-v-6dffe839><div id="sec1-1" class="article-Section"><h2 >1. Introduction</h2><p class="">As machine learning (ML) develops, it becomes capable of solving increasingly complex problems, such as image recognition, speech recognition, and semantic understanding. Despite the effectiveness of traditional machine learning algorithms in several areas, the researchers found that scenes involving many parties are still difficult to resolve, especially when privacy is concerned. Federated learning (FL), in these cases, has attracted increasing interest among ML researchers. Technically, the FL is a decentralized collaborative approach that allows multiple partners to train data respectively and build a shared model while maintaining privacy. With its innovative learning architecture and concepts, FL provides safer experience exchange services and enhances capabilities of ML in distributed scenarios.</p><p class="">In ML, reinforcement learning (RL) is one of the branches that focuses on how individuals, <i>i.e.</i>, agents, interact with their environment and maximize some portion of the cumulative reward. The process allows agents to learn to improve their behavior in a trial and error manner. Through a set of policies, they take actions to explore the environment and expect to be rewarded. Research on RL has been hot in recent years, and it has shown great potential in various applications, including games, robotics, communication, and so on.</p><p class="">However, there are still many problems in the implementation of RL in practical scenarios. For example, considering that in the case of large action space and state space, the performance of agents is vulnerable to collected samples since it is nearly impossible to explore all sampling spaces. In addition, many RL algorithms have the problem of learning efficiency caused by low sample efficiency. Therefore, through information exchange between agents, learning speed can be greatly accelerated. Although distributed RL and parallel RL algorithms<sup>[<a href="#B1" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B1">1</a>-<a href="#B3" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B3">3</a>]</sup> can be used to address the above problems, they usually need to collect all the data, parameters, or gradients from each agent in a central server for model training. However, one of the important issues is that some tasks need to prevent agent information leakage and protect agent privacy during the application of RL. Agents’ distrust of the central server and the risk of eavesdropping on the transmission of raw data has become a major bottleneck for such RL applications. FL can not only complete information exchange while avoiding privacy disclosure, but also adapt various agents to their different environments. Another problem of RL is how to bridge the simulation-reality gap. Many RL algorithms require pre-training in simulated environments as a prerequisite for application deployment, but one problem is that the simulated environments cannot accurately reflect the environments of the real world. FL can aggregate information from both environments and thus bridge the gap between them. Finally, in some cases, only partial features can be observed by each agent in RL. However, these features, no matter observations or rewards, are not enough to obtain sufficient information required to make decisions. At this time, FL makes it possible to integrate this information through aggregation.</p><p class="">Thus, the above challenges give rise to the idea of federated reinforcement learning (FRL). As FRL can be considered as an integration of FL and RL under privacy protection, several elements of RL can be presented in FL frameworks to deals with sequential decision-making tasks. For example, these three dimensions of sample, feature and label in FL can be replaced by environment, state and action respectively in FRL. Since FL can be divided into several categories according to the distribution characteristics of data, including horizontal federated learning (HFL) and vertical federated learning (VFL), we can similarly categorize FRL algorithms into horizontal federated reinforcement learning (HFRL) and vertical federated reinforcement learning (VFRL).</p><p class="">Though a few survey papers on FL <sup>[<a href="#B4" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B4">4</a>-<a href="#B6" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B6">6</a>]</sup> have been published, to the best of our knowledge, there are currently no relevant survey papers focused on FRL. Due to the fact that FRL is a relatively new technique, most researchers may be unfamiliar with it to some extent. We hope to identify achievements from current studies and serve as a stepping stone to further research. In summary, this paper sheds light on the following aspects.</p><ul class="tipsDecimal"><li><p><i>Systematic tutorial on FRL methodology.</i> As a review focusing on FRL, this paper tries to explain the knowledge about FRL to researchers systematically and in detail. The definition and categories of FRL are introduced firstly, including system model, algorithm process, <i>etc.</i> In order to explain the framework of HFRL and VFRL and the difference between them clearly, two specific cases are introduced, <i>i.e.</i>, autonomous driving and smart grid. Moreover, we comprehensively introduce the existing research on FRL’s algorithm design.</p></li><li><p><i>Comprehensive summary for FRL applications.</i> This paper collects a large number of references in the field of FRL, and provides a comprehensive and detailed investigation of the FRL applications in various areas, including edge computing, communications, control optimization, attack detection, and some other applications. For each reference, we discuss the authors’ research ideas and methods, and summarize how the researchers combine the FRL algorithm with the specific practical problems.</p></li><li><p><i>Open issues for future research.</i> This paper identifies several open issues for FRL as a guide for further research. The scope covers communication, privacy and security, join and exit mechanisms design, learning convergence and some other issues. We hope that they can broaden the thinking of interested researchers and provide help for further research on FRL.</p></li></ul><p class="">The organization of this paper is as follows. To quickly gain a comprehensive understanding of FRL, the paper starts with FL and RL in <a href="#sec1-2" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="sec1-2">Section 2</a> and <a href="#sec1-3" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="sec1-3">Section 3</a>, respectively, and extends the discussion further to FRL in <a href="#sec1-4" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="sec1-4">Section 4</a>. The existing applications of FRL are summarized in <a href="#sec1-5" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="sec1-5">Section 5</a>. In addition, a few open issues and future research directions for FRL are highlighted in <a href="#sec1-6" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="sec1-6">Section 6</a>. Finally, the conclusion is given in <a href="#sec1-7" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="sec1-7">Section 7</a>.</p></div><div id="sec1-2" class="article-Section"><h2 >2. Federated learning</h2><div id="sec2-1" class="article-Section"><h3 >2.1. Federated learning definition and basics</h3><p class="">In general, FL is a ML algorithmic framework that allows multiple parties to perform ML under the requirements of privacy protection, data security, and regulations<sup>[<a href="#B7" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B7">7</a>]</sup>. In FL architecture, model construction includes two processes: model training and model inference. It is possible to exchange information about the model between parties during training, but not the data itself, so that data privacy will not be compromised in any way. An individual party or multiple parties can possess and maintain the trained model. In the process of model aggregation, more data instances collected from various parties contribute to updating the model. As the last step, a fair value-distribution mechanism should be used to share the profits obtained by the collaborative model<sup>[<a href="#B8" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B8">8</a>]</sup>. The well-designed mechanism enables the federation sustainability. Aiming to build a joint ML model without sharing local data, FL involves technologies from different research fields such as distributed systems, information communication, ML and cryptography<sup>[<a href="#B9" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B9">9</a>]</sup>. FL has the following characteristics as a result of these techniques, <i>i.e.</i>,</p><ul class="tipsDisc"><li><p>Distribution. There are two or more parties that hope to jointly build a model to tackle similar tasks. Each party holds independent data and would like to use it for model training.</p></li><li><p>Data protection. The data held by each party does not need to be sent to the other during the training of the model. The learned profits or experiences are conveyed through model parameters that do not involve privacy.</p></li><li><p>Secure communication. The model is able to be transmitted between parties with the support of an encryption scheme. The original data cannot be inferred even if it is eavesdropped during transmission.</p></li><li><p>Generality. It is possible to apply FL to different data structures and institutions without regard to domains or algorithms.</p></li><li><p>Guaranteed performance. The performance of the resulting model is very close to that of the ideal model established with all data transferred to one centralized party.</p></li><li><p>Status equality. To ensure the fairness of cooperation, all participating parties are on an equal footing. The shared model can be used by each party to improve its local models when needed.</p></li></ul><p class="">A formal definition of FL is presented as follows. Consider that there are <i>N</i> parties <inline-formula><tex-math id="M1">$$ \left\{\mathcal{F}_i\right\}_{i=1}^N $$</tex-math></inline-formula> interested in establishing and training a cooperative ML model. Each party has their respective datasets <inline-formula><tex-math id="M1">$$ \mathcal{D}_i $$</tex-math></inline-formula> . Traditional ML approaches consist of collecting all data <inline-formula><tex-math id="M1">$$ \left\{\mathcal{D}_i\right\}_{i=1}^N $$</tex-math></inline-formula> together to form a centralized dataset <inline-formula><tex-math id="M1">$$ \mathbb{R} $$</tex-math></inline-formula> at one data server. The expected model <inline-formula><tex-math id="M1">$$ \mathcal{M}_{S U M} $$</tex-math></inline-formula> is trained by using the dataset <inline-formula><tex-math id="M1">$$ \mathbb{R} $$</tex-math></inline-formula>. On the other hand, FL is a reform of ML process in which the participants <inline-formula><tex-math id="M1">$$ \mathcal{F}_i $$</tex-math></inline-formula> with data <inline-formula><tex-math id="M1">$$ \mathcal{D}_i $$</tex-math></inline-formula> jointly train a target model <inline-formula><tex-math id="M1">$$ \mathcal{M}_{F E D} $$</tex-math></inline-formula> without aggregating their data. Respective data <inline-formula><tex-math id="M1">$$ \mathcal{D}_i $$</tex-math></inline-formula> is stored on the owner <inline-formula><tex-math id="M1">$$ \mathcal{F}_i $$</tex-math></inline-formula> and not exposed to others. In addition, the performance measure of the federated model <inline-formula><tex-math id="M1">$$ \mathcal{M}_{F E D} $$</tex-math></inline-formula> is denoted as <inline-formula><tex-math id="M1">$$ \mathcal{V}_{F E D} $$</tex-math></inline-formula>, including accuracy, recall, and F1-score, <i>etc</i>, which should be a good approximation of the performance of the expected model <inline-formula><tex-math id="M1">$$ \mathcal{M}_{S U M} $$</tex-math></inline-formula>, <i>i.e.</i>, <inline-formula><tex-math id="M1">$$ \mathcal{V}_{S U M} $$</tex-math></inline-formula>. In order to quantify differences in performance, let <i>δ</i> be a non-negative real number and define the federated learning model <inline-formula><tex-math id="M1">$$ \mathcal{M}_{F E D} $$</tex-math></inline-formula> has <i>δ</i> performance loss if</p><p class=""><div class="disp-formula"><label></label><tex-math id="E1"> $$ \left |\mathcal{V}_{SUM} - \mathcal{V}_{FED}\right |< \delta. $$ </tex-math></div></p><p class="">Specifically, the FL model hold by each party is basically the same as the ML model, and it also includes a set of parameters <i>w<sub>i</sub></i> which is learned based on the respective training dataset <inline-formula><tex-math id="M1">$$ \mathcal{D}_i $$</tex-math></inline-formula><sup>[<a href="#B10" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B10">10</a>]</sup>. A training sample <i>j</i> typically contains both the input of FL model and the expected output. For example, in the case of image recognition, the input is the pixel of the image, and the expected output is the correct label. The learning process is facilitated by defining a loss function on parameter vector <i>w</i> for every data sample <i>j.</i> The loss function represents the error of the model in relation to the training data. For each dataset <inline-formula><tex-math id="M1">$$ \mathcal{D}_i $$</tex-math></inline-formula> at party <inline-formula><tex-math id="M1">$$ \mathcal{F}_i $$</tex-math></inline-formula>, the loss function on the collection of training samples can be defined as follow<sup>[<a href="#B11" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B11">11</a>]</sup>,</p><p class=""><div class="disp-formula"><label></label><tex-math id="E1"> $$ F_{i}(w)=\frac{1}{\left|\mathcal{D}_{i}\right|} \sum_{j \in \mathcal{D}_{i}} f_{j}(w), $$ </tex-math></div></p><p class="">where <i>f<sub>j</sub></i> (<i>w</i>) denotes the loss function of the sample <i>j</i> with the given model parameter vector <i>w</i> and |·| represents the size of the set. In FL, it is important to define the global loss function since multiple parties are jointly training a global statistical model without sharing a dataset. The common global loss function on all the distributed datasets is given by,</p><p class=""><div class="disp-formula"><label></label><tex-math id="E1"> $$ F_g(w)=\sum_{i=1}^{N} \eta _{i}F_{i}(w), $$ </tex-math></div></p><p class="">where <i>η<sub>i</sub></i> indicates the relative impact of each party on the global model. In addition, <i>η<sub>i</sub> ></i> 0 and <inline-formula><tex-math id="M1">$$ \sum_{i=1}^{N} \eta_{i}=1 $$</tex-math></inline-formula>. This term <i>η</i> can be flexibly defined to improve training efficiency. The natural setting is averaging between parties, <i><i>i.e.</i>, η =</i> 1<i>/N.</i> The goal of the learning problem is to find the optimal parameter that minimizes the global loss function <i>F<sub>g</sub></i> (<i>w</i>). It is presented in formula form,</p><p class=""><div class="disp-formula"><label></label><tex-math id="E1"> $$ w^{*}=\underset{w}{\arg \min } F_{g}(w). $$ </tex-math></div></p><p class="">Considering that FL is designed to adapt to various scenarios, the objective function may be appropriate depending on the application. However, a closed-form solution is almost impossible to find with most FL models due to their inherent complexity. A canonical federated averaging algorithm (FedAvg) based on gradient-descent techniques is presented in the study from McMahan <i>et al.</i><sup>[<a href="#B12" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B12">12</a>]</sup>, which is widely used in FL systems. In general, the coordinator has the initial FL model and is responsible for aggregation. Distributed participants know the optimizer settings and can upload information that does not affect privacy. The specific architecture of FL will be discussed in the next subsection. Each participant uses their local data to perform one step (or multiple steps) of gradient descent on the current model parameter <i>w̄</i>(<i>t</i>) according to the following formula,</p><p class=""><div class="disp-formula"><label></label><tex-math id="E1"> $$ \forall_i,w_i(t+1)=\bar{w} (t)-\gamma \nabla F_i(\bar{w}(t)), $$ </tex-math></div></p><p class="">where <i>γ</i> denotes a fixed learning rate of each gradient descent. After receiving the local parameters from participants, the central coordinator updates the global model using a weighted average, <i>i.e.</i>,</p><p class=""><div class="disp-formula"><label></label><tex-math id="E1"> $$ \bar{w}_g(t+1)=\sum_{i=1}^{N} \frac{n_i}{n} w_i(t+1), $$ </tex-math></div></p><p class="">where <i>n<sub>i</sub></i> indicates the number of training data samples of the <i>i</i>-th participant has and <i>n</i> denotes the total number of samples contained in all the datasets. Finally, the coordinator sends the aggregated model weights <i>w̄<sub>g</sub></i> (<i>t</i> + 1) back to the participants. The aggregation process is performed at a predetermined interval or iteration round. Additionally, FL leverages privacy-preserving techniques to prevent the leakage of gradients or model weights. For example, the existing encryption algorithms are added on top of the original FedAvg to provide secure FL<sup>[<a href="#B13" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B13">13</a>,<a href="#B14" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B14">14</a>]</sup>.</p></div><div id="sec2-2" class="article-Section"><h3 >2.2. Architecture of federated learning</h3><p class="">According to the application characteristics, the architecture of FL can be divided into two types<sup>[<a href="#B7" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B7">7</a>]</sup>, <i>i.e.</i>, client-server model and peer-to-peer model.</p><p class="">As shown in <a href="#fig1" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="fig1">Figure 1</a>, there are two major components in the client-server model, <i>i.e.</i>, participants and coordinators. The participants are the data owners and can perform local model training and updates. In different scenarios, the participants are made up of different devices, the vehicles in the internet of vehicles (IoV), or the smart devices in the IoT. In addition, participants usually possess at least two characteristics. Firstly, each participant has a certain level of hardware performance, including computation power, communication and storage. The hardware capabilities ensure that the FL algorithm operates normally. Secondly, participants are independent of one another and located in a wide geographic area. In the client-server model, coordinator can be considered as a central aggregation server, which can initialize a model and aggregate model updates from participants<sup>[<a href="#B12" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B12">12</a>]</sup>. As participants train both based on local data sets concurrently and share their experience through the coordinator with the model aggregation mechanism, it will greatly enhance the efficiency of the training and enhance the performance of the model. However, since participants won’t be able to communicate directly, the coordinator must perform well to train the global model and maintain communication with all participants. Therefore, the model has security challenges such as a single point of failure. If the coordinator fails to complete the model aggregation task, the local model of participant has difficulty meeting target performance. The basic workflow of the client-server model can be summarized in the following five steps. The process continues to repeat the steps from 2 to 5 until the model converges, or until the maximum number of iterations is reached.</p><div class="Figure-block" id="fig1"><div xmlns="http://www.w3.org/1999/xhtml" class="article-figure-image"><a href="/articles/ir.2021.02/image/fig1" class="Article-img" alt="" target="_blank"><img alt="Federated reinforcement learning: techniques, applications, and open challenges" src="https://image.oaes.cc/38ebf366-5fba-47ab-86ac-aa0a6cc124ef/4325.fig.1.jpg" class="" title="" alt="" /></a></div><div class="article-figure-note"><p class="figure-note"></p><p class="figure-note">Figure 1. An example of federated learning architecture: Client-Server Model.</p></div></div><ul class="tipsDisc"><li><p>Step 1: In the process of setting up a client-server-based learning system, the coordinator creates an initial model and sends it to each participant. Those participants who join later can access the latest global model.</p></li><li><p>Step 2: Each participant trains a local model based on their respective dataset.</p></li><li><p>Step 3: Updates of model parameters are sent to the central coordinator.</p></li><li><p>Step 4: The coordinator combines the model updates using specific aggregation algorithms.</p></li><li><p>Step 5: The combined model is sent back to the corresponding participant.</p></li></ul><p class="">The peer-to-peer based FL architecture does not require a coordinator as illustrated in <a href="#fig2" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="fig2">Figure 2</a>. Participants can directly communicate with each other without relying on a third party. Therefore, each participant in the architecture is equal and can initiate a model exchange request with anyone else. As there is no central server, participants must agree in advance on the order in which model should be sent and received. Common transfer modes are cyclic transfer and random transfer. The peer-to-peer model is suitable and important for specific scenarios. For example, multiple banks jointly develop an ML-based attack detection model. With FL, there is no need to establish a central authority between banks to manage and store all attack patterns. The attack record is only held at the server of the attacked bank, but the detection experience can be shared with other participants through model parameters. The FL procedure of the peer-to-peer model is simpler than that of the client-server model.</p><div class="Figure-block" id="fig2"><div xmlns="http://www.w3.org/1999/xhtml" class="article-figure-image"><a href="/articles/ir.2021.02/image/fig2" class="Article-img" alt="" target="_blank"><img alt="Federated reinforcement learning: techniques, applications, and open challenges" src="https://image.oaes.cc/38ebf366-5fba-47ab-86ac-aa0a6cc124ef/4325.fig.2.jpg" class="" title="" alt="" /></a></div><div class="article-figure-note"><p class="figure-note"></p><p class="figure-note">Figure 2. An example of federated learning architecture: Peer-to-Peer Model.</p></div></div><ul class="tipsDisc"><li><p>Step 1: Each participant initializes their local model depending on its needs.</p></li><li><p>Step 2: Train the local model based on the respective dataset.</p></li><li><p>Step 3: Create a model exchange request to other participants and send local model parameters.</p></li><li><p>Step 4: Aggregate the model received from other participants into the local model.</p></li></ul><p class="">The termination conditions of the process can be designed by participants according to their needs. This architecture further guarantees security since there is no centralized server. However, it requires more communication resources and potentially increased computation for more messages.</p></div><div id="sec2-3" class="article-Section"><h3 >2.3. Categories of federated learning</h3><p class="">Based on the way data is partitioned within a feature and sample space, FL may be classified as HFL, VFL, or federated transfer learning (FTL)<sup>[<a href="#B8" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B8">8</a>]</sup>. In <a href="#fig3" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="fig3">Figure 3</a>, <a href="#fig4" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="fig4">Figure 4</a>, and <a href="#fig5" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="fig5">Figure 5</a>, these three federated learning categories for a two-party scenario are illustrated. In order to define each category more clearly, some parameters are formalized. We suppose that the <i>i</i>-th participant has its own dataset <inline-formula><tex-math id="M1">$$ \mathcal{D}_i $$</tex-math></inline-formula>. The dataset includes three types of data, <i>i.e.</i>, the feature space <inline-formula><tex-math id="M1">$$ \mathcal{X}_i $$</tex-math></inline-formula>, the label space <inline-formula><tex-math id="M1">$$ \mathcal{Y}_i $$</tex-math></inline-formula> and the sample ID space <inline-formula><tex-math id="M1">$$ \mathcal{I}_i $$</tex-math></inline-formula>. In particular, the feature space <inline-formula><tex-math id="M1">$$ \mathcal{X}_i $$</tex-math></inline-formula> is a high-dimensional abstraction of the variables within each pattern sample. Various features are used to characterize data held by the participant. All categories of association between input and task target are collected in the label space <inline-formula><tex-math id="M1">$$ \mathcal{Y}_i $$</tex-math></inline-formula>. The sample ID space <inline-formula><tex-math id="M1">$$ \mathcal{I}_i $$</tex-math></inline-formula> is added in consideration of actual application requirements. The identification can facilitate the discovery of possible connections among different features of the same individual.</p><div class="Figure-block" id="fig3"><div xmlns="http://www.w3.org/1999/xhtml" class="article-figure-image"><a href="/articles/ir.2021.02/image/fig3" class="Article-img" alt="" target="_blank"><img alt="Federated reinforcement learning: techniques, applications, and open challenges" src="https://image.oaes.cc/38ebf366-5fba-47ab-86ac-aa0a6cc124ef/4325.fig.3.jpg" class="" title="" alt="" /></a></div><div class="article-figure-note"><p class="figure-note"></p><p class="figure-note">Figure 3. Illustration of horizontal federated learning.</p></div></div><div class="Figure-block" id="fig4"><div xmlns="http://www.w3.org/1999/xhtml" class="article-figure-image"><a href="/articles/ir.2021.02/image/fig4" class="Article-img" alt="" target="_blank"><img alt="Federated reinforcement learning: techniques, applications, and open challenges" src="https://image.oaes.cc/38ebf366-5fba-47ab-86ac-aa0a6cc124ef/4325.fig.4.jpg" class="" title="" alt="" /></a></div><div class="article-figure-note"><p class="figure-note"></p><p class="figure-note">Figure 4. Illustration of vertical federated learning.</p></div></div><div class="Figure-block" id="fig5"><div xmlns="http://www.w3.org/1999/xhtml" class="article-figure-image"><a href="/articles/ir.2021.02/image/fig5" class="Article-img" alt="" target="_blank"><img alt="Federated reinforcement learning: techniques, applications, and open challenges" src="https://image.oaes.cc/38ebf366-5fba-47ab-86ac-aa0a6cc124ef/4325.fig.5.jpg" class="" title="" alt="" /></a></div><div class="article-figure-note"><p class="figure-note"></p><p class="figure-note">Figure 5. Illustration of federated transfer learning.</p></div></div><p class="">HFL indicates the case in which participants have their dataset with a small sample overlap, while most of the data features are aligned. The word ”horizontal” is derived from the term ”horizontal partition”. This is similar to the situation where data is horizontally partitioned inside the traditional tabular view of a database. As shown in <a href="#fig3" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="fig3">Figure 3</a>, the training data of two participants with the aligned features is horizontally partitioned for HFL. A cuboid with a red border represents the training data required in learning. Especially, a row corresponds to complete data features collected from a sampling ID. Columns correspond to different sampling IDs. The overlapping part means there can be more than one participant sampling the same ID. In addition, HFL is also known as feature-aligned FL, sample-partitioned FL, or example-partitioned FL. Formally, the conditions for HFL can be summarized as</p><p class=""><div class="disp-formula"><label></label><tex-math id="E1"> $$ \mathcal{X}_i=\mathcal{X}_j, \mathcal{Y}_i=\mathcal{Y}_j, \mathcal{I}_i\not=\mathcal{I}_j, \forall\mathcal{D}_i,\mathcal{D}_j,i\not=j, $$ </tex-math></div></p><p class="">where <inline-formula><tex-math id="M1">$$ \mathcal{D}_i $$</tex-math></inline-formula> and <inline-formula><tex-math id="M1">$$ \mathcal{Y}_j $$</tex-math></inline-formula> denote the datasets of participant <i>i</i> and participant <i>j</i> respectively. In both datasets, the feature space <inline-formula><tex-math id="M1">$$ \mathcal{X} $$</tex-math></inline-formula> and label space <inline-formula><tex-math id="M1">$$ \mathcal{Y} $$</tex-math></inline-formula> are assumed to be the same, but the sampling ID space <inline-formula><tex-math id="M1">$$ \mathcal{I} $$</tex-math></inline-formula> is assumed to be different. The objective of HFL is to increase the amount of data with similar features, while keeping the original data from being transmitted, thus improving the performance of the training model. Participants can still perform feature extraction and classification if new samples appear. HFL can be applied in various fields because it benefits from privacy protection and experience sharing<sup>[<a href="#B15" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B15">15</a>]</sup>. For example, regional hospitals may receive different patients, and the clinical manifestations of patients with the same disease are similar. It is imperative to protect the patient’s privacy, so data about patients cannot be shared. HFL provides a good way to jointly build a ML model for identifying diseases between hospitals.</p><p class="">VFL refers to the case where different participants with various targets usually have datasets that have different feature spaces, but those participants may serve a large number of common users. The heterogeneous feature spaces of distributed datasets can be used to build more general and accurate models without releasing the private data. The word “vertical” derives from the term “vertical partition”, which is also widely used in reference to the traditional tabular view. Different from HFL, the training data of each participant are divided vertically. <a href="#fig4" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="fig4">Figure 4</a> shows an example of VFL in a two-party scenario. The important step in VFL is to align samples, <i>i.e.</i>, determine which samples are common to the participants. Although the features of the data are different, the sampled identity can be verified with the same ID. Therefore, VFL is also called sample-aligned FL or feature-partitioned FL. Multiple features are vertically divided into one or more columns. The common samples exposed to different participants can be marked by different labels. The formal definition of VFL’s applicable scenario is given.</p><p class=""><div class="disp-formula"><label></label><tex-math id="E1"> $$ \mathcal{X}_i\not=\mathcal{X}_j, \mathcal{Y}_i\not=\mathcal{Y}_j, \mathcal{I}_i=\mathcal{I}_j, \forall\mathcal{D}_i,\mathcal{D}_j,i\not=j, $$ </tex-math></div></p><p class="">where <inline-formula><tex-math id="M1">$$ \mathcal{D}_i $$</tex-math></inline-formula> and <inline-formula><tex-math id="M1">$$ \mathcal{D}_j $$</tex-math></inline-formula> represent the dataset held by different participants, and the data feature space pair <inline-formula><tex-math id="M1">$$ (\mathcal{X}_i,\mathcal{X}_j) $$</tex-math></inline-formula> and label space pair <inline-formula><tex-math id="M1">$$ (\mathcal{Y}_i,\mathcal{Y}_j) $$</tex-math></inline-formula> are assumed to be different. The sample ID space <inline-formula><tex-math id="M1">$$ \mathcal{I}_i $$</tex-math></inline-formula> and <inline-formula><tex-math id="M1">$$ \mathcal{I}_j $$</tex-math></inline-formula> are assumed to be the same. It is the objective of VFL to collaborate in building a shared ML model by exploiting all features collected by each participant. The fusion and analysis of existing features can even infer new features. An example of the application of VFL is the evaluation of trust. Banks and e-commerce companies can create a ML model for trust evaluation for users. The credit card record held at the bank and the purchasing history held at the e-commerce company for the set of same users can be used as training data to improve the evaluation model.</p><p class="">FTL applies to a more general case where the datasets of participants are not aligned with each other in terms of samples or features. FTL involves finding the invariant between a resource-rich source domain and a resource-scarce target domain, and exploiting that invariant to transfer knowledge. In comparison with traditional transfer learning<sup>[<a href="#B6" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B6">6</a>]</sup>, FTL focuses on privacy-preserving issues and addresses distributed challenges. An example of FTL is shown in <a href="#fig5" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="fig5">Figure 5</a>. The training data required by FTL may include all data owned by multiply parties for comprehensive information extraction. In order to predict labels for unlabeled new samples, a prediction model is built using additional feature representations for mixed samples from participants A and B. More formally, FTL is applicable for the following scenarios:</p><p class=""><div class="disp-formula"><label></label><tex-math id="E1"> $$ \mathcal{X}_i\not=\mathcal{X}_j, \mathcal{Y}_i\not=\mathcal{Y}_j, \mathcal{I}_i\not=\mathcal{I}_j, \forall\mathcal{D}_i,\mathcal{D}_j,i\not=j, $$ </tex-math></div></p><p class="">In datasets <inline-formula><tex-math id="M1">$$ \mathcal{D}_i $$</tex-math></inline-formula> and <inline-formula><tex-math id="M1">$$ \mathcal{D}_j $$</tex-math></inline-formula>, there is no duplication or similarity in terms of features, labels and samples. The objective of FTL is to generate as accurate a label prediction as possible for newly incoming samples or unlabeled samples already present. Another benefit of FTL is that it is capable of overcoming the absence of data or labels. For example, a bank and an e-commerce company in two different countries want to build a shared ML model for user risk assessment. In light of geographical restrictions, the user groups of these two organizations have limited overlap. Due to the fact that businesses are different, only a small number of data features are the same. It is important in this case to introduce FTL to solve the problem of small unilateral data and fewer sample labels, and improve the model performance.</p></div></div><div id="sec1-3" class="article-Section"><h2 >3. Reinforcement learning</h2><div id="sec2-4" class="article-Section"><h3 >3.1. Reinforcement learning definition and basics</h3><p class="">Generally, the field of ML includes supervised learning, unsupervised learning, RL, <i>etc</i><sup>[<a href="#B17" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B17">17</a>]</sup>. While supervised and unsupervised learning attempt to make the agent copy the data set, <i>i.e.</i>, learning from the pre-provided samples, RL is to make the agent gradually stronger in the interaction with the environment, <i>i.e.</i>, generating samples to learn by itself<sup>[<a href="#B18" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B18">18</a>]</sup>. RL is a very hot research direction in the field of ML in recent years, which has made great progress in many applications, such as IoT<sup>[<a href="#B19" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B19">19</a>-<a href="#B22" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B22">22</a>]</sup>, autonomous driving <sup>[<a href="#B23" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B23">23</a>,<a href="#B24" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B24">24</a>]</sup>, and game design<sup>[<a href="#B25" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B25">25</a>]</sup>. For example, the AlphaGo program developed by DeepMind is a good example to reflect the thinking of RL<sup>[<a href="#B26" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B26">26</a>]</sup>. The agent gradually accumulates the intelligent judgment on the sub-environment of each move by playing game by game with different opponents, so as to continuously improve its level.</p><p class="">The RL problem can be defined as a model of the agent-environment interaction, which is represented in <a href="#fig6" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="fig6">Figure 6</a>. The basic model of RL contains several important concepts, <i>i.e.</i>,</p><div class="Figure-block" id="fig6"><div xmlns="http://www.w3.org/1999/xhtml" class="article-figure-image"><a href="/articles/ir.2021.02/image/fig6" class="Article-img" alt="" target="_blank"><img alt="Federated reinforcement learning: techniques, applications, and open challenges" src="https://image.oaes.cc/38ebf366-5fba-47ab-86ac-aa0a6cc124ef/4325.fig.6.jpg" class="" title="" alt="" /></a></div><div class="article-figure-note"><p class="figure-note"></p><p class="figure-note">Figure 6. The agent-environment interaction of the basic reinforcement learning model.</p></div></div><ul class="tipsDisc"><li><p>Environment and agent: Agents are a part of a RL model that exists in an external environment, such as the player in the environment of chess. Agents can improve their behavior by interacting with the environment. Specifically, they take a series of actions to the environment through a set of policies and expect to get a high payoff or achieve a certain goal.</p></li><li><p>Time step: The whole process of RL can be discretized into different time steps. At every time step, the environment and the agent interact accordingly.</p></li><li><p>State: The state reflects agents’ observations of the environment. When agents take action, the state will also change. In other words, the environment will move to the next state.</p></li><li><p>Actions: Agents can assess the environment, make decisions and finally take certain actions. These actions are imposed on the environment.</p></li><li><p>Reward: After receiving the action of the agent, the environment will give the agent the state of the current environment and the reward due to the previous action. Reward represents an assessment of the action taken by agents.</p></li></ul><p class="">More formally, we assume that there are a series of time steps <i>t =</i> 0,1,2,<i>…</i> in a basic RL model. At a certain time step <i>t</i>, the agent will receive a state signal <i>S<sub>t</sub></i> of the environment. In each step, the agent will select one of the actions allowed by the state to take an action <i>A<sub>t</sub>.</i> After the environment receives the action signal <i>A<sub>t</sub></i>, the environment will feed back to the agent the corresponding status signal <i>S<sub>t+</sub></i><sub>1</sub> at the next step <i>t</i> + 1 and the immediate reward <i>R<sub>t+</sub></i><sub>1</sub>. The set of all possible states, <i>i.e.</i>, the state space, is denoted as <inline-formula><tex-math id="M1">$$ \mathcal{S} $$</tex-math></inline-formula>. Similarly, the action space is denoted as <inline-formula><tex-math id="M1">$$ \mathcal{A} $$</tex-math></inline-formula>. Since our goal is to maximize the total reward, we can quantify this total reward, usually referred to as return with</p><p class=""><div class="disp-formula"><label></label><tex-math id="E1"> $$ G_t=R_{t+1}+R_{t+2}+\dots +R_T, $$ </tex-math></div></p><p class="">where <i>T</i> is the last step, <i>i.e.</i>, <i>S<sub>T</sub></i> as the termination state. An episode is completed when the agent completes the termination action.</p><p class="">In addition to this type of episodic task, there is another type of task that does not have a termination state, in other words, it can in principle run forever. This type of task is called a continuing task. For continuous tasks, since there is no termination state, the above definition of return may be divergent. Thus, another way to calculate return is introduced, which is called discounted return, <i>i.e.</i>,</p><p class=""><div class="disp-formula"><label></label><tex-math id="E1"> $$ G_t = R_{t+1}+\gamma R_{t+2}+\gamma ^{2} R_{t+3}+\dots=\sum_{k=0}^{\infty}\gamma ^{k}R_{t+k+1}, $$ </tex-math></div></p><p class="">where the discount factor <i>γ</i> satisfies 0 ≤ <i>γ ≤</i> 1. When <i>γ</i> = 1, the agent can obtain the full value of all future steps, while when <i>γ</i> = 0, the agent can only see the current reward. As <i>γ</i> changes from 0 to 1, the agent will gradually become forward-looking, looking not only at current interests, but also for its own future.</p><p class="">The value function is the agent’s prediction of future rewards, which is used to evaluate the quality of the state and select actions. The difference between the value function and rewards is that the latter is defined as evaluating an immediate sense for interaction while the former is defined as the average return of actions over a long period of time. In other words, the value function of the current state <i>S<sub>t</sub> = s</i> is its long-term expected return. There are two significant value functions in the field of RL, <i>i.e.</i>, state value function <i>V<sub>π</sub></i> (<i>s</i>) and action value function <i>Q<sub>π</sub></i> (<i>s, a</i>). The function <i>V<sub>π</sub></i> (<i>s</i>) represents the expected return obtained if the agent continues to follow strategy <i>π</i> all the time after reaching a certain state <i>S<sub>t</sub></i>, while the function <i>Q<sub>π</sub></i> (<i>s, a</i>) represents the expected return obtained if action <i>A<sub>t</sub> = a</i> is taken after reaching the current state <i>S<sub>t</sub> = s</i> and the following actions are taken according to the strategy <i>π</i>. The two functions are specifically defined as follows, <i>i.e.</i>,</p><p class=""><div class="disp-formula"><label></label><tex-math id="E1"> $$ V_\pi (s)=\mathbb{E}_\pi[G_t|S_t=s],\forall_s\in \mathcal{S} $$ </tex-math></div></p><p class=""><div class="disp-formula"><label></label><tex-math id="E1"> $$ Q_\pi(s,a)=\mathbb{E}_\pi[G_t|S_t=s,A_t=a],\forall_s\in \mathcal{S},a\in\mathcal{A} $$ </tex-math></div></p><p class="">The results of RL are action decisions, called as the policy. The policy gives agents the action <i>a</i> that should be taken for each state <i>s.</i> It is noted as <i>=</i><i>π</i> (<i>A<sub>t</sub> = a|S<sub>t</sub> = s</i>), which represents the probability of taking action <i>A<sub>t</sub> = a</i> in state <i>S<sub>t</sub> = s.</i> The goal of RL is to learn the optimal policy that can maximize the value function by interacting with the environment. Our purpose is not to get the maximum reward after a single action in the short term, but to get more reward in the long term. Therefore, the policy can be figured out as,</p><p class=""><div class="disp-formula"><label></label><tex-math id="E1"> $$ \pi^*=\underset{\pi}{arg \max }V_\pi(s),\forall_s\in\mathcal{S}. $$ </tex-math></div></p></div><div id="sec2-5" class="article-Section"><h3 >3.2. Categories of reinforcement learning</h3><p class="">In RL, there are several categories of algorithms. One is value-based and the other is policy-based. In addition, there is also an actor-critic algorithm that can be obtained by combining the two, as shown in <a href="#fig7" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="fig7">Figure 7</a>.</p><div class="Figure-block" id="fig7"><div xmlns="http://www.w3.org/1999/xhtml" class="article-figure-image"><a href="/articles/ir.2021.02/image/fig7" class="Article-img" alt="" target="_blank"><img alt="Federated reinforcement learning: techniques, applications, and open challenges" src="https://image.oaes.cc/38ebf366-5fba-47ab-86ac-aa0a6cc124ef/4325.fig.7.jpg" class="" title="" alt="" /></a></div><div class="article-figure-note"><p class="figure-note"></p><p class="figure-note">Figure 7. The categories and representative algorithms of reinforcement learning.</p></div></div><div id="sec3-1" class="article-Section"><h4 >3.2.1. Value-based methods</h4><p class="">Recursively expand the formulas of the action value function, the corresponding Bellman equation is obtained, which describes the recursive relationship between the value function of the current state and subsequent state. The recursive expansion formula of the action value function <i>Q<sub>π</sub></i> (<i>s, a</i>) is</p><p class=""><div class="disp-formula"><label></label><tex-math id="E1"> $$ Q_{\pi}(s, a)=\sum_{s^{\prime}, r} p\left(s^{\prime}, r \mid s, a\right)\left[r+\gamma \sum_{a^{\prime}} \pi\left(a^{\prime} \mid s^{\prime}\right) Q_{\pi}\left(s^{\prime}, a^{\prime}\right)\right], $$ </tex-math></div></p><p class="">where the function <i>p</i> (<i>s</i>ʹ<i>,r|s,a</i>) <i>= Pr</i> {<i>S<sub>t</sub> = s</i>ʹ, <i>R<sub>t</sub> = r|S<sub>t-</sub></i><sub>1</sub><i>= s, A<sub>t-</sub></i><sub>1</sub><i>= a</i>} defines the trajectory probability to characterize the environment’s dynamics. <i>R<sub>t</sub> = r</i> indicates the reward obtained by the agent taking action <i>A<sub>t</sub></i><sub>-1</sub> = a in state <i>S<sub>t-</sub></i><sub>1</sub> = <i>s.</i> Besides, <i>S<sub>t</sub> = s</i>ʹ and <i>A<sub>t</sub> = a</i>ʹ respectively represent the state and the action taken by the agent at the next moment <i>t.</i></p><p class="">In the value-based algorithms, the above value function <i>Qπ</i> (<i>s, a</i>) is calculated iteratively, and the strategy is then improved based on this value function. If the value of every action in a given state is known, the agent can select an action to perform. In this way, if the optimal <i>Qπ</i> (<i>s, a = a*</i>) can be figured out, the best action <i>a</i>* will be found. There are many classical value-based algorithms, including Q-learning<sup>[<a href="#B27" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B27">27</a>]</sup>, state–action–reward-state–action (SARSA)<sup>[<a href="#B28" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B28">28</a>]</sup>, <i>etc.</i></p><p class="">Q-learning is a typical widely-used value-based RL algorithm. It is also a model-free algorithm, which means that it does not need to know the model of the environment but directly estimates the Q value of each executed action in each encountered state through interacting with the environment<sup>[<a href="#B27" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B27">27</a>]</sup>. Then, the optimal strategy is formulated by selecting the action with the highest Q value in each state. This strategy maximizes the expected return for all subsequent actions from the current state. The most important part of Q-learning is the update of Q value. It uses a table, <i>i.e.</i>, Q-table, to store all Q value functions. Q-table uses state as row and action as column. Each (<i>s, a</i>) pair corresponds to a Q value, <i>i.e.</i><i>Q</i>(<i>s, a</i>), in the Q-table, which is updated as follows,</p><p class=""><div class="disp-formula"><label></label><tex-math id="E1"> $$ Q(s,a)\longleftarrow Q(s,a)+\alpha [r+\gamma\underset{a'}{\max} Q(s',a')-Q(s,a)] $$ </tex-math></div></p><p class="">where <i>r</i> is the reward given by taking action <i>a</i> under state <i>s</i> at the current time step. <i>s</i>ʹ and <i>a</i>ʹ indicate the state and the action taken by the agent at the next time step respectively. <i>α</i> is the learning rate to determine how much error needs to be learned, and <i>γ</i> is the attenuation of future reward. If the agent continuously accesses all state-action pairs, the Q-learning algorithm will converge to the optimal Q function. Q-learning is suitable for simple problems, <i>i.e.</i>, small state space, or a small number of actions. It has high data utilization and stable convergence.</p></div><div id="sec3-2" class="article-Section"><h4 >3.2.2. Policy-based methods</h4><p class="">The above value-based method is an indirect approach to policy selection, and has trouble handling an infinite number of actions. Therefore, we want to be able to model the policy directly. Different from the value-based method, the policy-based algorithm does not need to estimate the value function, but directly fits the policy function, updates the policy parameters through training, and directly generates the best policy. In policy-based methods, we input a state and output the corresponding action directly, rather than the value <i>V</i> (<i>s</i>) or Q value <i>Q</i> (<i>s, a</i>) of the state. One of the most representative algorithms is strategy gradient, which is also the most basic policy-based algorithm.</p><p class="">Policy gradient chooses to optimize the policy directly and update the parameters of the policy network by calculating the gradient of expected reward<sup>[<a href="#B29" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B29">29</a>]</sup>. Therefore, its objective function <i>J</i> (<i>θ</i>) is directly designed as expected cumulative rewards, <i>i.e.</i>,</p><p class=""><div class="disp-formula"><label></label><tex-math id="E1"> $$ J(\theta )=\mathbb{E}_{\tau \_\theta(\tau)}[r(\tau)]=\int_{\tau\;\pi(\tau)}^{}r(\tau)\pi_\theta(\tau)d\tau. $$ </tex-math></div></p><p class="">By taking the derivative of <i>J</i> (<i>θ</i>), we get</p><p class=""><div class="disp-formula"><label></label><tex-math id="E1"> $$ \nabla _\theta J(\theta)=\mathbb{E}_{\tau\;\pi\theta(\tau)}[\sum_{t=1}^{T}\nabla _\theta\log_{}{\pi_\theta}(A_t|S_t)\sum_{t=1}^{T}r(S_t,A_t)]. $$ </tex-math></div></p><p class="">The above formula consists of two parts. One is <inline-formula><tex-math id="M1">$$ \sum_{t=1}^{T} \nabla _\theta \log_{}{\pi_\theta}(A_t|S_t) $$</tex-math></inline-formula> which denotes the probability of the gradient in the current trace. The other is <inline-formula><tex-math id="M1">$$ \sum_{t=1}^{T}r(S_t,A_t) $$</tex-math></inline-formula> which represents the return of the current trace. Since the return is total rewards and can only be obtained after one episode, the policy gradient algorithm can only be updated for each episode, not for each time step.</p><p class="">The expected value can be expressed in a variety of ways, corresponding to different ways of calculating the loss function. The advantage of the strategy gradient algorithm is that it can be applied in the continuous action space. In addition, the change of the action probability is smoother, and the convergence is better guaranteed.</p><p class="">REINFORCE algorithm is a classic policy gradient algorithm<sup>[<a href="#B30" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B30">30</a>]</sup>. Since the expected value of the cumulative reward cannot be calculated directly, the Monte Carlo method is applied to approximate the average value of multiple samples. REINFORCE updates the unbiased estimate of the gradient by using Monte Carlo sampling. Each sampling generates a trajectory, which runs iteratively. After obtaining a large number of trajectories, the cumulative reward can be calculated by using certain transformations and approximations as the loss function for gradient update. However, the variance of this algorithm is large since it needs to interact with the environment until the terminate state. The reward for each interaction is a random variable, so each variance will add up when the variance is calculated. In particular, the REINFORCE algorithm has three steps:</p><ul class="tipsDisc"><li><p>Step 1: sample <i>τ<sub>i</sub></i> from <i>π<sub>θ</sub></i> (<i>A<sub>t</sub>|S<sub>t</sub></i>)</p></li><li><p>Step 2: <inline-formula><tex-math id="M1">$$ \nabla _\theta J(\theta)\approx \sum_{i}^{}[\sum_{t=1}^{T}\nabla _\theta\log_{}{\pi_\theta}(A_{t}^{i}|S_{t}^{i})\sum_{t=1}^{T}r(S_{t}^{i},A_{t}^{i})] $$</tex-math></inline-formula></p></li><li><p>Step 3: <i>theta</i> ← <i>θ</i> + <i>α</i>∇<i><sub>θ</sub>J</i> (<i>θ</i>)</p></li></ul><p class="">The two algorithms, value-based and policy-based methods, both have their own characteristics and disadvantages. Firstly, the disadvantages of the value-based methods are that the output of the action cannot be obtained directly, and it is difficult to extend to the continuous action space. The value-based methods can also lead to the problem of high bias, <i>i.e.</i>, it is difficult to eliminate the error between the estimated value function and the actual value function. For the policy-based methods, a large number of trajectories must be sampled, and the difference between each trajectory may be huge. As a result, high variance and large gradient noise are introduced. It leads to the instability of training and the difficulty of policy convergence.</p></div><div id="sec3-3" class="article-Section"><h4 >3.2.3. Actor-critic methods</h4><p class="">The actor-critic architecture combines the characteristics of the value-based and policy-based algorithms, and to a certain extent solves their respective weaknesses, as well as the contradictions between high variance and high bias. The constructed agent can not only directly output policies, but also evaluate the performance of the current policies through the value function. Specifically, the actor-critic architecture consists of an actor which is responsible for generating the policy and a critic to evaluate this policy. When the actor is performing, the critic should evaluate its performance, both of which are constantly being updated<sup>[<a href="#B31" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B31">31</a>]</sup>. This complementary training is generally more effective than a policy-based method or value-based method.</p><p class="">In specific, the input of actor is state <i>S<sub>t</sub></i>, and the output is action <i>A<sub>t</sub>.</i> The role of actor is to approximate the policy model <i>π</i><sub><i>θ</i></sub> (<i>A<sub>t</sub>|S<sub>t</sub></i>). Critic uses the value function <i>Q</i> as the output to evaluate the value of the policy, and this Q value <i>Q</i> (<i>S<sub>t</sub>, A<sub>t</sub></i>) can be directly applied to calculate the loss function of actor. The gradient of the expected revenue function <i>J</i> (<i>θ</i>) in the action-critic framework is developed from the basic policy gradient algorithm. The policy gradient algorithm can only implement the update of each episode, and it is difficult to accurately feedback the reward. Therefore, it has poor training efficiency. Instead, the actor-critic algorithm replaces <inline-formula><tex-math id="M1">$$ \sum_{t=1}^{T}r(S_{t}^{i},A_{t}^{i}) $$</tex-math></inline-formula> with <i>Q</i> (<i>S<sub>t</sub>,A<sub>t</sub></i>) to evaluate the expected returns of state-action tuple {<i>S<sub>t</sub>,A<sub>t</sub></i>} in the current time step <i>t.</i> The gradient of <i>J</i> (<i>θ</i>) can be expressed as</p><p class=""><div class="disp-formula"><label></label><tex-math id="E1"> $$ \nabla _\theta J(\theta)=\mathbb{E}_{\tau\;\pi\theta(\tau)}[\sum_{t=1}^{T}\nabla _\theta\log_{}{\pi_\theta}(A_{t}|S_{t})Q(S_{t},A_{t})]. $$ </tex-math></div></p></div></div><div id="sec2-6" class="article-Section"><h3 >3.3. Deep reinforcement learning</h3><p class="">With the continuous expansion of the application of deep learning, its wave also swept into the RL field, resulting in deep reinforcement learning (DRL), <i>i.e.</i>, using a multi-layer deep neural network to approximate value function or policy function in the RL algorithm <sup>[<a href="#B32" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B32">32</a>,<a href="#B33" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B33">33</a>]</sup>. DRL mainly solves the curse-of-dimensionality problem in real-world RL applications with large or continuous state and/or action space, where the traditional tabular RL algorithms cannot store and extract a large amount of feature information <sup>[<a href="#B17" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B17">17</a>,<a href="#B34" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B34">34</a>]</sup>.</p><p class="">Q-learning, as a very classical algorithm in RL, is a good example to understand the purpose of DRL. The big issue with Q-learning falls into the tabular method, which means that when state and action spaces are very large, it cannot build a very large Q table to store a large number of Q values<sup>[<a href="#B35" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B35">35</a>]</sup>. Besides, it counts and iterates Q values based on past states. Therefore, on the one hand, the applicable state and action space of Q-learning is very small. On the other hand, if a state never appears, Q-learning cannot deal with it<sup>[<a href="#B36" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B36">36</a>]</sup>. In other words, Q-learning has no prediction ability and generalization ability at this point.</p><p class="">In order to make Q-learning with prediction ability, considering that neural network can extract feature information well, deep Q network (DQN) is proposed by applying deep neural network to simulate Q value function. In specific, DQN is the continuation of Q-learning algorithm in continuous or large state space to approximate Q value function by replacing Q table with neural networks<sup>[<a href="#B37" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B37">37</a>]</sup>.</p><p class="">In addition to the value-based DRL algorithm such as DQN, we summarize a variety of classical DRL algorithms according to algorithm types by referring to some DRL related surveys<sup>[<a href="#B38" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B38">38</a>]</sup> in <a href="#t1" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="t1">Table 1</a>, including not only the policy-based and actor-critic DRL algorithms, but also the advanced DRL algorithms of partially observable markov decision process (POMDP) and multi-agents.</p><div id="t1" class="Figure-block"><div class="table-note"><span class="">Table 1</span><p class="">Taxonomy of representative algorithms for DRL</p></div><div class="table-responsive article-table"><table class="a-table"><thead><tr><th align="center" valign="middle" colspan="2">Types</th><th align="left" valign="top">Representative algorithms</th></tr></thead><tbody><tr><td align="center" valign="middle" colspan="2">Value-based</td><td align="left" valign="top">Deep Q-Network (DQN)<sup>[<a href="#B37" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B37">37</a>]</sup>, Double Deep Q-Network (DDQN)<sup>[<a href="#B39" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B39">39</a>]</sup>, <br />DDQN with proportional prioritization<sup>[<a href="#B40" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B40">40</a>]</sup></td></tr><tr><td align="center" valign="middle" colspan="2">Policy-based</td><td align="left" valign="top">REINFORCE<sup>[<a href="#B30" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B30">30</a>]</sup>, Q-prop<sup>[<a href="#B41" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B41">41</a>]</sup></td></tr><tr><td align="center" valign="middle" colspan="2">Actor-critic</td><td align="left" valign="top">Soft Actor-Critic (SAC)<sup>[<a href="#B42" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B42">42</a>]</sup>, Asynchronous Advantage Actor Critic (A3C)<sup>[<a href="#B43" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B43">43</a>]</sup>, <br />Deep Deterministic Policy Gradient (DDPG)<sup>[<a href="#B44" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B44">44</a>]</sup>, <br />Distributed Distributional Deep Deterministic Policy Radients (D4PG)<sup>[<a href="#B45" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B45">45</a>]</sup>, <br />Twin Delayed Deep Deterministic (TD3)<sup>[<a href="#B46" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B46">46</a>]</sup>, <br />Trust Region Policy Optimization (TRPO)<sup>[<a href="#B47" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B47">47</a>]</sup>, <br />Proximal Policy Optimization (PPO)<sup>[<a href="#B48" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B48">48</a>]</sup></td></tr><tr><td align="center" valign="middle" rowspan="2">Advanced</td><td align="center" valign="middle">POMDP</td><td align="left" valign="top">Deep Belief Q-Network (DBQN)<sup>[<a href="#B49" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B49">49</a>]</sup>, <br />Deep Recurrent Q-Network (DRQN)<sup>[<a href="#B50" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B50">50</a>]</sup>, <br />Recurrent Deterministic Policy Gradients (RDPG)<sup>[<a href="#B51" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B51">51</a>]</sup></td></tr><tr><td align="center" valign="middle">Multi-agents</td><td align="left" valign="top">Multi-Agent Importance Sampling (MAIS)<sup>[<a href="#B52" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B52">52</a>]</sup>, <br />Coordinated Multi-agent DQN<sup>[<a href="#B53" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B53">53</a>]</sup>, <br />Multi-agent Fingerprints (MAF)<sup>[<a href="#B52" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B52">52</a>]</sup>, <br />Counterfactual Multiagent Policy Gradient (COMAPG)<sup>[<a href="#B54" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B54">54</a>]</sup>, <br />Multi-Agent DDPG (MADDPG)<sup>[<a href="#B55" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B55">55</a>]</sup></td></tr></tbody></table></div><div class="table_footer"></div></div></div></div><div id="sec1-4" class="article-Section"><h2 >4. Federated reinforcement learning</h2><p class="">In this section, the detailed background and categories of FRL will be discussed.</p><div id="sec2-7" class="article-Section"><h3 >4.1. Federated reinforcement learning background</h3><p class="">Despite the excellent performance that RL and DRL have achieved in many areas, they still face several important technical and non-technical challenges in solving real-world problems. The successful application of FL in supervised learning tasks arouses interest in exploiting similar ideas in RL, <i>i.e.</i>, FRL. On the other hand, although FL is useful in some specific situations, it fails to deal with cooperative control and optimal decision-making in dynamic environments<sup>[<a href="#B10" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B10">10</a>]</sup>. FRL not only provides the experience for agents to learn to make good decisions in an unknown environment, but also ensures that the privately collected data during the agent’s exploration does not have to be shared with others. A forward-looking and interesting research direction is how to conduct RL under the premise of protecting privacy. Therefore, it is proposed to use FL framework to enhance the security of RL and define FRL as a security-enhanced distributed RL framework to accelerate the learning process, protect agent privacy and handle not independent and identically distributed (Non-IID) data<sup>[<a href="#B8" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B8">8</a>]</sup>. Apart from improving the security and privacy of RL, we believe that FRL has a wider and larger potential in helping RL to achieve better performance in various aspects, which will be elaborated in the following subsections.</p><p class="">In order to facilitate understanding and maintain consistency with FL, FRL is divided into two categories depending on environment partition<sup>[<a href="#B7" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B7">7</a>]</sup>, <i>i.e.</i>, HFRL and VFRL. <a href="#fig8" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="fig8">Figure 8</a> gives the comparison between HFRL and VFRL. In HFRL, the environment that each agent interacts with is independent of the others, while the state space and action space of different agents are aligned to solve similar problems. The action of each agent only affects its own environment and results in corresponding rewards. As an agent can hardly explore all states of its environment, multiple agents interacting with their own copy of the environment can accelerate training and improve model performance by sharing experience. Therefore, horizontal agents use server-client model or peer-to-peer model to transmit and exchange the gradients or parameters of their policy models (actors) and/or value function models (critics). In VFRL, multiple agents interact with the same global environment, but each can only observe limited state information in the scope of its view. Agents can perform different actions depending on the observed environment and receive local reward or even no reward. Based on the actual scenario, there may be some observation overlap between agents. In addition, all agents’ actions affect the global environment dynamics and total rewards. As opposed to the horizontal arrangement of independent environments in HFRL, the vertical arrangement of observations in VFRL poses a more complex problem and is less studied in the existing literature.</p><div class="Figure-block" id="fig8"><div xmlns="http://www.w3.org/1999/xhtml" class="article-figure-image"><a href="/articles/ir.2021.02/image/fig8" class="Article-img" alt="" target="_blank"><img alt="Federated reinforcement learning: techniques, applications, and open challenges" src="https://image.oaes.cc/38ebf366-5fba-47ab-86ac-aa0a6cc124ef/4325.fig.8.jpg" class="" title="" alt="" /></a></div><div class="article-figure-note"><p class="figure-note"></p><p class="figure-note">Figure 8. Comparison of horizontal federated reinforcement learning and vertical federated reinforcement learning.</p></div></div></div><div id="sec2-8" class="article-Section"><h3 >4.2. Horizontal federated reinforcement learning</h3><p class="">HFRL can be applied in scenarios in which the agents may be distributed geographically, but they face similar decision-making tasks and have very little interaction with each other in the observed environments. Each participating agent independently executes decision-making actions based on the current state of environment and obtains positive or negative rewards for evaluation. Since the environment explored by one agent is limited and each agent is unwilling to share the collected data, multiple agents try to train the policy and/or value model together to improve model performance and increase learning efficiency. The purpose of HFRL is to alleviate the sample-efficiency problem in RL, and help each agent quickly obtain the optimal policy which can maximize the expected cumulative reward for specific tasks, while considering privacy protection.</p><p class="">In the HFRL problem, the environment, state space, and action space can replace the data set, feature space, and label space of basic FL. More formally, we assume that <i>N</i> agents <inline-formula><tex-math id="M1">$$ \{\mathcal{F}_i\}_{i=1}^{N} $$</tex-math></inline-formula> can observe the environment <inline-formula><tex-math id="M1">$$ \{\mathcal{E}_i\}_{i=1}^{N} $$</tex-math></inline-formula> within their field of vision, <inline-formula><tex-math id="M1">$$ \mathcal{G} $$</tex-math></inline-formula> denotes the collection of all environments. The environment <inline-formula><tex-math id="M1">$$ \mathcal{E}_i $$</tex-math></inline-formula> where the <i>i</i>-th agent is located has a similar model, <i>i.e.</i>, state transition probability and reward function compared to other environments. Note that the environment <inline-formula><tex-math id="M1">$$ \mathcal{E}_i $$</tex-math></inline-formula> is independent of the other environments, in that the state transition and reward model of <inline-formula><tex-math id="M1">$$ \mathcal{E}_i $$</tex-math></inline-formula> do not depend on the states and actions of the other environments. Each agent <inline-formula><tex-math id="M1">$$ F_i $$</tex-math></inline-formula> interacts with its own environment <inline-formula><tex-math id="M1">$$ \mathcal{E}_i $$</tex-math></inline-formula> to learn an optimal policy. Therefore, the conditions for HFRL are presented as follows, <i>i.e.</i>,</p><p class=""><div class="disp-formula"><label></label><tex-math id="E1"> $$ \mathcal{S}_{i}=\mathcal{S}_{j}, \mathcal{A}_{i}=\mathcal{A}_{j}, \mathcal{E}_{i} \neq \mathcal{E}_{j}, \forall i, j \in\{1,2, \ldots, N\}, \mathcal{E}_{i}, \mathcal{E}_{j} \in \mathcal{G}, i \neq j, $$ </tex-math></div></p><p class="">where <inline-formula><tex-math id="M1">$$ \mathcal{S}_i $$</tex-math></inline-formula> and <inline-formula><tex-math id="M1">$$ \mathcal{S}_j $$</tex-math></inline-formula> denote the similar state space encountered by the <i>i</i>-th agent and <i>j</i>-th agent, respectively. <inline-formula><tex-math id="M1">$$ \mathcal{A}_i $$</tex-math></inline-formula> and <inline-formula><tex-math id="M1">$$ \mathcal{A}_j $$</tex-math></inline-formula> denote the similar action space of the <i>i</i>-th agent and <i>j</i>-th agent, respectively The observed environment <inline-formula><tex-math id="M1">$$ \mathcal{E}_i $$</tex-math></inline-formula> and <inline-formula><tex-math id="M1">$$ \mathcal{E}_j $$</tex-math></inline-formula> are two different environments that are assumed to be independent and ideally identically distributed.</p><p class=""><a href="#fig9" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="fig9">Figure 9</a> shows the HFRL in graphic form. Each agent is represented by a cuboid. The axes of the cuboid denote three dimensions of information, <i>i.e.</i>, the environment, state space, and action space. We can intuitively see that all environments are arranged horizontally, and multiple agents have aligned state and action spaces. In other words, each agent explores independently in its respective environment, and needs to obtain optimal strategies for similar tasks. In HFRL, agents share their experiences by exchanging masked models to enhance sample efficiency and accelerate the learning process.</p><div class="Figure-block" id="fig9"><div xmlns="http://www.w3.org/1999/xhtml" class="article-figure-image"><a href="/articles/ir.2021.02/image/fig9" class="Article-img" alt="" target="_blank"><img alt="Federated reinforcement learning: techniques, applications, and open challenges" src="https://image.oaes.cc/38ebf366-5fba-47ab-86ac-aa0a6cc124ef/4325.fig.9.jpg" class="" title="" alt="" /></a></div><div class="article-figure-note"><p class="figure-note"></p><p class="figure-note">Figure 9. Illustration of horizontal federated reinforcement learning.</p></div></div><p class="">A typical example of HFRL is the autonomous driving system in IoV. As vehicles drive on roads throughout the city and country, they can collect various environmental information and train the autonomous driving models locally. Due to driving regulations, weather conditions, driving routes, and other factors, one vehicle cannot be exposed to every possible situation in the environment. Moreover, the vehicles have basically the same operations, including braking, acceleration, steering, <i>etc.</i> Therefore, vehicles driving on different roads, different cities, or even different countries could share their learned experience with each other by FRL without revealing their driving data according to the premise of privacy protection. In this case, even if other vehicles have never encountered a situation, they can still perform the best action by using the shared model. The exploration of multiple vehicles together also creates an increased chance of learning rare cases to ensure the reliability of the model.</p><p class="">For a better understanding of HFRL, <a href="#fig10" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="fig10">Figure 10</a> shows an example of HFRL architecture using the server-client model. The coordinator is responsible for establishing encrypted communication with agents and implementing aggregation of shared models. The multiple parallel agents may be composed of heterogeneous equipment (<i>e.g.</i>, IoT devices, smart phone and computers, <i>etc.</i>) and distributed geographically. It is worth noting that there is no specific requirement for the number of agents, and agents are free to choose to join or leave. The basic procedure for conducting HFRL can be summarized as follows.</p><div class="Figure-block" id="fig10"><div xmlns="http://www.w3.org/1999/xhtml" class="article-figure-image"><a href="/articles/ir.2021.02/image/fig10" class="Article-img" alt="" target="_blank"><img alt="Federated reinforcement learning: techniques, applications, and open challenges" src="https://image.oaes.cc/38ebf366-5fba-47ab-86ac-aa0a6cc124ef/4325.fig.10.jpg" class="" title="" alt="" /></a></div><div class="article-figure-note"><p class="figure-note"></p><p class="figure-note">Figure 10. An example of horizontal federated reinforcement learning architecture.</p></div></div><ul class="tipsDisc"><li><p>Step 1: The initialization/join process can be divided into two cases, one is when the agent has no model locally, and the other is when the agent has a model locally. For the first case, the agent can directly download the shared global model from a coordinator. For the second case, the agent needs to confirm the model type and parameters with the central coordinator.</p></li><li><p>Step 2: Each agent independently observes the state of the environment and determines the private strategy based on the local model. The selected action is evaluated by the next state and received reward. All agents train respective models in state-action-reward-state (SARS) cycles.</p></li><li><p>Step 3: Local model parameters are encrypted and transmitted to the coordinator. Agents may submit local models at any time as long as the trigger conditions are met.</p></li><li><p>Step 4: The coordinator conducts the specific aggregation algorithm to evolve the global federated model. Actually, there is no need to wait for submissions from all agents, and appropriate aggregation conditions can be formulated depending on communication resources.</p></li><li><p>Step 5: The coordinator sends back the aggregated model to the agents.</p></li><li><p>Step 6: The agents improve their respective models by fusing the federated model.</p></li></ul><p class="">Following the above architecture and process, applications suitable for HFRL should meet the following characteristics. First, agents have similar tasks to make decisions under dynamic environments. Different from the FL setting, the goal of the HFRL-based application is to find the optimal strategy to maximize reward in the future. For the agent to accomplish the task requirements, the optimal strategy directs them to perform certain actions, such as control, scheduling, navigation, <i>etc.</i> Second, distributed agents maintain independent observations. Each agent can only observe the environment within its field of view, but does not ensure that the collected data follows the same distribution. Third, it is important to protect the data that each agent collects and explores. Agents are presumed to be honest but curious, <i>i.e.</i>, they honestly follow the learning mechanism but are curious about private information held by other agents. Due to this, the data used for training is only stored at the owner and is not transferred to the coordinator. HFRL provides an implementation method for sharing experiences under the constraints of privacy protection. Additionally, various reasons limit the agent’s ability to explore the environment in a balanced manner. Participating agents may include heterogeneous devices. The amount of data collected by each agent is unbalanced due to mobility, observation, energy and other factors. However, all participants have sufficient computing, storage, and communication capabilities. These capabilities assist the agent in completing model training, merging, and other basic processes. Finally, the environment observed by a agent may change dynamically, causing differences in data distribution. The participating agents need to update the model in time to quickly adapt to environmental changes and construct a personalized local model.</p><p class="">In existing RL studies, some applications that meet the above characteristics can be classified as HFRL. Nadiger <i>et al.</i><sup>[<a href="#B56" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B56">56</a>]</sup> presents a typical HFRL architecture, which includes the grouping policy, the learning policy, and the federation policy. In this work, RL is used to show the applicability of granular personalization and FL is used to reduce training time. To demonstrate the effectiveness of the proposed architecture, a non-player character in the Atari game Pong is implemented and evaluated. In the study from Liu <i>et al.</i><sup>[<a href="#B57" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B57">57</a>]</sup>, the authors propose the lifelong federated reinforcement learning (LFRL) for navigation in cloud robotic systems. It enables the robot to learn efficiently in a new environment and use prior knowledge to quickly adapt to the changes in the environment. Each robot trains a local model according to its own navigation task, and the centralized cloud server implements a knowledge fusion algorithm for upgrading a shared model. In considering that the local model and the shared model might have different network structures, this paper proposes to apply transfer learning to improve the performance and efficiency of the shared model. Further, researchers also focus on HFRL-based applications in the IoT due to the high demand for privacy protection. Ren <i>et al.</i><sup>[<a href="#B58" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B58">58</a>]</sup> suggest deploying the FL architecture between edge nodes and IoT devices for computation offloading tasks. IoT devices can download RL model from edge nodes and train the local model using own data, including the remained energy resources and the workload of IoT device, <i>etc.</i> The edge node aggregates the updated private model into the shared model. Although this method considers privacy protection issues, it requires further evaluation regarding the cost of communication resources by the model exchange. In addition, the work<sup>[<a href="#B59" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B59">59</a>]</sup> proposes a federated deep-reinforcement-learning-based framework (FADE) for edge caching. Edge devices, including base stations (BSs), can cooperatively learn a predictive model using the first round of training parameters for local learning, and then upload the local parameters tuned to the next round of global training. By keeping the training on local devices, the FADE can enable fast training and decouple the learning process between the cloud and data owner in a distributed-centralized manner. More HFRL-based applications will be classified and summarized in the next section.</p><p class="">Prior to HFRL, a variety of distributed RL algorithms have been extensively investigated, which are closely related to HFRL. In general, distributed RL algorithms can be divided into two types: synchronized and asynchronous. In synchronous RL algorithms, such as Sync-Opt synchronous stochastic optimization (Sync-Opt) <sup>[<a href="#B60" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B60">60</a>]</sup> and parallel advantage actor critic (PAAC)<sup>[<a href="#B3" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B3">3</a>]</sup>, the agents explore their own environments separately, and after a number of samples are collected, the global parameters are updated synchronously. On the contrary, the coordinator will update the global model immediately after receiving the gradient from an arbitrary agent in asynchronous RL algorithms, rather than waiting for other agents. Several asynchronous RL algorithms are presented, including A3C<sup>[<a href="#B61" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B61">61</a>]</sup>, Impala<sup>[<a href="#B62" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B62">62</a>]</sup>, Ape-X<sup>[<a href="#B63" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B63">63</a>]</sup> and general reinforcement learning architecture (Go-rila)<sup>[<a href="#B1" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B1">1</a>]</sup>. From the perspective of technology development, HFRL can also be considered security-enhanced parallel RL. In parallel RL, multiple agents interact with a stochastic environment to seek the optimal policy for the same task<sup>[<a href="#B1" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B1">1</a>,<a href="#B2" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B2">2</a>]</sup>. By building a closed loop of data and knowledge in parallel systems, parallel RL helps determine the next course of action for each agent. The state and action representations are fed into a designed neural network to approximate the action value function<sup>[<a href="#B64" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B64">64</a>]</sup>. However, parallel RL typically transfers the experience of agent without considering privacy protection issues<sup>[<a href="#B7" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B7">7</a>]</sup>. In the implementation of HFRL, further restrictions accompany privacy protection and communication consumption to adapt to special scenarios, such as IoT applications<sup>[<a href="#B59" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B59">59</a>]</sup>. In addition, another point to consider is Non-IID data. In order to ensure convergence of the RL model, it is generally assumed in parallel RL that the states transitions in the environment follow the same distribution, <i>i.e.</i>, the environments of different agents are IID. But in actual scenarios, the situation faced by agents may differ slightly, so that the models of environments for different agents are not identically distributed. Therefore, HFRL needs to improve the generalization ability of the model compared with parallel RL to meet the challenges posed by Non-IID data.</p><p class="">Based on the potential issues faced by the current RL technology, the advantages of HFRL can be summarized as follows.</p><ul class="tipsDisc"><li><p>Enhancing training speed. In the case of a similar target task, multiple agents sharing training experiences gained from different environments can expedite the learning process. The local model rapidly evolves through aggregation and update algorithms to assess the unexplored environment. Moreover, the data obtained by different agents are independent, reducing correlations between the observed data. Furthermore, this also helps to solve the issue of unbalanced data caused by various restrictions.</p></li><li><p>Improving the reliability of model. When the dimensions of the state of the environment are enormous or even uncountable, it is difficult for a single agent to train an optimal strategy for situations with extremely low occurrence probabilities. Horizontal agents are exploring independently while building a cooperative model to improve the local model’s performance on rare states.</p></li><li><p>Mitigating the problems of devices heterogeneity. Different devices deploying RL agents in the HFRL architecture may have different computational and communication capabilities. Some devices may not meet the basic requirements for training, but strategies are needed to guide actions. HFRL makes it possible for all agents to obtain the shared model equally for the target task.</p></li><li><p>Addressing the issue of non-identical environment. Considering the differences in the environment dynamics for the different agents, the assumption of IID data may be broken. Under the HFRL architecture, agents in not identically-distributed environment models can still cooperate to learn a federated model. In order to address the difference in environment dynamics, a personalized update algorithm of local model could be designed to minimize the impact of this issue.</p></li><li><p>Increasing the flexibility of the system. The agent can decide when to participate in the cooperative system at any time, because HFRL allows asynchronous requests and aggregation of shared models. In the existing HFRL-based application, new agents also can apply for membership and benefit from downloading the shared model.</p></li></ul></div><div id="sec2-9" class="article-Section"><h3 >4.3. Vertical federated reinforcement learning</h3><p class="">In VFL, samples of multiple data sets have different feature spaces but these samples may belong to the same groups or common users. The training data of each participant are divided vertically according to their features. More general and accurate models can be generated by building heterogeneous feature spaces without releasing private information. VFRL applies the methodology of VFL to RL and is suitable for POMDP scenarios where different RL agents are in the same environment but have different interactions with the environment. Specifically, different agents could have different observations that are only part of the global state. They could take actions from different action spaces and observe different rewards, or some agents even take no actions or cannot observe any rewards. Since the observation range of a single agent to the environment is limited, multiple agents cooperate to collect enough knowledge needed for decision making. The role of FL in VFRL is to aggregate the partial features observed by various agents. Especially for those agents without rewards, the aggregation effect of FL greatly enhances the value of such agents in their interactions with the environment, and ultimately helps with the strategy optimization. It is worth noting that in VFRL the issue of privacy protection needs to be considered, <i>i.e.</i>, private data collected by some agents do not have to be shared with others. Instead, agents can transmit encrypted model parameters, gradients, or direct mid-product to each other. In short, the goal of VFRL is for agents interacting with the same environment to improve the performance of their policies and the effectiveness in learning them by sharing experiences without compromising the privacy.</p><p class="">More formally, we denote <inline-formula><tex-math id="M1">$$ \{\mathcal{F}_i\}_{i=1}^{N} $$</tex-math></inline-formula> as <i>N</i> agents in VFRL, which interact with a global environment <inline-formula><tex-math id="M1">$$ \mathcal{E} $$</tex-math></inline-formula>. The <i>i</i>-th agent <inline-formula><tex-math id="M1">$$ \mathcal{F}_i $$</tex-math></inline-formula> is located in the environment <inline-formula><tex-math id="M1">$$ \mathcal{E}_i=\mathcal{E} $$</tex-math></inline-formula>, obtains the local partial observation <inline-formula><tex-math id="M1">$$ \mathcal{O}_i $$</tex-math></inline-formula>, and can perform the set of actions <inline-formula><tex-math id="M1">$$ \mathcal{A}_i $$</tex-math></inline-formula>. Different from HFRL, the state/observation and action spaces of two agents <inline-formula><tex-math id="M1">$$ \mathcal{F}_i $$</tex-math></inline-formula> and <inline-formula><tex-math id="M1">$$ \mathcal{F}_j $$</tex-math></inline-formula> may be not identical, but the aggregation of the state/observation spaces and action spaces of all the agents constitutes the global state and action spaces of the global environment <inline-formula><tex-math id="M1">$$ \mathcal{E} $$</tex-math></inline-formula>. The conditions for VFRL can be defined as <i>i.e.</i>,</p><p class=""><div class="disp-formula"><label></label><tex-math id="E1"> $$ \mathcal{O}_i\not=\mathcal{O}_j,\mathcal{A}_i\not=\mathcal{A}_j,\mathcal{E}_i=\mathcal{E}_j=\mathcal{E},\bigcup_{i=1}^{N}\mathcal{O}_i=\mathcal{S},\bigcup_{i=1}^{N}\mathcal{A}_i=\mathcal{A},\forall i,j\in\{1,2,\dots,N\},i\not=j, $$ </tex-math></div></p><p class="">where <inline-formula><tex-math id="M1">$$ \mathcal{S} $$</tex-math></inline-formula> and <inline-formula><tex-math id="M1">$$ \mathcal{A} $$</tex-math></inline-formula> denote the global state space and action space of all participant agents respectively. It can be seen that all the observations of the <i>N</i> agents together constitute the global state space <inline-formula><tex-math id="M1">$$ \mathcal{S} $$</tex-math></inline-formula> of the environment <inline-formula><tex-math id="M1">$$ \mathcal{E} $$</tex-math></inline-formula>. Besides, the environments <inline-formula><tex-math id="M1">$$ \mathcal{E}_i $$</tex-math></inline-formula> and <inline-formula><tex-math id="M1">$$ \mathcal{E}_j $$</tex-math></inline-formula> are the same environment <inline-formula><tex-math id="M1">$$ \mathcal{E} $$</tex-math></inline-formula>. In most cases, there is a great difference between the observations of two agents <inline-formula><tex-math id="M1">$$ \mathcal{F}_i $$</tex-math></inline-formula> and <inline-formula><tex-math id="M1">$$ \mathcal{F}_j $$</tex-math></inline-formula>.</p><p class=""><a href="#fig11" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="fig11">Figure 11</a> shows the architecture of VFRL. The dataset and feature space in VFL are converted to the environment and state space respectively. VFL divides the dataset vertically according to the features of samples, and VFRL divides agents based on the state spaces observed from the global environment. Generally speaking, every agent has its local state which can be different from that of the other agents and the aggregation of these local partial states corresponds to the entire environment state<sup>[<a href="#B65" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B65">65</a>]</sup>. In addition, after interacting with the environment, agents may generate their local actions which correspond to the labels in VFL.</p><div class="Figure-block" id="fig11"><div xmlns="http://www.w3.org/1999/xhtml" class="article-figure-image"><a href="/articles/ir.2021.02/image/fig11" class="Article-img" alt="" target="_blank"><img alt="Federated reinforcement learning: techniques, applications, and open challenges" src="https://image.oaes.cc/38ebf366-5fba-47ab-86ac-aa0a6cc124ef/4325.fig.11.jpg" class="" title="" alt="" /></a></div><div class="article-figure-note"><p class="figure-note"></p><p class="figure-note">Figure 11. Illustration of vertical federated reinforcement learning.</p></div></div><p class="">Two types of agents can be defined for VFRL, <i>i.e.</i>, decision-oriented agents and support-oriented agents. Decision-oriented agents <inline-formula><tex-math id="M1">$$ \{\mathcal{F}_i\}_{i=1}^K $$</tex-math></inline-formula> can interact with the environment <inline-formula><tex-math id="M1">$$ \mathcal{E} $$</tex-math></inline-formula> based on their local state <inline-formula><tex-math id="M1">$$ \{\mathcal{S}_i\}_{i=1}^K $$</tex-math></inline-formula> and action <inline-formula><tex-math id="M1">$$ \{\mathcal{A}_i\}_{i=1}^K $$</tex-math></inline-formula>. Meanwhile, support-oriented agents <inline-formula><tex-math id="M1">$$ \{\mathcal{F}_i\}_{i=K+1}^N $$</tex-math></inline-formula> take no actions and receive no rewards but only the observations of the environment, <i>i.e.</i>, their local states <inline-formula><tex-math id="M1">$$ \{\mathcal{S}_i\}_{i=K+1}^N $$</tex-math></inline-formula>. In general, the following six steps, as shown in <a href="#fig12" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="fig12">Figure 12</a>, are the basic procedure for VFRL, <i>i.e.</i>,</p><div class="Figure-block" id="fig12"><div xmlns="http://www.w3.org/1999/xhtml" class="article-figure-image"><a href="/articles/ir.2021.02/image/fig12" class="Article-img" alt="" target="_blank"><img alt="Federated reinforcement learning: techniques, applications, and open challenges" src="https://image.oaes.cc/38ebf366-5fba-47ab-86ac-aa0a6cc124ef/4325.fig.12.jpg" class="" title="" alt="" /></a></div><div class="article-figure-note"><p class="figure-note"></p><p class="figure-note">Figure 12. An example of vertical federated reinforcement learning architecture.</p></div></div><ul class="tipsDisc"><li><p>Step 1: Initialization is performed for all agent models.</p></li><li><p>Step 2: Agents obtain states from the environment. For decision-oriented agents, actions are obtained based on the local models, and feedbacks are obtained through interactions with the environment, <i>i.e.</i>, the states of the next time step and rewards. The data tuple of state-action-reward-state (SARS) is used to train the local models.</p></li><li><p>Step 3: All agents calculate the mid-products of the local models and then transmit the encrypted mid-products to the federated model.</p></li><li><p>Step 4: The federated model performs the aggregation calculation for mid-products and trains the federated model based on the aggregation results.</p></li><li><p>Step 5: Federated model encrypts model parameters such as weight and gradient and passes them back to other agents.</p></li><li><p>Step 6: All agents update their local models based on the received encrypted parameters.</p></li></ul><p class="">As an example of VFRL, consider a microgrid (MG) system including household users, the power company and the photovoltaic (PV) management company as the agents. All the agents observe the same MG environment while their local state spaces are quite different. The global states of the MG system generally consist of several dimensions/features, <i>i.e.</i>, state-of-charge (SOC) of the batteries, load consumption of the household users, power generation from PV, etc. The household agents can obtain the SOC of their own batteries and their own load consumption, the power company can know the load consumption of all the users, and PV management company can know the power generation of PV. As to the action, the power company needs to make decisions on the power dispatch of the diesel generators (DG), and the household users can make decisions to manage their electrical utilities with demand response. Finally, the power company can observe rewards such as the cost of DG power generation, the balance between power generation and consumption, and the household users can observe rewards such as their electricity bill that is related to their power consumption. In order to learn the optimal policies, these agents need to communicate with each other to share their observations. However, PV managers do not want to expose their data to other companies, and household users also want to keep their consumption data private. In this way, VFRL is suitable to achieve this goal and can help improve policy decisions without exposing specific data.</p><p class="">Compared with HFRL, there are currently few works on VFRL. Zhuo <i>et al.</i><sup>[<a href="#B65" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B65">65</a>]</sup> present the federated deep reinforcement learning (FedRL) framework. The purpose of this paper is to solve the challenge where the feature space of states is small and the training data are limited. Transfer learning approaches in DRL are also solutions for this case. However, when considering the privacy-aware applications, directly transferring data or models should not be used. Hence, FedRL combines the advantage of FL with RL, which is suitable for the case when agents need to consider their privacy. FedRL framework assumes agents cannot share their partial observations of the environment and some agents are unable to receive rewards. It builds a shared value network, <i>i.e.</i>, multiLayer perceptron (MLP), and takes its own Q-network output and encryption value as input to calculate a global Q-network output. Based on the output of global Q-network, the shared value network and self Q-network are updated. Two agents are used in the FedRL algorithm, <i>i.e.</i>, agent <i>α</i> and <i>β</i>, which interact with the same environment. However, agent <i>β</i> cannot build its own policies and rewards. Finally, FedRL is applied in two different games, <i>i.e.</i>, Grid-World and Text2Action, and achieves better results than the other baselines. Although the VFRL model in this paper only contains two agents, and the structure of the aggregated neural network model is relatively simple, we believe that it is a great attempt to first implement VFRL and verify its effectiveness.</p><p class="">Multi-agent RL (MARL) is very closely related to VFRL. As the name implies, MARL takes into account the existence of multiple agents in the RL system. However, the empirical evaluation shows that applying the simple single-agent RL algorithms directly to scenarios of multiple agents cannot converge to the optimal solution, since the environment is no longer static from the perspective of each agent <sup>[<a href="#B66" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B66">66</a>]</sup>. In specific, the action of each agent will affect the next state, thus affecting all agents in the future time step<sup>[<a href="#B67" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B67">67</a>]</sup>. Besides, the actions performed by one certain agent will yield different rewards depending on the actions taken by other agents. This means that agents in MARL correlate with each other, rather than being independent of each other. This challenge, called as the non-stationarity of the environment, is the main problem to be solved in the development of an efficient MARL algorithm<sup>[<a href="#B68" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B68">68</a>]</sup>.</p><p class="">MARL and VFRL both study the problem of multiple agents learning concurrently how to solve a task by interacting with the same environment<sup>[<a href="#B69" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B69">69</a>]</sup>. Since MARL and VFRL have a large range of similarities, the review of MARL’s related works is a very useful guide to help researchers summarize the research focus and better understand VFRL. There is abundant literature related to MARL. However, most MARL research <sup>[<a href="#B70" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B70">70</a>-<a href="#B73" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B73">73</a>]</sup> is based on a fully observed markov decision process (MDP), where each agent is assumed to have the global observation of the system state<sup>[<a href="#B68" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B68">68</a>]</sup>. These MARL algorithms are not applicable to the case of POMDP where the observations of individual agents are often only a part of the overall environment<sup>[<a href="#B74" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B74">74</a>]</sup>. Partial observability is a crucial consideration for the development of algorithms that can be applied to real-world problems<sup>[<a href="#B75" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B75">75</a>]</sup>. Since VFRL is mainly oriented towards POMDP scenarios, it is more important to analyze the related works of MARL based on POMDP as the guidance of VFRL.</p><p class="">Agents in the above scenarios partially observe the system state and make decisions at each step to maximize the overall rewards for all agents, which can be formalized as a decentralized partially observable markov decision process (Dec-POMDP)<sup>[<a href="#B76" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B76">76</a>]</sup>. Optimally addressing a Dec-POMDP model is well known to be a very challenging problem. In the early works, Omidshafiei <i>et al.</i><sup>[<a href="#B77" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B77">77</a>]</sup> proposes a two-phase MT-MARL approach that concludes the methods of cautiously-optimistic learners for action-value approximation and concurrent experience replay trajectories (CERTs) as the experience replay targeting sample-efficient and stable MARL. The authors also apply the recursive neural network (RNN) to estimate the non-observed state and hysteretic Q-learning to address the problem of non-stationarity in Dec-POMDP. Han <i>et al.</i><sup>[<a href="#B78" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B78">78</a>]</sup> designs a neural network architecture, IPOMDP-net, which extends QMDP-net planning algorithm<sup>[<a href="#B79" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B79">79</a>]</sup> to MARL settings under POMDP. Besides, Mao <i>et al.</i><sup>[<a href="#B80" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B80">80</a>]</sup> introduces the concept of information state embedding to compress agents’ histories and proposes an RNN model combining the state embedding. Their method, <i>i.e.</i>, embed-then-learn pipeline, is universal since the embedding can be fed into any existing partially observable MARL algorithm as the black-box. In the study from Mao <i>et al.</i><sup>[<a href="#B81" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B81">81</a>]</sup>, the proposed attention MADDPG (ATT-MADDPG) has several critic networks for various agents under POMDP. A centralized critic is adopted to collect the observations and actions of the teammate agents. Specifically, the attention mechanism is applied to enhance the centralized critic. The final introduced work is from Lee <i>et al.</i><sup>[<a href="#B82" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B82">82</a>]</sup>. They present an augmenting MARL algorithm based on pretraining to address the challenge in disaster response. It is interesting that they use behavioral cloning (BC), a supervised learning method where agents learn their policy from demonstration samples, as the approach to pretrain the neural network. BC can generate a feasible Dec-POMDP policy from demonstration samples, which offers advantages over plain MARL in terms of solution quality and computation time.</p><p class="">Some MARL algorithms also concentrate on the communication issue of POMDP. In the study from Sukhbaatar <i>et al.</i><sup>[<a href="#B83" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B83">83</a>]</sup>, communication between the agents is performed for a number of rounds before their action is selected. The communication protocol is learned concurrently with the optimal policy. Foerster <i>et al.</i><sup>[<a href="#B84" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B84">84</a>]</sup> proposes a deep recursive network architecture, <i>i.e.</i>, Deep Distributed Recurrent Q-network (DDRQN), to address the communication problem in a multi-agent partially-observable setting. This work makes three fundamental modifications to previous algorithms. The first one is last-action inputs, which let each agent access its previous action as an input for the next time-step. Besides, inter-agent weight sharing allows diverse behavior between agents, as the agents receive different observations and thus evolve in different hidden states. The final one is disabling experience replay, which is because the non-stationarity of the environment renders old experiences obsolete or misleading. Foerster <i>et al.</i><sup>[<a href="#B84" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B84">84</a>]</sup> considers the communication task of fully cooperative, partially observable, sequential multi-agent decision-making problems. In their system model, each agent can receive a private observation and take actions that affect the environment. In addition, the agent can also communicate with its fellow agents via a discrete limited-bandwidth channel. Despite the partial observability and limited channel capacity, authors achieved the task that the two agents could discover a communication protocol that enables them to coordinate their behavior based on the approach of deep recurrent Q-networks.</p><p class="">While there are some similarities between MARL and VFRL, several important differences have to be paid attention to, <i>i.e.</i>,</p><ul class="tipsDisc"><li><p>VFRL and some MARL algorithms are able to address similar problems, <i>e.g.</i>, the issues of POMDP. However, there are differences between the solution ideas between two algorithms. Since VFRL is the product of applying VFL to RL, the FL component of VFRL focuses more on the aggregation of partial features, including states and rewards, observed by different agents since VFRL inception. Security is also an essential issue in VFRL. On the contrary, MARL may arise as the most natural way of adding more than one agent in a RL system<sup>[<xref ref-type="bibr" rid="B85">85</xref>]</sup>. In MARL, agents not only interact with the environment, but also have complex interactive relationships with other agents, which creates a great obstacle to the solution of policy optimization. Therefore, the original intentions of two algorithms are different.</p></li><li><p>Two algorithms are slightly different in terms of the structure. The agents in MARL must surely have the reward even some of them may not have their own local actions. However, in some cases, the agents in VFRL are not able to generate a corresponding operation policy, so in these cases, some agents have no actions and rewards<sup>[<xref ref-type="bibr" rid="B65">65</xref>]</sup>. Therefore, VFRL can solve more extensive problems that MARL is not capable of solving.</p></li><li><p>Both two algorithms involve the communication problem between agents. In MARL, information such as the states of other agents and model parameters can be directly and freely propagated among agents. During communication, some MARL methods such as DDRQN in the work of Foerster <i>et al.</i><sup>[<xref ref-type="bibr" rid="B84">84</xref>]</sup> consider the previous action as an input for the next time-step state. Weight sharing is also allowed between agents. However, VFRL assumes states cannot be shared among agents. Since these agents do not exchange experience and data directly, VFRL focuses more on security and privacy issues of communication between agents, as well as how to process mid-products transferred by other agents and aggregate federated models.</p></li></ul><p class="">In summary, as a potential and notable algorithm, VFRL has several advantages as follows, <i>i.e.</i>,</p><ul class="tipsDisc"><li><p>Excellent privacy protection. VFRL inherits the FL algorithm’s idea of data privacy protection, so for the task of multiple agents cooperation in the same environment, information interaction can be carried out confidently to enhance the learning efficiency of RL model. In this process, each participant does not have to worry about any leakage of raw real-time data.</p></li><li><p>Wide application scenarios. With appropriate knowledge extraction methods, including algorithm design and system modeling, VFRL can solve more real-world problems compared with MARL algorithms. This is because VFRL can consider some agents that cannot generate rewards into the system model, so as to integrate their partial observation information of the environment based on FL while protecting privacy, train a more robust RL agent, and further improve learning efficiency.</p></li></ul></div><div id="sec2-10" class="article-Section"><h3 >4.4. Other types of FRL</h3><p class="">The above HFRL or VFRL algorithms borrow ideas from FL for federation between RL agents. Meanwhile, there are also some existing works on FRL that are less affected by FL. Hence, they do not belong to either HFRL or VFRL, but federation between agents is also implemented.</p><p class="">The study from Hu <i>et al.</i><sup>[<a href="#B86" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B86">86</a>]</sup> is a typical example, which proposes a reward shaping based general FRL algorithm, called federated reward shaping (FRS). It uses reward shaping to share federated information to improve policy quality and training speed. FRS adopts the server-client architecture. The server includes the federated model, while each client completes its own tasks based on the local model. This algorithm can be combined with different kinds of RL algorithms. However, it should be noted that FRS focuses on reward shaping, this algorithm cannot be used when there is no reward in some agents in VFRL. In addition, FRS performs knowledge aggregation by sharing high-level information such as reward shaping value or embedding between client and server instead of sharing experience or policy directly. The convergence of FRS is also guaranteed since only minor changes are made during the learning process, which is the modification of the reward in the replay buffer.</p><p class="">As another example, Anwar <i>et al.</i><sup>[<a href="#B87" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B87">87</a>]</sup> achieves federation between agents by smoothing the average weight. This work analyzes the multi-task FRL algorithms (MT-FedRL) with adversaries. Agents only interact and make observations in their environment, which can be featured by different MDPs. Different from HFRL, the state and action spaces do not need to be the same in these environments. The goal of MT-FedRL is to learn a unified policy, which is jointly optimized across all of the environments. MT-FedRL adopts policy gradient methods for RL. In other words, policy parameter is needed to learn the optimal policy. The server-client architecture is also applied and all agents should share their own information with a centralized server. The role of non-negative smoothing average weights is to achieve a consensus among the agents’ parameters. As a result, they can help to incorporate the knowledge from other agents as the process of federation.</p></div></div><div id="sec1-5" class="article-Section"><h2 >5. Applications of FRL</h2><p class="">In this section, we provide an extensive discussion of the application of FRL in a variety of tasks, such as edge computing, communications, control optimization, attack detection, <i>etc.</i> This section is aimed at enabling readers to understand the applicable scenarios and research status of FRL.</p><div id="sec2-11" class="article-Section"><h3 >5.1. FRL for edge computing</h3><p class="">In recent years, edge equipment, such as BSs and road side units (RSUs), has been equipped with increasingly advanced communication, computing and storage capabilities. As a result, edge computing is proposed to delegating more tasks to edge equipment in order to reduce the communication load and reduce the corresponding delay. However, the issue of privacy protection remains challenging since it may be untrustworthy for the data owner to hand off their private information to a third-party edge server<sup>[<a href="#B4" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B4">4</a>]</sup>. FRL offers a potential solution for achieving privacy-protected intelligent edge computing, especially in decision-making tasks like caching and offloading. Additionally, the multi-layer processing architecture of edge computing is also suitable for implementing FRL through the server-client model. Therefore, many researchers have focused on applying FRL to edge computing.</p><p class="">The distributed data of large-scale edge computing architecture makes it possible for FRL to provide distributed intelligent solutions to achieve resource optimization at the edge. For mobile edge networks, a potential FRL framework is presented for edge system<sup>[<a href="#B88" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B88">88</a>]</sup>, named as “In-Edge AI”, to address optimization of mobile edge computing, caching, and communication problems. The authors also propose some ideas and paradigms for solving these problems by using DRL and Distributed DRL. To carry out dynamic system-level optimization and reduce the unnecessary transmission load, “In-Edge AI” framework takes advantage of the collaboration among edge nodes to exchange learning parameters for better training and inference of models. It has been evaluated that the framework has high performance and relatively low learning overhead, while the mobile communication system is cognitive and adaptive to the environment. The paper provides good prospects for the application of FRL to edge computing, but there are still many challenges to overcome, including the adaptive improvement of the algorithm, and the training time of the model from scratch <i>etc.</i></p><p class="">Edge caching has been considered a promising technique for edge computing to meet the growing demands for next-generation mobile networks and beyond. Addressing the adaptability and collaboration challenges of the dynamic network environment, Wang <i>et al.</i><sup>[<a href="#B89" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B89">89</a>]</sup> proposes a device-to-device (D2D)-assisted heterogeneous collaborative edge caching framework. User equipment (UE) in a mobile network uses the local DQN model to make node selection and cache replacement decisions based on network status and historical information. In other words, UE decides where to fetch content and which content should be replaced in its cache list. The BS calculates aggregation weights based on the training evaluation indicators from UE. To solve the long-term mixed-integer linear programming problem, the attention-weighted federated deep reinforcement learning (AWFDRL) is presented, which optimizes the aggregation weights to avoid the imbalance of the local model quality and improve the learning efficiency of the DQN. The convergence of the proposed algorithm is verified and simulation results show that the AWFDRL framework can perform well on average delay, hit rate, and offload traffic.</p><p class="">A federated solution for cooperative edge caching management in fog radio access networks (F-RANs) is proposed <sup>[<a href="#B90" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B90">90</a>]</sup>. Both edge computing and fog computing involve bringing intelligence and processing to the origins of data. The key difference between the two architectures is where the computing node is positioned. A dueling deep Q-network based cooperative edge caching method is proposed to overcome the dimensionality curse of RL problem and improve caching performance. Agents are developed in fog access points (F-APs) and allowed to build a local caching model for optimal caching decisions based on the user content request and the popularity of content. HFRL is applied to aggregate the local models into a global model in the cloud server. The proposed method outperforms three classical content caching methods and two RL algorithms in terms of reducing content request delays and increasing cache hit rates.</p><p class="">For edge-enabled IoT, Majidi <i>et al.</i><sup>[<a href="#B91" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B91">91</a>]</sup> proposes a dynamic cooperative caching method based on hierarchical federated deep reinforcement learning (HFDRL), which is used to determine which content should be cached or evicted by predicting future user requests. Edge devices that have a strong relationship are grouped into a cluster and one head is selected for this cluster. The BS trains the Q-value based local model by using BS states, content states, and request states. The head has enough processing and caching capabilities to deal with model aggregation in the cluster. By categorizing edge devices hierarchically, HFDRL improves the response time delay to keeps both small and large clusters from experiencing the disadvantages they could encounter. Storage partitioning allows content to be stored in clusters at different levels using the storage space of each device. The simulation results show the proposed method using MovieLens datasets improves the average content access delay and the hit rate.</p><p class="">Considering the low latency requirements and privacy protection issue of IoV, the study of efficient and secure caching methods has attracted many researchers. An FRL-empowered task caching problem with IoV has been analyzed by Zhao <i>et al.</i><sup>[<a href="#B92" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B92">92</a>]</sup>. The work proposes a novel cooperative caching algorithm (CoCaRL) for vehicular networks with multi-level FRL to dynamically determine which contents should be replaced and where the content requests should be served. This paper develops a two-level aggregation mechanism for federated learning to speed up the convergence rate and reduces communication overhead, while DRL task is employed to optimize the cooperative caching policy between RSUs of vehicular networks. Simulation results show that the proposed algorithm can achieve a high hit rate, good adaptability and fast convergence in a complex environment.</p><p class="">Apart from caching services, FRL has demonstrated its strong ability to facilitate resource allocation in edge computing. In the study from Zhu <i>et al.</i><sup>[<a href="#B93" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B93">93</a>]</sup>, the authors specifically focus on the data offloading task for mobile edge computing (MEC) systems. To achieve joint collaboration, the heterogeneous multi-agent actor-critic (H-MAAC) framework is proposed, in which edge devices independently learn the interactive strategies through their own observations. The problem is formulated as a multi-agent MDP for modeling edge devices’ data allocation strategies, <i>i.e.</i>, moving the data, locally executing or offloading to a cloud server. The corresponding joint cooperation algorithm that combines the edge federated model with the multi-agent actor-critic RL is also presented. Dual lightweight neural networks are built, comprising original actor/critic networks and target actor/critic networks.</p><p class="">Blockchain technology has also attracted lot attention from researchers in edge computing fields since it is able to provide reliable data management within the massive distributed edge nodes. In the study from Yu <i>et al.</i><sup>[<a href="#B94" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B94">94</a>]</sup>, the intelligent ultra-dense edge computing (I-UDEC) framework is proposed, integrating with blockchain and RL technologies into 5G ultra-dense edge computing networks. In order to achieve low overhead computation offloading decisions and resource allocation strategies, authors design a two-timescale deep reinforcement learning (2TS-DRL) approach, which consists of a fast-timescale and a slow-timescale learning process. The target model can be trained in a distributed manner via FL architecture, protecting the privacy of edge devices.</p><p class="">Additionally, to deal with the different types of optimization tasks, variants of FRL are being studied. Zhu <i>et al.</i><sup>[<a href="#B95" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B95">95</a>]</sup> presents a resource allocation method for edge computing systems, called concurrent federated reinforcement learning (CFRL). The edge node continuously receives tasks from serviced IoT devices and stores those tasks in a queue. Depending on its own resource allocation status, the node determines the scheduling strategy so that all tasks are completed as soon as possible. In case the edge host does not have enough available resources for the task, the task can be offloaded to the server. Contrary to the definition of the central server in the basic FRL, the aim of central server in CFRL is to complete the tasks that the edge nodes cannot handle instead of aggregating local models. Therefore, the server needs to train a special resource allocation model based on its own resource status, forwarded tasks and unique rewards. The main idea of CFRL is that edge nodes and the server cooperatively participate in all task processing in order to reduce total computing time and provide a degree of privacy protection.</p></div><div id="sec2-12" class="article-Section"><h3 >5.2. FRL for communication networks</h3><p class="">In parallel with the continuous evolution of communication technology, a number of heterogeneous communication systems are also being developed to adapt to different scenarios. Many researchers are also working toward intelligent management of communication systems. The traditional ML-based management methods are often inefficient due to their centralized data processing architecture and the risk of privacy leakage<sup>[<a href="#B5" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B5">5</a>]</sup>. FRL can play an important role in services slicing and access controlling to replace centralized ML methods.</p><p class="">In communication network services, network function virtualization (NFV) is a critical component of achieving scalability and flexibility. Huang <i>et al.</i><sup>[<a href="#B96" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B96">96</a>]</sup> proposes a novel scalable service function chains orchestration (SSCO) scheme for NFV-enabled networks via FRL. In the work, a federated-learning-based framework for training global learning, along with a time-variant local model exploration, is designed for scalable SFC orchestration. It prevents data sharing among stakeholders and enables quick convergence of the global model. To reduce communication costs, SSCO allows the parameters of local models to be updated just at the beginning and end of each episode through distributed clients and the cloud server. A DRL approach is used to map virtual network functions (VNFs) into networks with local knowledge of resources and instantiation cost. In addition, the authors also propose a loss-weight-based mechanism for generation and exploitation of reference samples for training in replay buffers, avoiding the strong relevance of each sample. Simulation results demonstrate that SSCO can significantly reduce placement errors and improve resource utilization ratios to place time-variant VNFs, as well as achieving desirable scalability.</p><p class="">Network slicing (NS) is also a form of virtual network architecture to support divergent requirements sustainably. The work from Liu <i>et al.</i><sup>[<a href="#B97" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B97">97</a>]</sup> proposes a device association scheme (such as access control and handover management) for radio access network (RAN) slicing by exploiting a hybrid federated deep reinforcement learning (HDRL) framework. In view of the large state-action space and variety of services, HDRL is designed with two layers of model aggregations. Horizontal aggregation deployed on BSs is used for the same type of service. Generally, data samples collected by different devices within the same service have similar features. The discrete-action DRL algorithm, <i>i.e.</i>, DDQN, is employed to train the local model on individual smart devices. BS is able to aggregate model parameters and establish a cooperative global model. Vertical aggregation developed on the third encrypted party is responsible for the services of different types. In order to promote collaboration between devices with different tasks, authors aggregate local access features to form a global access feature, in which the data from different flows is strongly correlated since different data flows are competing for radio resources with each other. Furthermore, the Shapley value<sup>[<a href="#B98" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B98">98</a>]</sup>, which represents the average marginal contribution of a specific feature across all possible feature combinations, is used to reduce communication cost in vertical aggregation based on the global access feature. Simulation results show that HDRL can improve network throughput and communication efficiency.</p><p class="">The open radio access network (O-RAN) has emerged as a paradigm for supporting multi-class wireless services in 5G and beyond networks. To deal with the two critical issues of load balance and handover control, Cao <i>et al.</i><sup>[<a href="#B99" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B99">99</a>]</sup> proposes a federated DRL-based scheme to train the model for user access control in the O-RAN. Due to the mobility of UEs and the high cost of the handover between BSs, it is necessary for each UE to access the appropriate BS to optimize its throughput performance. As independent agents, UEs make access decisions with assistance from a global model server, which updates global DQN parameters by averaging DQN parameters of selected UEs. Further, the scheme proposes only partially exchanging DQN parameters to reduce communication overheads, and using the dueling structure to allow convergence for independent agents. Simulation results demonstrate that the scheme increases long-term throughput while avoiding frequent handovers of users with limited signaling overheads.</p><p class="">The issue of optimizing user access is important in wireless communication systems. FRL can provide interesting solutions for enabling efficient and privacy-enhanced management of access control. Zhang <i>et al.</i><sup>[<a href="#B100" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B100">100</a>]</sup> studies the problem of multi-user access in WIFI networks. In order to mitigate collision events on channel access, an enhanced multiple access mechanism based on FRL is proposed for user-dense scenarios. In particular, distributed stations train their local q-learning networks through channel state, access history and feedback from central access point (AP). AP uses the central aggregation algorithm to update the global model every period of time and broadcast it to all stations. In addition, a monte carlo (MC) reward estimation method for the training phase of local model is introduced, which allocates more weight to the reward of that current state by reducing the previous cumulative reward.</p><p class="">FRL is also studied for intelligent cyber-physical systems (ICPS), which aims to meet the requirements of intelligent applications for high-precision, low-latency analysis of big data. In light of the heterogeneity brought by multiple agents, the central RL-based resource allocation scheme has non-stationary issues and does not consider privacy issues. Therefore, the work from Xu <i>et al.</i><sup>[<a href="#B101" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B101">101</a>]</sup> proposes a multi-agent FRL (MA-FRL) mechanism which synthesizes a good inferential global policy from encrypted local policies of agents without revealing private information. The data resource allocation and secure communication problems are formulated as a Stackelberg game with multiple participants, including near devices (NDs), far devices (FDs) and relay devices (RDs). Take into account the limited scope of the heterogeneous devices, the authors model this multi-agent system as a POMDP. Furthermore, it is proved that MA-FRL is <i>µ</i>-strongly convex and <i>β</i>-smooth and derives its convergence speed in expectation.</p><p class="">Zhang <i>et al.</i><sup>[<a href="#B102" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B102">102</a>]</sup> pays attention to the challenges in cellular vehicle-to-everything (V2X) communication for future vehicular applications. A joint optimization problem of selecting the transmission mode and allocating the resources is presented. This paper proposes a decentralized DRL algorithm for maximizing the amount of available vehicle-to-infrastructure capacity while meeting the latency and reliability requirements of vehicle-to-vehicle (V2V) pairs. Considering limited local training data at vehicles, the federated learning algorithm is conducted on a small timescale. On the other hand, the graph theory-based vehicle clustering algorithm is conducted on a large timescale.</p><p class="">The development of communication technologies in extreme environments is important, including deep underwater exploration. The architecture and philosophy of FRL are applied to smart ocean applications in the study of Kwon<sup>[<a href="#B103" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B103">103</a>]</sup>. To deal with the nonstationary environment and unreliable channels of underwater wireless networks, the authors propose a multi-agent DRL-based algorithm that can realize FL computation with internet-of-underwater-things (IoUT) devices in the ocean environment. The cooperative model is trained by MADDPG for cell association and resource allocation problems. As for downlink throughput, it is found that the proposed MADDPG-based algorithm performed 80% and 41% better than the standard actor-critic and DDPG algorithms, respectively.</p></div><div id="sec2-13" class="article-Section"><h3 >5.3. FRL for control optimization</h3><p class="">Reinforcement learning based control schemes are considered as one of the most effective ways to learn a nonlinear control strategy in complex scenarios, such as robotics. Individual agent’s exploration of the environment is limited by its own field of vision and usually needs a great deal of training to obtain the optimal strategy. The FRL-based approach has emerged as an appealing way to realize control optimization without exposing agent data or compromising privacy.</p><p class="">Automated control of robots is a typical example of control optimization problems. Liu <i>et al.</i><sup>[<a href="#B57" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B57">57</a>]</sup> discusses robot navigation scenarios and focuses on how to make robots transfer their experience so that they can make use of prior knowledge and quickly adapt to changing environments. As a solution, a cooperative learning architecture, called LFRL, is proposed for navigation in cloud robotic systems. Under the FRL-based architecture, the authors propose a corresponding knowledge fusion algorithm to upgrade the shared model deployed on the cloud. In addition, the paper also discusses the problems and feasibility of applying transfer learning algorithms to different tasks and network structures between the shared model and the local model.</p><p class="">FRL is combined with autonomous driving of robotic vehicles in the study of Liang <i>et al.</i><sup>[<a href="#B104" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B104">104</a>]</sup>. To reach rapid training from a simulation environment to a real-world environment, Liang <i>et al.</i><sup>[<a href="#B104" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B104">104</a>]</sup> presents a federated transfer reinforcement learning (FTRL) framework for knowledge extraction where all the vehicles make corresponding actions with the knowledge learned by others. The framework can potentially be used to train more powerful tasks by pooling the resources of multiple entities without revealing raw data information in real-life scenarios. To evaluate the feasibility of the proposed framework, authors perform real-life experiments on steering control tasks for collision avoidance of autonomous driving robotic cars and it is demonstrated that the framework has superior performance to the non-federated local training process. Note that the framework can be considered an extension of HFRL, because the target tasks to be accomplished are highly-relative and all observation data are pre-aligned.</p><p class="">FRL also appears as an attractive approach for enabling intelligent control of IoT devices without revealing private information. Lim<i>et al.</i><sup>[<a href="#B105" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B105">105</a>]</sup> proposes a FRL architecture which allows agents working on independent IoT devices to share their learning experiences with each other, and transfer the policy model parameters to other agents. The aim is to effectively control multiple IoT devices of the same type but with slightly different dynamics. Whenever an agent meets the predefined criteria, its mature model will be shared by the server with all other agents in training. The agents continue training based on the shared model until the local model converges in the respective environment. The actor-critical proximal policy optimization (Actor-Critic PPO) algorithm is integrated into the control of multiple rotary inverted pendulum (RIP) devices. The results show that the proposed architecture facilitates the learning process and if more agents participate the learning speed can be improved. In addition, Lim <i>et al.</i><sup>[<a href="#B106" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B106">106</a>]</sup> uses FRL architecture based on a multi-agent environment to solve the problems and limitations of RL for applications to the real-world problems. The proposed federation policy allows multiple agents to share their learning experiences to get better learning efficacy. The proposed scheme adopts Actor-Critic PPO algorithm for four types of RL simulation environments from OpenAI Gym as well as RIP in real control systems. Compared to a previous real-environment study, the scheme enhances learning performance by approximately 1.2 times.</p></div><div id="sec2-14" class="article-Section"><h3 >5.4. FRL for attack detection</h3><p class="">With the heterogeneity of services and the sophistication of threats, it is challenging to detect these attacks using traditional methods or centralized ML-based methods, which have a high false alarm rate and do not take privacy into account. FRL offers a powerful alternative to detecting attacks and provides support for network defense in different scenarios.</p><p class="">Because of various constraints, IoT applications have become a primary target for malicious adversaries that can disrupt normal operations or steal confidential information. In order to address the security issues in flying ad-hoc network (FANET), Mowla <i>et al.</i><sup>[<a href="#B107" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B107">107</a>]</sup> proposes an adaptive FRL-based jamming attack defense strategy for unmanned aerial vehicles (UAVs). A model-free Q-learning mechanism is developed and deployed on distributed UAVs to cooperatively learn detection models for jamming attacks. According to the results, the average accuracy of the federated jamming detection mechanism, employed in the proposed defense strategy, is 39.9% higher than the distributed mechanism when verified with the CRAWDAD standard and the ns-3 simulated FANET jamming attack dataset.</p><p class="">An efficient traffic monitoring framework, known as DeepMonitor, is presented in the study of Nguyen <i>et al.</i><sup>[<a href="#B108" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B108">108</a>]</sup> to provide fine-grained traffic analysis capability at the edge of software defined network (SDN) based IoT networks. The agents deployed in edge nodes consider the different granularity-level requirements and their maximum flow-table capacity to achieve the optimal flow rule match-field strategy. The control optimization problem is formulated as the MDP and a federated DDQN algorithm is developed to improve the learning performance of agents. The results show that the proposed monitoring framework can produce reliable traffic granularity at all levels of traffic granularity and substantially mitigate the issue of flow-table overflows. In addition, the distributed denial of service (DDoS) attack detection performance of an intrusion detection system can be enhanced by up to 22.83% by using DeepMonitor instead of FlowStat.</p><p class="">In order to reduce manufacturing costs and improve production efficiency, the industrial internet of things (IIoT) is proposed as a potentially promising research direction. It is a challenge to implement anomaly detection mechanisms in IIoT applications with data privacy protection. Wang <i>et al.</i><sup>[<a href="#B109" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B109">109</a>]</sup> proposes a reliable anomaly detection strategy for IIoT using FRL techniques. In the system framework, there are four entities involved in establishing the detection model, <i>i.e.</i>, the Global Anomaly Detection Center (GADC), the Local Anomaly Detection Center (LADC), the Regional Anomaly Detection Center (RADC), and the users. The anomaly detection is suggested to be implemented in two phases, including anomaly detection for RADC and users. Especially, the GADC can build global RADC anomaly detection models based on local models trained by LADCs. Different from RADC anomaly detection based on action deviations, user anomaly detection is mainly concerned with privacy leakage and is employed by RADC and GADC. Note that the DDPG algorithm is applied for local anomaly detection model training.</p></div><div id="sec2-15" class="article-Section"><h3 >5.5. FRL for other applications</h3><p class="">Due to the outstanding performance of training efficiency and privacy protection, many researchers are exploring the possible applications of FRL.</p><p class="">FL has been applied to realize distributed energy management in IoT applications. In the revolution of smart home, smart meters are deployed in the advanced metering infrastructure (AMI) to monitor and analyze the energy consumption of users in real-time. As an example<sup>[<a href="#B110" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B110">110</a>]</sup>, the FRL-based approach is proposed for the energy management of multiple smart homes with solar PVs, home appliances, and energy storage. Multiple local home energy management systems (LHEMSs) and a global server (GS) make up FRL architecture of the smart home. DRL agents for LHEMSs construct and upload local models to the GS by using energy consumption data. The GS updates the global model based on local models of LHEMSs using the federated stochastic gradient descent (FedSGD) algorithm. Under heterogeneous home environments, simulation results indicate that the proposed approach outperforms others when it comes to convergence speed, appliance energy consumption, and the number of agents.</p><p class="">Moreover, FRL offers an alternative to share information with low latency and privacy preservation. The collaborative perception of vehicles provided by IoV can greatly enhance the ability to sense things beyond their line of sight, which is important for autonomous driving. Region quadtrees have been proposed as a storage and communication resource-saving solution for sharing perception information<sup>[<a href="#B111" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B111">111</a>]</sup>. It is challenging to tailor the number and resolution of transmitted quadtree blocks to bandwidth availability. In the framework of FRL, Mohamed <i>et al.</i><sup>[<a href="#B112" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B112">112</a>]</sup> presents a quadtree-based point cloud compression mechanism to select cooperative perception messages. Specifically, over a period of time, each vehicle covered by an RSU transfers its latest network weights with the RSU, which then averages all of the received model parameters and broadcasts the result back to the vehicles. Optimal sensory information transmission (<i>i.e.</i>, quadtree blocks) and appropriate resolution levels for a given vehicle pair are the main objectives of a vehicle. The dueling and branching concepts are also applied to overcome the vast action space inherent in the formulation of the RL problem. Simulation results show that the learned policies achieve higher vehicular satisfaction and the training process is enhanced by FRL.</p></div><div id="sec2-16" class="article-Section"><h3 >5.6. Lessons Learned</h3><p class="">In the following, we summarize the major lessons learned from this survey in order to provide a comprehensive understanding of current research on FRL applications.</p><div id="sec3-4" class="article-Section"><h4 >5.6.1. Lessons learned from the aggregation algorithms</h4><p class="">The existing FRL literature usually uses classical DRL algorithms, such as DQN and DDPG, at the participants, while the gradients or parameters of the critic and/or actor networks are periodically reported synchronously or asynchronously by the participants to the coordinator. The coordinator then aggregates the parameters or gradients and sends the updated values to the participants. In order to meet the challenges presented by different scenarios, the aggregation algorithms have been designed as a key feature of FRL. In the original FedAvg algorithm<sup>[<a href="#B12" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B12">12</a>]</sup>, the number of samples in a participant’s dataset determines its influence on the global model. In accordance with this idea, several papers propose different methods to calculate the weights in the aggregation algorithms according to the requirement of application. In the study from Lim <i>et al.</i><sup>[<a href="#B106" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B106">106</a>]</sup>, the aggregation weight is derived from the average of the cumulative rewards of the last ten episodes. Greater weights are placed on the models of those participants with higher rewards. In contrast to the positive correlation of reward, Huang <i>et al.</i><sup>[<a href="#B96" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B96">96</a>]</sup> takes the error rate of action as an essential factor to assign weights for participating in the global model training. In D2D -assisted edge caching, Wang <i>et al.</i><sup>[<a href="#B89" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B89">89</a>]</sup> uses the reward and some device-related indicators as the measurement to evaluate the local model’s contribution to the global model. Moreover, the existing FRL methods based on offline DRL algorithms, such DQN and DDPG, usually use experience replay. Sampling random batch from replay memory can break correlations of continuous transition tuples and accelerate the training process. To arrive at an accurate evaluation of the participants, the paper<sup>[<a href="#B102" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B102">102</a>]</sup> calculates the aggregation weight based on the size of the training batch in each iteration.</p><p class="">The above aggregation methods can effectively deal with the issue of data imbalance and performance discrepancy between participants, but it is hard for participants to cope with subtle environmental differences. According to the paper<sup>[<a href="#B105" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B105">105</a>]</sup>, as soon as a participant reaches the predefined criteria in its own environment, it should stop learning and send its model parameters as a reference to the remaining individuals. Exchanging mature network models (satisfying terminal conditions) can help other participants complete their training quickly. Participants in other similar environments can continue to use FRL for further updating their parameters to achieve the desired model performance according to their individual environments. Liu <i>et al.</i><sup>[<a href="#B57" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B57">57</a>]</sup> also suggests that the sharing global model in the cloud is not the final policy model for local participants. An effective transfer learning should be applied to resolve the structural difference between the shared network and private network.</p></div><div id="sec3-5" class="article-Section"><h4 >5.6.2. Lessons learned from the relationship between FL and RL</h4><p class="">In most of the literature on FRL, FL is used to improve the performance of RL. With FL, the learning experience can be shared among decentralized multiple parties while ensuring privacy and scalability without requiring direct data offloading to servers or third parties. Therefore, FL can expand the scope and enhance the security of RL. Among the applications of FRL, most researchers focus on the communication network system due to its robust security requirements, advanced distributed architecture, and a variety of decision-making tasks. Data offloading<sup>[<a href="#B93" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B93">93</a>]</sup> and caching<sup>[<a href="#B89" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B89">89</a>]</sup> solutions powered by distributed AI are available from FRL. In addition, with the ability to detect a wide range of attacks and support defense solutions, FRL has emerged as a strong alternative for performing distributed learning for security-sensitive scenarios. Enabled by the privacy-enhancing and cooperative features, detection and defense solutions can be learned quickly where multiple participants join to build a federated model <sup>[<a href="#B107" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B107">107</a>,<a href="#B109" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B109">109</a>]</sup>. FRL can also provide viable solutions to realize intelligence for control systems with many applied domains such as robotics<sup>[<a href="#B57" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B57">57</a>]</sup> and autonomous driving<sup>[<a href="#B104" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B104">104</a>]</sup> without data exchange and privacy leakage. The data owners (robot or vehicle) may not trust the third-party server and therefore hesitate to upload their private information to potentially insecure learning systems. Each participant of FRL runs a separate RL model for determining its own control policy and gains experience by sharing model parameters, gradients or losses.</p><p class="">Meanwhile, RL may have the potential to optimize FL schemes and improve the efficiency of training. Due to the unstable network connectivity, it is not practical for FL to update and aggregate models simultaneously across all participants. Therefore, Wang <i>et al.</i><sup>[<a href="#B113" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B113">113</a>]</sup> proposes a RL-based control framework that intelligently chooses the participants to participate in each round of FL with the aim to speed up convergence. Similarly, Zhang <i>et al.</i><sup>[<a href="#B114" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B114">114</a>]</sup> applies RL to pre-select a set of candidate edge participants, and then determine reliable edge participants through social attribute perception. In IoT or IoV scenarios, due to the heterogeneous nature of participating devices, different computing and communication resources are available to them. RL can speed up training by coordinating the allocation of resources between participants. Zhan <i>et al.</i><sup>[<a href="#B115" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B115">115</a>]</sup> defines the L4L (Learning for Learning) concept, <i>i.e.</i>, use RL to improve FL. Using the heterogeneity of participants and dynamic network connections, this paper investigates a computational resource control problem for FL that simultaneously considers learning time and energy efficiency. An experience-driven resource control approach based on RL is presented to derive the near-optimal strategy with only the participants’ bandwidth information in the previous training rounds. In addition, as with any other ML algorithm, FL algorithms are vulnerable to malicious attacks. RL has been studied to defend against attacks in various scenarios, and it can also enhance the security of FL. The paper<sup>[<a href="#B116" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B116">116</a>]</sup> proposes a reputation-aware RL (RA-RL) based selection method to ensure that FL is not disrupted. The participating devices’ attributes, including computing resources and trust values, <i>etc</i>, are used as part of the environment in RL. In the aggregation of the global model, devices with high reputation levels will have a greater chance of being considered to reduce the effects of malicious devices mixed into FL.</p></div><div id="sec3-6" class="article-Section"><h4 >5.6.3. Lessons learned from categories of FRL</h4><p class="">As discussed above, FRL can be divided into two main categories, <i>i.e.</i>, HFRL and VFRL. Currently, most of the existing research is focused on HFRL, while little attention is devoted to VFRL. The reason for this is that HFRL has obvious application scenarios, where multiple participants have similar decision-making tasks with individual environments, such as caching allocation<sup>[<a href="#B59" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B59">59</a>]</sup>, offloading optimization<sup>[<a href="#B58" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B58">58</a>]</sup>, and attack monitoring<sup>[<a href="#B108" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B108">108</a>]</sup>. The participants and coordinator only need to train a similar model with the same state and action spaces. Consequently, the algorithm design can be implemented and the training convergence can be verified relatively easily. On the other hand, even though VFRL has a higher degree of technical difficulty at the algorithm design level, it also has a wide range of possible applications. In a multi-agent scenario, for example, a single agent is limited by its ability to observe only part of the environment, whereas the transition of the environment is determined by the behavior of all the agents. Zhuo <i>et al.</i><sup>[<a href="#B65" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B65">65</a>]</sup> assumes agents cannot share their partial observations of the environment and some agents are unable to receive rewards. The federated Q-network aggregation algorithm between two agents is proposed for VFRL. The paper<sup>[<a href="#B97" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B97">97</a>]</sup> specifically applies both HFRL and VFRL for radio access network slicing. For the same type of services, similar data samples are trained locally at participating devices, and BSs perform horizontal aggregation to integrate a cooperative access model by adopting an iterative approach. The terminal device also can optimize the selection of base stations and network slices based on the global model of VFRL, which aggregates access features generated by different types of services on the third encrypted party. The method improves the device’s ability to select the appropriate access points when initiating different types of service requests under restrictions regarding privacy protection. The feasible implementation of VFRL also provides guidance for future research.</p></div></div></div><div id="sec1-6" class="article-Section"><h2 >6. Open issues and future research directions</h2><p class="">As we presented in the previous section, FRL serves an increasingly important role as an enabler of various applications. While the FRL-based approach possesses many advantages, there are a number of critical open issues to consider for future implementation. Therefore, this section focuses on several key challenges, including those inherited from FL such as security and communication issues, as well as those unique to FRL. Research on tackling these issues offers interesting directions for the future.</p><div id="sec2-17" class="article-Section"><h3 >6.1. Learning convergence in HFRL</h3><p class="">In realistic HFRL scenarios, while the agents perform similar tasks, the inherent dynamics for the different environments in which the agents reside are usually not exactly identically distributed. The slight difference in the stochastic properties of the transition models for multiple agents could cause the learning convergence issue. One possible method to address this problem is by adjusting the frequency of global aggregation, <i>i.e.</i>, after each global aggregation, a period of time is left for each agent to fine-tune its local parameters according to its own environment. Apart from the non-identical environment problem, another interesting and important problem is how to leverage FL to make RL algorithms converge better and faster. It is well-known that DRL algorithms could be unstable and diverge, especially when off-policy training is combined with function approximation and bootstrapping. In FRL, the training curves of some agents could diverge while others converge although the agents are trained in the exact replicas of the same environment. By leveraging FL, it is envisioned that we could expedite the training process as well as increase the stability. For example, we could selectively aggregate the parameters of a subset of agents with a larger potential for convergence, and later transfer the converged parameters to all the agents. To tackle the above problems, several possible solutions proposed for FL algorithm contains certain reference significance. For example, server operators could account for heterogeneity inherent in partial information by adding a proximal term<sup>[<a href="#B117" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B117">117</a>]</sup>. The local updates submitted by agents are constrained by the tunable term and have a different effect on the global parameters. In addition, a probabilistic agent selection scheme can be implemented to select the agents whose local FL models have significant effects on the global model to minimize the FL convergence time and the FL training loss<sup>[<a href="#B118" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B118">118</a>]</sup>. Another problem is theoretical analysis of the convergence bounds. Although some existing studies have been directed at this problem<sup>[<a href="#B119" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B119">119</a>]</sup>, the convergence can be guaranteed since the loss function is convex. How to analyze and evaluate the non-convex loss functions in HFRL is also an important research topic in the future.</p></div><div id="sec2-18" class="article-Section"><h3 >6.2. Agents without rewards in VFRL</h3><p class="">In most existing works, all the RL agents have the ability to take part in full interaction with the environment and can generate their own actions and rewards. Even though some MARL agents may not participate in the policy decision, they still generate their own reward for evaluation. In some scenarios, special agents in VFRL take the role of providing assistance to other agents. They can only observe the environment and pass on the knowledge of their observation, so as to help other agents make more effective decisions. Therefore, such agents do not have their own actions and rewards. The traditional RL models cannot effectively deal with this thorny problem. Many algorithms either directly use the states of such agents as public knowledge in the system model or design corresponding action and reward for such agents, which may be only for convenience of calculation and have no practical significance. These approaches cannot fundamentally overcome the challenge, especially when privacy protection is also an essential objective to be complied with. Although the FedRL algorithm<sup>[<a href="#B65" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B65">65</a>]</sup> is proposed to deal with the above problem, which has demonstrated good performance, there are still some limitations. First of all, the number of agents used in experiments and algorithms is limited to two, which means the scalability of this algorithm is not high and VFRL algorithms for a large number of agents need to be designed. Secondly, this algorithm uses Q-network as the federated model, which is a relatively simple algorithm. Therefore, how to design VFRL models based on other more complex and changeable networks remains an open issue.</p></div><div id="sec2-19" class="article-Section"><h3 >6.3. Communications</h3><p class="">In FRL, the agents need to exchange the model parameters, gradients, intermediate results, etc., between themselves or with a central server. Due to the limited communication resources and battery capacity, the communication cost is an important consideration when implementing these applications. With an increased number of participants, the coordinator has to bear more network workload within the client-server FRL model<sup>[<a href="#B120" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B120">120</a>]</sup>. This is because each participant needs to upload and download model updates through the coordinator. Although the distributed peer-to-peer model does not require a central coordinator, each agent may have to exchange information with other participants more frequently. In current research for distributed models, there are no effective model exchange protocols to determine when to share experiences with which agents. In addition, DRL involves updating parameters in deep neural networks. Several popular DRL algorithms, such as DQN<sup>[<a href="#B121" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B121">121</a>]</sup> and DDPG<sup>[<a href="#B122" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B122">122</a>]</sup>, consist of multiple layers or multiple networks. Model updates contain millions of parameters, which isn’t feasible for scenarios with limited communication resources. The research directions for the above issues can be divided into three categories. First, it is necessary to design a dynamic update mechanism for participants to optimize the number of model exchanges. A second research direction is to use model compression algorithms to reduce the amount of communication data. Finally, aggregation algorithms that allow participants to only submit the important parts of local model should be studied further.</p></div><div id="sec2-20" class="article-Section"><h3 >6.4. Privacy and Security</h3><p class="">Although FL provides privacy protection that allows the agents to exchange information in a secure manner during the learning process, it still has several privacy and security vulnerabilities associated with communication and attack<sup>[<a href="#B123" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B123">123</a>]</sup>. As FRL is implemented based on FL algorithms, these problems also exist in FRL in the same or variant form. It is important to note that the data poisoning attack is a different attack mode between FL and FRL. In the existing classification tasks of FL, each piece of training data in the dataset corresponds to a respective label. The attacker flips the labels on training examples in one category onto another while the features of the examples are kept unchanged, misguiding the establishment of a target model<sup>[<a href="#B124" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B124">124</a>]</sup>. However, in the decision-making task of FRL, the training data is continuously generated from the interaction between the agent and the environment. As a result, the data poisoning attack is implemented in another way. For example, the malicious agent tampers with the reward, which causes the evaluative function to shift. An option is to conduct regular safety assessments for all participants. Participants whose evaluation indicator falls below the threshold are punished to reduce the impact on the global model<sup>[<a href="#B125" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B125">125</a>]</sup>. Apart form the insider attacks which are launched by the agents in the FRL system, there may be various outsider attacks which are launched by intruders or eavesdroppers. Intruders may hide in the environment where the agent is and manipulate the transitions of environment to achieve specific goals. In addition, by listening to the communication between the coordinator and the agent, the eavesdropper may infer sensitive information from exchanging parameters and gradients<sup>[<a href="#B126" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B126">126</a>]</sup>. Therefore, the development of technology that detects and protects against attacks and privacy threats does have great potential and is urgently needed.</p></div><div id="sec2-21" class="article-Section"><h3 >6.5. Join and exit mechanisms design</h3><p class="">One overlooked aspect of FRL-based research is the join and exit process of participants. In practice, the management of participants is essential to the normal progression of cooperation. As mentioned earlier in the security issue, the penetration of malicious participants severely impacts the performance of the cooperative model and the speed of training. The joining mechanism provides participants with the legal status to engage in federated cooperation. It is the first line of defense against malicious attackers. In contrast, the exit mechanism signifies the cancellation of the permission for cooperation. Participant-driven or enforced exit mechanisms are both possible. In particular, for synchronous algorithms, ignoring the exit mechanism can negatively impact learning efficiency. This is because the coordinator needs to wait for all participants to submit their information. In the event that any participant is offline or compromised and unable to upload, the time for one round of training will be increased indefinitely. To address the bottleneck, a few studies consider updating the global model using the selected models from a subset of participants <sup>[<a href="#B113" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B113">113</a>,<a href="#B127" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B127">127</a>]</sup>. Unfortunately, there is no comprehensive consideration of the exit mechanism, and the communication of participants is typically assumed to be reliable. Therefore, research gaps of FRL still exist in joining and exiting mechanisms. It is expected that the coordinator or monitoring system, upon discovering a failure, disconnection, or malicious participant, will use the exit mechanism to reduce its impact on the global model or even eliminate it.</p></div><div id="sec2-22" class="article-Section"><h3 >6.6. Incentive mechanisms</h3><p class="">For most studies, the agents taking part in the FRL process are assumed to be honest and voluntary. Each agent provides assistance for the establishment of the cooperation model following the rules and freely shares the masked experience through encrypted parameters or gradients. An agent’s motivation for participation may come from regulation or incentive mechanisms. The FRL process within an organization is usually governed by regulations. For example, BSs belonging to the same company establish a joint model for offloading and caching. Nevertheless, because participants may be members of different organizations or use disparate equipment, it is difficult for regulation to force all parties to share information learned from their own data in the same manner. If there are no regulatory measures, participants prone to selfish behavior will only benefit from the cooperation model but not submit local updates. Therefore, the cooperation of multiple parties, organizations, or individuals requires a fair and efficient incentive mechanism to encourage their active participation. In this way, agents providing more contributions can benefit more and selfish agents unwilling to share there learning experience will receive less benefit. As an example, Google Keyboard<sup>[<a href="#B128" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B128">128</a>]</sup> users can choose whether or not to allow Google to use their data, but if they do, they can benefit from more accurate word prediction. Although an incentive mechanism in a context-aware manner among data owners is proposed in the study from Yu <i>et al.</i><sup>[<a href="#B129" class="Link_style" data-jats-ref-type="bibr" data-jats-rid="B129">129</a>]</sup>, it is not suitable for the RL problems. There is still no clear plan of action regarding how the FRL-based application can be designed to create a reasonable incentive mechanism for inspiring agents to participate in collaborative learning. To be successful, future research needs to propose a quantitative standard for evaluating the contribution of agents in FRL.</p></div><div id="sec2-23" class="article-Section"><h3 >6.7. Peer-to-peer cooperation</h3><p class="">FRL applications have the option of choosing between a central server-client model as well as a distributed peer-to-peer model. A distributed model can not only eliminate the single point of failure, but it can also improve energy efficiency significantly by allowing models to be exchanged directly between two agents. In a typical application, two adjacent cars share experience learned from road condition environment in the form of models with D2D communications to assist autonomous driving. However, the distributed cooperation increases the complexity of the learning system and imposes stricter requirements for application scenarios. This research should include, but not be limited to, the agent selection method for the exchange model, the mechanism for triggering the model exchange, the improvement of algorithm adaptability, and the convergence analysis of the aggregation algorithm.</p></div></div><div id="sec1-7" class="article-Section"><h2 >7. Conclusion</h2><p class="">As a new and potential branch of RL, FL can make learning safer and more efficient while leveraging the benefits of FL. We have discussed the basic definitions of FL and RL as well as our thoughts on their integration in this paper. In general, FRL algorithms can be classified into two categories, <i>i.e.</i>, HFRL and VFRL. Thus, the definition and general framework of these two categories have been given. Specifically, we have highlighted the difference between HFRL and VFRL. Then, a lot of existing FRL schemes have been summarized and analyzed according to different applications. Finally, the potential challenges in the development of FRL algorithms have been explored. Several open issues of FRL have been identified, which will encourage more efforts toward further research in FRL.</p></div><div class="article-Section article-declarations"><h2>Declarations</h2><h3>Authors’ contributions</h3><p>Made substantial contributions to the research and investigation process, reviewed and summarized the literature, wrote and edited the original draft: Qi J, Zhou Q</p><p>Performed oversight and leadership responsibility for the research activity planning and execution, as well as developed ideas and evolution of overarching research aims: Lei L</p><p>Performed critical review, commentary and revision, as well as provided administrative, technical, and material support: Zheng K</p><h3>Availability of data and materials</h3><p>Not applicable.</p><h3>Financial support and sponsorship</h3><p>This work was supported by the Natural Sciences and Engineering Research Council (NSERC) of Canada (Discovery Grant No. 401718) and the CARE-AI Seed Fund at the University of Guelph.</p><h3>Conflicts of interest</h3><p>The authors declared that there are no conflicts of interest.</p><h3>Ethical approval and consent to participate</h3><p>Not applicable.</p><h3>Consent for publication</h3><p>Not applicable.</p><h3>Copyright</h3><p>© The Author(s) 2021.</p></div></div> <!----> <div class="art_list" data-v-6dffe839></div> <div class="article_references" data-v-6dffe839><div class="ReferencesBox" data-v-6dffe839><h2 id="References" class="bg_d" data-v-6dffe839><span data-v-6dffe839><a href="/articles//reference" data-v-6dffe839>REFERENCES</a></span> <span class="icon" data-v-6dffe839><i class="el-icon-arrow-down" data-v-6dffe839></i> <i class="el-icon-arrow-up hidden" data-v-6dffe839></i></span></h2> <div class="references_list heightHide" data-v-6dffe839><div id="b1" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>1. </span> <span data-v-6dffe839>Nair A, Srinivasan P, Blackwell S, et al. Massively parallel methods for deep reinforcement learning. CoRR 2015;abs/1507.04296. Available from: <a target="_blank" xmlns:xlink="http://www.w3.org/1999/xlink" href="http://arxiv.org/abs/1507.04296">http://arxiv.org/abs/1507.04296.</a>.</span></p> <div class="refrences" data-v-6dffe839><!----> <!----> <!----></div></div><div id="b2" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>2. </span> <span data-v-6dffe839>Grounds M, Kudenko D. Parallel reinforcement learning with linear function approximation. In: Tuyls K, Nowe A, Guessoum Z, Kudenko D, editors. Adaptive Agents and Multi-Agent Systems III. Adaptation and Multi-Agent Learning. Berlin, Heidelberg: Springer Berlin Heidelberg; 2008. pp. 60-74.</span></p> <div class="refrences" data-v-6dffe839><a href="https://dx.doi.org/10.1007/978-3-540-77949-0_5" target="_blank" data-v-6dffe839><button type="button" class="el-button el-button--text el-button--mini" data-v-6dffe839><!----><!----><span>DOI</span></button></a> <!----> <!----></div></div><div id="b3" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>3. </span> <span data-v-6dffe839>Clemente AV, Martínez HNC, Chandra A. Efficient parallel methods for deep reinforcement learning. CoRR 2017;abs/1705.04862. Available from: <a target="_blank" xmlns:xlink="http://www.w3.org/1999/xlink" href="http://arxiv.org/abs/1705.04862">http://arxiv.org/abs/1705.04862.</a>.</span></p> <div class="refrences" data-v-6dffe839><!----> <!----> <!----></div></div><div id="b4" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>4. </span> <span data-v-6dffe839>Lim WYB, Luong NC, Hoang DT, et al. Federated learning in mobile edge networks: a comprehensive survey. <i>IEEE Communications Surveys Tutorials</i> 2020;22:2031-63.</span></p> <div class="refrences" data-v-6dffe839><a href="https://dx.doi.org/10.1109/COMST.2020.2986024" target="_blank" data-v-6dffe839><button type="button" class="el-button el-button--text el-button--mini" data-v-6dffe839><!----><!----><span>DOI</span></button></a> <!----> <!----></div></div><div id="b5" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>5. </span> <span data-v-6dffe839>Nguyen DC, Ding M, Pathirana PN, et al. Federated learning for internet of things: a comprehensive survey. <i>IEEE Communications Surveys Tutorials</i> 2021;23:1622-58.</span></p> <div class="refrences" data-v-6dffe839><a href="https://dx.doi.org/10.1109/COMST.2021.3075439" target="_blank" data-v-6dffe839><button type="button" class="el-button el-button--text el-button--mini" data-v-6dffe839><!----><!----><span>DOI</span></button></a> <!----> <!----></div></div><div id="b6" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>6. </span> <span data-v-6dffe839>Khan LU, Saad W, Han Z, Hossain E, Hong CS. Federated learning for internet of things: recent advances, taxonomy, and open challenges. <i>IEEE Communications Surveys Tutorials</i> 2021;23:1759-99.</span></p> <div class="refrences" data-v-6dffe839><a href="https://dx.doi.org/10.1109/COMST.2021.3090430" target="_blank" data-v-6dffe839><button type="button" class="el-button el-button--text el-button--mini" data-v-6dffe839><!----><!----><span>DOI</span></button></a> <!----> <!----></div></div><div id="b7" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>7. </span> <span data-v-6dffe839>Yang Q, Liu Y, Cheng Y, et al. 1st ed. Morgan & Claypool; 2019.</span></p> <div class="refrences" data-v-6dffe839><!----> <!----> <!----></div></div><div id="b8" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>8. </span> <span data-v-6dffe839>Yang Q, Liu Y, Chen T, Tong Y. Federated machine learning: concept and applications. <i>ACM Transactions on Intelligent Systems and Technology (TIST)</i> 2019;10:1-19.</span></p> <div class="refrences" data-v-6dffe839><a href="https://dx.doi.org/10.1145/3298981" target="_blank" data-v-6dffe839><button type="button" class="el-button el-button--text el-button--mini" data-v-6dffe839><!----><!----><span>DOI</span></button></a> <!----> <!----></div></div><div id="b9" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>9. </span> <span data-v-6dffe839>Qinbin L, Zeyi W, Bingsheng H. Federated learning systems: vision, hype and reality for data privacy and protection. CoRR 2019;abs/1907.09693. Available from: <a target="_blank" xmlns:xlink="http://www.w3.org/1999/xlink" href="http://arxiv.org/abs/1907.09693">http://arxiv.org/abs/1907.09693.</a>.</span></p> <div class="refrences" data-v-6dffe839><!----> <!----> <!----></div></div><div id="b10" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>10. </span> <span data-v-6dffe839>Li T, Sahu AK, Talwalkar A, Smith V. Federated learning: challenges, methods, and future directions. <i>IEEE Signal Processing Magazine</i> 2020;37:50-60.</span></p> <div class="refrences" data-v-6dffe839><a href="https://dx.doi.org/10.1109/MSP.2020.2975749" target="_blank" data-v-6dffe839><button type="button" class="el-button el-button--text el-button--mini" data-v-6dffe839><!----><!----><span>DOI</span></button></a> <!----> <!----></div></div><div id="b11" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>11. </span> <span data-v-6dffe839>Wang S, Tuor T, Salonidis T, Leung KK, Makaya C, et al. Adaptive federated learning in resource constrained edge computing systems. <i>IEEE Journal on Selected Areas in Communications</i> 2019;37:1205-21.</span></p> <div class="refrences" data-v-6dffe839><a href="https://dx.doi.org/10.1109/JSAC.2019.2904348" target="_blank" data-v-6dffe839><button type="button" class="el-button el-button--text el-button--mini" data-v-6dffe839><!----><!----><span>DOI</span></button></a> <!----> <!----></div></div><div id="b12" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>12. </span> <span data-v-6dffe839>McMahan HB, Moore E, Ramage D, y Arcas BA. Communication-efficient learning of deep networks from decentralized data. CoRR 2016;abs/1602.05629. Available from: <a target="_blank" xmlns:xlink="http://www.w3.org/1999/xlink" href="http://arxiv.org/abs/1602.05629">http://arxiv.org/abs/1602.05629.</a>.</span></p> <div class="refrences" data-v-6dffe839><!----> <!----> <!----></div></div><div id="b13" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>13. </span> <span data-v-6dffe839>Phong LT, Aono Y, Hayashi T, Wang L, Moriai S. Privacy-preserving deep learning via additively homomorphic encryption. <i>IEEE Transactions on Information Forensics and Security</i> 2018;13:1333-45.</span></p> <div class="refrences" data-v-6dffe839><a href="https://dx.doi.org/10.1109/TIFS.2017.2787987" target="_blank" data-v-6dffe839><button type="button" class="el-button el-button--text el-button--mini" data-v-6dffe839><!----><!----><span>DOI</span></button></a> <!----> <!----></div></div><div id="b14" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>14. </span> <span data-v-6dffe839>Zhu H, Jin Y. Multi-objective evolutionary federated learning. <i>IEEE Transactions on Neural Networks and Learning Systems</i> 2020;31:1310-22.</span></p> <div class="refrences" data-v-6dffe839><a href="https://dx.doi.org/10.1109/TNNLS.2019.2919699" target="_blank" data-v-6dffe839><button type="button" class="el-button el-button--text el-button--mini" data-v-6dffe839><!----><!----><span>DOI</span></button></a> <!----> <!----></div></div><div id="b15" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>15. </span> <span data-v-6dffe839>Kairouz P, McMahan HB, Avent B, et al. Advances and open problems in federated learning. CoRR 2019;abs/1912.04977. Available from: <a target="_blank" xmlns:xlink="http://www.w3.org/1999/xlink" href="http://arxiv.org/abs/1912.04977">http://arxiv.org/abs/1912.04977.</a>.</span></p> <div class="refrences" data-v-6dffe839><!----> <!----> <!----></div></div><div id="b16" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>16. </span> <span data-v-6dffe839>Pan SJ, Yang Q. A survey on transfer learning. <i>IEEE Transactions on Knowledge and Data Engineering</i> 2010;22:1345-59.</span></p> <div class="refrences" data-v-6dffe839><a href="https://dx.doi.org/10.1109/TKDE.2009.191" target="_blank" data-v-6dffe839><button type="button" class="el-button el-button--text el-button--mini" data-v-6dffe839><!----><!----><span>DOI</span></button></a> <!----> <!----></div></div><div id="b17" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>17. </span> <span data-v-6dffe839>Li Y. Deep reinforcement learning: an overview. CoRR 2017;abs/1701.07274. Available from: <a target="_blank" xmlns:xlink="http://www.w3.org/1999/xlink" href="http://arxiv.org/abs/1701.07274">http://arxiv.org/abs/1701.07274.</a>.</span></p> <div class="refrences" data-v-6dffe839><!----> <!----> <!----></div></div><div id="b18" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>18. </span> <span data-v-6dffe839>Xu Z, Tang J, Meng J, et al. Experience-driven networking: a deep reinforcement learning based approach. In: IEEE INFOCOM 2018-IEEE Conference on Computer Communications. IEEE; 2018. pp. 1871-79.</span></p> <div class="refrences" data-v-6dffe839><a href="https://dx.doi.org/10.1109/INFOCOM.2018.8485853" target="_blank" data-v-6dffe839><button type="button" class="el-button el-button--text el-button--mini" data-v-6dffe839><!----><!----><span>DOI</span></button></a> <!----> <!----></div></div><div id="b19" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>19. </span> <span data-v-6dffe839>Mohammadi M, Al-Fuqaha A, Guizani M, Oh JS. Semisupervised deep reinforcement learning in support of IoT and smart city services. <i>IEEE Internet of Things Journal</i> 2018;5:624-35.</span></p> <div class="refrences" data-v-6dffe839><a href="https://dx.doi.org/10.1109/JIOT.2017.2712560" target="_blank" data-v-6dffe839><button type="button" class="el-button el-button--text el-button--mini" data-v-6dffe839><!----><!----><span>DOI</span></button></a> <!----> <!----></div></div><div id="b20" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>20. </span> <span data-v-6dffe839>Bu F, Wang X. A smart agriculture IoT system based on deep reinforcement learning. Future Generation Computer Systems 2019;99:500–507. Available from: <a target="_blank" xmlns:xlink="http://www.w3.org/1999/xlink" href="https://www.sciencedirect.com/science/article/pii/S0167739X19307277">https://www.sciencedirect.com/science/article/pii/S0167739X19307277.</a>.</span></p> <div class="refrences" data-v-6dffe839><!----> <!----> <!----></div></div><div id="b21" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>21. </span> <span data-v-6dffe839>Xiong X, Zheng K, Lei L, Hou L. Resource allocation based on deep reinforcement learning in IoT edge computing. <i>IEEE Journal on Selected Areas in Communications</i> 2020;38:1133-46.</span></p> <div class="refrences" data-v-6dffe839><a href="https://dx.doi.org/10.1109/JSAC.2020.2986615" target="_blank" data-v-6dffe839><button type="button" class="el-button el-button--text el-button--mini" data-v-6dffe839><!----><!----><span>DOI</span></button></a> <!----> <!----></div></div><div id="b22" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>22. </span> <span data-v-6dffe839>Lei L, Qi J, Zheng K. Patent analytics based on feature vector space model: a case of IoT. <i>IEEE Access</i> 2019;7:45705-15.</span></p> <div class="refrences" data-v-6dffe839><a href="https://dx.doi.org/10.1109/ACCESS.2019.2909123" target="_blank" data-v-6dffe839><button type="button" class="el-button el-button--text el-button--mini" data-v-6dffe839><!----><!----><span>DOI</span></button></a> <!----> <!----></div></div><div id="b23" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>23. </span> <span data-v-6dffe839>Shalev-Shwartz S, Shammah S, Shashua A. Safe, multi-agent, reinforcement learning for autonomous driving. CoRR 2016;abs/1610.03295. Available from: <a target="_blank" xmlns:xlink="http://www.w3.org/1999/xlink" href="http://arxiv.org/abs/1610.03295">http://arxiv.org/abs/1610.03295.</a>.</span></p> <div class="refrences" data-v-6dffe839><!----> <!----> <!----></div></div><div id="b24" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>24. </span> <span data-v-6dffe839>Sallab AE, Abdou M, Perot E, Yogamani S. Deep reinforcement learning framework for autonomous driving. <i>Electronic Imaging</i> 2017;2017:70-76.</span></p> <div class="refrences" data-v-6dffe839><a href="https://dx.doi.org/10.2352/ISSN.2470-1173.2017.19.AVM-023" target="_blank" data-v-6dffe839><button type="button" class="el-button el-button--text el-button--mini" data-v-6dffe839><!----><!----><span>DOI</span></button></a> <!----> <!----></div></div><div id="b25" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>25. </span> <span data-v-6dffe839>Taylor ME. Teaching reinforcement learning with mario: an argument and case study. In: Second AAAI Symposium on Educational Advances in Artificial Intelligence; 2011. Available from: <a target="_blank" xmlns:xlink="http://www.w3.org/1999/xlink" href="https://www.aaai.org/ocs/index.php/EAAI/EAAI11/paper/viewPaper/3515">https://www.aaai.org/ocs/index.php/EAAI/EAAI11/paper/viewPaper/3515.</a>.</span></p> <div class="refrences" data-v-6dffe839><!----> <!----> <!----></div></div><div id="b26" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>26. </span> <span data-v-6dffe839>Holcomb SD, Porter WK, Ault SV, Mao G, Wang J. Overview on deepmind and its alphago zero ai. In: Proceedings of the 2018 international conference on big data and education 2018. pp. 67-71.</span></p> <div class="refrences" data-v-6dffe839><a href="https://dx.doi.org/10.1145/3206157.3206174" target="_blank" data-v-6dffe839><button type="button" class="el-button el-button--text el-button--mini" data-v-6dffe839><!----><!----><span>DOI</span></button></a> <!----> <!----></div></div><div id="b27" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>27. </span> <span data-v-6dffe839>Watkins CJ, Dayan P. Q-learning. Machine learning 1992;8:279–92. Available from: <a target="_blank" xmlns:xlink="http://www.w3.org/1999/xlink" href="https://link.springer.com/content/pdf/10.1007/BF00992698.pdf">https://link.springer.com/content/pdf/10.1007/BF00992698.pdf.</a>.</span></p> <div class="refrences" data-v-6dffe839><!----> <!----> <!----></div></div><div id="b28" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>28. </span> <span data-v-6dffe839>Thorpe TL. Vehicle traffic light control using sarsa. In: Online]. Available: citeseer. ist. psu. edu/thorpe97vehicle. html. Citeseer; 1997. Available from: <a target="_blank" xmlns:xlink="http://www.w3.org/1999/xlink" href="https://citeseer.ist.psu.edu/thorpe97vehicle.html">https://citeseer.ist.psu.edu/thorpe97vehicle.html.</a>.</span></p> <div class="refrences" data-v-6dffe839><!----> <!----> <!----></div></div><div id="b29" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>29. </span> <span data-v-6dffe839>Silver D, Lever G, Heess N, et al. Deterministic policy gradient algorithms. In: Xing EP, Jebara T, editors. Proceedings of the 31st International Conference on Machine Learning. vol. 32 of Proceedings of Machine Learning Research. Bejing, China: PMLR; 2014. pp. 387–95. Available from: <a target="_blank" xmlns:xlink="http://www.w3.org/1999/xlink" href="https://proceedings.mlr.press/v32/silver14.html">https://proceedings.mlr.press/v32/silver14.html.</a>.</span></p> <div class="refrences" data-v-6dffe839><!----> <!----> <!----></div></div><div id="b30" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>30. </span> <span data-v-6dffe839>Williams RJ. Simple statistical gradient-following algorithms for connectionist reinforcement learning. <i>Machine learning</i> 1992;8:229-56.</span></p> <div class="refrences" data-v-6dffe839><a href="https://dx.doi.org/10.1007/BF00992696" target="_blank" data-v-6dffe839><button type="button" class="el-button el-button--text el-button--mini" data-v-6dffe839><!----><!----><span>DOI</span></button></a> <!----> <!----></div></div><div id="b31" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>31. </span> <span data-v-6dffe839>Konda VR, Tsitsiklis JN. Actor-critic algorithms. In: Advances in neural information processing systems; 2000. pp. 1008–14. Available from: <a target="_blank" xmlns:xlink="http://www.w3.org/1999/xlink" href="https://proceedings.neurips.cc/paper/1786-actor-critic-algorithms.pdf">https://proceedings.neurips.cc/paper/1786-actor-critic-algorithms.pdf</a>.</span></p> <div class="refrences" data-v-6dffe839><!----> <!----> <!----></div></div><div id="b32" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>32. </span> <span data-v-6dffe839>Henderson P, Islam R, Bachman P, et al. Deep reinforcement learning that matters. In: Proceedings of the AAAI conference on artificial intelligence. vol. 32; 2018. Available from: <a target="_blank" xmlns:xlink="http://www.w3.org/1999/xlink" href="https://ojs.aaai.org/index.php/AAAI/article/view/11694">https://ojs.aaai.org/index.php/AAAI/article/view/11694.</a>.</span></p> <div class="refrences" data-v-6dffe839><!----> <!----> <!----></div></div><div id="b33" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>33. </span> <span data-v-6dffe839>Lei L, Tan Y, Dahlenburg G, Xiang W, Zheng K. Dynamic energy dispatch based on Deep Reinforcement Learning in IoT-Driven Smart Isolated Microgrids. <i>IEEE Internet of Things Journal</i> 2021;8:7938-53.</span></p> <div class="refrences" data-v-6dffe839><a href="https://dx.doi.org/10.1109/JIOT.2020.3042007" target="_blank" data-v-6dffe839><button type="button" class="el-button el-button--text el-button--mini" data-v-6dffe839><!----><!----><span>DOI</span></button></a> <!----> <!----></div></div><div id="b34" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>34. </span> <span data-v-6dffe839>Lei L, Xu H, Xiong X, Zheng K, Xiang W, et al. Multiuser resource control with deep reinforcement learning in IoT edge computing. <i>IEEE Internet of Things Journal</i> 2019;6:10119-33.</span></p> <div class="refrences" data-v-6dffe839><a href="https://dx.doi.org/10.1109/JIOT.2019.2935543" target="_blank" data-v-6dffe839><button type="button" class="el-button el-button--text el-button--mini" data-v-6dffe839><!----><!----><span>DOI</span></button></a> <!----> <!----></div></div><div id="b35" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>35. </span> <span data-v-6dffe839>Ohnishi S, Uchibe E, Yamaguchi Y, Nakanishi K, Yasui Y, et al. Constrained deep q-learning gradually approaching ordinary q-learning. <i>Frontiers in neurorobotics</i> 2019;13:103.</span></p> <div class="refrences" data-v-6dffe839><a href="https://dx.doi.org/10.3389/fnbot.2019.00103" target="_blank" data-v-6dffe839><button type="button" class="el-button el-button--text el-button--mini" data-v-6dffe839><!----><!----><span>DOI</span></button></a> <a href="http://www.ncbi.nlm.nih.gov/pubmed/31920613" target="_blank" data-v-6dffe839><button type="button" class="el-button el-button--text el-button--mini" data-v-6dffe839><!----><!----><span>PubMed</span></button></a> <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6914867" target="_blank" data-v-6dffe839><button type="button" class="el-button el-button--text el-button--mini" data-v-6dffe839><!----><!----><span>PMC</span></button></a></div></div><div id="b36" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>36. </span> <span data-v-6dffe839>Peng J, Williams RJ. Incremental multi-step Q-learning. In: machine learning proceedings 1994. Elsevier; 1994. pp. 226-32.</span></p> <div class="refrences" data-v-6dffe839><a href="https://dx.doi.org/10.1016/B978-1-55860-335-6.50035-0" target="_blank" data-v-6dffe839><button type="button" class="el-button el-button--text el-button--mini" data-v-6dffe839><!----><!----><span>DOI</span></button></a> <!----> <!----></div></div><div id="b37" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>37. </span> <span data-v-6dffe839>Mnih V, Kavukcuoglu K, Silver D, et al. Human-level control through deep reinforcement learning. <i>Nature</i> 2015;518:529-33.</span></p> <div class="refrences" data-v-6dffe839><a href="https://dx.doi.org/10.1038/nature14236" target="_blank" data-v-6dffe839><button type="button" class="el-button el-button--text el-button--mini" data-v-6dffe839><!----><!----><span>DOI</span></button></a> <!----> <!----></div></div><div id="b38" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>38. </span> <span data-v-6dffe839>Lei L, Tan Y, Zheng K, et al. Deep reinforcement learning for autonomous internet of things: model, applications and challenges. <i>IEEE Communications Surveys Tutorials</i> 2020;22:1722-60.</span></p> <div class="refrences" data-v-6dffe839><a href="https://dx.doi.org/10.1109/COMST.2020.2988367" target="_blank" data-v-6dffe839><button type="button" class="el-button el-button--text el-button--mini" data-v-6dffe839><!----><!----><span>DOI</span></button></a> <!----> <!----></div></div><div id="b39" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>39. </span> <span data-v-6dffe839>Van Hasselt H, Guez A, Silver D. Deep reinforcement learning with double q-learning. In: proceedings of the AAAI conference on artificial intelligence. vol. 30; 2016. Available from: <a target="_blank" xmlns:xlink="http://www.w3.org/1999/xlink" href="https://ojs.aaai.org/index.php/AAAI/article/view/10295">https://ojs.aaai.org/index.php/AAAI/article/view/10295.</a>.</span></p> <div class="refrences" data-v-6dffe839><!----> <!----> <!----></div></div><div id="b40" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>40. </span> <span data-v-6dffe839>Schaul T, Quan J, Antonoglou I, Silver D. Prioritized experience replay. arXiv preprint arXiv:151105952 2015. Available from: <a target="_blank" xmlns:xlink="http://www.w3.org/1999/xlink" href="https://arxiv.org/abs/1511.05952">https://arxiv.org/abs/1511.05952.</a>.</span></p> <div class="refrences" data-v-6dffe839><!----> <!----> <!----></div></div><div id="b41" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>41. </span> <span data-v-6dffe839>Gu S, Lillicrap TP, Ghahramani Z, Turner RE, Levine S. Q-Prop: sample-efficient policy gradient with an off-policy critic. CoRR 2016;abs/1611.02247. Available from: <a target="_blank" xmlns:xlink="http://www.w3.org/1999/xlink" href="http://arxiv.org/abs/1611.02247">http://arxiv.org/abs/1611.02247.</a>.</span></p> <div class="refrences" data-v-6dffe839><!----> <!----> <!----></div></div><div id="b42" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>42. </span> <span data-v-6dffe839>Haarnoja T, Zhou A, Abbeel P, Levine S. Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: Dy J, Krause A, editors. Proceedings of the 35th International Conference on Machine Learning. vol. 80 of Proceedings of Machine Learning Research. PMLR; 2018. pp. 1861–70. Available from: <a target="_blank" xmlns:xlink="http://www.w3.org/1999/xlink" href="https://proceedings.mlr.press/v80/haarnoja18b.html">https://proceedings.mlr.press/v80/haarnoja18b.html.</a>.</span></p> <div class="refrences" data-v-6dffe839><!----> <!----> <!----></div></div><div id="b43" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>43. </span> <span data-v-6dffe839>Mnih V, Badia AP, Mirza M, Graves A, Lillicrap T, et al. Asynchronous methods for deep reinforcement learning. In: Balcan MF, Weinberger KQ, editors. Proceedings of The 33rd International Conference on Machine Learning. vol. 48 of Proceedings of Machine Learning Research. New York, New York, USA: PMLR; 2016. pp. 1928–37. Available from: <a target="_blank" xmlns:xlink="http://www.w3.org/1999/xlink" href="https://proceedings.mlr.press/v48/mniha16.html">https://proceedings.mlr.press/v48/mniha16.html.</a>.</span></p> <div class="refrences" data-v-6dffe839><!----> <!----> <!----></div></div><div id="b44" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>44. </span> <span data-v-6dffe839>Lillicrap TP, Hunt JJ, Pritzel A, et al. Continuous control with deep reinforcement learning. arXiv preprint arXiv: 150902971 2015. Available from: <a target="_blank" xmlns:xlink="http://www.w3.org/1999/xlink" href="https://arxiv.org/abs/1509.02971">https://arxiv.org/abs/1509.02971.</a>.</span></p> <div class="refrences" data-v-6dffe839><!----> <!----> <!----></div></div><div id="b45" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>45. </span> <span data-v-6dffe839>Barth-Maron G, Hoffman MW, Budden D, et al. Distributed distributional deterministic policy gradients. CoRR 2018;abs/1804.08617. Available from: <a target="_blank" xmlns:xlink="http://www.w3.org/1999/xlink" href="http://arxiv.org/abs/1804.08617">http://arxiv.org/abs/1804.08617.</a>.</span></p> <div class="refrences" data-v-6dffe839><!----> <!----> <!----></div></div><div id="b46" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>46. </span> <span data-v-6dffe839>Fujimoto S, van Hoof H, Meger D. Addressing function approximation error in actor-critic methods. In: Dy J, Krause A, editors. Proceedings of the 35th International Conference on Machine Learning. vol. 80 of Proceedings of Machine Learning Research. PMLR; 2018. pp. 1587–96. Available from: <a target="_blank" xmlns:xlink="http://www.w3.org/1999/xlink" href="https://proceedings.mlr.press/v80/fujimoto18a.html">https://proceedings.mlr.press/v80/fujimoto18a.html.</a>.</span></p> <div class="refrences" data-v-6dffe839><!----> <!----> <!----></div></div><div id="b47" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>47. </span> <span data-v-6dffe839>Schulman J, Levine S, Abbeel P, Jordan M, Moritz P. Trust region policy optimization. In: Bach F, Blei D, editors. Proceedings of the 32nd International Conference on Machine Learning. vol. 37 of Proceedings of Machine Learning Research. Lille, France: PMLR; 2015. pp. 1889–97. Available from: <a target="_blank" xmlns:xlink="http://www.w3.org/1999/xlink" href="https://proceedings.mlr.press/v37/schulman15.html">https://proceedings.mlr.press/v37/schulman15.html.</a>.</span></p> <div class="refrences" data-v-6dffe839><!----> <!----> <!----></div></div><div id="b48" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>48. </span> <span data-v-6dffe839>Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O. Proximal policy optimization algorithms. CoRR 2017;abs/1707.06347. Available from: <a target="_blank" xmlns:xlink="http://www.w3.org/1999/xlink" href="http://arxiv.org/abs/1707.06347">http://arxiv.org/abs/1707.06347.</a>.</span></p> <div class="refrences" data-v-6dffe839><!----> <!----> <!----></div></div><div id="b49" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>49. </span> <span data-v-6dffe839>Zhu P, Li X, Poupart P. On improving deep reinforcement learning for POMDPs. CoRR 2017;abs/1704.07978. Available from: <a target="_blank" xmlns:xlink="http://www.w3.org/1999/xlink" href="http://arxiv.org/abs/1704.07978">http://arxiv.org/abs/1704.07978.</a>.</span></p> <div class="refrences" data-v-6dffe839><!----> <!----> <!----></div></div><div id="b50" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>50. </span> <span data-v-6dffe839>Hausknecht M, Stone P. Deep recurrent q-learning for partially observable mdps. In: 2015 aaai fall symposium series; 2015. Available from: <a target="_blank" xmlns:xlink="http://www.w3.org/1999/xlink" href="https://www.aaai.org/ocs/index.php/FSS/FSS15/paper/viewPaper/11673">https://www.aaai.org/ocs/index.php/FSS/FSS15/paper/viewPaper/11673.</a>.</span></p> <div class="refrences" data-v-6dffe839><!----> <!----> <!----></div></div><div id="b51" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>51. </span> <span data-v-6dffe839>Heess N, Hunt JJ, Lillicrap TP, Silver D. Memory-based control with recurrent neural networks. CoRR 2015;abs/1512.04455. Available from: <a target="_blank" xmlns:xlink="http://www.w3.org/1999/xlink" href="http://arxiv.org/abs/1512.04455">http://arxiv.org/abs/1512.04455.</a>.</span></p> <div class="refrences" data-v-6dffe839><!----> <!----> <!----></div></div><div id="b52" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>52. </span> <span data-v-6dffe839>Foerster J, Nardelli N, Farquhar G, et al. Stabilising experience replay for deep multi-agent reinforcement learning. In: Precup D, Teh YW, editors. Proceedings of the 34th International Conference on Machine Learning. vol. 70 of Proceedings of Machine Learning Research. PMLR; 2017. pp. 1146–55. Available from: <a target="_blank" xmlns:xlink="http://www.w3.org/1999/xlink" href="https://proceedings.mlr.press/v70/foerster17b.html">https://proceedings.mlr.press/v70/foerster17b.html.</a>.</span></p> <div class="refrences" data-v-6dffe839><!----> <!----> <!----></div></div><div id="b53" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>53. </span> <span data-v-6dffe839>Van der Pol E, Oliehoek FA. Coordinated deep reinforcement learners for traffic light control. Proceedings of learning, inference and control of multi-agent systems (at NIPS 2016) 2016. Available from: <a target="_blank" xmlns:xlink="http://www.w3.org/1999/xlink" href="https://www.elisevanderpol.nl/papers/vanderpolNIPSMALIC2016.pdf">https://www.elisevanderpol.nl/papers/vanderpolNIPSMALIC2016.pdf.</a>.</span></p> <div class="refrences" data-v-6dffe839><!----> <!----> <!----></div></div><div id="b54" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>54. </span> <span data-v-6dffe839>Foerster J, Farquhar G, Afouras T, Nardelli N, Whiteson S. Counterfactual multi-agent policy gradients. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 32; 2018. Available from: <a target="_blank" xmlns:xlink="http://www.w3.org/1999/xlink" href="https://ojs.aaai.org/index.php/AAAI/article/view/11794">https://ojs.aaai.org/index.php/AAAI/article/view/11794.</a>.</span></p> <div class="refrences" data-v-6dffe839><!----> <!----> <!----></div></div><div id="b55" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>55. </span> <span data-v-6dffe839>Lowe R, Wu Y, Tamar A, et al. Multi-agent actor-critic for mixed cooperative-competitive environments. CoRR 2017;abs/1706.02275. Available from: <a target="_blank" xmlns:xlink="http://www.w3.org/1999/xlink" href="http://arxiv.org/abs/1706.02275">http://arxiv.org/abs/1706.02275.</a>.</span></p> <div class="refrences" data-v-6dffe839><!----> <!----> <!----></div></div><div id="b56" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>56. </span> <span data-v-6dffe839>Nadiger C, Kumar A, Abdelhak S. Federated Reinforcement Learning for Fast Personalization. In: 2019 IEEE Second International Conference on Artificial Intelligence and Knowledge Engineering (AIKE) 2019. pp. 123-27.</span></p> <div class="refrences" data-v-6dffe839><a href="https://dx.doi.org/10.1109/AIKE.2019.00031" target="_blank" data-v-6dffe839><button type="button" class="el-button el-button--text el-button--mini" data-v-6dffe839><!----><!----><span>DOI</span></button></a> <!----> <!----></div></div><div id="b57" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>57. </span> <span data-v-6dffe839>Liu B, Wang L, Liu M, Xu C. Lifelong federated reinforcement learning: a learning architecture for navigation in cloud robotic systems. CoRR 2019;abs/1901.06455. Available from: <a target="_blank" xmlns:xlink="http://www.w3.org/1999/xlink" href="http://arxiv.org/abs/1901.06455">http://arxiv.org/abs/1901.06455.</a>.</span></p> <div class="refrences" data-v-6dffe839><!----> <!----> <!----></div></div><div id="b58" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>58. </span> <span data-v-6dffe839>Ren J, Wang H, Hou T, Zheng S, Tang C. Federated learning-based computation offloading optimization in edge computing-supported internet of things. <i>IEEE Access</i> 2019;7:69194-201.</span></p> <div class="refrences" data-v-6dffe839><a href="https://dx.doi.org/10.1109/ACCESS.2019.2919736" target="_blank" data-v-6dffe839><button type="button" class="el-button el-button--text el-button--mini" data-v-6dffe839><!----><!----><span>DOI</span></button></a> <!----> <!----></div></div><div id="b59" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>59. </span> <span data-v-6dffe839>Wang X, Wang C, Li X, Leung VCM, Taleb T. Federated deep reinforcement learning for internet of things with decentralized cooperative edge caching. <i>IEEE Internet of Things Journal</i> 2020;7:9441-55.</span></p> <div class="refrences" data-v-6dffe839><a href="https://dx.doi.org/10.1109/JIOT.2020.2986803" target="_blank" data-v-6dffe839><button type="button" class="el-button el-button--text el-button--mini" data-v-6dffe839><!----><!----><span>DOI</span></button></a> <!----> <!----></div></div><div id="b60" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>60. </span> <span data-v-6dffe839>Chen J, Monga R, Bengio S, Józefowicz R. Revisiting distributed synchronous SGD. CoRR 2016;abs/1604.00981. Available from: <a target="_blank" xmlns:xlink="http://www.w3.org/1999/xlink" href="http://arxiv.org/abs/1604.00981">http://arxiv.org/abs/1604.00981.</a>.</span></p> <div class="refrences" data-v-6dffe839><!----> <!----> <!----></div></div><div id="b61" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>61. </span> <span data-v-6dffe839>Mnih V, Badia AP, Mirza M, et al. Asynchronous methods for deep reinforcement learning. In: Balcan MF, Weinberger KQ, editors. Proceedings of The 33rd International Conference on Machine Learning. vol. 48 of Proceedings of Machine Learning Research. New York, New York, USA: PMLR; 2016. pp. 1928–37. Available from: <a target="_blank" xmlns:xlink="http://www.w3.org/1999/xlink" href="https://proceedings.mlr.press/v48/mniha16.html">https://proceedings.mlr.press/v48/mniha16.html.</a>.</span></p> <div class="refrences" data-v-6dffe839><!----> <!----> <!----></div></div><div id="b62" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>62. </span> <span data-v-6dffe839>Espeholt L, Soyer H, Munos R, et al. IMPALA: Scalable distributed deep-RL with importance weighted actor- learner architectures. In: Dy J, Krause A, editors. Proceedings of the 35th International Conference on Machine Learning. vol. 80 of Proceedings of Machine Learning Research. PMLR; 2018. pp. 1407–16. Available from: <a target="_blank" xmlns:xlink="http://www.w3.org/1999/xlink" href="http://proceedings.mlr.press/v80/espeholt18a.html">http://proceedings.mlr.press/v80/espeholt18a.html.</a>.</span></p> <div class="refrences" data-v-6dffe839><!----> <!----> <!----></div></div><div id="b63" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>63. </span> <span data-v-6dffe839>Horgan D, Quan J, Budden D, et al. Distributed prioritized experience replay. CoRR 2018;abs/1803.00933. Available from: <a target="_blank" xmlns:xlink="http://www.w3.org/1999/xlink" href="http://arxiv.org/abs/1803.00933">http://arxiv.org/abs/1803.00933.</a>.</span></p> <div class="refrences" data-v-6dffe839><!----> <!----> <!----></div></div><div id="b64" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>64. </span> <span data-v-6dffe839>Liu T, Tian B, Ai Y, et al. Parallel reinforcement learning: a framework and case study. <i>IEEE/CAA Journal of Automatica Sinica</i> 2018;5:827-35.</span></p> <div class="refrences" data-v-6dffe839><a href="https://dx.doi.org/10.1109/JAS.2018.7511144" target="_blank" data-v-6dffe839><button type="button" class="el-button el-button--text el-button--mini" data-v-6dffe839><!----><!----><span>DOI</span></button></a> <!----> <!----></div></div><div id="b65" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>65. </span> <span data-v-6dffe839>Zhuo HH, Feng W, Xu Q, Yang Q, Lin Y. Federated reinforcement learning. CoRR 2019;abs/1901.08277. Available from: <a target="_blank" xmlns:xlink="http://www.w3.org/1999/xlink" href="http://arxiv.org/abs/1901.08277">http://arxiv.org/abs/1901.08277.</a>.</span></p> <div class="refrences" data-v-6dffe839><!----> <!----> <!----></div></div><div id="b66" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>66. </span> <span data-v-6dffe839>Canese L, Cardarilli GC, Di Nunzio L, et al. Multi-agent reinforcement learning: a review of challenges and applications. <i>Applied Sciences</i> 2021;11:4948. Available from: <a target="_blank" xmlns:xlink="http://www.w3.org/1999/xlink" href="https://doi.org/10.3390/app11114948">https://doi.org/10.3390/app11114948.</a>.</span></p> <div class="refrences" data-v-6dffe839><!----> <!----> <!----></div></div><div id="b67" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>67. </span> <span data-v-6dffe839>Busoniu L, Babuska R, De Schutter B. A Comprehensive survey of multiagent reinforcement learning. <i>IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews)</i> 2008;38:156-72.</span></p> <div class="refrences" data-v-6dffe839><a href="https://dx.doi.org/10.1109/TSMCC.2007.913919" target="_blank" data-v-6dffe839><button type="button" class="el-button el-button--text el-button--mini" data-v-6dffe839><!----><!----><span>DOI</span></button></a> <!----> <!----></div></div><div id="b68" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>68. </span> <span data-v-6dffe839>Zhang K, Yang Z, Başar T. Multi-agent reinforcement learning: a selective overview of theories and algorithms. <i>Handbook of Rein forcement Learning and Control</i> 2021:321-84.</span></p> <div class="refrences" data-v-6dffe839><a href="https://dx.doi.org/10.1007/978-3-030-60990-0_12" target="_blank" data-v-6dffe839><button type="button" class="el-button el-button--text el-button--mini" data-v-6dffe839><!----><!----><span>DOI</span></button></a> <!----> <!----></div></div><div id="b69" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>69. </span> <span data-v-6dffe839>Stone P, Veloso M. Multiagent systems: a survey from a machine learning perspective. <i>Autonomous Robots</i> 2000;8:345-83.</span></p> <div class="refrences" data-v-6dffe839><a href="https://dx.doi.org/10.1023/A:1008942012299" target="_blank" data-v-6dffe839><button type="button" class="el-button el-button--text el-button--mini" data-v-6dffe839><!----><!----><span>DOI</span></button></a> <!----> <!----></div></div><div id="b70" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>70. </span> <span data-v-6dffe839>Szepesvári C, Littman ML. A unified analysis of value-function-based reinforcement-learning algorithms. <i>Neural computation</i> 1999;11:2017-60.</span></p> <div class="refrences" data-v-6dffe839><a href="https://dx.doi.org/10.5555/1121924.1121927" target="_blank" data-v-6dffe839><button type="button" class="el-button el-button--text el-button--mini" data-v-6dffe839><!----><!----><span>DOI</span></button></a> <!----> <!----></div></div><div id="b71" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>71. </span> <span data-v-6dffe839>Littman ML. Value-function reinforcement learning in markov games. <i>Cognitive systems research</i> 2001;2:55-66.</span></p> <div class="refrences" data-v-6dffe839><a href="https://dx.doi.org/10.1016/S1389-0417(01)00015-8" target="_blank" data-v-6dffe839><button type="button" class="el-button el-button--text el-button--mini" data-v-6dffe839><!----><!----><span>DOI</span></button></a> <!----> <!----></div></div><div id="b72" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>72. </span> <span data-v-6dffe839>Tan M. Multi-agent reinforcement learning: independent vs. cooperative agents. In: proceedings of the tenth international conference on machine learning 1993. pp. 330-37.</span></p> <div class="refrences" data-v-6dffe839><!----> <!----> <!----></div></div><div id="b73" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>73. </span> <span data-v-6dffe839>Lauer M, Riedmiller M. An algorithm for distributed reinforcement learning in cooperative multi-agent systems. In: In Proceedings of the Seventeenth International Conference on Machine Learning. Citeseer; 2000. Available from: <a target="_blank" xmlns:xlink="http://www.w3.org/1999/xlink" href="http://citeseerx.ist.psu.edu/viewdoc/summary">http://citeseerx.ist.psu.edu/viewdoc/summary.</a>.</span></p> <div class="refrences" data-v-6dffe839><!----> <!----> <!----></div></div><div id="b74" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>74. </span> <span data-v-6dffe839>Monahan GE. State of the art—a survey of partially observable Markov decision processes: theory, models, and algorithms. <i>Management science</i> 1982;28:1-16.</span></p> <div class="refrences" data-v-6dffe839><a href="https://dx.doi.org/10.1287/mnsc.28.1.1" target="_blank" data-v-6dffe839><button type="button" class="el-button el-button--text el-button--mini" data-v-6dffe839><!----><!----><span>DOI</span></button></a> <!----> <!----></div></div><div id="b75" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>75. </span> <span data-v-6dffe839>Oroojlooyjadid A, Hajinezhad D. A review of cooperative multi-agent deep reinforcement learning. CoRR 2019;abs/1908.03963. Available from: <a target="_blank" xmlns:xlink="http://www.w3.org/1999/xlink" href="http://arxiv.org/abs/1908.03963">http://arxiv.org/abs/1908.03963.</a>.</span></p> <div class="refrences" data-v-6dffe839><!----> <!----> <!----></div></div><div id="b76" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>76. </span> <span data-v-6dffe839>Bernstein DS, Givan R, Immerman N, Zilberstein S. The complexity of decentralized control of Markov decision processes. <i>Mathematics of operations research</i> 2002;27:819-40.</span></p> <div class="refrences" data-v-6dffe839><a href="https://dx.doi.org/10.1287/moor.27.4.819.297" target="_blank" data-v-6dffe839><button type="button" class="el-button el-button--text el-button--mini" data-v-6dffe839><!----><!----><span>DOI</span></button></a> <!----> <!----></div></div><div id="b77" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>77. </span> <span data-v-6dffe839>Omidshafiei S, Pazis J, Amato C, How JP, Vian J. Deep decentralized multi-task multi-agent reinforcement learning under partial observability. In: Precup D, Teh YW, editors. Proceedings of the 34th International Conference on Machine Learning. vol. 70 of Proceedings of Machine Learning Research. PMLR; 2017. pp. 2681–90. Available from: <a target="_blank" xmlns:xlink="http://www.w3.org/1999/xlink" href="https://proceedings.mlr.press/v70/omidshafiei17a.html">https://proceedings.mlr.press/v70/omidshafiei17a.html.</a>.</span></p> <div class="refrences" data-v-6dffe839><!----> <!----> <!----></div></div><div id="b78" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>78. </span> <span data-v-6dffe839>Han Y, Gmytrasiewicz P. Ipomdp-net: A deep neural network for partially observable multi-agent planning using interactive pomdps. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 33 2019. pp. 6062-69.</span></p> <div class="refrences" data-v-6dffe839><a href="https://dx.doi.org/10.1609/aaai.v33i01.33016062" target="_blank" data-v-6dffe839><button type="button" class="el-button el-button--text el-button--mini" data-v-6dffe839><!----><!----><span>DOI</span></button></a> <!----> <!----></div></div><div id="b79" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>79. </span> <span data-v-6dffe839>Karkus P, Hsu D, Lee WS. QMDP-Net: Deep learning for planning under partial observability; 2017. Available from: <a target="_blank" xmlns:xlink="http://www.w3.org/1999/xlink" href="https://arxiv.org/abs/1703.06692">https://arxiv.org/abs/1703.06692.</a>.</span></p> <div class="refrences" data-v-6dffe839><!----> <!----> <!----></div></div><div id="b80" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>80. </span> <span data-v-6dffe839>Mao W, Zhang K, Miehling E, Başar T. Information state embedding in partially observable cooperative multi-agent reinforcement learning. In: 2020 59th IEEE Conference on Decision and Control (CDC) 2020. pp. 6124-31.</span></p> <div class="refrences" data-v-6dffe839><a href="https://dx.doi.org/10.1109/CDC42340.2020.9303801" target="_blank" data-v-6dffe839><button type="button" class="el-button el-button--text el-button--mini" data-v-6dffe839><!----><!----><span>DOI</span></button></a> <!----> <!----></div></div><div id="b81" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>81. </span> <span data-v-6dffe839>Mao H, Zhang Z, Xiao Z, Gong Z. Modelling the dynamic joint policy of teammates with attention multi-agent DDPG. CoRR 2018;abs/1811.07029. Available from: <a target="_blank" xmlns:xlink="http://www.w3.org/1999/xlink" href="http://arxiv.org/abs/1811.07029">http://arxiv.org/abs/1811.07029.</a>.</span></p> <div class="refrences" data-v-6dffe839><!----> <!----> <!----></div></div><div id="b82" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>82. </span> <span data-v-6dffe839>Lee HR, Lee T. Multi-agent reinforcement learning algorithm to solve a partially-observable multi-agent problem in disaster response. <i>European Journal of Operational Research</i> 2021;291:296-308.</span></p> <div class="refrences" data-v-6dffe839><a href="https://dx.doi.org/10.1016/j.ejor.2020.09.018" target="_blank" data-v-6dffe839><button type="button" class="el-button el-button--text el-button--mini" data-v-6dffe839><!----><!----><span>DOI</span></button></a> <!----> <!----></div></div><div id="b83" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>83. </span> <span data-v-6dffe839>Sukhbaatar S, szlam a, Fergus R. Learning multiagent communication with backpropagation. In: Lee D, Sugiyama M, Luxburg U, Guyon I, Garnett R, editors. Advances in Neural Information Processing Systems. vol. 29. Curran Associates, Inc.; 2016. Available from: <a target="_blank" xmlns:xlink="http://www.w3.org/1999/xlink" href="https://proceedings.neurips.cc/paper/2016/file/55b1927fdafef39c48e5b73b5d61ea60-Paper.pdf">https://proceedings.neurips.cc/paper/2016/file/55b1927fdafef39c48e5b73b5d61ea60-Paper.pdf.</a>.</span></p> <div class="refrences" data-v-6dffe839><!----> <!----> <!----></div></div><div id="b84" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>84. </span> <span data-v-6dffe839>Foerster JN, Assael YM, de Freitas N, Whiteson S. Learning to communicate with deep multi-agent reinforcement learning. CoRR 2016;abs/1605.06676. Available from: <a target="_blank" xmlns:xlink="http://www.w3.org/1999/xlink" href="http://arxiv.org/abs/1605.06676">http://arxiv.org/abs/1605.06676.</a>.</span></p> <div class="refrences" data-v-6dffe839><!----> <!----> <!----></div></div><div id="b85" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>85. </span> <span data-v-6dffe839>Buşoniu L, Babuška R, De Schutter B. Multi-agent reinforcement learning: an overview. <i>Innovations in multiagent systems and applications 1</i> 2010:183-221.</span></p> <div class="refrences" data-v-6dffe839><a href="https://dx.doi.org/10.1007/978-3-642-14435-6_7" target="_blank" data-v-6dffe839><button type="button" class="el-button el-button--text el-button--mini" data-v-6dffe839><!----><!----><span>DOI</span></button></a> <!----> <!----></div></div><div id="b86" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>86. </span> <span data-v-6dffe839>Hu Y, Hua Y, Liu W, Zhu J. Reward shaping based federated reinforcement learning. <i>IEEE Access</i> 2021;9:67259-67.</span></p> <div class="refrences" data-v-6dffe839><a href="https://dx.doi.org/10.1109/ACCESS.2021.3074221" target="_blank" data-v-6dffe839><button type="button" class="el-button el-button--text el-button--mini" data-v-6dffe839><!----><!----><span>DOI</span></button></a> <!----> <!----></div></div><div id="b87" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>87. </span> <span data-v-6dffe839>Anwar A, Raychowdhury A. Multi-task federated reinforcement learning with adversaries. CoRR 2021;abs/2103.06473. Available from: <a target="_blank" xmlns:xlink="http://www.w3.org/1999/xlink" href="https://arxiv.org/abs/2103.06473">https://arxiv.org/abs/2103.06473.</a>.</span></p> <div class="refrences" data-v-6dffe839><!----> <!----> <!----></div></div><div id="b88" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>88. </span> <span data-v-6dffe839>Wang X, Han Y, Wang C, et al. In-edge AI: intelligentizing mobile edge computing, caching and communication by federated learning. <i>IEEE Network</i> 2019;33:156-65.</span></p> <div class="refrences" data-v-6dffe839><a href="https://dx.doi.org/10.1109/MNET.2019.1800286" target="_blank" data-v-6dffe839><button type="button" class="el-button el-button--text el-button--mini" data-v-6dffe839><!----><!----><span>DOI</span></button></a> <!----> <!----></div></div><div id="b89" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>89. </span> <span data-v-6dffe839>Wang X, Li R, Wang C, et al. Attention-weighted federated deep reinforcement learning for device-to-device assisted heterogeneous collaborative edge caching. <i>IEEE Journal on Selected Areas in Communications</i> 2021;39:154-69.</span></p> <div class="refrences" data-v-6dffe839><a href="https://dx.doi.org/10.1109/JSAC.2020.3036946" target="_blank" data-v-6dffe839><button type="button" class="el-button el-button--text el-button--mini" data-v-6dffe839><!----><!----><span>DOI</span></button></a> <!----> <!----></div></div><div id="b90" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>90. </span> <span data-v-6dffe839>Zhang M, Jiang Y, Zheng FC, Bennis M, You X. Cooperative edge caching via federated deep reinforcement learning in Fog-RANs. In: 2021 IEEE International Conference on Communications Workshops (ICC Workshops) 2021. pp. 1-6.</span></p> <div class="refrences" data-v-6dffe839><a href="https://dx.doi.org/10.1109/ICCWorkshops50388.2021.9473609" target="_blank" data-v-6dffe839><button type="button" class="el-button el-button--text el-button--mini" data-v-6dffe839><!----><!----><span>DOI</span></button></a> <!----> <!----></div></div><div id="b91" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>91. </span> <span data-v-6dffe839>Majidi F, Khayyambashi MR, Barekatain B. HFDRL: an intelligent dynamic cooperate cashing method based on hierarchical federated deep reinforcement learning in edge-enabled IoT. <i>IEEE Internet of Things Journal</i> 2021:1-1.</span></p> <div class="refrences" data-v-6dffe839><a href="https://dx.doi.org/10.1109/MECO52532.2021.9460304" target="_blank" data-v-6dffe839><button type="button" class="el-button el-button--text el-button--mini" data-v-6dffe839><!----><!----><span>DOI</span></button></a> <!----> <!----></div></div><div id="b92" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>92. </span> <span data-v-6dffe839>Zhao L, Ran Y, Wang H, Wang J, Luo J. Towards cooperative caching for vehicular networks with multi-level federated reinforcement learning. In: ICC 2021 - IEEE International Conference on Communications 2021. pp. 1-6.</span></p> <div class="refrences" data-v-6dffe839><a href="https://dx.doi.org/10.1109/ICC42927.2021.9500714" target="_blank" data-v-6dffe839><button type="button" class="el-button el-button--text el-button--mini" data-v-6dffe839><!----><!----><span>DOI</span></button></a> <!----> <!----></div></div><div id="b93" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>93. </span> <span data-v-6dffe839>Zhu Z, Wan S, Fan P, Letaief KB. Federated multi-agent actor-critic learning for age sensitive mobile edge computing. <i>IEEE Internet of Things Journal</i> 2021:1-1.</span></p> <div class="refrences" data-v-6dffe839><a href="https://dx.doi.org/10.1109/JIOT.2021.3078514" target="_blank" data-v-6dffe839><button type="button" class="el-button el-button--text el-button--mini" data-v-6dffe839><!----><!----><span>DOI</span></button></a> <!----> <!----></div></div><div id="b94" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>94. </span> <span data-v-6dffe839>Yu S, Chen X, Zhou Z, Gong X, Wu D. When deep reinforcement learning meets federated learning: intelligent multi-timescale resource management for multi-access edge computing in 5G ultra dense network. arXiv:200910601 [cs] 2020 Sep. ArXiv: 2009.10601. Available from: <a target="_blank" xmlns:xlink="http://www.w3.org/1999/xlink" href="http://arxiv.org/abs/2009.10601">http://arxiv.org/abs/2009.10601.</a>.</span></p> <div class="refrences" data-v-6dffe839><!----> <!----> <!----></div></div><div id="b95" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>95. </span> <span data-v-6dffe839>Tianqing Z, Zhou W, Ye D, Cheng Z, Li J. Resource allocation in IoT edge computing via concurrent federated reinforcement learning. <i>IEEE Internet of Things Journal</i> 2021:1-1.</span></p> <div class="refrences" data-v-6dffe839><a href="https://dx.doi.org/10.1109/JIOT.2021.3086910" target="_blank" data-v-6dffe839><button type="button" class="el-button el-button--text el-button--mini" data-v-6dffe839><!----><!----><span>DOI</span></button></a> <!----> <!----></div></div><div id="b96" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>96. </span> <span data-v-6dffe839>Huang H, Zeng C, Zhao Y, et al. Scalable orchestration of service function chains in NFV-enabled networks: a federated reinforcement learning approach. <i>IEEE Journal on Selected Areas in Communications</i> 2021;39:2558-71.</span></p> <div class="refrences" data-v-6dffe839><a href="https://dx.doi.org/10.1109/JSAC.2021.3087227" target="_blank" data-v-6dffe839><button type="button" class="el-button el-button--text el-button--mini" data-v-6dffe839><!----><!----><span>DOI</span></button></a> <!----> <!----></div></div><div id="b97" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>97. </span> <span data-v-6dffe839>Liu YJ, Feng G, Sun Y, Qin S, Liang YC. Device association for RAN slicing based on hybrid federated deep reinforcement learning. <i>IEEE Transactions on Vehicular Technology</i> 2020;69:15731-45.</span></p> <div class="refrences" data-v-6dffe839><a href="https://dx.doi.org/10.1109/TVT.2020.3033035" target="_blank" data-v-6dffe839><button type="button" class="el-button el-button--text el-button--mini" data-v-6dffe839><!----><!----><span>DOI</span></button></a> <!----> <!----></div></div><div id="b98" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>98. </span> <span data-v-6dffe839>Wang G, Dang CX, Zhou Z. Measure Contribution of participants in federated learning. In: 2019 IEEE International Conference on Big Data (Big Data) 2019. pp. 2597-604.</span></p> <div class="refrences" data-v-6dffe839><a href="https://dx.doi.org/10.1109/BigData47090.2019.9006179" target="_blank" data-v-6dffe839><button type="button" class="el-button el-button--text el-button--mini" data-v-6dffe839><!----><!----><span>DOI</span></button></a> <!----> <!----></div></div><div id="b99" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>99. </span> <span data-v-6dffe839>Cao Y, Lien SY, Liang YC, Chen KC. Federated deep reinforcement learning for user access control in open radio access networks. In: ICC 2021 - IEEE International Conference on Communications 2021. pp. 1-6.</span></p> <div class="refrences" data-v-6dffe839><a href="https://dx.doi.org/10.1109/ICC42927.2021.9500603" target="_blank" data-v-6dffe839><button type="button" class="el-button el-button--text el-button--mini" data-v-6dffe839><!----><!----><span>DOI</span></button></a> <!----> <!----></div></div><div id="b100" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>100. </span> <span data-v-6dffe839>Zhang L, Yin H, Zhou Z, Roy S, Sun Y. Enhancing WiFi multiple access performance with federated deep reinforcement learning. In: 2020 IEEE 92nd Vehicular Technology Conference (VTC2020-Fall) 2020. pp. 1-6.</span></p> <div class="refrences" data-v-6dffe839><a href="https://dx.doi.org/10.1109/VTC2020-Fall49728.2020.9348485" target="_blank" data-v-6dffe839><button type="button" class="el-button el-button--text el-button--mini" data-v-6dffe839><!----><!----><span>DOI</span></button></a> <!----> <!----></div></div><div id="b101" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>101. </span> <span data-v-6dffe839>Xu M, Peng J, Gupta BB, et al. Multi-agent federated reinforcement learning for secure incentive mechanism in intelligent cyber-physical systems. <i>IEEE Internet of Things Journal</i> 2021:1-1.</span></p> <div class="refrences" data-v-6dffe839><a href="https://dx.doi.org/10.1109/JIOT.2021.3081626" target="_blank" data-v-6dffe839><button type="button" class="el-button el-button--text el-button--mini" data-v-6dffe839><!----><!----><span>DOI</span></button></a> <!----> <!----></div></div><div id="b102" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>102. </span> <span data-v-6dffe839>Zhang X, Peng M, Yan S, Sun Y. Deep-reinforcement-learning-based mode selection and resource allocation for cellular V2X communications. <i>IEEE Internet of Things Journal</i> 2020;7:6380-91.</span></p> <div class="refrences" data-v-6dffe839><a href="https://dx.doi.org/10.1109/JIOT.2019.2962715" target="_blank" data-v-6dffe839><button type="button" class="el-button el-button--text el-button--mini" data-v-6dffe839><!----><!----><span>DOI</span></button></a> <!----> <!----></div></div><div id="b103" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>103. </span> <span data-v-6dffe839>Kwon D, Jeon J, Park S, Kim J, Cho S. Multiagent DDPG-based deep learning for smart ocean federated learning IoT networks. <i>IEEE Internet of Things Journal</i> 2020;7:9895-903.</span></p> <div class="refrences" data-v-6dffe839><a href="https://dx.doi.org/10.1109/JIOT.2020.2988033" target="_blank" data-v-6dffe839><button type="button" class="el-button el-button--text el-button--mini" data-v-6dffe839><!----><!----><span>DOI</span></button></a> <!----> <!----></div></div><div id="b104" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>104. </span> <span data-v-6dffe839>Liang X, Liu Y, Chen T, Liu M, Yang Q. Federated transfer reinforcement learning for autonomous driving. arXiv:191006001 [cs] 2019 Oct. ArXiv: 1910.06001. Available from: <a target="_blank" xmlns:xlink="http://www.w3.org/1999/xlink" href="http://arxiv.org/abs/1910.06001">http://arxiv.org/abs/1910.06001.</a>.</span></p> <div class="refrences" data-v-6dffe839><!----> <!----> <!----></div></div><div id="b105" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>105. </span> <span data-v-6dffe839>Lim HK, Kim JB, Heo JS, Han YH. Federated reinforcement learning for training control policies on multiple IoT devices. Sensors 2020 Mar;20:1359. Available from: <a target="_blank" xmlns:xlink="http://www.w3.org/1999/xlink" href="https://www.mdpi.com/1424-8220/20/5/1359">https://www.mdpi.com/1424-8220/20/5/1359.</a>.</span></p> <div class="refrences" data-v-6dffe839><!----> <!----> <!----></div></div><div id="b106" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>106. </span> <span data-v-6dffe839>Lim HK, Kim JB, Ullah I, Heo JS, Han YH. Federated reinforcement learning acceleration method for precise control of multiple devices. <i>IEEE Access</i> 2021;9:76296-306.</span></p> <div class="refrences" data-v-6dffe839><a href="https://dx.doi.org/10.1109/ACCESS.2021.3083087" target="_blank" data-v-6dffe839><button type="button" class="el-button el-button--text el-button--mini" data-v-6dffe839><!----><!----><span>DOI</span></button></a> <!----> <!----></div></div><div id="b107" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>107. </span> <span data-v-6dffe839>Mowla NI, Tran NH, Doh I, Chae K. AFRL: Adaptive federated reinforcement learning for intelligent jamming defense in FANET. <i>Journal of Communications and Networks</i> 2020;22:244-58.</span></p> <div class="refrences" data-v-6dffe839><a href="https://dx.doi.org/10.1109/JCN.2020.000015" target="_blank" data-v-6dffe839><button type="button" class="el-button el-button--text el-button--mini" data-v-6dffe839><!----><!----><span>DOI</span></button></a> <!----> <!----></div></div><div id="b108" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>108. </span> <span data-v-6dffe839>Nguyen TG, Phan TV, Hoang DT, Nguyen TN, So-In C. Federated deep reinforcement learning for traffic monitoring in SDN-Based IoT networks. <i>IEEE Transactions on Cognitive Communications and Networking</i> 2021:1-1.</span></p> <div class="refrences" data-v-6dffe839><a href="https://dx.doi.org/10.1109/TCCN.2021.3102971" target="_blank" data-v-6dffe839><button type="button" class="el-button el-button--text el-button--mini" data-v-6dffe839><!----><!----><span>DOI</span></button></a> <!----> <!----></div></div><div id="b109" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>109. </span> <span data-v-6dffe839>Wang X, Garg S, Lin H, et al. Towards accurate anomaly detection in industrial internet-of-things using hierarchical federated learning. <i>IEEE Internet of Things Journal</i> 2021:1-1.</span></p> <div class="refrences" data-v-6dffe839><a href="https://dx.doi.org/10.1109/JIOT.2021.3074382" target="_blank" data-v-6dffe839><button type="button" class="el-button el-button--text el-button--mini" data-v-6dffe839><!----><!----><span>DOI</span></button></a> <!----> <!----></div></div><div id="b110" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>110. </span> <span data-v-6dffe839>Lee S, Choi DH. Federated reinforcement learning for energy management of multiple smart homes with distributed energy resources. <i>IEEE Transactions on Industrial Informatics</i> 2020:1-1.</span></p> <div class="refrences" data-v-6dffe839><a href="https://dx.doi.org/10.1109/TII.2020.3035451" target="_blank" data-v-6dffe839><button type="button" class="el-button el-button--text el-button--mini" data-v-6dffe839><!----><!----><span>DOI</span></button></a> <!----> <!----></div></div><div id="b111" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>111. </span> <span data-v-6dffe839>Samet H. The quadtree and related hierarchical data structures. ACM Comput Surv 1984;16:187–260. Available from: <a target="_blank" xmlns:xlink="http://www.w3.org/1999/xlink" href="https://doi.org/10.1145/356924.356930">https://doi.org/10.1145/356924.356930.</a>.</span></p> <div class="refrences" data-v-6dffe839><!----> <!----> <!----></div></div><div id="b112" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>112. </span> <span data-v-6dffe839>Abdel-Aziz MK, Samarakoon S, Perfecto C, Bennis M. Cooperative perception in vehicular networks using multi-agent reinforcement learning. In: 2020 54th Asilomar Conference on Signals, Systems, and Computers 2020. pp. 408-12.</span></p> <div class="refrences" data-v-6dffe839><a href="https://dx.doi.org/10.1109/IEEECONF51394.2020.9443539" target="_blank" data-v-6dffe839><button type="button" class="el-button el-button--text el-button--mini" data-v-6dffe839><!----><!----><span>DOI</span></button></a> <!----> <!----></div></div><div id="b113" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>113. </span> <span data-v-6dffe839>Wang H, Kaplan Z, Niu D, Li B. Optimizing federated learning on Non-IID data with reinforcement learning. In: IEEE INFOCOM 2020 - IEEE Conference on Computer Communications. Toronto, ON, Canada: IEEE; 2020. pp. 1698–707. Available from: <a target="_blank" xmlns:xlink="http://www.w3.org/1999/xlink" href="https://ieeexplore.ieee.org/document/9155494/">https://ieeexplore.ieee.org/document/9155494/.</a>.</span></p> <div class="refrences" data-v-6dffe839><!----> <!----> <!----></div></div><div id="b114" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>114. </span> <span data-v-6dffe839>Zhang P, Gan P, Aujla GS, Batth RS. Reinforcement learning for edge device selection using social attribute perception in industry 4.0. <i>IEEE Internet of Things Journal</i> 2021:1-1.</span></p> <div class="refrences" data-v-6dffe839><a href="https://dx.doi.org/10.1109/JIOT.2021.3088577" target="_blank" data-v-6dffe839><button type="button" class="el-button el-button--text el-button--mini" data-v-6dffe839><!----><!----><span>DOI</span></button></a> <!----> <!----></div></div><div id="b115" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>115. </span> <span data-v-6dffe839>Zhan Y, Li P, Leijie W, Guo S. L4L: experience-driven computational resource control in federated learning. <i>IEEE Transactions on Computers</i> 2021:1-1.</span></p> <div class="refrences" data-v-6dffe839><a href="https://dx.doi.org/10.1109/TC.2021.3068219" target="_blank" data-v-6dffe839><button type="button" class="el-button el-button--text el-button--mini" data-v-6dffe839><!----><!----><span>DOI</span></button></a> <!----> <!----></div></div><div id="b116" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>116. </span> <span data-v-6dffe839>Dong Y, Gan P, Aujla GS, Zhang P. RA-RL: reputation-aware edge device selection method based on reinforcement learning. In: 2021 IEEE 22nd International Symposium on a World of Wireless, Mobile and Multimedia Networks (WoWMoM) 2021. pp. 348-53.</span></p> <div class="refrences" data-v-6dffe839><a href="https://dx.doi.org/10.1109/WoWMoM51794.2021.00063" target="_blank" data-v-6dffe839><button type="button" class="el-button el-button--text el-button--mini" data-v-6dffe839><!----><!----><span>DOI</span></button></a> <!----> <!----></div></div><div id="b117" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>117. </span> <span data-v-6dffe839>Sahu AK, Li T, Sanjabi M, et al. On the convergence of federated optimization in heterogeneous networks. CoRR 2018;abs/1812.06127. Available from: <a target="_blank" xmlns:xlink="http://www.w3.org/1999/xlink" href="http://arxiv.org/abs/1812.06127">http://arxiv.org/abs/1812.06127.</a>.</span></p> <div class="refrences" data-v-6dffe839><!----> <!----> <!----></div></div><div id="b118" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>118. </span> <span data-v-6dffe839>Chen M, Poor HV, Saad W, Cui S. Convergence time optimization for federated learning over wireless networks. <i>IEEE Transactions on Wireless Communications</i> 2021;20:2457-71.</span></p> <div class="refrences" data-v-6dffe839><a href="https://dx.doi.org/10.1109/TWC.2020.3042530" target="_blank" data-v-6dffe839><button type="button" class="el-button el-button--text el-button--mini" data-v-6dffe839><!----><!----><span>DOI</span></button></a> <!----> <!----></div></div><div id="b119" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>119. </span> <span data-v-6dffe839>Li X, Huang K, Yang W, Wang S, Zhang Z. On the convergence of fedAvg on Non-IID data; 2020. Available from: <a target="_blank" xmlns:xlink="http://www.w3.org/1999/xlink" href="https://arxiv.org/abs/1907.02189?context=stat.ML">https://arxiv.org/abs/1907.02189?context=stat.ML.</a>.</span></p> <div class="refrences" data-v-6dffe839><!----> <!----> <!----></div></div><div id="b120" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>120. </span> <span data-v-6dffe839>Bonawitz KA, Eichner H, Grieskamp W, et al. Towards federated learning at scale: system design. CoRR 2019;abs/1902.01046. Available from: <a target="_blank" xmlns:xlink="http://www.w3.org/1999/xlink" href="http://arxiv.org/abs/1902.01046">http://arxiv.org/abs/1902.01046.</a>.</span></p> <div class="refrences" data-v-6dffe839><!----> <!----> <!----></div></div><div id="b121" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>121. </span> <span data-v-6dffe839>Mnih V, Kavukcuoglu K, Silver D, et al. Human-level control through deep reinforcement learning. Nature 2015;518:529–33. Available from: <a target="_blank" xmlns:xlink="http://www.w3.org/1999/xlink" href="https://doi.org/10.1038/nature14236">https://doi.org/10.1038/nature14236.</a>.</span></p> <div class="refrences" data-v-6dffe839><!----> <!----> <!----></div></div><div id="b122" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>122. </span> <span data-v-6dffe839>Lillicrap TP, Hunt JJ, Pritzel A, et al. Continuous control with deep reinforcement learning; 2019. Available from: <a target="_blank" xmlns:xlink="http://www.w3.org/1999/xlink" href="https://arxiv.org/abs/1509.02971">https://arxiv.org/abs/1509.02971.</a>.</span></p> <div class="refrences" data-v-6dffe839><!----> <!----> <!----></div></div><div id="b123" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>123. </span> <span data-v-6dffe839>Lyu L, Yu H, Yang Q. Threats to federated learning: a survey. CoRR 2020;abs/2003.02133. Available from: <a target="_blank" xmlns:xlink="http://www.w3.org/1999/xlink" href="https://arxiv.org/abs/2003.02133">https://arxiv.org/abs/2003.02133.</a>.</span></p> <div class="refrences" data-v-6dffe839><!----> <!----> <!----></div></div><div id="b124" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>124. </span> <span data-v-6dffe839>Fung C, Yoon CJM, Beschastnikh I. Mitigating sybils in federated learning poisoning. CoRR 2018;abs/1808.04866. Available from: <a target="_blank" xmlns:xlink="http://www.w3.org/1999/xlink" href="http://arxiv.org/abs/1808.04866">http://arxiv.org/abs/1808.04866.</a>.</span></p> <div class="refrences" data-v-6dffe839><!----> <!----> <!----></div></div><div id="b125" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>125. </span> <span data-v-6dffe839>Anwar A, Raychowdhury A. Multi-task federated reinforcement learning with adversaries 2021.</span></p> <div class="refrences" data-v-6dffe839><!----> <!----> <!----></div></div><div id="b126" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>126. </span> <span data-v-6dffe839>Zhu L, Liu Z, Han S. Deep leakage from gradients. CoRR 2019;abs/1906.08935. Available from: <a target="_blank" xmlns:xlink="http://www.w3.org/1999/xlink" href="http://arxiv.org/abs/1906.08935">http://arxiv.org/abs/1906.08935.</a>.</span></p> <div class="refrences" data-v-6dffe839><!----> <!----> <!----></div></div><div id="b127" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>127. </span> <span data-v-6dffe839>Nishio T, Yonetani R. Client Selection for federated learning with heterogeneous resources in mobile edge. In: ICC 2019-2019 IEEE International Conference on Communications (ICC) 2019. pp. 1-7.</span></p> <div class="refrences" data-v-6dffe839><a href="https://dx.doi.org/10.1109/ICC.2019.8761315" target="_blank" data-v-6dffe839><button type="button" class="el-button el-button--text el-button--mini" data-v-6dffe839><!----><!----><span>DOI</span></button></a> <!----> <!----></div></div><div id="b128" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>128. </span> <span data-v-6dffe839>Yang T, Andrew G, Eichner H, et al. Applied federated learning: improving google keyboard query suggestions. CoRR 2018;abs/1812.02903. Available from: <a target="_blank" xmlns:xlink="http://www.w3.org/1999/xlink" href="http://arxiv.org/abs/1812.02903">http://arxiv.org/abs/1812.02903.</a>.</span></p> <div class="refrences" data-v-6dffe839><!----> <!----> <!----></div></div><div id="b129" class="references_item" data-v-6dffe839><p data-v-6dffe839><span data-v-6dffe839>129. </span> <span data-v-6dffe839>Yu H, Liu Z, Liu Y, et al. A fairness-aware incentive scheme for federated learning. In: Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society. AIES ’20. New York, NY, USA: Association for Computing Machinery; 2020. p. 393–399. Available from: <a target="_blank" xmlns:xlink="http://www.w3.org/1999/xlink" href="https://doi.org/10.1145/3375627.3375840">https://doi.org/10.1145/3375627.3375840.</a>.</span></p> <div class="refrences" data-v-6dffe839><!----> <!----> <!----></div></div></div> <div class="line" data-v-6dffe839></div></div> <div class="article_cite cite_layout" data-v-6dffe839><div id="cite" data-v-6dffe839></div> <div class="el-row" style="margin-left:-10px;margin-right:-10px;" data-v-6dffe839><div class="el-col el-col-24 el-col-xs-24 el-col-sm-16" style="padding-left:10px;padding-right:10px;" data-v-6dffe839><div class="left_box" data-v-6dffe839><div data-v-6dffe839><h2 style="margin-top:0!important;padding-top:0;" data-v-6dffe839>Cite This Article</h2> <div class="cite_article" data-v-6dffe839><div class="cite_article_sec" data-v-6dffe839>Review</div> <div class="cite_article_open" style="color:#aa0c2f;" data-v-6dffe839><img src="https://g.oaes.cc/oae/nuxt/img/open_icon.bff5dde.png" alt="" style="width:10px;" data-v-6dffe839> Open Access</div> <div class="cite_article_tit" data-v-6dffe839><span data-v-6dffe839>Federated reinforcement learning: techniques, applications, and open challenges</span></div> <div class="cite_article_editor" data-v-6dffe839><span data-v-6dffe839>Jiaju Qi, ... Kan Zheng</span></div></div></div> <div class="color_000" data-v-6dffe839><h2 data-v-6dffe839>How to Cite</h2> <p data-v-6dffe839>Qi, J.; Zhou, Q.; Lei, L.; Zheng, K. Federated reinforcement learning: techniques, applications, and open challenges. <i>Intell. Robot.</i> <b>2021</b>, <i>1</i>, 18-57. http://dx.doi.org/10.20517/ir.2021.02</p></div> <div data-v-6dffe839><h2 data-v-6dffe839>Download Citation</h2> <div class="font_12" data-v-6dffe839>If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click on download.</div></div> <div class="line" data-v-6dffe839></div> <div data-v-6dffe839><h2 data-v-6dffe839>Export Citation File:</h2> <div style="margin-top:10px;" data-v-6dffe839><label role="radio" aria-checked="true" tabindex="0" class="el-radio is-checked" style="display:block;margin-bottom:20px;" data-v-6dffe839><span class="el-radio__input is-checked"><span class="el-radio__inner"></span><input type="radio" aria-hidden="true" tabindex="-1" autocomplete="off" value="1" checked="checked" class="el-radio__original"></span><span class="el-radio__label">RIS<!----></span></label> <label role="radio" tabindex="0" class="el-radio" style="display:block;margin-bottom:20px;" data-v-6dffe839><span class="el-radio__input"><span class="el-radio__inner"></span><input type="radio" aria-hidden="true" tabindex="-1" autocomplete="off" value="2" class="el-radio__original"></span><span class="el-radio__label">BibTeX<!----></span></label> <label role="radio" tabindex="0" class="el-radio" style="display:block;margin-bottom:20px;" data-v-6dffe839><span class="el-radio__input"><span class="el-radio__inner"></span><input type="radio" aria-hidden="true" tabindex="-1" autocomplete="off" value="3" class="el-radio__original"></span><span class="el-radio__label">EndNote<!----></span></label></div></div> <div class="line" data-v-6dffe839></div> <div data-v-6dffe839><h2 data-v-6dffe839>Type of Import</h2> <div style="margin-top:10px;" data-v-6dffe839><label role="radio" aria-checked="true" tabindex="0" class="el-radio is-checked" style="display:block;margin-bottom:20px;" data-v-6dffe839><span class="el-radio__input is-checked"><span class="el-radio__inner"></span><input type="radio" aria-hidden="true" tabindex="-1" autocomplete="off" value="1" checked="checked" class="el-radio__original"></span><span class="el-radio__label">Direct Import<!----></span></label> <label role="radio" tabindex="0" class="el-radio" style="display:block;margin-bottom:20px;" data-v-6dffe839><span class="el-radio__input"><span class="el-radio__inner"></span><input type="radio" aria-hidden="true" tabindex="-1" autocomplete="off" value="2" class="el-radio__original"></span><span class="el-radio__label">Indirect Import<!----></span></label></div></div> <div data-v-6dffe839><button type="button" class="el-button el-button--primary" data-v-6dffe839><!----><!----><span>Download</span></button></div></div></div> <div class="el-col el-col-24 el-col-xs-24 el-col-sm-8" style="padding-left:10px;padding-right:10px;" data-v-6dffe839><div class="grid-content bg-purple" data-v-6dffe839><div data-v-6dffe839><h3 style="margin-top:0;" data-v-6dffe839>Tips on Downloading Citation</h3> <div class="color_666 font_12" data-v-6dffe839>This feature enables you to download the bibliographic information (also called citation data, header data, or metadata) for the articles on our site.</div></div> <div data-v-6dffe839><h3 data-v-6dffe839>Citation Manager File Format</h3> <div class="color_666 font_12" data-v-6dffe839>Use the radio buttons to choose how to format the bibliographic data you're harvesting. Several citation manager formats are available, including EndNote and BibTex.</div></div> <div data-v-6dffe839><h3 data-v-6dffe839>Type of Import</h3> <div class="color_666 font_12" data-v-6dffe839>If you have citation management software installed on your computer your Web browser should be able to import metadata directly into your reference database.<br data-v-6dffe839><br data-v-6dffe839> <b data-v-6dffe839>Direct Import:</b> When the Direct Import option is selected (the default state), a dialogue box will give you the option to Save or Open the downloaded citation data. Choosing Open will either launch your citation manager or give you a choice of applications with which to use the metadata. The Save option saves the file locally for later use.<br data-v-6dffe839><br data-v-6dffe839> <b data-v-6dffe839>Indirect Import:</b> When the Indirect Import option is selected, the metadata is displayed and may be copied and pasted as needed. </div></div></div></div></div></div> <h2 id="about" data-v-6dffe839>About This Article</h2> <!----> <!----> <!----> <h3 id="copyright" class="btn_h3" data-v-6dffe839>Copyright</h3> <div class="CorrsAdd" data-v-6dffe839><div class="cor_left" data-v-6dffe839><a href="https://creativecommons.org/licenses/by/4.0/" data-v-6dffe839><img alt="" src="https://g.oaes.cc/oae/nuxt/img/ccb_4.229daa2.png" class="media-object" data-v-6dffe839></a></div> <div class="cor_right font12" data-v-6dffe839>© The Author(s) 2021. <b>Open Access</b> This article is licensed under a Creative Commons Attribution 4.0 International License (<a target="_blank"href="https://creativecommons.org/licenses/by/4.0/" xmlns:xlink="http://www.w3.org/1999/xlink">https://creativecommons.org/licenses/by/4.0/</a>), which permits unrestricted use, sharing, adaptation, distribution and reproduction in any medium or format, for any purpose, even commercially, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.</div></div> <!----> <!----> <div class="clearfix" data-v-6dffe839></div> <div class="line_box" data-v-6dffe839></div> <!----> <h2 id="data" data-v-6dffe839>Data & Comments</h2> <h3 class="btn_h3" data-v-6dffe839>Data</h3> <div class="article_viewnum" data-v-6dffe839><div class="viewnum_item" data-v-6dffe839><div class="item_top" data-v-6dffe839><b data-v-6dffe839>Views</b></div> <div class="item_ctn" data-v-6dffe839>7412</div></div> <div class="viewnum_item" data-v-6dffe839><div class="item_top" data-v-6dffe839><b data-v-6dffe839>Downloads</b></div> <div class="item_ctn" data-v-6dffe839>1789</div></div> <div class="viewnum_item" data-v-6dffe839><div class="item_top" data-v-6dffe839><b data-v-6dffe839>Citations</b></div> <div class="item_ctn" data-v-6dffe839><img alt="" src="" class="Crossref" data-v-6dffe839> <a href="/articles//citation/" target="_blank" style="color:#4475e1;" data-v-6dffe839>74</a></div></div> <!----> <div class="viewnum_item" data-v-6dffe839><div class="item_top" data-v-6dffe839><b data-v-6dffe839>Comments</b></div> <div class="item_ctn" data-v-6dffe839>0</div></div> <div class="viewnum_item" data-v-6dffe839><div class="item_top" data-v-6dffe839><img alt="" src="https://g.oaes.cc/oae/nuxt/img/like.08d1ca4.png" data-v-6dffe839></div> <span class="num" data-v-6dffe839><b data-v-6dffe839>13</b></span></div></div> <div class="article_comments" data-v-6dffe839><h3 id="comment" class="btn_h3" data-v-6dffe839>Comments</h3> <p data-v-6dffe839> Comments must be written in English. Spam, offensive content, impersonation, and private information will not be permitted. If any comment is reported and identified as inappropriate content by OAE staff, the comment will be removed without notice. If you have any queries or need any help, please contact us at <a href="mailto:support@oaepublish.com" data-v-6dffe839>support@oaepublish.com</a>. </p> <div class="commentlist" data-v-6dffe839> <div style="height:70px;display:none;" data-v-6dffe839></div> <div id="comment_input" class="commentinput" data-v-6dffe839><div class="contain contain_log" data-v-6dffe839><div class="user_icon" data-v-6dffe839><img src="https://g.oaes.cc/oae/nuxt/img/comments_u25.e5672eb.png" alt="" data-v-6dffe839></div> <div class="el-textarea height30" data-v-6dffe839><textarea autocomplete="off" placeholder="Write your comments here…" class="el-textarea__inner"></textarea><!----></div> <div class="contain_right" data-v-6dffe839><button disabled="disabled" type="button" class="el-button postbtn el-button--text el-button--mini is-disabled" data-v-6dffe839><!----><!----><span>Post</span></button> <button type="button" class="el-button el-button--text el-button--mini" data-v-6dffe839><!----><!----><span>Login</span></button> <div class="el-badge item" data-v-6dffe839><button type="button" class="el-button el-button--text el-button--mini" data-v-6dffe839><!----><!----><span><i class="icon-comment" data-v-6dffe839></i></span></button><sup class="el-badge__content is-fixed">0</sup></div> <button type="button" class="el-button el-button--text el-button--mini" data-v-6dffe839><!----><!----><span><i class="icon-like-line font20" data-v-6dffe839></i></span></button> <span data-v-6dffe839><div role="tooltip" id="el-popover-7712" aria-hidden="true" class="el-popover el-popper" style="width:170px;display:none;"><!----><div class="icon_share" style="text-align:right;margin:0;" data-v-6dffe839><a href="http://pinterest.com/pin/create/button/?url=&media=&description=https://www.oaepublish.com/articles/ir.2021.02" target="_blank" class="pinterest-sign" data-v-6dffe839><i class="iconfont icon-pinterest" data-v-6dffe839></i></a> <a href="https://www.facebook.com/sharer/sharer.php?u=https://www.oaepublish.com/articles/ir.2021.02" target="_blank" class="facebook-sign" data-v-6dffe839><i aria-hidden="true" class="iconfont icon-facebook" data-v-6dffe839></i></a> <a href="https://twitter.com/intent/tweet?url=https://www.oaepublish.com/articles/ir.2021.02" target="_blank" class="twitter-sign" data-v-6dffe839><i class="iconfont icon-tuite1" data-v-6dffe839></i></a> <a href="https://www.linkedin.com/shareArticle?url=https://www.oaepublish.com/articles/ir.2021.02" target="_blank" class="linkedin-sign" data-v-6dffe839><i class="iconfont icon-linkedin" data-v-6dffe839></i></a></div> </div><span class="el-popover__reference-wrapper"><button type="button" class="el-button el-button--text el-button--mini" data-v-6dffe839><!----><!----><span><i class="iconfont icon-zhuanfa" data-v-6dffe839></i></span></button></span></span></div></div></div></div></div></div></div></div></div></div> <div class="hidden-sm-and-down box_right el-col el-col-24 el-col-md-6" style="padding-left:10px;padding-right:10px;" data-v-6dffe839><div class="art_right pad_l_10" data-v-6dffe839><div class="top_banner" data-v-6dffe839><div class="oae_header" data-v-6dffe839>Author's Talk</div> <div class="line" data-v-6dffe839></div> <div class="img_box" data-v-6dffe839><img src="https://i.oaes.cc/uploads/20240205/1103dab1ab644cf2818a1cab0dd2b8ff.jpg" alt="" data-itemid="4325" data-itemhref="https://v1.oaepublish.com/files/talkvideo/4325.mp4" data-itemimg="https://i.oaes.cc/uploads/20240205/1103dab1ab644cf2818a1cab0dd2b8ff.jpg" data-v-6dffe839> <i data-itemid="4325" data-itemhref="https://v1.oaepublish.com/files/talkvideo/4325.mp4" data-itemimg="https://i.oaes.cc/uploads/20240205/1103dab1ab644cf2818a1cab0dd2b8ff.jpg" class="bo_icon" data-v-6dffe839></i></div></div> <!----> <a href="https://f.oaes.cc/xmlpdf/38ebf366-5fba-47ab-86ac-aa0a6cc124ef/4325_down.pdf?v=42" data-v-6dffe839><div class="down_pdf" data-v-6dffe839><span data-v-6dffe839>Download PDF</span> <i class="el-icon-download" data-v-6dffe839></i></div></a> <div class="right_btn" data-v-6dffe839><div class="btn_item" data-v-6dffe839><a href="https://f.oaes.cc/xmlpdf/38ebf366-5fba-47ab-86ac-aa0a6cc124ef/4325.xml" data-v-6dffe839><img alt="" src="https://g.oaes.cc/oae/nuxt/img/xml.6704117.png" data-v-6dffe839> <span class="name" data-v-6dffe839>Download XML</span> <span class="num" data-v-6dffe839><b data-v-6dffe839>18</b> downloads </span></a></div> <div class="btn_item" data-v-6dffe839><a href="/articles//cite/" data-v-6dffe839><img alt="" src="https://g.oaes.cc/oae/nuxt/img/cite.7c6f3cb.png" data-v-6dffe839> <span class="name" data-v-6dffe839>Cite This Article</span> <span class="num" data-v-6dffe839><b data-v-6dffe839>76</b> clicks </span></a></div> <div class="btn_item" data-v-6dffe839><a href="https://f.oaes.cc/ris/4325.ris" data-v-6dffe839><img alt="" src="https://g.oaes.cc/oae/nuxt/img/cita.fa0c5fb.png" data-v-6dffe839> <span class="name" data-v-6dffe839>Export Citation</span> <span class="num" data-v-6dffe839><b data-v-6dffe839>109</b> clicks </span></a></div> <div class="btn_item" data-v-6dffe839><img alt="" src="" data-v-6dffe839> <span class="name" data-v-6dffe839>Like This Article</span> <span class="num" data-v-6dffe839><b data-v-6dffe839>13</b> likes </span></div></div> <!----> <div class="right_list" data-v-6dffe839><div class="oae_header" data-v-6dffe839>Share This Article</div> <div class="icon_share" data-v-6dffe839><a href="http://pinterest.com/pin/create/button/?url=&media=&description=https://www.oaepublish.com/articles/ir.2021.02" target="_blank" class="pinterest-sign" data-v-6dffe839><i class="iconfont icon-pinterest" data-v-6dffe839></i></a> <a href="https://www.facebook.com/sharer/sharer.php?u=https://www.oaepublish.com/articles/ir.2021.02" target="_blank" class="facebook-sign" data-v-6dffe839><i aria-hidden="true" class="iconfont icon-facebook" data-v-6dffe839></i></a> <a href="https://twitter.com/intent/tweet?url=https://www.oaepublish.com/articles/ir.2021.02" target="_blank" class="twitter-sign" data-v-6dffe839><i class="iconfont icon-tuite1" data-v-6dffe839></i></a> <a href="https://www.linkedin.com/shareArticle?url=https://www.oaepublish.com/articles/ir.2021.02" target="_blank" class="linkedin-sign" data-v-6dffe839><i class="iconfont icon-linkedin" data-v-6dffe839></i></a></div> <div class="Journal_qrcode m-top30" data-v-6dffe839><img title="https://www.oaepublish.com/articles/ir.2021.02" alt="" src="https://api.qrserver.com/v1/create-qr-code/?size=80x80&data=https://www.oaepublish.com/articles/ir.2021.02" class="code" data-v-6dffe839> <span class="tip" data-v-6dffe839>Scan the QR code for reading!</span></div></div> <div class="right_list" data-v-6dffe839><div class="oae_header" data-v-6dffe839>See Updates</div> <div class="checkNew" data-v-6dffe839><img alt="" src="https://g.oaes.cc/oae/nuxt/img/CROSSMARK_Color_horizontal.a6fa1ee.svg" data-v-6dffe839></div></div> <div id="boxFixed" class="right_tab" data-v-6dffe839><div class="a_tab_top" data-v-6dffe839><div class="top_item tab_active1" data-v-6dffe839>Contents</div> <div class="top_item" data-v-6dffe839>Figures</div> <div class="top_item" data-v-6dffe839>Related</div></div> <div class="tab_ctn" data-v-6dffe839><div class="ctn_item" data-v-6dffe839></div></div> <!----></div></div></div></div></div></div> <div class="ipad_menu" data-v-6dffe839><span class="el-icon-s-unfold" data-v-6dffe839></span></div> <div class="ipad_con" style="display:none;" data-v-6dffe839><div id="Share-con" class="Journal_right_Share publish-content" data-v-6dffe839><h2 data-v-6dffe839>Share This Article</h2> <div class="line_btn" data-v-6dffe839></div> <div class="shareicon-Con" data-v-6dffe839><a href="http://pinterest.com/pin/create/button/?url=&media=&description=https://www.oaepublish.com/articles/ir.2021.02" target="_blank" class="pinterest-sign" data-v-6dffe839><i class="iconfont icon-pinterest" data-v-6dffe839></i></a> <a href="https://www.facebook.com/sharer/sharer.php?u=https://www.oaepublish.com/articles/ir.2021.02" target="_blank" class="facebook-sign" data-v-6dffe839><i aria-hidden="true" class="iconfont icon-facebook" data-v-6dffe839></i></a> <a href="https://twitter.com/intent/tweet?url=https://www.oaepublish.com/articles/ir.2021.02" target="_blank" class="twitter-sign" data-v-6dffe839><i class="iconfont icon-tuite1" data-v-6dffe839></i></a> <a href="https://www.linkedin.com/shareArticle?url=https://www.oaepublish.com/articles/ir.2021.02" target="_blank" class="linkedin-sign" data-v-6dffe839><i class="iconfont icon-linkedin" data-v-6dffe839></i></a> <a href="javascript:void(0);" id="wx_wd" class="weixin-sign" data-v-6dffe839><i aria-hidden="true" class="iconfont icon-weixin" data-v-6dffe839></i></a> <div class="wx_code" style="display: none" data-v-6dffe839><div class="arrow" style="top: 50%; z-idnex: 9999" data-v-6dffe839></div> <img src="https://api.qrserver.com/v1/create-qr-code/?size=80x80&data=https://www.oaepublish.comarticle/view/426" alt class="code" data-v-6dffe839></div> <a id="wx_ph" onclick="call('wechatFriend')" target="_blank" class="weixin-sign" style="display: none" data-v-6dffe839><i aria-hidden="true" class="fab2 fa-weixin" data-v-6dffe839></i></a></div></div> <div class="ipad_btn" data-v-6dffe839><ul class="btn_list" data-v-6dffe839><div class="btn_item" data-v-6dffe839><a href="https://f.oaes.cc/xmlpdf/38ebf366-5fba-47ab-86ac-aa0a6cc124ef/4325_down.pdf?v=42" data-v-6dffe839><img alt src="https://g.oaes.cc/oae/nuxt/img/pdf.c310b0c.png" data-v-6dffe839> <span class="name" data-v-6dffe839>Download PDF (1789)</span></a></div> <div class="btn_item" data-v-6dffe839><a href="https://f.oaes.cc/xmlpdf/38ebf366-5fba-47ab-86ac-aa0a6cc124ef/4325.xml" data-v-6dffe839><img alt src="https://g.oaes.cc/oae/nuxt/img/xml.6704117.png" data-v-6dffe839> <span class="name" data-v-6dffe839>Download XML (18)</span></a></div> <div class="btn_item" data-v-6dffe839><img alt src="https://g.oaes.cc/oae/nuxt/img/cite.7c6f3cb.png" data-v-6dffe839> <span class="name" data-v-6dffe839>Cite This Article (76)</span></div> <div class="btn_item" data-v-6dffe839><a data-v-6dffe839><img alt src="" data-v-6dffe839> <span class="name" data-v-6dffe839>Export Citation (109)</span></a></div></ul></div> <div id="Updates-con2" class="Journal-right-Updates publish-content" data-v-6dffe839><h2 data-v-6dffe839>See Updates</h2> <img alt src="https://g.oaes.cc/oae/nuxt/img/CROSSMARK_Color_horizontal.a6fa1ee.svg" type="image/svg+xml" width="150" data-v-6dffe839></div> <div id="Contents-con" class="Journal-right-Contents publish-content" data-v-6dffe839><div id="scrollspy" class="scroll-box" data-v-6dffe839><div data-v-6dffe839></div></div></div></div> <div class="imgDolg" style="display:none;" data-v-6dffe839><div class="img_btn_box" data-v-6dffe839><i class="el-icon-error img_btn" data-v-6dffe839></i> <i class="el-icon-circle-plus img_btn" data-v-6dffe839></i> <i class="el-icon-remove img_btn" data-v-6dffe839></i></div> <img alt src="" style="transform:scale(1);" data-v-6dffe839></div> <div class="el-dialog__wrapper" style="display:none;" data-v-6dffe839><div role="dialog" aria-modal="true" aria-label="Citation" class="el-dialog" style="margin-top:15vh;width:700px;"><div class="el-dialog__header"><span class="el-dialog__title">Citation</span><button type="button" aria-label="Close" class="el-dialog__headerbtn"><i class="el-dialog__close el-icon el-icon-close"></i></button></div><!----><!----></div></div> <div class="el-dialog__wrapper delogCheck" style="display:none;" data-v-6dffe839><div role="dialog" aria-modal="true" aria-label="CrossMark" class="el-dialog" style="margin-top:15vh;width:700px;"><div class="el-dialog__header"><span class="el-dialog__title">CrossMark</span><button type="button" aria-label="Close" class="el-dialog__headerbtn"><i class="el-dialog__close el-icon el-icon-close"></i></button></div><!----><!----></div></div> <div class="el-dialog__wrapper delogTable" style="display:none;" data-v-6dffe839><div role="dialog" aria-modal="true" aria-label="dialog" class="el-dialog" style="margin-top:15vh;width:800px;"><div class="el-dialog__header"><span class="el-dialog__title"></span><button type="button" aria-label="Close" class="el-dialog__headerbtn"><i class="el-dialog__close el-icon el-icon-close"></i></button></div><!----><!----></div></div> <div data-v-7adb8971 data-v-6dffe839><div class="el-dialog__wrapper loginDol" style="display:none;" data-v-7adb8971><div role="dialog" aria-modal="true" aria-label="Login to Online Publication System" class="el-dialog" style="margin-top:15vh;width:700px;"><div class="el-dialog__header"><span class="el-dialog__title">Login to Online Publication System</span><button type="button" aria-label="Close" class="el-dialog__headerbtn"><i class="el-dialog__close el-icon el-icon-close"></i></button></div><!----><div class="el-dialog__footer"><span class="dialog-footer" data-v-7adb8971><div class="shareicon-Con" data-v-7adb8971><div class="title" data-v-7adb8971>Login with:</div> <div id="g_id_signin" class="g_id_signin" data-v-7adb8971></div></div></span></div></div></div></div></div></main> <div data-v-11ddc367 data-v-0baa1603><!----></div> <div class="PcComment" data-v-576acc14 data-v-0baa1603><div class="foot_one clo_bor" data-v-576acc14><div class="wrapper foot_box" data-v-576acc14><div class="mgb_20 el-row" style="margin-left:-10px;margin-right:-10px;" data-v-576acc14><div class="el-col el-col-24 el-col-sm-12 el-col-md-8" style="padding-left:10px;padding-right:10px;" data-v-576acc14><div class="grid-content" data-v-576acc14><div class="foot_title" data-v-576acc14>Intelligence & Robotics</div> <div class="foot-con" data-v-576acc14><div data-v-576acc14>ISSN 2770-3541 (Online)</div> <div data-v-576acc14><a href="mailto:editorial@intellrobot.com" data-v-576acc14>editorial@intellrobot.com</a></div></div></div></div> <div class="hidden-sm-and-down el-col el-col-24 el-col-sm-12 el-col-md-8" style="padding-left:10px;padding-right:10px;" data-v-576acc14><div class="grid-content" data-v-576acc14><div class="foot_title" data-v-576acc14>Navigation</div> <div data-v-351032c4="" class="foot-con" data-v-576acc14><div data-v-576acc14><a href="/ir/contact_us" data-v-576acc14>Contact Us</a></div><div data-v-576acc14><a href="/ir/sitemap" data-v-576acc14>Sitemap</a></div></div></div></div> <div class="el-col el-col-24 el-col-sm-12 el-col-md-8" style="padding-left:10px;padding-right:10px;" data-v-576acc14><div class="grid-content" data-v-576acc14><div class="foot_title" data-v-576acc14>Follow Us</div> <div class="oaemedia-link" data-v-576acc14><ul data-v-576acc14><li data-v-576acc14><a href="https://www.linkedin.com/in/intelligence-robotics/" rel="nofollow" target="_blank" data-v-576acc14><span class="bk LinkedIn" data-v-576acc14><i class="iconfont icon-linkedin" data-v-576acc14></i></span>LinkedIn</a></li><li data-v-576acc14><a href="https://twitter.com/OAE_IR" rel="nofollow" target="_blank" data-v-576acc14><span class="bk Twitter" data-v-576acc14><i class="iconfont icon-tuite1" data-v-576acc14></i></span>Twitter</a></li> <!----> <div class="wx_code_and wx_code2" style="display:none;" data-v-576acc14><img src="" alt="" class="code" style="width:100px;" data-v-576acc14></div></ul></div></div></div></div> <div class="pad_box" data-v-576acc14><div class="foot_nav1" data-v-576acc14><div class="foot_title" data-v-576acc14>Navigation</div> <div data-v-351032c4 class="foot-con" data-v-576acc14><div data-v-576acc14><a href="/ir/contact_us" data-v-576acc14>Contact Us</a></div><div data-v-576acc14><a href="/ir/sitemap" data-v-576acc14>Sitemap</a></div></div></div> <div class="foot_box" data-v-576acc14><div class="foot_cont" data-v-576acc14><img src="https://i.oaes.cc/uploads/20230811/49f92f416c9845b58a01de02ecea785f.jpg" alt data-v-576acc14> <div class="fot_c_right" data-v-576acc14><h4 data-v-576acc14>Committee on Publication Ethics</h4> <a href="https://publicationethics.org/members/intelligence-robotics" data-v-576acc14>https://publicationethics.org/members/intelligence-robotics</a></div></div> <div class="foot_cont" data-v-576acc14><img src="https://i.oaes.cc/uploads/20230911/67d78ebf8c55485db6ae5b5b4bcda421.jpg" alt class="img2" style="padding: 5px;" data-v-576acc14> <div class="fot_c_right" data-v-576acc14><h4 data-v-576acc14>Portico</h4> <p data-v-576acc14>All published articles are preserved here permanently:</p> <a href="https://www.portico.org/publishers/oae/" data-v-576acc14>https://www.portico.org/publishers/oae/</a></div></div></div></div> <div class="foot_btn hidden-sm-and-down" data-v-576acc14><div class="foot_cont" data-v-576acc14><img src="https://i.oaes.cc/uploads/20230811/49f92f416c9845b58a01de02ecea785f.jpg" alt data-v-576acc14> <div class="fot_c_right" data-v-576acc14><h4 data-v-576acc14>Committee on Publication Ethics</h4> <a href="https://publicationethics.org/members/intelligence-robotics" data-v-576acc14>https://publicationethics.org/members/intelligence-robotics</a></div></div> <div class="foot_cont" data-v-576acc14><img src="https://i.oaes.cc/uploads/20230911/67d78ebf8c55485db6ae5b5b4bcda421.jpg" alt class="img2" style="padding: 5px;" data-v-576acc14> <div class="fot_c_right" data-v-576acc14><h4 data-v-576acc14>Portico</h4> <p data-v-576acc14>All published articles are preserved here permanently:</p> <a href="https://www.portico.org/publishers/oae/" data-v-576acc14>https://www.portico.org/publishers/oae/</a></div></div></div></div> <div class="footer_container" data-v-43de1bb7 data-v-576acc14><div class="footer_content container wrapper" data-v-43de1bb7><div class="el-row" style="margin-left:-10px;margin-right:-10px;" data-v-43de1bb7><div class="el-col el-col-24 el-col-xs-24 el-col-sm-8" style="padding-left:10px;padding-right:10px;" data-v-43de1bb7><div class="left_box" data-v-43de1bb7><a href="/" class="nuxt-link-active" style="display:inline-block;width:auto;border-bottom:none!important;" data-v-43de1bb7><img src="https://g.oaes.cc/oae/nuxt/img/footer_logo.4510953.png" alt="" style="width:123px;height:53px;margin-bottom:20px;margin-top:10px;" data-v-43de1bb7></a> <div class="small-box" data-v-43de1bb7><div data-v-43de1bb7><a href="mailto:partners@oaepublish.com" data-v-43de1bb7>partners@oaepublish.com</a></div> <div data-v-43de1bb7><a href="/about/who_we_are" data-v-43de1bb7>Company</a></div> <div data-v-43de1bb7><a href="/about/contact_us" data-v-43de1bb7>Contact Us</a></div></div></div></div> <div class="el-col el-col-24 el-col-xs-24 el-col-sm-8" style="padding-left:10px;padding-right:10px;" data-v-43de1bb7><div class="center_box" data-v-43de1bb7><div class="tit" data-v-43de1bb7>Discover Content</div> <div data-v-43de1bb7><div data-v-43de1bb7><a href="/alljournals" data-v-43de1bb7>Journals A-Z</a></div> <div data-v-43de1bb7><a href="/about/language_editing_services" data-v-43de1bb7>Language Editing</a></div> <div data-v-43de1bb7><a href="/about/layout_and_production" data-v-43de1bb7>Layout & Production</a></div> <div data-v-43de1bb7><a href="/about/graphic_abstracts" data-v-43de1bb7>Graphical Abstracts</a></div> <div data-v-43de1bb7><a href="/about/video_abstracts" data-v-43de1bb7>Video Abstracts</a></div> <div data-v-43de1bb7><a href="/about/expert_lecture" data-v-43de1bb7>Expert Lecture</a></div> <div data-v-43de1bb7><a href="/about/organizer" data-v-43de1bb7>Conference Organizer</a></div> <div data-v-43de1bb7><a href="/about/collaborators" data-v-43de1bb7>Strategic Collaborators</a></div> <div data-v-43de1bb7><a href="https://www.scierxiv.com/" target="_blank" data-v-43de1bb7>ScieRxiv.com</a></div> <div data-v-43de1bb7><a href="https://www.oaescience.com/" target="_blank" data-v-43de1bb7>Think Tank</a></div></div></div></div> <div class="el-col el-col-24 el-col-xs-24 el-col-sm-8" style="padding-left:10px;padding-right:10px;" data-v-43de1bb7><div class="right_box" data-v-43de1bb7><div class="tit" data-v-43de1bb7>Follow OAE</div> <div class="small-box" data-v-43de1bb7><div class="item_cla" data-v-43de1bb7><a href="https://twitter.com/OAE_Publishing" target="_blank" data-v-43de1bb7><span class="bk color1" data-v-43de1bb7><span class="iconfont icon-tuite1" data-v-43de1bb7></span></span> <span class="title" data-v-43de1bb7>Twitter</span></a></div> <div class="item_cla" data-v-43de1bb7><a href="https://www.facebook.com/profile.php?id=100018840961346" target="_blank" data-v-43de1bb7><span class="bk color2" data-v-43de1bb7><span class="iconfont icon-facebook" data-v-43de1bb7></span></span> <span class="title" data-v-43de1bb7>Facebook</span></a></div> <div class="item_cla" data-v-43de1bb7><a href="https://www.linkedin.com/company/oae-publishing-inc/mycompany/" target="_blank" data-v-43de1bb7><span class="bk color3" data-v-43de1bb7><span class="iconfont icon-linkedin" data-v-43de1bb7></span></span> <span class="title" data-v-43de1bb7>LinkedIn</span></a></div> <div class="item_cla" data-v-43de1bb7><a href="https://www.youtube.com/channel/UCjlAxBPaZErsc7qJ3fFYBLg" target="_blank" data-v-43de1bb7><span class="bk color4" data-v-43de1bb7><span class="iconfont icon-youtube" data-v-43de1bb7></span></span> <span class="title" data-v-43de1bb7>YouTube</span></a></div> <div class="item_cla" data-v-43de1bb7><a href="https://space.bilibili.com/1259987867" target="_blank" data-v-43de1bb7><span class="bk color5" data-v-43de1bb7><span class="iconfont icon-bilibili-line" style="color:#000000;" data-v-43de1bb7></span></span> <span class="title" data-v-43de1bb7>BiLiBiLi</span></a></div> <div class="oaemedia-link" data-v-43de1bb7><span data-v-43de1bb7><a href="javascript:void(0);" class="weixin-sign-and" data-v-43de1bb7><span class="bk wxbk" data-v-43de1bb7><i aria-hidden="true" class="iconfont icon-weixin" data-v-43de1bb7></i></span>WeChat</a></span> <div class="wx_code_and wx_code2" style="display:none;" data-v-43de1bb7><img src="https://g.oaes.cc/oae/nuxt/img/oaeshare.b5aa6a9.png" alt="" class="code" style="width:100px;" data-v-43de1bb7></div></div></div></div></div></div></div> <div class="backUp" data-v-43de1bb7><div class="go_top" data-v-43de1bb7><i class="iconfont icon-ai-top" data-v-43de1bb7></i></div></div> <div class="footer_text wrapper" data-v-43de1bb7><div class="mgb_10" data-v-43de1bb7> © 2016-2024 OAE Publishing Inc., except certain content provided by third parties </div> <div data-v-43de1bb7><a href="/abouts/privacy" data-v-43de1bb7>Privacy</a> <a href="/abouts/cookies" data-v-43de1bb7>Cookies</a> <a href="/abouts/terms_of_service" data-v-43de1bb7>Terms of Service</a></div></div> <!----></div></div> <div class="backUp" data-v-576acc14><div class="go_top" data-v-576acc14><i class="iconfont icon-tuite1" data-v-576acc14></i> <i class="iconfont icon-iconfonterweima" data-v-576acc14></i> <div class="wx_code" style="display:none;" data-v-576acc14><img src="https://i.oaes.cc/uploads/20230824/5249ddabb6d642558c9843fba9283219.png" alt class="code" style="width: 100px" data-v-576acc14></div> <i class="iconfont icon-ai-top" data-v-576acc14></i></div></div></div> <!----></div></div></div><script>window.__NUXT__=(function(a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z,A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T,U,V,W,X,Y,Z,_,$,aa,ab,ac,ad,ae,af,ag,ah,ai,aj,ak,al,am,an,ao,ap,aq,ar,as,at,au,av,aw,ax,ay,az,aA,aB,aC,aD,aE,aF,aG,aH,aI,aJ,aK,aL,aM,aN,aO,aP,aQ,aR,aS,aT,aU,aV,aW,aX,aY,aZ,a_,a$,ba,bb,bc,bd,be,bf,bg,bh,bi,bj,bk,bl,bm,bn,bo,bp,bq,br,bs,bt,bu,bv,bw,bx,by,bz,bA,bB,bC,bD,bE,bF,bG,bH,bI,bJ,bK,bL,bM,bN,bO,bP,bQ,bR,bS,bT,bU,bV,bW,bX,bY,bZ,b_,b$,ca,cb,cc,cd,ce,cf,cg,ch,ci,cj,ck,cl,cm,cn,co,cp,cq,cr,cs,ct,cu,cv,cw,cx,cy,cz,cA,cB,cC,cD,cE,cF,cG,cH,cI,cJ,cK,cL,cM,cN,cO,cP,cQ,cR,cS,cT,cU,cV,cW,cX,cY,cZ,c_,c$,da,db,dc,dd,de,df,dg,dh,di,dj,dk,dl,dm,dn,do0,dp,dq,dr,ds,dt,du,dv,dw,dx,dy,dz,dA,dB,dC,dD,dE,dF,dG,dH,dI,dJ,dK,dL,dM,dN,dO,dP,dQ,dR,dS,dT,dU,dV,dW,dX,dY,dZ,d_,d$,ea,eb,ec,ed,ee,ef,eg,eh,ei,ej,ek,el,em,en,eo,ep,eq,er,es,et,eu,ev,ew,ex,ey,ez,eA,eB,eC,eD,eE,eF,eG,eH,eI,eJ,eK,eL,eM,eN,eO,eP,eQ,eR,eS,eT,eU,eV,eW,eX,eY,eZ,e_,e$,fa,fb,fc,fd,fe,ff,fg,fh,fi,fj,fk,fl,fm,fn,fo,fp,fq,fr,fs,ft,fu,fv,fw,fx,fy,fz,fA,fB,fC,fD,fE,fF,fG,fH,fI,fJ,fK,fL,fM,fN,fO,fP,fQ,fR,fS,fT,fU,fV,fW,fX,fY,fZ,f_,f$,ga,gb,gc,gd,ge,gf,gg,gh,gi,gj,gk,gl,gm,gn,go,gp,gq,gr,gs,gt,gu,gv,gw,gx,gy,gz,gA,gB,gC,gD,gE,gF,gG,gH,gI,gJ,gK,gL,gM,gN,gO,gP,gQ,gR,gS,gT,gU,gV,gW,gX,gY,gZ,g_,g$,ha,hb,hc,hd,he,hf,hg,hh,hi,hj,hk,hl,hm,hn,ho,hp,hq,hr,hs,ht,hu,hv,hw,hx,hy,hz,hA,hB,hC,hD,hE,hF,hG,hH,hI,hJ,hK,hL,hM,hN,hO,hP,hQ,hR,hS,hT,hU,hV,hW,hX,hY,hZ,h_,h$,ia,ib,ic,id,ie,if0,ig,ih,ii,ij,ik,il,im,in0,io,ip,iq,ir,is,it,iu,iv,iw,ix,iy,iz,iA,iB,iC,iD,iE,iF,iG,iH,iI,iJ,iK,iL,iM,iN,iO,iP,iQ,iR,iS,iT,iU,iV,iW,iX,iY,iZ,i_,i$,ja,jb,jc,jd,je,jf,jg,jh,ji,jj,jk,jl,jm,jn,jo,jp,jq,jr,js,jt,ju,jv,jw,jx,jy,jz,jA,jB,jC,jD,jE,jF,jG,jH,jI,jJ,jK,jL,jM,jN,jO,jP,jQ,jR,jS,jT,jU,jV,jW,jX,jY,jZ,j_,j$,ka,kb,kc,kd,ke,kf,kg,kh,ki,kj,kk,kl,km,kn,ko,kp,kq,kr,ks,kt,ku,kv,kw,kx,ky,kz,kA,kB,kC,kD,kE,kF,kG,kH,kI,kJ,kK,kL,kM,kN,kO,kP,kQ,kR,kS,kT,kU,kV,kW,kX,kY,kZ,k_,k$,la,lb,lc,ld,le,lf,lg,lh,li,lj,lk,ll,lm,ln,lo,lp,lq,lr,ls,lt,lu,lv,lw,lx,ly,lz,lA,lB,lC,lD,lE,lF,lG,lH,lI,lJ,lK,lL,lM,lN,lO,lP,lQ,lR,lS,lT,lU,lV,lW,lX,lY,lZ,l_){return {layout:"oaelayouta",data:[{ArtData:{date_published:"2021-10-12 00:00:00",section:bw,title:aD,doi:"10.20517\u002Fir.2021.02",abstract:cz,pdfurl:"https:\u002F\u002Ff.oaes.cc\u002Fxmlpdf\u002F38ebf366-5fba-47ab-86ac-aa0a6cc124ef\u002F4325.pdf",xmlurl:"https:\u002F\u002Ff.oaes.cc\u002Fxmlpdf\u002F38ebf366-5fba-47ab-86ac-aa0a6cc124ef\u002F4325.xml",elocation_id:c,fpage:H,article_id:b,viewed:6062,downloaded:1496,video_url:cA,volume:d,year:cB,cited:cC,corresponding:"Correspondence Address: Dr. Lei Lei, School of Engineering, University of Guelph, 50 Stone Road East, Guelph, ON N1G 2W1, Canada. E-mail: \u003Cemail\u003Eleil@uoguelph.ca\u003C\u002Femail\u003E",editor:[],editor_time:"\u003Cspan\u003E\u003Cb\u003EReceived:\u003C\u002Fb\u003E 24 Aug 2021 | \u003C\u002Fspan\u003E\u003Cspan\u003E\u003Cb\u003EFirst Decision:\u003C\u002Fb\u003E 14 Sep 2021 | \u003C\u002Fspan\u003E\u003Cspan\u003E\u003Cb\u003ERevised:\u003C\u002Fb\u003E 21 Sep 2021 | \u003C\u002Fspan\u003E\u003Cspan\u003E\u003Cb\u003EAccepted:\u003C\u002Fb\u003E 22 Sep 2021 | \u003C\u002Fspan\u003E\u003Cspan\u003E\u003Cb\u003EPublished:\u003C\u002Fb\u003E 12 Oct 2021\u003C\u002Fspan\u003E",cop_link:"https:\u002F\u002Fcreativecommons.org\u002Flicenses\u002Fby\u002F4.0\u002F",cop_info:"© The Author(s) 2021. \u003Cb\u003EOpen Access\u003C\u002Fb\u003E This article is licensed under a Creative Commons Attribution 4.0 International License (\u003Ca target=\"_blank\"href=\"https:\u002F\u002Fcreativecommons.org\u002Flicenses\u002Fby\u002F4.0\u002F\" xmlns:xlink=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxlink\"\u003Ehttps:\u002F\u002Fcreativecommons.org\u002Flicenses\u002Fby\u002F4.0\u002F\u003C\u002Fa\u003E), which permits unrestricted use, sharing, adaptation, distribution and reproduction in any medium or format, for any purpose, even commercially, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.",keywords:["Federated Learning",cD,"Federated Reinforcement Learning"],issue:d,image:w,tag:" \u003Ci\u003EIntell Robot\u003C\u002Fi\u003E 2021;1(1):18-57.",authors:"Jiaju Qi, ... Kan Zheng",picurl:c,expicurl:c,picabstract:c,interview_pic:a,interview_url:c,review:a,cop_statement:"© The Author(s) 2021.",seo:{title:aD,keywords:a,description:cE},video_img:cF,lpage:57,author:[{base:"Jiaju Qi\u003Csup\u003E1\u003C\u002Fsup\u003E",email:a,orcid:a},{base:"Qihao Zhou\u003Csup\u003E2\u003C\u002Fsup\u003E",email:a,orcid:"https:\u002F\u002Forcid.org\u002F0000-0002-1839-1439"},{base:"Lei Lei\u003Csup\u003E1\u003C\u002Fsup\u003E",email:a,orcid:"https:\u002F\u002Forcid.org\u002F0000-0002-2577-6644"},{base:"Kan Zheng\u003Csup\u003E2\u003C\u002Fsup\u003E",email:a,orcid:a}],specialissue:a,specialinfo:a,date_published_stamp:1633968000,year1:cB,CitedImage:"https:\u002F\u002Fi.oaes.cc\u002Fimages_2018\u002Fjournals\u002FCrossref.png",article_editor:[],editoruser:"\u003Cspan\u003E\u003Cb\u003EAcademic Editor:\u003C\u002Fb\u003E Simon X. Yang | \u003C\u002Fspan\u003E\u003Cspan\u003E\u003Cb\u003ECopy Editor:\u003C\u002Fb\u003E Xi-Jun Chen | \u003C\u002Fspan\u003E\u003Cspan\u003E\u003Cb\u003EProduction Editor:\u003C\u002Fb\u003E Xi-Jun Chen\u003C\u002Fspan\u003E",commentsNums:g,oaestyle:cG,amastyle:cH,ctstyle:cI,acstyle:cJ,copyImage:"https:\u002F\u002Fi.oaes.cc\u002Fimages_2018\u002Fjournals\u002Fccb_4.png",affiliation:[{id:80873,article_id:b,Content:"\u003Csup\u003E1\u003C\u002Fsup\u003E\u003Cinstitution\u003ESchool of Engineering, University of Guelph\u003C\u002Finstitution\u003E, \u003Caddr-line\u003EGuelph, ON N1G 2W1, Canada\u003C\u002Faddr-line\u003E."},{id:80874,article_id:b,Content:"\u003Csup\u003E2\u003C\u002Fsup\u003E\u003Cinstitution\u003EIntelligent Computing and Communications (IC\u003Csup\u003E2\u003C\u002Fsup\u003E) Lab, Beijing University of Posts and Telecommunications\u003C\u002Finstitution\u003E, \u003Caddr-line\u003EBeijing 100876, China\u003C\u002Faddr-line\u003E."}],related:[{article_id:aE,journal_id:l,section_id:r,path:q,journal:n,ar_title:aF,date_published:cK,doi:aG,author:[{first_name:cL,middle_name:a,last_name:cM,ans:i,email:a,bio:a,photoUrl:a},{first_name:cN,middle_name:a,last_name:cO,ans:i,email:cP,bio:a,photoUrl:a},{first_name:cQ,middle_name:a,last_name:cR,ans:i,email:a,bio:a,photoUrl:a},{first_name:cS,middle_name:a,last_name:aH,ans:i,email:a,bio:a,photoUrl:a}]},{article_id:bx,journal_id:l,section_id:r,path:q,journal:n,ar_title:by,date_published:cT,doi:bz,author:[{first_name:cU,middle_name:a,last_name:A,ans:o,email:a,bio:a,photoUrl:a},{first_name:cV,middle_name:a,last_name:I,ans:o,email:a,bio:a,photoUrl:a},{first_name:cW,middle_name:a,last_name:J,ans:o,email:a,bio:a,photoUrl:a},{first_name:aI,middle_name:a,last_name:cX,ans:o,email:a,bio:a,photoUrl:a},{first_name:aH,middle_name:a,last_name:aJ,ans:o,email:a,bio:a,photoUrl:a},{first_name:K,middle_name:a,last_name:A,ans:o,email:a,bio:a,photoUrl:a},{first_name:cY,middle_name:a,last_name:aK,ans:cZ,email:c_,bio:a,photoUrl:a}]},{article_id:bA,journal_id:l,section_id:r,path:q,journal:n,ar_title:bB,date_published:c$,doi:bC,author:[{first_name:da,middle_name:a,last_name:db,ans:o,email:dc,bio:a,photoUrl:a},{first_name:dd,middle_name:a,last_name:J,ans:y,email:a,bio:a,photoUrl:a},{first_name:de,middle_name:a,last_name:I,ans:y,email:a,bio:a,photoUrl:a},{first_name:df,middle_name:a,last_name:dg,ans:o,email:a,bio:a,photoUrl:a}]},{article_id:bD,journal_id:l,section_id:r,path:q,journal:n,ar_title:bE,date_published:dh,doi:bF,author:[{first_name:di,middle_name:a,last_name:dj,ans:o,email:a,bio:a,photoUrl:a},{first_name:dk,middle_name:a,last_name:A,ans:y,email:a,bio:a,photoUrl:a},{first_name:dl,middle_name:a,last_name:dm,ans:y,email:dn,bio:a,photoUrl:a},{first_name:do0,middle_name:a,last_name:aJ,ans:dp,email:a,bio:a,photoUrl:a}]},{article_id:bG,journal_id:l,section_id:r,path:q,journal:n,ar_title:bH,date_published:dq,doi:bI,author:[{first_name:dr,middle_name:a,last_name:aI,ans:i,email:a,bio:a,photoUrl:a},{first_name:ds,middle_name:a,last_name:dt,ans:i,email:du,bio:a,photoUrl:a}]},{article_id:bJ,journal_id:l,section_id:L,path:q,journal:n,ar_title:bK,date_published:dv,doi:bL,author:[{first_name:dw,middle_name:a,last_name:aK,ans:i,email:a,bio:a,photoUrl:a},{first_name:A,middle_name:a,last_name:dx,ans:i,email:a,bio:a,photoUrl:a},{first_name:dy,middle_name:a,last_name:J,ans:i,email:dz,bio:a,photoUrl:a}]},{article_id:bM,journal_id:l,section_id:r,path:q,journal:n,ar_title:bN,date_published:dA,doi:bO,author:[{first_name:dB,middle_name:a,last_name:dC,ans:i,email:a,bio:a,photoUrl:a},{first_name:K,middle_name:a,last_name:K,ans:i,email:a,bio:a,photoUrl:a},{first_name:dD,middle_name:a,last_name:I,ans:i,email:a,bio:a,photoUrl:a}]},{article_id:bP,journal_id:l,section_id:L,path:q,journal:n,ar_title:bQ,date_published:dE,doi:bR,author:[{first_name:dF,middle_name:a,last_name:dG,ans:o,email:a,bio:a,photoUrl:a},{first_name:dH,middle_name:a,last_name:dI,ans:o,email:a,bio:a,photoUrl:a},{first_name:dJ,middle_name:a,last_name:dK,ans:y,email:a,bio:a,photoUrl:a}]}],down:"https:\u002F\u002Ff.oaes.cc\u002Fris\u002F4325.ris",xml:{id:2025,article_id:b,xml_down:M,cite_click:bS,export_click:bT},zan:W,cited_type:"cited",subarray:[],issn:bU,uuid:"a5b57b973fee94cb191c55343b14f6fe",abstractUuid:"efc77e3e87613bd6534f879221f2f549",apiurl:a,api_abstract_url:a,journal_id:aL,journal_path:q},loadingAbs:void 0,loading:X,ArtDataC:{content:"\u003Cdiv id=\"sec1-1\" class=\"article-Section\"\u003E\u003Ch2 \u003E1. Introduction\u003C\u002Fh2\u003E\u003Cp class=\"\"\u003EAs machine learning (ML) develops, it becomes capable of solving increasingly complex problems, such as image recognition, speech recognition, and semantic understanding. Despite the effectiveness of traditional machine learning algorithms in several areas, the researchers found that scenes involving many parties are still difficult to resolve, especially when privacy is concerned. Federated learning (FL), in these cases, has attracted increasing interest among ML researchers. Technically, the FL is a decentralized collaborative approach that allows multiple partners to train data respectively and build a shared model while maintaining privacy. With its innovative learning architecture and concepts, FL provides safer experience exchange services and enhances capabilities of ML in distributed scenarios.\u003C\u002Fp\u003E\u003Cp class=\"\"\u003EIn ML, reinforcement learning (RL) is one of the branches that focuses on how individuals, \u003Ci\u003Ei.e.\u003C\u002Fi\u003E, agents, interact with their environment and maximize some portion of the cumulative reward. The process allows agents to learn to improve their behavior in a trial and error manner. Through a set of policies, they take actions to explore the environment and expect to be rewarded. Research on RL has been hot in recent years, and it has shown great potential in various applications, including games, robotics, communication, and so on.\u003C\u002Fp\u003E\u003Cp class=\"\"\u003EHowever, there are still many problems in the implementation of RL in practical scenarios. For example, considering that in the case of large action space and state space, the performance of agents is vulnerable to collected samples since it is nearly impossible to explore all sampling spaces. In addition, many RL algorithms have the problem of learning efficiency caused by low sample efficiency. Therefore, through information exchange between agents, learning speed can be greatly accelerated. Although distributed RL and parallel RL algorithms\u003Csup\u003E[\u003Ca href=\"#B1\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B1\"\u003E1\u003C\u002Fa\u003E-\u003Ca href=\"#B3\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B3\"\u003E3\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E can be used to address the above problems, they usually need to collect all the data, parameters, or gradients from each agent in a central server for model training. However, one of the important issues is that some tasks need to prevent agent information leakage and protect agent privacy during the application of RL. Agents’ distrust of the central server and the risk of eavesdropping on the transmission of raw data has become a major bottleneck for such RL applications. FL can not only complete information exchange while avoiding privacy disclosure, but also adapt various agents to their different environments. Another problem of RL is how to bridge the simulation-reality gap. Many RL algorithms require pre-training in simulated environments as a prerequisite for application deployment, but one problem is that the simulated environments cannot accurately reflect the environments of the real world. FL can aggregate information from both environments and thus bridge the gap between them. Finally, in some cases, only partial features can be observed by each agent in RL. However, these features, no matter observations or rewards, are not enough to obtain sufficient information required to make decisions. At this time, FL makes it possible to integrate this information through aggregation.\u003C\u002Fp\u003E\u003Cp class=\"\"\u003EThus, the above challenges give rise to the idea of federated reinforcement learning (FRL). As FRL can be considered as an integration of FL and RL under privacy protection, several elements of RL can be presented in FL frameworks to deals with sequential decision-making tasks. For example, these three dimensions of sample, feature and label in FL can be replaced by environment, state and action respectively in FRL. Since FL can be divided into several categories according to the distribution characteristics of data, including horizontal federated learning (HFL) and vertical federated learning (VFL), we can similarly categorize FRL algorithms into horizontal federated reinforcement learning (HFRL) and vertical federated reinforcement learning (VFRL).\u003C\u002Fp\u003E\u003Cp class=\"\"\u003EThough a few survey papers on FL \u003Csup\u003E[\u003Ca href=\"#B4\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B4\"\u003E4\u003C\u002Fa\u003E-\u003Ca href=\"#B6\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B6\"\u003E6\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E have been published, to the best of our knowledge, there are currently no relevant survey papers focused on FRL. Due to the fact that FRL is a relatively new technique, most researchers may be unfamiliar with it to some extent. We hope to identify achievements from current studies and serve as a stepping stone to further research. In summary, this paper sheds light on the following aspects.\u003C\u002Fp\u003E\u003Cul class=\"tipsDecimal\"\u003E\u003Cli\u003E\u003Cp\u003E\u003Ci\u003ESystematic tutorial on FRL methodology.\u003C\u002Fi\u003E As a review focusing on FRL, this paper tries to explain the knowledge about FRL to researchers systematically and in detail. The definition and categories of FRL are introduced firstly, including system model, algorithm process, \u003Ci\u003Eetc.\u003C\u002Fi\u003E In order to explain the framework of HFRL and VFRL and the difference between them clearly, two specific cases are introduced, \u003Ci\u003Ei.e.\u003C\u002Fi\u003E, autonomous driving and smart grid. Moreover, we comprehensively introduce the existing research on FRL’s algorithm design.\u003C\u002Fp\u003E\u003C\u002Fli\u003E\u003Cli\u003E\u003Cp\u003E\u003Ci\u003EComprehensive summary for FRL applications.\u003C\u002Fi\u003E This paper collects a large number of references in the field of FRL, and provides a comprehensive and detailed investigation of the FRL applications in various areas, including edge computing, communications, control optimization, attack detection, and some other applications. For each reference, we discuss the authors’ research ideas and methods, and summarize how the researchers combine the FRL algorithm with the specific practical problems.\u003C\u002Fp\u003E\u003C\u002Fli\u003E\u003Cli\u003E\u003Cp\u003E\u003Ci\u003EOpen issues for future research.\u003C\u002Fi\u003E This paper identifies several open issues for FRL as a guide for further research. The scope covers communication, privacy and security, join and exit mechanisms design, learning convergence and some other issues. We hope that they can broaden the thinking of interested researchers and provide help for further research on FRL.\u003C\u002Fp\u003E\u003C\u002Fli\u003E\u003C\u002Ful\u003E\u003Cp class=\"\"\u003EThe organization of this paper is as follows. To quickly gain a comprehensive understanding of FRL, the paper starts with FL and RL in \u003Ca href=\"#sec1-2\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"sec1-2\"\u003ESection 2\u003C\u002Fa\u003E and \u003Ca href=\"#sec1-3\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"sec1-3\"\u003ESection 3\u003C\u002Fa\u003E, respectively, and extends the discussion further to FRL in \u003Ca href=\"#sec1-4\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"sec1-4\"\u003ESection 4\u003C\u002Fa\u003E. The existing applications of FRL are summarized in \u003Ca href=\"#sec1-5\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"sec1-5\"\u003ESection 5\u003C\u002Fa\u003E. In addition, a few open issues and future research directions for FRL are highlighted in \u003Ca href=\"#sec1-6\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"sec1-6\"\u003ESection 6\u003C\u002Fa\u003E. Finally, the conclusion is given in \u003Ca href=\"#sec1-7\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"sec1-7\"\u003ESection 7\u003C\u002Fa\u003E.\u003C\u002Fp\u003E\u003C\u002Fdiv\u003E\u003Cdiv id=\"sec1-2\" class=\"article-Section\"\u003E\u003Ch2 \u003E2. Federated learning\u003C\u002Fh2\u003E\u003Cdiv id=\"sec2-1\" class=\"article-Section\"\u003E\u003Ch3 \u003E2.1. Federated learning definition and basics\u003C\u002Fh3\u003E\u003Cp class=\"\"\u003EIn general, FL is a ML algorithmic framework that allows multiple parties to perform ML under the requirements of privacy protection, data security, and regulations\u003Csup\u003E[\u003Ca href=\"#B7\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B7\"\u003E7\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E. In FL architecture, model construction includes two processes: model training and model inference. It is possible to exchange information about the model between parties during training, but not the data itself, so that data privacy will not be compromised in any way. An individual party or multiple parties can possess and maintain the trained model. In the process of model aggregation, more data instances collected from various parties contribute to updating the model. As the last step, a fair value-distribution mechanism should be used to share the profits obtained by the collaborative model\u003Csup\u003E[\u003Ca href=\"#B8\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B8\"\u003E8\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E. The well-designed mechanism enables the federation sustainability. Aiming to build a joint ML model without sharing local data, FL involves technologies from different research fields such as distributed systems, information communication, ML and cryptography\u003Csup\u003E[\u003Ca href=\"#B9\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B9\"\u003E9\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E. FL has the following characteristics as a result of these techniques, \u003Ci\u003Ei.e.\u003C\u002Fi\u003E,\u003C\u002Fp\u003E\u003Cul class=\"tipsDisc\"\u003E\u003Cli\u003E\u003Cp\u003EDistribution. There are two or more parties that hope to jointly build a model to tackle similar tasks. Each party holds independent data and would like to use it for model training.\u003C\u002Fp\u003E\u003C\u002Fli\u003E\u003Cli\u003E\u003Cp\u003EData protection. The data held by each party does not need to be sent to the other during the training of the model. The learned profits or experiences are conveyed through model parameters that do not involve privacy.\u003C\u002Fp\u003E\u003C\u002Fli\u003E\u003Cli\u003E\u003Cp\u003ESecure communication. The model is able to be transmitted between parties with the support of an encryption scheme. The original data cannot be inferred even if it is eavesdropped during transmission.\u003C\u002Fp\u003E\u003C\u002Fli\u003E\u003Cli\u003E\u003Cp\u003EGenerality. It is possible to apply FL to different data structures and institutions without regard to domains or algorithms.\u003C\u002Fp\u003E\u003C\u002Fli\u003E\u003Cli\u003E\u003Cp\u003EGuaranteed performance. The performance of the resulting model is very close to that of the ideal model established with all data transferred to one centralized party.\u003C\u002Fp\u003E\u003C\u002Fli\u003E\u003Cli\u003E\u003Cp\u003EStatus equality. To ensure the fairness of cooperation, all participating parties are on an equal footing. The shared model can be used by each party to improve its local models when needed.\u003C\u002Fp\u003E\u003C\u002Fli\u003E\u003C\u002Ful\u003E\u003Cp class=\"\"\u003EA formal definition of FL is presented as follows. Consider that there are \u003Ci\u003EN\u003C\u002Fi\u003E parties \u003Cinline-formula\u003E\u003Ctex-math id=\"M1\"\u003E$$ \\left\\{\\mathcal{F}_i\\right\\}_{i=1}^N $$\u003C\u002Ftex-math\u003E\u003C\u002Finline-formula\u003E interested in establishing and training a cooperative ML model. Each party has their respective datasets \u003Cinline-formula\u003E\u003Ctex-math id=\"M1\"\u003E$$ \\mathcal{D}_i $$\u003C\u002Ftex-math\u003E\u003C\u002Finline-formula\u003E\r\n. Traditional ML approaches consist of collecting all data \u003Cinline-formula\u003E\u003Ctex-math id=\"M1\"\u003E$$ \\left\\{\\mathcal{D}_i\\right\\}_{i=1}^N $$\u003C\u002Ftex-math\u003E\u003C\u002Finline-formula\u003E together to form a centralized dataset \u003Cinline-formula\u003E\u003Ctex-math id=\"M1\"\u003E$$ \\mathbb{R} $$\u003C\u002Ftex-math\u003E\u003C\u002Finline-formula\u003E at one data server. The expected model \u003Cinline-formula\u003E\u003Ctex-math id=\"M1\"\u003E$$ \\mathcal{M}_{S U M} $$\u003C\u002Ftex-math\u003E\u003C\u002Finline-formula\u003E is trained by using the dataset \u003Cinline-formula\u003E\u003Ctex-math id=\"M1\"\u003E$$ \\mathbb{R} $$\u003C\u002Ftex-math\u003E\u003C\u002Finline-formula\u003E. On the other hand, FL is a reform of ML process in which the participants \u003Cinline-formula\u003E\u003Ctex-math id=\"M1\"\u003E$$ \\mathcal{F}_i $$\u003C\u002Ftex-math\u003E\u003C\u002Finline-formula\u003E with data \u003Cinline-formula\u003E\u003Ctex-math id=\"M1\"\u003E$$ \\mathcal{D}_i $$\u003C\u002Ftex-math\u003E\u003C\u002Finline-formula\u003E jointly train a target model \u003Cinline-formula\u003E\u003Ctex-math id=\"M1\"\u003E$$ \\mathcal{M}_{F E D} $$\u003C\u002Ftex-math\u003E\u003C\u002Finline-formula\u003E without aggregating their data. Respective data \u003Cinline-formula\u003E\u003Ctex-math id=\"M1\"\u003E$$ \\mathcal{D}_i $$\u003C\u002Ftex-math\u003E\u003C\u002Finline-formula\u003E is stored on the owner \u003Cinline-formula\u003E\u003Ctex-math id=\"M1\"\u003E$$ \\mathcal{F}_i $$\u003C\u002Ftex-math\u003E\u003C\u002Finline-formula\u003E and not exposed to others. In addition, the performance measure of the federated model \u003Cinline-formula\u003E\u003Ctex-math id=\"M1\"\u003E$$ \\mathcal{M}_{F E D} $$\u003C\u002Ftex-math\u003E\u003C\u002Finline-formula\u003E is denoted as \u003Cinline-formula\u003E\u003Ctex-math id=\"M1\"\u003E$$ \\mathcal{V}_{F E D} $$\u003C\u002Ftex-math\u003E\u003C\u002Finline-formula\u003E, including accuracy, recall, and F1-score, \u003Ci\u003Eetc\u003C\u002Fi\u003E, which should be a good approximation of the performance of the expected model \u003Cinline-formula\u003E\u003Ctex-math id=\"M1\"\u003E$$ \\mathcal{M}_{S U M} $$\u003C\u002Ftex-math\u003E\u003C\u002Finline-formula\u003E, \u003Ci\u003Ei.e.\u003C\u002Fi\u003E, \u003Cinline-formula\u003E\u003Ctex-math id=\"M1\"\u003E$$ \\mathcal{V}_{S U M} $$\u003C\u002Ftex-math\u003E\u003C\u002Finline-formula\u003E. In order to quantify differences in performance, let \u003Ci\u003Eδ\u003C\u002Fi\u003E be a non-negative real number and define the federated learning model \u003Cinline-formula\u003E\u003Ctex-math id=\"M1\"\u003E$$ \\mathcal{M}_{F E D} $$\u003C\u002Ftex-math\u003E\u003C\u002Finline-formula\u003E has \u003Ci\u003Eδ\u003C\u002Fi\u003E performance loss if\u003C\u002Fp\u003E\u003Cp class=\"\"\u003E\u003Cdiv class=\"disp-formula\"\u003E\u003Clabel\u003E\u003C\u002Flabel\u003E\u003Ctex-math id=\"E1\"\u003E $$ \\left |\\mathcal{V}_{SUM} - \\mathcal{V}_{FED}\\right |< \\delta. $$ \u003C\u002Ftex-math\u003E\u003C\u002Fdiv\u003E\u003C\u002Fp\u003E\u003Cp class=\"\"\u003ESpecifically, the FL model hold by each party is basically the same as the ML model, and it also includes a set of parameters \u003Ci\u003Ew\u003Csub\u003Ei\u003C\u002Fsub\u003E\u003C\u002Fi\u003E which is learned based on the respective training dataset \u003Cinline-formula\u003E\u003Ctex-math id=\"M1\"\u003E$$ \\mathcal{D}_i $$\u003C\u002Ftex-math\u003E\u003C\u002Finline-formula\u003E\u003Csup\u003E[\u003Ca href=\"#B10\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B10\"\u003E10\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E. A training sample \u003Ci\u003Ej\u003C\u002Fi\u003E typically contains both the input of FL model and the expected output. For example, in the case of image recognition, the input is the pixel of the image, and the expected output is the correct label. The learning process is facilitated by defining a loss function on parameter vector \u003Ci\u003Ew\u003C\u002Fi\u003E for every data sample \u003Ci\u003Ej.\u003C\u002Fi\u003E The loss function represents the error of the model in relation to the training data. For each dataset \u003Cinline-formula\u003E\u003Ctex-math id=\"M1\"\u003E$$ \\mathcal{D}_i $$\u003C\u002Ftex-math\u003E\u003C\u002Finline-formula\u003E at party \u003Cinline-formula\u003E\u003Ctex-math id=\"M1\"\u003E$$ \\mathcal{F}_i $$\u003C\u002Ftex-math\u003E\u003C\u002Finline-formula\u003E, the loss function on the collection of training samples can be defined as follow\u003Csup\u003E[\u003Ca href=\"#B11\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B11\"\u003E11\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E,\u003C\u002Fp\u003E\u003Cp class=\"\"\u003E\u003Cdiv class=\"disp-formula\"\u003E\u003Clabel\u003E\u003C\u002Flabel\u003E\u003Ctex-math id=\"E1\"\u003E $$ F_{i}(w)=\\frac{1}{\\left|\\mathcal{D}_{i}\\right|} \\sum_{j \\in \\mathcal{D}_{i}} f_{j}(w), $$ \u003C\u002Ftex-math\u003E\u003C\u002Fdiv\u003E\u003C\u002Fp\u003E\u003Cp class=\"\"\u003Ewhere \u003Ci\u003Ef\u003Csub\u003Ej\u003C\u002Fsub\u003E\u003C\u002Fi\u003E (\u003Ci\u003Ew\u003C\u002Fi\u003E) denotes the loss function of the sample \u003Ci\u003Ej\u003C\u002Fi\u003E with the given model parameter vector \u003Ci\u003Ew\u003C\u002Fi\u003E and |·| represents the size of the set. In FL, it is important to define the global loss function since multiple parties are jointly training a global statistical model without sharing a dataset. The common global loss function on all the distributed datasets is given by,\u003C\u002Fp\u003E\u003Cp class=\"\"\u003E\u003Cdiv class=\"disp-formula\"\u003E\u003Clabel\u003E\u003C\u002Flabel\u003E\u003Ctex-math id=\"E1\"\u003E $$ F_g(w)=\\sum_{i=1}^{N} \\eta _{i}F_{i}(w), $$ \u003C\u002Ftex-math\u003E\u003C\u002Fdiv\u003E\u003C\u002Fp\u003E\u003Cp class=\"\"\u003Ewhere \u003Ci\u003Eη\u003Csub\u003Ei\u003C\u002Fsub\u003E\u003C\u002Fi\u003E indicates the relative impact of each party on the global model. In addition, \u003Ci\u003Eη\u003Csub\u003Ei\u003C\u002Fsub\u003E >\u003C\u002Fi\u003E 0 and \u003Cinline-formula\u003E\u003Ctex-math id=\"M1\"\u003E$$ \\sum_{i=1}^{N} \\eta_{i}=1 $$\u003C\u002Ftex-math\u003E\u003C\u002Finline-formula\u003E. This term \u003Ci\u003Eη\u003C\u002Fi\u003E can be flexibly defined to improve training efficiency. The natural setting is averaging between parties, \u003Ci\u003E\u003Ci\u003Ei.e.\u003C\u002Fi\u003E, η =\u003C\u002Fi\u003E 1\u003Ci\u003E\u002FN.\u003C\u002Fi\u003E The goal of the learning problem is to find the optimal parameter that minimizes the global loss function \u003Ci\u003EF\u003Csub\u003Eg\u003C\u002Fsub\u003E\u003C\u002Fi\u003E (\u003Ci\u003Ew\u003C\u002Fi\u003E). It is presented in formula form,\u003C\u002Fp\u003E\u003Cp class=\"\"\u003E\u003Cdiv class=\"disp-formula\"\u003E\u003Clabel\u003E\u003C\u002Flabel\u003E\u003Ctex-math id=\"E1\"\u003E $$ w^{*}=\\underset{w}{\\arg \\min } F_{g}(w). $$ \u003C\u002Ftex-math\u003E\u003C\u002Fdiv\u003E\u003C\u002Fp\u003E\u003Cp class=\"\"\u003EConsidering that FL is designed to adapt to various scenarios, the objective function may be appropriate depending on the application. However, a closed-form solution is almost impossible to find with most FL models due to their inherent complexity. A canonical federated averaging algorithm (FedAvg) based on gradient-descent techniques is presented in the study from McMahan \u003Ci\u003Eet al.\u003C\u002Fi\u003E\u003Csup\u003E[\u003Ca href=\"#B12\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B12\"\u003E12\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E, which is widely used in FL systems. In general, the coordinator has the initial FL model and is responsible for aggregation. Distributed participants know the optimizer settings and can upload information that does not affect privacy. The specific architecture of FL will be discussed in the next subsection. Each participant uses their local data to perform one step (or multiple steps) of gradient descent on the current model parameter \u003Ci\u003Ew̄\u003C\u002Fi\u003E(\u003Ci\u003Et\u003C\u002Fi\u003E) according to the following formula,\u003C\u002Fp\u003E\u003Cp class=\"\"\u003E\u003Cdiv class=\"disp-formula\"\u003E\u003Clabel\u003E\u003C\u002Flabel\u003E\u003Ctex-math id=\"E1\"\u003E $$ \\forall_i,w_i(t+1)=\\bar{w} (t)-\\gamma \\nabla F_i(\\bar{w}(t)), $$ \u003C\u002Ftex-math\u003E\u003C\u002Fdiv\u003E\u003C\u002Fp\u003E\u003Cp class=\"\"\u003Ewhere \u003Ci\u003Eγ\u003C\u002Fi\u003E denotes a fixed learning rate of each gradient descent. After receiving the local parameters from participants, the central coordinator updates the global model using a weighted average, \u003Ci\u003Ei.e.\u003C\u002Fi\u003E,\u003C\u002Fp\u003E\u003Cp class=\"\"\u003E\u003Cdiv class=\"disp-formula\"\u003E\u003Clabel\u003E\u003C\u002Flabel\u003E\u003Ctex-math id=\"E1\"\u003E $$ \\bar{w}_g(t+1)=\\sum_{i=1}^{N} \\frac{n_i}{n} w_i(t+1), $$ \u003C\u002Ftex-math\u003E\u003C\u002Fdiv\u003E\u003C\u002Fp\u003E\u003Cp class=\"\"\u003Ewhere \u003Ci\u003En\u003Csub\u003Ei\u003C\u002Fsub\u003E\u003C\u002Fi\u003E indicates the number of training data samples of the \u003Ci\u003Ei\u003C\u002Fi\u003E-th participant has and \u003Ci\u003En\u003C\u002Fi\u003E denotes the total number of samples contained in all the datasets. Finally, the coordinator sends the aggregated model weights \u003Ci\u003Ew̄\u003Csub\u003Eg\u003C\u002Fsub\u003E\u003C\u002Fi\u003E (\u003Ci\u003Et\u003C\u002Fi\u003E + 1) back to the participants. The aggregation process is performed at a predetermined interval or iteration round. Additionally, FL leverages privacy-preserving techniques to prevent the leakage of gradients or model weights. For example, the existing encryption algorithms are added on top of the original FedAvg to provide secure FL\u003Csup\u003E[\u003Ca href=\"#B13\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B13\"\u003E13\u003C\u002Fa\u003E,\u003Ca href=\"#B14\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B14\"\u003E14\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E.\u003C\u002Fp\u003E\u003C\u002Fdiv\u003E\u003Cdiv id=\"sec2-2\" class=\"article-Section\"\u003E\u003Ch3 \u003E2.2. Architecture of federated learning\u003C\u002Fh3\u003E\u003Cp class=\"\"\u003EAccording to the application characteristics, the architecture of FL can be divided into two types\u003Csup\u003E[\u003Ca href=\"#B7\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B7\"\u003E7\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E, \u003Ci\u003Ei.e.\u003C\u002Fi\u003E, client-server model and peer-to-peer model.\u003C\u002Fp\u003E\u003Cp class=\"\"\u003EAs shown in \u003Ca href=\"#fig1\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"fig1\"\u003EFigure 1\u003C\u002Fa\u003E, there are two major components in the client-server model, \u003Ci\u003Ei.e.\u003C\u002Fi\u003E, participants and coordinators. The participants are the data owners and can perform local model training and updates. In different scenarios, the participants are made up of different devices, the vehicles in the internet of vehicles (IoV), or the smart devices in the IoT. In addition, participants usually possess at least two characteristics. Firstly, each participant has a certain level of hardware performance, including computation power, communication and storage. The hardware capabilities ensure that the FL algorithm operates normally. Secondly, participants are independent of one another and located in a wide geographic area. In the client-server model, coordinator can be considered as a central aggregation server, which can initialize a model and aggregate model updates from participants\u003Csup\u003E[\u003Ca href=\"#B12\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B12\"\u003E12\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E. As participants train both based on local data sets concurrently and share their experience through the coordinator with the model aggregation mechanism, it will greatly enhance the efficiency of the training and enhance the performance of the model. However, since participants won’t be able to communicate directly, the coordinator must perform well to train the global model and maintain communication with all participants. Therefore, the model has security challenges such as a single point of failure. If the coordinator fails to complete the model aggregation task, the local model of participant has difficulty meeting target performance. The basic workflow of the client-server model can be summarized in the following five steps. The process continues to repeat the steps from 2 to 5 until the model converges, or until the maximum number of iterations is reached.\u003C\u002Fp\u003E\u003Cdiv class=\"Figure-block\" id=\"fig1\"\u003E\u003Cdiv xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxhtml\" class=\"article-figure-image\"\u003E\u003Ca href=\"\u002Farticles\u002Fir.2021.02\u002Fimage\u002Ffig1\" class=\"Article-img\" alt=\"\" target=\"_blank\"\u003E\u003Cimg alt=\"Federated reinforcement learning: techniques, applications, and open challenges\" src=\"https:\u002F\u002Fimage.oaes.cc\u002F38ebf366-5fba-47ab-86ac-aa0a6cc124ef\u002F4325.fig.1.jpg\" class=\"\" title=\"\" alt=\"\" \u002F\u003E\u003C\u002Fa\u003E\u003C\u002Fdiv\u003E\u003Cdiv class=\"article-figure-note\"\u003E\u003Cp class=\"figure-note\"\u003E\u003C\u002Fp\u003E\u003Cp class=\"figure-note\"\u003EFigure 1. An example of federated learning architecture: Client-Server Model.\u003C\u002Fp\u003E\u003C\u002Fdiv\u003E\u003C\u002Fdiv\u003E\u003Cul class=\"tipsDisc\"\u003E\u003Cli\u003E\u003Cp\u003EStep 1: In the process of setting up a client-server-based learning system, the coordinator creates an initial model and sends it to each participant. Those participants who join later can access the latest global model.\u003C\u002Fp\u003E\u003C\u002Fli\u003E\u003Cli\u003E\u003Cp\u003EStep 2: Each participant trains a local model based on their respective dataset.\u003C\u002Fp\u003E\u003C\u002Fli\u003E\u003Cli\u003E\u003Cp\u003EStep 3: Updates of model parameters are sent to the central coordinator.\u003C\u002Fp\u003E\u003C\u002Fli\u003E\u003Cli\u003E\u003Cp\u003EStep 4: The coordinator combines the model updates using specific aggregation algorithms.\u003C\u002Fp\u003E\u003C\u002Fli\u003E\u003Cli\u003E\u003Cp\u003EStep 5: The combined model is sent back to the corresponding participant.\u003C\u002Fp\u003E\u003C\u002Fli\u003E\u003C\u002Ful\u003E\u003Cp class=\"\"\u003EThe peer-to-peer based FL architecture does not require a coordinator as illustrated in \u003Ca href=\"#fig2\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"fig2\"\u003EFigure 2\u003C\u002Fa\u003E. Participants can directly communicate with each other without relying on a third party. Therefore, each participant in the architecture is equal and can initiate a model exchange request with anyone else. As there is no central server, participants must agree in advance on the order in which model should be sent and received. Common transfer modes are cyclic transfer and random transfer. The peer-to-peer model is suitable and important for specific scenarios. For example, multiple banks jointly develop an ML-based attack detection model. With FL, there is no need to establish a central authority between banks to manage and store all attack patterns. The attack record is only held at the server of the attacked bank, but the detection experience can be shared with other participants through model parameters. The FL procedure of the peer-to-peer model is simpler than that of the client-server model.\u003C\u002Fp\u003E\u003Cdiv class=\"Figure-block\" id=\"fig2\"\u003E\u003Cdiv xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxhtml\" class=\"article-figure-image\"\u003E\u003Ca href=\"\u002Farticles\u002Fir.2021.02\u002Fimage\u002Ffig2\" class=\"Article-img\" alt=\"\" target=\"_blank\"\u003E\u003Cimg alt=\"Federated reinforcement learning: techniques, applications, and open challenges\" src=\"https:\u002F\u002Fimage.oaes.cc\u002F38ebf366-5fba-47ab-86ac-aa0a6cc124ef\u002F4325.fig.2.jpg\" class=\"\" title=\"\" alt=\"\" \u002F\u003E\u003C\u002Fa\u003E\u003C\u002Fdiv\u003E\u003Cdiv class=\"article-figure-note\"\u003E\u003Cp class=\"figure-note\"\u003E\u003C\u002Fp\u003E\u003Cp class=\"figure-note\"\u003EFigure 2. An example of federated learning architecture: Peer-to-Peer Model.\u003C\u002Fp\u003E\u003C\u002Fdiv\u003E\u003C\u002Fdiv\u003E\u003Cul class=\"tipsDisc\"\u003E\u003Cli\u003E\u003Cp\u003EStep 1: Each participant initializes their local model depending on its needs.\u003C\u002Fp\u003E\u003C\u002Fli\u003E\u003Cli\u003E\u003Cp\u003EStep 2: Train the local model based on the respective dataset.\u003C\u002Fp\u003E\u003C\u002Fli\u003E\u003Cli\u003E\u003Cp\u003EStep 3: Create a model exchange request to other participants and send local model parameters.\u003C\u002Fp\u003E\u003C\u002Fli\u003E\u003Cli\u003E\u003Cp\u003EStep 4: Aggregate the model received from other participants into the local model.\u003C\u002Fp\u003E\u003C\u002Fli\u003E\u003C\u002Ful\u003E\u003Cp class=\"\"\u003EThe termination conditions of the process can be designed by participants according to their needs. This architecture further guarantees security since there is no centralized server. However, it requires more communication resources and potentially increased computation for more messages.\u003C\u002Fp\u003E\u003C\u002Fdiv\u003E\u003Cdiv id=\"sec2-3\" class=\"article-Section\"\u003E\u003Ch3 \u003E2.3. Categories of federated learning\u003C\u002Fh3\u003E\u003Cp class=\"\"\u003EBased on the way data is partitioned within a feature and sample space, FL may be classified as HFL, VFL, or federated transfer learning (FTL)\u003Csup\u003E[\u003Ca href=\"#B8\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B8\"\u003E8\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E. In \u003Ca href=\"#fig3\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"fig3\"\u003EFigure 3\u003C\u002Fa\u003E, \u003Ca href=\"#fig4\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"fig4\"\u003EFigure 4\u003C\u002Fa\u003E, and \u003Ca href=\"#fig5\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"fig5\"\u003EFigure 5\u003C\u002Fa\u003E, these three federated learning categories for a two-party scenario are illustrated. In order to define each category more clearly, some parameters are formalized. We suppose that the \u003Ci\u003Ei\u003C\u002Fi\u003E-th participant has its own dataset \u003Cinline-formula\u003E\u003Ctex-math id=\"M1\"\u003E$$ \\mathcal{D}_i $$\u003C\u002Ftex-math\u003E\u003C\u002Finline-formula\u003E. The dataset includes three types of data, \u003Ci\u003Ei.e.\u003C\u002Fi\u003E, the feature space \u003Cinline-formula\u003E\u003Ctex-math id=\"M1\"\u003E$$ \\mathcal{X}_i $$\u003C\u002Ftex-math\u003E\u003C\u002Finline-formula\u003E, the label space \u003Cinline-formula\u003E\u003Ctex-math id=\"M1\"\u003E$$ \\mathcal{Y}_i $$\u003C\u002Ftex-math\u003E\u003C\u002Finline-formula\u003E and the sample ID space \u003Cinline-formula\u003E\u003Ctex-math id=\"M1\"\u003E$$ \\mathcal{I}_i $$\u003C\u002Ftex-math\u003E\u003C\u002Finline-formula\u003E. In particular, the feature space \u003Cinline-formula\u003E\u003Ctex-math id=\"M1\"\u003E$$ \\mathcal{X}_i $$\u003C\u002Ftex-math\u003E\u003C\u002Finline-formula\u003E is a high-dimensional abstraction of the variables within each pattern sample. Various features are used to characterize data held by the participant. All categories of association between input and task target are collected in the label space \u003Cinline-formula\u003E\u003Ctex-math id=\"M1\"\u003E$$ \\mathcal{Y}_i $$\u003C\u002Ftex-math\u003E\u003C\u002Finline-formula\u003E. The sample ID space \u003Cinline-formula\u003E\u003Ctex-math id=\"M1\"\u003E$$ \\mathcal{I}_i $$\u003C\u002Ftex-math\u003E\u003C\u002Finline-formula\u003E is added in consideration of actual application requirements. The identification can facilitate the discovery of possible connections among different features of the same individual.\u003C\u002Fp\u003E\u003Cdiv class=\"Figure-block\" id=\"fig3\"\u003E\u003Cdiv xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxhtml\" class=\"article-figure-image\"\u003E\u003Ca href=\"\u002Farticles\u002Fir.2021.02\u002Fimage\u002Ffig3\" class=\"Article-img\" alt=\"\" target=\"_blank\"\u003E\u003Cimg alt=\"Federated reinforcement learning: techniques, applications, and open challenges\" src=\"https:\u002F\u002Fimage.oaes.cc\u002F38ebf366-5fba-47ab-86ac-aa0a6cc124ef\u002F4325.fig.3.jpg\" class=\"\" title=\"\" alt=\"\" \u002F\u003E\u003C\u002Fa\u003E\u003C\u002Fdiv\u003E\u003Cdiv class=\"article-figure-note\"\u003E\u003Cp class=\"figure-note\"\u003E\u003C\u002Fp\u003E\u003Cp class=\"figure-note\"\u003EFigure 3. Illustration of horizontal federated learning.\u003C\u002Fp\u003E\u003C\u002Fdiv\u003E\u003C\u002Fdiv\u003E\u003Cdiv class=\"Figure-block\" id=\"fig4\"\u003E\u003Cdiv xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxhtml\" class=\"article-figure-image\"\u003E\u003Ca href=\"\u002Farticles\u002Fir.2021.02\u002Fimage\u002Ffig4\" class=\"Article-img\" alt=\"\" target=\"_blank\"\u003E\u003Cimg alt=\"Federated reinforcement learning: techniques, applications, and open challenges\" src=\"https:\u002F\u002Fimage.oaes.cc\u002F38ebf366-5fba-47ab-86ac-aa0a6cc124ef\u002F4325.fig.4.jpg\" class=\"\" title=\"\" alt=\"\" \u002F\u003E\u003C\u002Fa\u003E\u003C\u002Fdiv\u003E\u003Cdiv class=\"article-figure-note\"\u003E\u003Cp class=\"figure-note\"\u003E\u003C\u002Fp\u003E\u003Cp class=\"figure-note\"\u003EFigure 4. Illustration of vertical federated learning.\u003C\u002Fp\u003E\u003C\u002Fdiv\u003E\u003C\u002Fdiv\u003E\u003Cdiv class=\"Figure-block\" id=\"fig5\"\u003E\u003Cdiv xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxhtml\" class=\"article-figure-image\"\u003E\u003Ca href=\"\u002Farticles\u002Fir.2021.02\u002Fimage\u002Ffig5\" class=\"Article-img\" alt=\"\" target=\"_blank\"\u003E\u003Cimg alt=\"Federated reinforcement learning: techniques, applications, and open challenges\" src=\"https:\u002F\u002Fimage.oaes.cc\u002F38ebf366-5fba-47ab-86ac-aa0a6cc124ef\u002F4325.fig.5.jpg\" class=\"\" title=\"\" alt=\"\" \u002F\u003E\u003C\u002Fa\u003E\u003C\u002Fdiv\u003E\u003Cdiv class=\"article-figure-note\"\u003E\u003Cp class=\"figure-note\"\u003E\u003C\u002Fp\u003E\u003Cp class=\"figure-note\"\u003EFigure 5. Illustration of federated transfer learning.\u003C\u002Fp\u003E\u003C\u002Fdiv\u003E\u003C\u002Fdiv\u003E\u003Cp class=\"\"\u003EHFL indicates the case in which participants have their dataset with a small sample overlap, while most of the data features are aligned. The word ”horizontal” is derived from the term ”horizontal partition”. This is similar to the situation where data is horizontally partitioned inside the traditional tabular view of a database. As shown in \u003Ca href=\"#fig3\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"fig3\"\u003EFigure 3\u003C\u002Fa\u003E, the training data of two participants with the aligned features is horizontally partitioned for HFL. A cuboid with a red border represents the training data required in learning. Especially, a row corresponds to complete data features collected from a sampling ID. Columns correspond to different sampling IDs. The overlapping part means there can be more than one participant sampling the same ID. In addition, HFL is also known as feature-aligned FL, sample-partitioned FL, or example-partitioned FL. Formally, the conditions for HFL can be summarized as\u003C\u002Fp\u003E\u003Cp class=\"\"\u003E\u003Cdiv class=\"disp-formula\"\u003E\u003Clabel\u003E\u003C\u002Flabel\u003E\u003Ctex-math id=\"E1\"\u003E $$ \\mathcal{X}_i=\\mathcal{X}_j, \\mathcal{Y}_i=\\mathcal{Y}_j, \\mathcal{I}_i\\not=\\mathcal{I}_j, \\forall\\mathcal{D}_i,\\mathcal{D}_j,i\\not=j, $$ \u003C\u002Ftex-math\u003E\u003C\u002Fdiv\u003E\u003C\u002Fp\u003E\u003Cp class=\"\"\u003Ewhere \u003Cinline-formula\u003E\u003Ctex-math id=\"M1\"\u003E$$ \\mathcal{D}_i $$\u003C\u002Ftex-math\u003E\u003C\u002Finline-formula\u003E and \u003Cinline-formula\u003E\u003Ctex-math id=\"M1\"\u003E$$ \\mathcal{Y}_j $$\u003C\u002Ftex-math\u003E\u003C\u002Finline-formula\u003E denote the datasets of participant \u003Ci\u003Ei\u003C\u002Fi\u003E and participant \u003Ci\u003Ej\u003C\u002Fi\u003E respectively. In both datasets, the feature space \u003Cinline-formula\u003E\u003Ctex-math id=\"M1\"\u003E$$ \\mathcal{X} $$\u003C\u002Ftex-math\u003E\u003C\u002Finline-formula\u003E and label space \u003Cinline-formula\u003E\u003Ctex-math id=\"M1\"\u003E$$ \\mathcal{Y} $$\u003C\u002Ftex-math\u003E\u003C\u002Finline-formula\u003E are assumed to be the same, but the sampling ID space \u003Cinline-formula\u003E\u003Ctex-math id=\"M1\"\u003E$$ \\mathcal{I} $$\u003C\u002Ftex-math\u003E\u003C\u002Finline-formula\u003E is assumed to be different. The objective of HFL is to increase the amount of data with similar features, while keeping the original data from being transmitted, thus improving the performance of the training model. Participants can still perform feature extraction and classification if new samples appear. HFL can be applied in various fields because it benefits from privacy protection and experience sharing\u003Csup\u003E[\u003Ca href=\"#B15\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B15\"\u003E15\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E. For example, regional hospitals may receive different patients, and the clinical manifestations of patients with the same disease are similar. It is imperative to protect the patient’s privacy, so data about patients cannot be shared. HFL provides a good way to jointly build a ML model for identifying diseases between hospitals.\u003C\u002Fp\u003E\u003Cp class=\"\"\u003EVFL refers to the case where different participants with various targets usually have datasets that have different feature spaces, but those participants may serve a large number of common users. The heterogeneous feature spaces of distributed datasets can be used to build more general and accurate models without releasing the private data. The word “vertical” derives from the term “vertical partition”, which is also widely used in reference to the traditional tabular view. Different from HFL, the training data of each participant are divided vertically. \u003Ca href=\"#fig4\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"fig4\"\u003EFigure 4\u003C\u002Fa\u003E shows an example of VFL in a two-party scenario. The important step in VFL is to align samples, \u003Ci\u003Ei.e.\u003C\u002Fi\u003E, determine which samples are common to the participants. Although the features of the data are different, the sampled identity can be verified with the same ID. Therefore, VFL is also called sample-aligned FL or feature-partitioned FL. Multiple features are vertically divided into one or more columns. The common samples exposed to different participants can be marked by different labels. The formal definition of VFL’s applicable scenario is given.\u003C\u002Fp\u003E\u003Cp class=\"\"\u003E\u003Cdiv class=\"disp-formula\"\u003E\u003Clabel\u003E\u003C\u002Flabel\u003E\u003Ctex-math id=\"E1\"\u003E $$ \\mathcal{X}_i\\not=\\mathcal{X}_j, \\mathcal{Y}_i\\not=\\mathcal{Y}_j, \\mathcal{I}_i=\\mathcal{I}_j, \\forall\\mathcal{D}_i,\\mathcal{D}_j,i\\not=j, $$ \u003C\u002Ftex-math\u003E\u003C\u002Fdiv\u003E\u003C\u002Fp\u003E\u003Cp class=\"\"\u003Ewhere \u003Cinline-formula\u003E\u003Ctex-math id=\"M1\"\u003E$$ \\mathcal{D}_i $$\u003C\u002Ftex-math\u003E\u003C\u002Finline-formula\u003E and \u003Cinline-formula\u003E\u003Ctex-math id=\"M1\"\u003E$$ \\mathcal{D}_j $$\u003C\u002Ftex-math\u003E\u003C\u002Finline-formula\u003E represent the dataset held by different participants, and the data feature space pair \u003Cinline-formula\u003E\u003Ctex-math id=\"M1\"\u003E$$ (\\mathcal{X}_i,\\mathcal{X}_j) $$\u003C\u002Ftex-math\u003E\u003C\u002Finline-formula\u003E and label space pair \u003Cinline-formula\u003E\u003Ctex-math id=\"M1\"\u003E$$ (\\mathcal{Y}_i,\\mathcal{Y}_j) $$\u003C\u002Ftex-math\u003E\u003C\u002Finline-formula\u003E are assumed to be different. The sample ID space \u003Cinline-formula\u003E\u003Ctex-math id=\"M1\"\u003E$$ \\mathcal{I}_i $$\u003C\u002Ftex-math\u003E\u003C\u002Finline-formula\u003E and \u003Cinline-formula\u003E\u003Ctex-math id=\"M1\"\u003E$$ \\mathcal{I}_j $$\u003C\u002Ftex-math\u003E\u003C\u002Finline-formula\u003E are assumed to be the same. It is the objective of VFL to collaborate in building a shared ML model by exploiting all features collected by each participant. The fusion and analysis of existing features can even infer new features. An example of the application of VFL is the evaluation of trust. Banks and e-commerce companies can create a ML model for trust evaluation for users. The credit card record held at the bank and the purchasing history held at the e-commerce company for the set of same users can be used as training data to improve the evaluation model.\u003C\u002Fp\u003E\u003Cp class=\"\"\u003EFTL applies to a more general case where the datasets of participants are not aligned with each other in terms of samples or features. FTL involves finding the invariant between a resource-rich source domain and a resource-scarce target domain, and exploiting that invariant to transfer knowledge. In comparison with traditional transfer learning\u003Csup\u003E[\u003Ca href=\"#B6\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B6\"\u003E6\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E, FTL focuses on privacy-preserving issues and addresses distributed challenges. An example of FTL is shown in \u003Ca href=\"#fig5\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"fig5\"\u003EFigure 5\u003C\u002Fa\u003E. The training data required by FTL may include all data owned by multiply parties for comprehensive information extraction. In order to predict labels for unlabeled new samples, a prediction model is built using additional feature representations for mixed samples from participants A and B. More formally, FTL is applicable for the following scenarios:\u003C\u002Fp\u003E\u003Cp class=\"\"\u003E\u003Cdiv class=\"disp-formula\"\u003E\u003Clabel\u003E\u003C\u002Flabel\u003E\u003Ctex-math id=\"E1\"\u003E $$ \\mathcal{X}_i\\not=\\mathcal{X}_j, \\mathcal{Y}_i\\not=\\mathcal{Y}_j, \\mathcal{I}_i\\not=\\mathcal{I}_j, \\forall\\mathcal{D}_i,\\mathcal{D}_j,i\\not=j, $$ \u003C\u002Ftex-math\u003E\u003C\u002Fdiv\u003E\u003C\u002Fp\u003E\u003Cp class=\"\"\u003EIn datasets \u003Cinline-formula\u003E\u003Ctex-math id=\"M1\"\u003E$$ \\mathcal{D}_i $$\u003C\u002Ftex-math\u003E\u003C\u002Finline-formula\u003E and \u003Cinline-formula\u003E\u003Ctex-math id=\"M1\"\u003E$$ \\mathcal{D}_j $$\u003C\u002Ftex-math\u003E\u003C\u002Finline-formula\u003E, there is no duplication or similarity in terms of features, labels and samples. The objective of FTL is to generate as accurate a label prediction as possible for newly incoming samples or unlabeled samples already present. Another benefit of FTL is that it is capable of overcoming the absence of data or labels. For example, a bank and an e-commerce company in two different countries want to build a shared ML model for user risk assessment. In light of geographical restrictions, the user groups of these two organizations have limited overlap. Due to the fact that businesses are different, only a small number of data features are the same. It is important in this case to introduce FTL to solve the problem of small unilateral data and fewer sample labels, and improve the model performance.\u003C\u002Fp\u003E\u003C\u002Fdiv\u003E\u003C\u002Fdiv\u003E\u003Cdiv id=\"sec1-3\" class=\"article-Section\"\u003E\u003Ch2 \u003E3. Reinforcement learning\u003C\u002Fh2\u003E\u003Cdiv id=\"sec2-4\" class=\"article-Section\"\u003E\u003Ch3 \u003E3.1. Reinforcement learning definition and basics\u003C\u002Fh3\u003E\u003Cp class=\"\"\u003EGenerally, the field of ML includes supervised learning, unsupervised learning, RL, \u003Ci\u003Eetc\u003C\u002Fi\u003E\u003Csup\u003E[\u003Ca href=\"#B17\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B17\"\u003E17\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E. While supervised and unsupervised learning attempt to make the agent copy the data set, \u003Ci\u003Ei.e.\u003C\u002Fi\u003E, learning from the pre-provided samples, RL is to make the agent gradually stronger in the interaction with the environment, \u003Ci\u003Ei.e.\u003C\u002Fi\u003E, generating samples to learn by itself\u003Csup\u003E[\u003Ca href=\"#B18\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B18\"\u003E18\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E. RL is a very hot research direction in the field of ML in recent years, which has made great progress in many applications, such as IoT\u003Csup\u003E[\u003Ca href=\"#B19\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B19\"\u003E19\u003C\u002Fa\u003E-\u003Ca href=\"#B22\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B22\"\u003E22\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E, autonomous driving \u003Csup\u003E[\u003Ca href=\"#B23\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B23\"\u003E23\u003C\u002Fa\u003E,\u003Ca href=\"#B24\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B24\"\u003E24\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E, and game design\u003Csup\u003E[\u003Ca href=\"#B25\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B25\"\u003E25\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E. For example, the AlphaGo program developed by DeepMind is a good example to reflect the thinking of RL\u003Csup\u003E[\u003Ca href=\"#B26\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B26\"\u003E26\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E. The agent gradually accumulates the intelligent judgment on the sub-environment of each move by playing game by game with different opponents, so as to continuously improve its level.\u003C\u002Fp\u003E\u003Cp class=\"\"\u003EThe RL problem can be defined as a model of the agent-environment interaction, which is represented in \u003Ca href=\"#fig6\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"fig6\"\u003EFigure 6\u003C\u002Fa\u003E. The basic model of RL contains several important concepts, \u003Ci\u003Ei.e.\u003C\u002Fi\u003E,\u003C\u002Fp\u003E\u003Cdiv class=\"Figure-block\" id=\"fig6\"\u003E\u003Cdiv xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxhtml\" class=\"article-figure-image\"\u003E\u003Ca href=\"\u002Farticles\u002Fir.2021.02\u002Fimage\u002Ffig6\" class=\"Article-img\" alt=\"\" target=\"_blank\"\u003E\u003Cimg alt=\"Federated reinforcement learning: techniques, applications, and open challenges\" src=\"https:\u002F\u002Fimage.oaes.cc\u002F38ebf366-5fba-47ab-86ac-aa0a6cc124ef\u002F4325.fig.6.jpg\" class=\"\" title=\"\" alt=\"\" \u002F\u003E\u003C\u002Fa\u003E\u003C\u002Fdiv\u003E\u003Cdiv class=\"article-figure-note\"\u003E\u003Cp class=\"figure-note\"\u003E\u003C\u002Fp\u003E\u003Cp class=\"figure-note\"\u003EFigure 6. The agent-environment interaction of the basic reinforcement learning model.\u003C\u002Fp\u003E\u003C\u002Fdiv\u003E\u003C\u002Fdiv\u003E\u003Cul class=\"tipsDisc\"\u003E\u003Cli\u003E\u003Cp\u003EEnvironment and agent: Agents are a part of a RL model that exists in an external environment, such as the player in the environment of chess. Agents can improve their behavior by interacting with the environment. Specifically, they take a series of actions to the environment through a set of policies and expect to get a high payoff or achieve a certain goal.\u003C\u002Fp\u003E\u003C\u002Fli\u003E\u003Cli\u003E\u003Cp\u003ETime step: The whole process of RL can be discretized into different time steps. At every time step, the environment and the agent interact accordingly.\u003C\u002Fp\u003E\u003C\u002Fli\u003E\u003Cli\u003E\u003Cp\u003EState: The state reflects agents’ observations of the environment. When agents take action, the state will also change. In other words, the environment will move to the next state.\u003C\u002Fp\u003E\u003C\u002Fli\u003E\u003Cli\u003E\u003Cp\u003EActions: Agents can assess the environment, make decisions and finally take certain actions. These actions are imposed on the environment.\u003C\u002Fp\u003E\u003C\u002Fli\u003E\u003Cli\u003E\u003Cp\u003EReward: After receiving the action of the agent, the environment will give the agent the state of the current environment and the reward due to the previous action. Reward represents an assessment of the action taken by agents.\u003C\u002Fp\u003E\u003C\u002Fli\u003E\u003C\u002Ful\u003E\u003Cp class=\"\"\u003EMore formally, we assume that there are a series of time steps \u003Ci\u003Et =\u003C\u002Fi\u003E 0,1,2,\u003Ci\u003E…\u003C\u002Fi\u003E in a basic RL model. At a certain time step \u003Ci\u003Et\u003C\u002Fi\u003E, the agent will receive a state signal \u003Ci\u003ES\u003Csub\u003Et\u003C\u002Fsub\u003E\u003C\u002Fi\u003E of the environment. In each step, the agent will select one of the actions allowed by the state to take an action \u003Ci\u003EA\u003Csub\u003Et\u003C\u002Fsub\u003E.\u003C\u002Fi\u003E After the environment receives the action signal \u003Ci\u003EA\u003Csub\u003Et\u003C\u002Fsub\u003E\u003C\u002Fi\u003E, the environment will feed back to the agent the corresponding status signal \u003Ci\u003ES\u003Csub\u003Et+\u003C\u002Fsub\u003E\u003C\u002Fi\u003E\u003Csub\u003E1\u003C\u002Fsub\u003E at the next step \u003Ci\u003Et\u003C\u002Fi\u003E + 1 and the immediate reward \u003Ci\u003ER\u003Csub\u003Et+\u003C\u002Fsub\u003E\u003C\u002Fi\u003E\u003Csub\u003E1\u003C\u002Fsub\u003E. The set of all possible states, \u003Ci\u003Ei.e.\u003C\u002Fi\u003E, the state space, is denoted as \u003Cinline-formula\u003E\u003Ctex-math id=\"M1\"\u003E$$ \\mathcal{S} $$\u003C\u002Ftex-math\u003E\u003C\u002Finline-formula\u003E. Similarly, the action space is denoted as \u003Cinline-formula\u003E\u003Ctex-math id=\"M1\"\u003E$$ \\mathcal{A} $$\u003C\u002Ftex-math\u003E\u003C\u002Finline-formula\u003E. Since our goal is to maximize the total reward, we can quantify this total reward, usually referred to as return with\u003C\u002Fp\u003E\u003Cp class=\"\"\u003E\u003Cdiv class=\"disp-formula\"\u003E\u003Clabel\u003E\u003C\u002Flabel\u003E\u003Ctex-math id=\"E1\"\u003E $$ G_t=R_{t+1}+R_{t+2}+\\dots +R_T, $$ \u003C\u002Ftex-math\u003E\u003C\u002Fdiv\u003E\u003C\u002Fp\u003E\u003Cp class=\"\"\u003Ewhere \u003Ci\u003ET\u003C\u002Fi\u003E is the last step, \u003Ci\u003Ei.e.\u003C\u002Fi\u003E, \u003Ci\u003ES\u003Csub\u003ET\u003C\u002Fsub\u003E\u003C\u002Fi\u003E as the termination state. An episode is completed when the agent completes the termination action.\u003C\u002Fp\u003E\u003Cp class=\"\"\u003EIn addition to this type of episodic task, there is another type of task that does not have a termination state, in other words, it can in principle run forever. This type of task is called a continuing task. For continuous tasks, since there is no termination state, the above definition of return may be divergent. Thus, another way to calculate return is introduced, which is called discounted return, \u003Ci\u003Ei.e.\u003C\u002Fi\u003E,\u003C\u002Fp\u003E\u003Cp class=\"\"\u003E\u003Cdiv class=\"disp-formula\"\u003E\u003Clabel\u003E\u003C\u002Flabel\u003E\u003Ctex-math id=\"E1\"\u003E $$ G_t = R_{t+1}+\\gamma R_{t+2}+\\gamma ^{2} R_{t+3}+\\dots=\\sum_{k=0}^{\\infty}\\gamma ^{k}R_{t+k+1}, $$ \u003C\u002Ftex-math\u003E\u003C\u002Fdiv\u003E\u003C\u002Fp\u003E\u003Cp class=\"\"\u003Ewhere the discount factor \u003Ci\u003Eγ\u003C\u002Fi\u003E satisfies 0 ≤ \u003Ci\u003Eγ ≤\u003C\u002Fi\u003E 1. When \u003Ci\u003Eγ\u003C\u002Fi\u003E = 1, the agent can obtain the full value of all future steps, while when \u003Ci\u003Eγ\u003C\u002Fi\u003E = 0, the agent can only see the current reward. As \u003Ci\u003Eγ\u003C\u002Fi\u003E changes from 0 to 1, the agent will gradually become forward-looking, looking not only at current interests, but also for its own future.\u003C\u002Fp\u003E\u003Cp class=\"\"\u003EThe value function is the agent’s prediction of future rewards, which is used to evaluate the quality of the state and select actions. The difference between the value function and rewards is that the latter is defined as evaluating an immediate sense for interaction while the former is defined as the average return of actions over a long period of time. In other words, the value function of the current state \u003Ci\u003ES\u003Csub\u003Et\u003C\u002Fsub\u003E = s\u003C\u002Fi\u003E is its long-term expected return. There are two significant value functions in the field of RL, \u003Ci\u003Ei.e.\u003C\u002Fi\u003E, state value function \u003Ci\u003EV\u003Csub\u003Eπ\u003C\u002Fsub\u003E\u003C\u002Fi\u003E (\u003Ci\u003Es\u003C\u002Fi\u003E) and action value function \u003Ci\u003EQ\u003Csub\u003Eπ\u003C\u002Fsub\u003E\u003C\u002Fi\u003E (\u003Ci\u003Es, a\u003C\u002Fi\u003E). The function \u003Ci\u003EV\u003Csub\u003Eπ\u003C\u002Fsub\u003E\u003C\u002Fi\u003E (\u003Ci\u003Es\u003C\u002Fi\u003E) represents the expected return obtained if the agent continues to follow strategy \u003Ci\u003Eπ\u003C\u002Fi\u003E all the time after reaching a certain state \u003Ci\u003ES\u003Csub\u003Et\u003C\u002Fsub\u003E\u003C\u002Fi\u003E, while the function \u003Ci\u003EQ\u003Csub\u003Eπ\u003C\u002Fsub\u003E\u003C\u002Fi\u003E (\u003Ci\u003Es, a\u003C\u002Fi\u003E) represents the expected return obtained if action \u003Ci\u003EA\u003Csub\u003Et\u003C\u002Fsub\u003E = a\u003C\u002Fi\u003E is taken after reaching the current state \u003Ci\u003ES\u003Csub\u003Et\u003C\u002Fsub\u003E = s\u003C\u002Fi\u003E and the following actions are taken according to the strategy \u003Ci\u003Eπ\u003C\u002Fi\u003E. The two functions are specifically defined as follows, \u003Ci\u003Ei.e.\u003C\u002Fi\u003E,\u003C\u002Fp\u003E\u003Cp class=\"\"\u003E\u003Cdiv class=\"disp-formula\"\u003E\u003Clabel\u003E\u003C\u002Flabel\u003E\u003Ctex-math id=\"E1\"\u003E $$ V_\\pi (s)=\\mathbb{E}_\\pi[G_t|S_t=s],\\forall_s\\in \\mathcal{S} $$ \u003C\u002Ftex-math\u003E\u003C\u002Fdiv\u003E\u003C\u002Fp\u003E\u003Cp class=\"\"\u003E\u003Cdiv class=\"disp-formula\"\u003E\u003Clabel\u003E\u003C\u002Flabel\u003E\u003Ctex-math id=\"E1\"\u003E $$ Q_\\pi(s,a)=\\mathbb{E}_\\pi[G_t|S_t=s,A_t=a],\\forall_s\\in \\mathcal{S},a\\in\\mathcal{A} $$ \u003C\u002Ftex-math\u003E\u003C\u002Fdiv\u003E\u003C\u002Fp\u003E\u003Cp class=\"\"\u003EThe results of RL are action decisions, called as the policy. The policy gives agents the action \u003Ci\u003Ea\u003C\u002Fi\u003E that should be taken for each state \u003Ci\u003Es.\u003C\u002Fi\u003E It is noted as \u003Ci\u003E=\u003C\u002Fi\u003E\u003Ci\u003Eπ\u003C\u002Fi\u003E (\u003Ci\u003EA\u003Csub\u003Et\u003C\u002Fsub\u003E = a|S\u003Csub\u003Et\u003C\u002Fsub\u003E = s\u003C\u002Fi\u003E), which represents the probability of taking action \u003Ci\u003EA\u003Csub\u003Et\u003C\u002Fsub\u003E = a\u003C\u002Fi\u003E in state \u003Ci\u003ES\u003Csub\u003Et\u003C\u002Fsub\u003E = s.\u003C\u002Fi\u003E The goal of RL is to learn the optimal policy that can maximize the value function by interacting with the environment. Our purpose is not to get the maximum reward after a single action in the short term, but to get more reward in the long term. Therefore, the policy can be figured out as,\u003C\u002Fp\u003E\u003Cp class=\"\"\u003E\u003Cdiv class=\"disp-formula\"\u003E\u003Clabel\u003E\u003C\u002Flabel\u003E\u003Ctex-math id=\"E1\"\u003E $$ \\pi^*=\\underset{\\pi}{arg \\max }V_\\pi(s),\\forall_s\\in\\mathcal{S}. $$ \u003C\u002Ftex-math\u003E\u003C\u002Fdiv\u003E\u003C\u002Fp\u003E\u003C\u002Fdiv\u003E\u003Cdiv id=\"sec2-5\" class=\"article-Section\"\u003E\u003Ch3 \u003E3.2. Categories of reinforcement learning\u003C\u002Fh3\u003E\u003Cp class=\"\"\u003EIn RL, there are several categories of algorithms. One is value-based and the other is policy-based. In addition, there is also an actor-critic algorithm that can be obtained by combining the two, as shown in \u003Ca href=\"#fig7\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"fig7\"\u003EFigure 7\u003C\u002Fa\u003E.\u003C\u002Fp\u003E\u003Cdiv class=\"Figure-block\" id=\"fig7\"\u003E\u003Cdiv xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxhtml\" class=\"article-figure-image\"\u003E\u003Ca href=\"\u002Farticles\u002Fir.2021.02\u002Fimage\u002Ffig7\" class=\"Article-img\" alt=\"\" target=\"_blank\"\u003E\u003Cimg alt=\"Federated reinforcement learning: techniques, applications, and open challenges\" src=\"https:\u002F\u002Fimage.oaes.cc\u002F38ebf366-5fba-47ab-86ac-aa0a6cc124ef\u002F4325.fig.7.jpg\" class=\"\" title=\"\" alt=\"\" \u002F\u003E\u003C\u002Fa\u003E\u003C\u002Fdiv\u003E\u003Cdiv class=\"article-figure-note\"\u003E\u003Cp class=\"figure-note\"\u003E\u003C\u002Fp\u003E\u003Cp class=\"figure-note\"\u003EFigure 7. The categories and representative algorithms of reinforcement learning.\u003C\u002Fp\u003E\u003C\u002Fdiv\u003E\u003C\u002Fdiv\u003E\u003Cdiv id=\"sec3-1\" class=\"article-Section\"\u003E\u003Ch4 \u003E3.2.1. Value-based methods\u003C\u002Fh4\u003E\u003Cp class=\"\"\u003ERecursively expand the formulas of the action value function, the corresponding Bellman equation is obtained, which describes the recursive relationship between the value function of the current state and subsequent state. The recursive expansion formula of the action value function \u003Ci\u003EQ\u003Csub\u003Eπ\u003C\u002Fsub\u003E\u003C\u002Fi\u003E (\u003Ci\u003Es, a\u003C\u002Fi\u003E) is\u003C\u002Fp\u003E\u003Cp class=\"\"\u003E\u003Cdiv class=\"disp-formula\"\u003E\u003Clabel\u003E\u003C\u002Flabel\u003E\u003Ctex-math id=\"E1\"\u003E $$ Q_{\\pi}(s, a)=\\sum_{s^{\\prime}, r} p\\left(s^{\\prime}, r \\mid s, a\\right)\\left[r+\\gamma \\sum_{a^{\\prime}} \\pi\\left(a^{\\prime} \\mid s^{\\prime}\\right) Q_{\\pi}\\left(s^{\\prime}, a^{\\prime}\\right)\\right], $$ \u003C\u002Ftex-math\u003E\u003C\u002Fdiv\u003E\u003C\u002Fp\u003E\u003Cp class=\"\"\u003Ewhere the function \u003Ci\u003Ep\u003C\u002Fi\u003E (\u003Ci\u003Es\u003C\u002Fi\u003Eʹ\u003Ci\u003E,r|s,a\u003C\u002Fi\u003E) \u003Ci\u003E= Pr\u003C\u002Fi\u003E {\u003Ci\u003ES\u003Csub\u003Et\u003C\u002Fsub\u003E = s\u003C\u002Fi\u003Eʹ, \u003Ci\u003ER\u003Csub\u003Et\u003C\u002Fsub\u003E = r|S\u003Csub\u003Et-\u003C\u002Fsub\u003E\u003C\u002Fi\u003E\u003Csub\u003E1\u003C\u002Fsub\u003E\u003Ci\u003E= s, A\u003Csub\u003Et-\u003C\u002Fsub\u003E\u003C\u002Fi\u003E\u003Csub\u003E1\u003C\u002Fsub\u003E\u003Ci\u003E= a\u003C\u002Fi\u003E} defines the trajectory probability to characterize the environment’s dynamics. \u003Ci\u003ER\u003Csub\u003Et\u003C\u002Fsub\u003E = r\u003C\u002Fi\u003E indicates the reward obtained by the agent taking action \u003Ci\u003EA\u003Csub\u003Et\u003C\u002Fsub\u003E\u003C\u002Fi\u003E\u003Csub\u003E-1\u003C\u002Fsub\u003E = a in state \u003Ci\u003ES\u003Csub\u003Et-\u003C\u002Fsub\u003E\u003C\u002Fi\u003E\u003Csub\u003E1\u003C\u002Fsub\u003E = \u003Ci\u003Es.\u003C\u002Fi\u003E Besides, \u003Ci\u003ES\u003Csub\u003Et\u003C\u002Fsub\u003E = s\u003C\u002Fi\u003Eʹ and \u003Ci\u003EA\u003Csub\u003Et\u003C\u002Fsub\u003E = a\u003C\u002Fi\u003Eʹ respectively represent the state and the action taken by the agent at the next moment \u003Ci\u003Et.\u003C\u002Fi\u003E\u003C\u002Fp\u003E\u003Cp class=\"\"\u003EIn the value-based algorithms, the above value function \u003Ci\u003EQπ\u003C\u002Fi\u003E (\u003Ci\u003Es, a\u003C\u002Fi\u003E) is calculated iteratively, and the strategy is then improved based on this value function. If the value of every action in a given state is known, the agent can select an action to perform. In this way, if the optimal \u003Ci\u003EQπ\u003C\u002Fi\u003E (\u003Ci\u003Es, a = a*\u003C\u002Fi\u003E) can be figured out, the best action \u003Ci\u003Ea\u003C\u002Fi\u003E* will be found. There are many classical value-based algorithms, including Q-learning\u003Csup\u003E[\u003Ca href=\"#B27\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B27\"\u003E27\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E, state–action–reward-state–action (SARSA)\u003Csup\u003E[\u003Ca href=\"#B28\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B28\"\u003E28\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E, \u003Ci\u003Eetc.\u003C\u002Fi\u003E\u003C\u002Fp\u003E\u003Cp class=\"\"\u003EQ-learning is a typical widely-used value-based RL algorithm. It is also a model-free algorithm, which means that it does not need to know the model of the environment but directly estimates the Q value of each executed action in each encountered state through interacting with the environment\u003Csup\u003E[\u003Ca href=\"#B27\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B27\"\u003E27\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E. Then, the optimal strategy is formulated by selecting the action with the highest Q value in each state. This strategy maximizes the expected return for all subsequent actions from the current state. The most important part of Q-learning is the update of Q value. It uses a table, \u003Ci\u003Ei.e.\u003C\u002Fi\u003E, Q-table, to store all Q value functions. Q-table uses state as row and action as column. Each (\u003Ci\u003Es, a\u003C\u002Fi\u003E) pair corresponds to a Q value, \u003Ci\u003Ei.e.\u003C\u002Fi\u003E\u003Ci\u003EQ\u003C\u002Fi\u003E(\u003Ci\u003Es, a\u003C\u002Fi\u003E), in the Q-table, which is updated as follows,\u003C\u002Fp\u003E\u003Cp class=\"\"\u003E\u003Cdiv class=\"disp-formula\"\u003E\u003Clabel\u003E\u003C\u002Flabel\u003E\u003Ctex-math id=\"E1\"\u003E $$ Q(s,a)\\longleftarrow Q(s,a)+\\alpha [r+\\gamma\\underset{a'}{\\max} Q(s',a')-Q(s,a)] $$ \u003C\u002Ftex-math\u003E\u003C\u002Fdiv\u003E\u003C\u002Fp\u003E\u003Cp class=\"\"\u003Ewhere \u003Ci\u003Er\u003C\u002Fi\u003E is the reward given by taking action \u003Ci\u003Ea\u003C\u002Fi\u003E under state \u003Ci\u003Es\u003C\u002Fi\u003E at the current time step. \u003Ci\u003Es\u003C\u002Fi\u003Eʹ and \u003Ci\u003Ea\u003C\u002Fi\u003Eʹ indicate the state and the action taken by the agent at the next time step respectively. \u003Ci\u003Eα\u003C\u002Fi\u003E is the learning rate to determine how much error needs to be learned, and \u003Ci\u003Eγ\u003C\u002Fi\u003E is the attenuation of future reward. If the agent continuously accesses all state-action pairs, the Q-learning algorithm will converge to the optimal Q function. Q-learning is suitable for simple problems, \u003Ci\u003Ei.e.\u003C\u002Fi\u003E, small state space, or a small number of actions. It has high data utilization and stable convergence.\u003C\u002Fp\u003E\u003C\u002Fdiv\u003E\u003Cdiv id=\"sec3-2\" class=\"article-Section\"\u003E\u003Ch4 \u003E3.2.2. Policy-based methods\u003C\u002Fh4\u003E\u003Cp class=\"\"\u003EThe above value-based method is an indirect approach to policy selection, and has trouble handling an infinite number of actions. Therefore, we want to be able to model the policy directly. Different from the value-based method, the policy-based algorithm does not need to estimate the value function, but directly fits the policy function, updates the policy parameters through training, and directly generates the best policy. In policy-based methods, we input a state and output the corresponding action directly, rather than the value \u003Ci\u003EV\u003C\u002Fi\u003E (\u003Ci\u003Es\u003C\u002Fi\u003E) or Q value \u003Ci\u003EQ\u003C\u002Fi\u003E (\u003Ci\u003Es, a\u003C\u002Fi\u003E) of the state. One of the most representative algorithms is strategy gradient, which is also the most basic policy-based algorithm.\u003C\u002Fp\u003E\u003Cp class=\"\"\u003EPolicy gradient chooses to optimize the policy directly and update the parameters of the policy network by calculating the gradient of expected reward\u003Csup\u003E[\u003Ca href=\"#B29\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B29\"\u003E29\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E. Therefore, its objective function \u003Ci\u003EJ\u003C\u002Fi\u003E (\u003Ci\u003Eθ\u003C\u002Fi\u003E) is directly designed as expected cumulative rewards, \u003Ci\u003Ei.e.\u003C\u002Fi\u003E,\u003C\u002Fp\u003E\u003Cp class=\"\"\u003E\u003Cdiv class=\"disp-formula\"\u003E\u003Clabel\u003E\u003C\u002Flabel\u003E\u003Ctex-math id=\"E1\"\u003E $$ J(\\theta )=\\mathbb{E}_{\\tau \\_\\theta(\\tau)}[r(\\tau)]=\\int_{\\tau\\;\\pi(\\tau)}^{}r(\\tau)\\pi_\\theta(\\tau)d\\tau. $$ \u003C\u002Ftex-math\u003E\u003C\u002Fdiv\u003E\u003C\u002Fp\u003E\u003Cp class=\"\"\u003EBy taking the derivative of \u003Ci\u003EJ\u003C\u002Fi\u003E (\u003Ci\u003Eθ\u003C\u002Fi\u003E), we get\u003C\u002Fp\u003E\u003Cp class=\"\"\u003E\u003Cdiv class=\"disp-formula\"\u003E\u003Clabel\u003E\u003C\u002Flabel\u003E\u003Ctex-math id=\"E1\"\u003E $$ \\nabla _\\theta J(\\theta)=\\mathbb{E}_{\\tau\\;\\pi\\theta(\\tau)}[\\sum_{t=1}^{T}\\nabla _\\theta\\log_{}{\\pi_\\theta}(A_t|S_t)\\sum_{t=1}^{T}r(S_t,A_t)]. $$ \u003C\u002Ftex-math\u003E\u003C\u002Fdiv\u003E\u003C\u002Fp\u003E\u003Cp class=\"\"\u003EThe above formula consists of two parts. One is \u003Cinline-formula\u003E\u003Ctex-math id=\"M1\"\u003E$$ \\sum_{t=1}^{T} \\nabla _\\theta \\log_{}{\\pi_\\theta}(A_t|S_t) $$\u003C\u002Ftex-math\u003E\u003C\u002Finline-formula\u003E which denotes the probability of the gradient in the current trace. The other is \u003Cinline-formula\u003E\u003Ctex-math id=\"M1\"\u003E$$ \\sum_{t=1}^{T}r(S_t,A_t) $$\u003C\u002Ftex-math\u003E\u003C\u002Finline-formula\u003E which represents the return of the current trace. Since the return is total rewards and can only be obtained after one episode, the policy gradient algorithm can only be updated for each episode, not for each time step.\u003C\u002Fp\u003E\u003Cp class=\"\"\u003EThe expected value can be expressed in a variety of ways, corresponding to different ways of calculating the loss function. The advantage of the strategy gradient algorithm is that it can be applied in the continuous action space. In addition, the change of the action probability is smoother, and the convergence is better guaranteed.\u003C\u002Fp\u003E\u003Cp class=\"\"\u003EREINFORCE algorithm is a classic policy gradient algorithm\u003Csup\u003E[\u003Ca href=\"#B30\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B30\"\u003E30\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E. Since the expected value of the cumulative reward cannot be calculated directly, the Monte Carlo method is applied to approximate the average value of multiple samples. REINFORCE updates the unbiased estimate of the gradient by using Monte Carlo sampling. Each sampling generates a trajectory, which runs iteratively. After obtaining a large number of trajectories, the cumulative reward can be calculated by using certain transformations and approximations as the loss function for gradient update. However, the variance of this algorithm is large since it needs to interact with the environment until the terminate state. The reward for each interaction is a random variable, so each variance will add up when the variance is calculated. In particular, the REINFORCE algorithm has three steps:\u003C\u002Fp\u003E\u003Cul class=\"tipsDisc\"\u003E\u003Cli\u003E\u003Cp\u003EStep 1: sample \u003Ci\u003Eτ\u003Csub\u003Ei\u003C\u002Fsub\u003E\u003C\u002Fi\u003E from \u003Ci\u003Eπ\u003Csub\u003Eθ\u003C\u002Fsub\u003E\u003C\u002Fi\u003E (\u003Ci\u003EA\u003Csub\u003Et\u003C\u002Fsub\u003E|S\u003Csub\u003Et\u003C\u002Fsub\u003E\u003C\u002Fi\u003E)\u003C\u002Fp\u003E\u003C\u002Fli\u003E\u003Cli\u003E\u003Cp\u003EStep 2: \u003Cinline-formula\u003E\u003Ctex-math id=\"M1\"\u003E$$ \\nabla _\\theta J(\\theta)\\approx \\sum_{i}^{}[\\sum_{t=1}^{T}\\nabla _\\theta\\log_{}{\\pi_\\theta}(A_{t}^{i}|S_{t}^{i})\\sum_{t=1}^{T}r(S_{t}^{i},A_{t}^{i})] $$\u003C\u002Ftex-math\u003E\u003C\u002Finline-formula\u003E\u003C\u002Fp\u003E\u003C\u002Fli\u003E\u003Cli\u003E\u003Cp\u003EStep 3: \u003Ci\u003Etheta\u003C\u002Fi\u003E ← \u003Ci\u003Eθ\u003C\u002Fi\u003E + \u003Ci\u003Eα\u003C\u002Fi\u003E∇\u003Ci\u003E\u003Csub\u003Eθ\u003C\u002Fsub\u003EJ\u003C\u002Fi\u003E (\u003Ci\u003Eθ\u003C\u002Fi\u003E)\u003C\u002Fp\u003E\u003C\u002Fli\u003E\u003C\u002Ful\u003E\u003Cp class=\"\"\u003EThe two algorithms, value-based and policy-based methods, both have their own characteristics and disadvantages. Firstly, the disadvantages of the value-based methods are that the output of the action cannot be obtained directly, and it is difficult to extend to the continuous action space. The value-based methods can also lead to the problem of high bias, \u003Ci\u003Ei.e.\u003C\u002Fi\u003E, it is difficult to eliminate the error between the estimated value function and the actual value function. For the policy-based methods, a large number of trajectories must be sampled, and the difference between each trajectory may be huge. As a result, high variance and large gradient noise are introduced. It leads to the instability of training and the difficulty of policy convergence.\u003C\u002Fp\u003E\u003C\u002Fdiv\u003E\u003Cdiv id=\"sec3-3\" class=\"article-Section\"\u003E\u003Ch4 \u003E3.2.3. Actor-critic methods\u003C\u002Fh4\u003E\u003Cp class=\"\"\u003EThe actor-critic architecture combines the characteristics of the value-based and policy-based algorithms, and to a certain extent solves their respective weaknesses, as well as the contradictions between high variance and high bias. The constructed agent can not only directly output policies, but also evaluate the performance of the current policies through the value function. Specifically, the actor-critic architecture consists of an actor which is responsible for generating the policy and a critic to evaluate this policy. When the actor is performing, the critic should evaluate its performance, both of which are constantly being updated\u003Csup\u003E[\u003Ca href=\"#B31\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B31\"\u003E31\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E. This complementary training is generally more effective than a policy-based method or value-based method.\u003C\u002Fp\u003E\u003Cp class=\"\"\u003EIn specific, the input of actor is state \u003Ci\u003ES\u003Csub\u003Et\u003C\u002Fsub\u003E\u003C\u002Fi\u003E, and the output is action \u003Ci\u003EA\u003Csub\u003Et\u003C\u002Fsub\u003E.\u003C\u002Fi\u003E The role of actor is to approximate the policy model \u003Ci\u003Eπ\u003C\u002Fi\u003E\u003Csub\u003E\u003Ci\u003Eθ\u003C\u002Fi\u003E\u003C\u002Fsub\u003E (\u003Ci\u003EA\u003Csub\u003Et\u003C\u002Fsub\u003E|S\u003Csub\u003Et\u003C\u002Fsub\u003E\u003C\u002Fi\u003E). Critic uses the value function \u003Ci\u003EQ\u003C\u002Fi\u003E as the output to evaluate the value of the policy, and this Q value \u003Ci\u003EQ\u003C\u002Fi\u003E (\u003Ci\u003ES\u003Csub\u003Et\u003C\u002Fsub\u003E, A\u003Csub\u003Et\u003C\u002Fsub\u003E\u003C\u002Fi\u003E) can be directly applied to calculate the loss function of actor. The gradient of the expected revenue function \u003Ci\u003EJ\u003C\u002Fi\u003E (\u003Ci\u003Eθ\u003C\u002Fi\u003E) in the action-critic framework is developed from the basic policy gradient algorithm. The policy gradient algorithm can only implement the update of each episode, and it is difficult to accurately feedback the reward. Therefore, it has poor training efficiency. Instead, the actor-critic algorithm replaces \u003Cinline-formula\u003E\u003Ctex-math id=\"M1\"\u003E$$ \\sum_{t=1}^{T}r(S_{t}^{i},A_{t}^{i}) $$\u003C\u002Ftex-math\u003E\u003C\u002Finline-formula\u003E with \u003Ci\u003EQ\u003C\u002Fi\u003E (\u003Ci\u003ES\u003Csub\u003Et\u003C\u002Fsub\u003E,A\u003Csub\u003Et\u003C\u002Fsub\u003E\u003C\u002Fi\u003E) to evaluate the expected returns of state-action tuple {\u003Ci\u003ES\u003Csub\u003Et\u003C\u002Fsub\u003E,A\u003Csub\u003Et\u003C\u002Fsub\u003E\u003C\u002Fi\u003E} in the current time step \u003Ci\u003Et.\u003C\u002Fi\u003E The gradient of \u003Ci\u003EJ\u003C\u002Fi\u003E (\u003Ci\u003Eθ\u003C\u002Fi\u003E) can be expressed as\u003C\u002Fp\u003E\u003Cp class=\"\"\u003E\u003Cdiv class=\"disp-formula\"\u003E\u003Clabel\u003E\u003C\u002Flabel\u003E\u003Ctex-math id=\"E1\"\u003E $$ \\nabla _\\theta J(\\theta)=\\mathbb{E}_{\\tau\\;\\pi\\theta(\\tau)}[\\sum_{t=1}^{T}\\nabla _\\theta\\log_{}{\\pi_\\theta}(A_{t}|S_{t})Q(S_{t},A_{t})]. $$ \u003C\u002Ftex-math\u003E\u003C\u002Fdiv\u003E\u003C\u002Fp\u003E\u003C\u002Fdiv\u003E\u003C\u002Fdiv\u003E\u003Cdiv id=\"sec2-6\" class=\"article-Section\"\u003E\u003Ch3 \u003E3.3. Deep reinforcement learning\u003C\u002Fh3\u003E\u003Cp class=\"\"\u003EWith the continuous expansion of the application of deep learning, its wave also swept into the RL field, resulting in deep reinforcement learning (DRL), \u003Ci\u003Ei.e.\u003C\u002Fi\u003E, using a multi-layer deep neural network to approximate value function or policy function in the RL algorithm \u003Csup\u003E[\u003Ca href=\"#B32\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B32\"\u003E32\u003C\u002Fa\u003E,\u003Ca href=\"#B33\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B33\"\u003E33\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E. DRL mainly solves the curse-of-dimensionality problem in real-world RL applications with large or continuous state and\u002For action space, where the traditional tabular RL algorithms cannot store and extract a large amount of feature information \u003Csup\u003E[\u003Ca href=\"#B17\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B17\"\u003E17\u003C\u002Fa\u003E,\u003Ca href=\"#B34\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B34\"\u003E34\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E.\u003C\u002Fp\u003E\u003Cp class=\"\"\u003EQ-learning, as a very classical algorithm in RL, is a good example to understand the purpose of DRL. The big issue with Q-learning falls into the tabular method, which means that when state and action spaces are very large, it cannot build a very large Q table to store a large number of Q values\u003Csup\u003E[\u003Ca href=\"#B35\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B35\"\u003E35\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E. Besides, it counts and iterates Q values based on past states. Therefore, on the one hand, the applicable state and action space of Q-learning is very small. On the other hand, if a state never appears, Q-learning cannot deal with it\u003Csup\u003E[\u003Ca href=\"#B36\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B36\"\u003E36\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E. In other words, Q-learning has no prediction ability and generalization ability at this point.\u003C\u002Fp\u003E\u003Cp class=\"\"\u003EIn order to make Q-learning with prediction ability, considering that neural network can extract feature information well, deep Q network (DQN) is proposed by applying deep neural network to simulate Q value function. In specific, DQN is the continuation of Q-learning algorithm in continuous or large state space to approximate Q value function by replacing Q table with neural networks\u003Csup\u003E[\u003Ca href=\"#B37\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B37\"\u003E37\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E.\u003C\u002Fp\u003E\u003Cp class=\"\"\u003EIn addition to the value-based DRL algorithm such as DQN, we summarize a variety of classical DRL algorithms according to algorithm types by referring to some DRL related surveys\u003Csup\u003E[\u003Ca href=\"#B38\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B38\"\u003E38\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E in \u003Ca href=\"#t1\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"t1\"\u003ETable 1\u003C\u002Fa\u003E, including not only the policy-based and actor-critic DRL algorithms, but also the advanced DRL algorithms of partially observable markov decision process (POMDP) and multi-agents.\u003C\u002Fp\u003E\u003Cdiv id=\"t1\" class=\"Figure-block\"\u003E\u003Cdiv class=\"table-note\"\u003E\u003Cspan class=\"\"\u003ETable 1\u003C\u002Fspan\u003E\u003Cp class=\"\"\u003ETaxonomy of representative algorithms for DRL\u003C\u002Fp\u003E\u003C\u002Fdiv\u003E\u003Cdiv class=\"table-responsive article-table\"\u003E\u003Ctable class=\"a-table\"\u003E\u003Cthead\u003E\u003Ctr\u003E\u003Cth align=\"center\" valign=\"middle\" colspan=\"2\"\u003ETypes\u003C\u002Fth\u003E\u003Cth align=\"left\" valign=\"top\"\u003ERepresentative algorithms\u003C\u002Fth\u003E\u003C\u002Ftr\u003E\u003C\u002Fthead\u003E\u003Ctbody\u003E\u003Ctr\u003E\u003Ctd align=\"center\" valign=\"middle\" colspan=\"2\"\u003EValue-based\u003C\u002Ftd\u003E\u003Ctd align=\"left\" valign=\"top\"\u003EDeep Q-Network (DQN)\u003Csup\u003E[\u003Ca href=\"#B37\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B37\"\u003E37\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E, Double Deep Q-Network (DDQN)\u003Csup\u003E[\u003Ca href=\"#B39\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B39\"\u003E39\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E,\r\n\u003Cbr \u002F\u003EDDQN with proportional prioritization\u003Csup\u003E[\u003Ca href=\"#B40\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B40\"\u003E40\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E\u003C\u002Ftd\u003E\u003C\u002Ftr\u003E\u003Ctr\u003E\u003Ctd align=\"center\" valign=\"middle\" colspan=\"2\"\u003EPolicy-based\u003C\u002Ftd\u003E\u003Ctd align=\"left\" valign=\"top\"\u003EREINFORCE\u003Csup\u003E[\u003Ca href=\"#B30\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B30\"\u003E30\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E, Q-prop\u003Csup\u003E[\u003Ca href=\"#B41\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B41\"\u003E41\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E\u003C\u002Ftd\u003E\u003C\u002Ftr\u003E\u003Ctr\u003E\u003Ctd align=\"center\" valign=\"middle\" colspan=\"2\"\u003EActor-critic\u003C\u002Ftd\u003E\u003Ctd align=\"left\" valign=\"top\"\u003ESoft Actor-Critic (SAC)\u003Csup\u003E[\u003Ca href=\"#B42\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B42\"\u003E42\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E, Asynchronous Advantage Actor Critic (A3C)\u003Csup\u003E[\u003Ca href=\"#B43\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B43\"\u003E43\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E,\r\n\u003Cbr \u002F\u003EDeep Deterministic Policy Gradient (DDPG)\u003Csup\u003E[\u003Ca href=\"#B44\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B44\"\u003E44\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E,\r\n\u003Cbr \u002F\u003EDistributed Distributional Deep Deterministic Policy Radients (D4PG)\u003Csup\u003E[\u003Ca href=\"#B45\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B45\"\u003E45\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E,\r\n\u003Cbr \u002F\u003ETwin Delayed Deep Deterministic (TD3)\u003Csup\u003E[\u003Ca href=\"#B46\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B46\"\u003E46\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E,\r\n\u003Cbr \u002F\u003ETrust Region Policy Optimization (TRPO)\u003Csup\u003E[\u003Ca href=\"#B47\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B47\"\u003E47\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E,\r\n\u003Cbr \u002F\u003EProximal Policy Optimization (PPO)\u003Csup\u003E[\u003Ca href=\"#B48\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B48\"\u003E48\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E\u003C\u002Ftd\u003E\u003C\u002Ftr\u003E\u003Ctr\u003E\u003Ctd align=\"center\" valign=\"middle\" rowspan=\"2\"\u003EAdvanced\u003C\u002Ftd\u003E\u003Ctd align=\"center\" valign=\"middle\"\u003EPOMDP\u003C\u002Ftd\u003E\u003Ctd align=\"left\" valign=\"top\"\u003EDeep Belief Q-Network (DBQN)\u003Csup\u003E[\u003Ca href=\"#B49\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B49\"\u003E49\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E,\r\n\u003Cbr \u002F\u003EDeep Recurrent Q-Network (DRQN)\u003Csup\u003E[\u003Ca href=\"#B50\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B50\"\u003E50\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E,\r\n\u003Cbr \u002F\u003ERecurrent Deterministic Policy Gradients (RDPG)\u003Csup\u003E[\u003Ca href=\"#B51\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B51\"\u003E51\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E\u003C\u002Ftd\u003E\u003C\u002Ftr\u003E\u003Ctr\u003E\u003Ctd align=\"center\" valign=\"middle\"\u003EMulti-agents\u003C\u002Ftd\u003E\u003Ctd align=\"left\" valign=\"top\"\u003EMulti-Agent Importance Sampling (MAIS)\u003Csup\u003E[\u003Ca href=\"#B52\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B52\"\u003E52\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E,\r\n\u003Cbr \u002F\u003ECoordinated Multi-agent DQN\u003Csup\u003E[\u003Ca href=\"#B53\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B53\"\u003E53\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E,\r\n\u003Cbr \u002F\u003EMulti-agent Fingerprints (MAF)\u003Csup\u003E[\u003Ca href=\"#B52\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B52\"\u003E52\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E,\r\n\u003Cbr \u002F\u003ECounterfactual Multiagent Policy Gradient (COMAPG)\u003Csup\u003E[\u003Ca href=\"#B54\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B54\"\u003E54\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E,\r\n\u003Cbr \u002F\u003EMulti-Agent DDPG (MADDPG)\u003Csup\u003E[\u003Ca href=\"#B55\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B55\"\u003E55\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E\u003C\u002Ftd\u003E\u003C\u002Ftr\u003E\u003C\u002Ftbody\u003E\u003C\u002Ftable\u003E\u003C\u002Fdiv\u003E\u003Cdiv class=\"table_footer\"\u003E\u003C\u002Fdiv\u003E\u003C\u002Fdiv\u003E\u003C\u002Fdiv\u003E\u003C\u002Fdiv\u003E\u003Cdiv id=\"sec1-4\" class=\"article-Section\"\u003E\u003Ch2 \u003E4. Federated reinforcement learning\u003C\u002Fh2\u003E\u003Cp class=\"\"\u003EIn this section, the detailed background and categories of FRL will be discussed.\u003C\u002Fp\u003E\u003Cdiv id=\"sec2-7\" class=\"article-Section\"\u003E\u003Ch3 \u003E4.1. Federated reinforcement learning background\u003C\u002Fh3\u003E\u003Cp class=\"\"\u003EDespite the excellent performance that RL and DRL have achieved in many areas, they still face several important technical and non-technical challenges in solving real-world problems. The successful application of FL in supervised learning tasks arouses interest in exploiting similar ideas in RL, \u003Ci\u003Ei.e.\u003C\u002Fi\u003E, FRL. On the other hand, although FL is useful in some specific situations, it fails to deal with cooperative control and optimal decision-making in dynamic environments\u003Csup\u003E[\u003Ca href=\"#B10\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B10\"\u003E10\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E. FRL not only provides the experience for agents to learn to make good decisions in an unknown environment, but also ensures that the privately collected data during the agent’s exploration does not have to be shared with others. A forward-looking and interesting research direction is how to conduct RL under the premise of protecting privacy. Therefore, it is proposed to use FL framework to enhance the security of RL and define FRL as a security-enhanced distributed RL framework to accelerate the learning process, protect agent privacy and handle not independent and identically distributed (Non-IID) data\u003Csup\u003E[\u003Ca href=\"#B8\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B8\"\u003E8\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E. Apart from improving the security and privacy of RL, we believe that FRL has a wider and larger potential in helping RL to achieve better performance in various aspects, which will be elaborated in the following subsections.\u003C\u002Fp\u003E\u003Cp class=\"\"\u003EIn order to facilitate understanding and maintain consistency with FL, FRL is divided into two categories depending on environment partition\u003Csup\u003E[\u003Ca href=\"#B7\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B7\"\u003E7\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E, \u003Ci\u003Ei.e.\u003C\u002Fi\u003E, HFRL and VFRL. \u003Ca href=\"#fig8\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"fig8\"\u003EFigure 8\u003C\u002Fa\u003E gives the comparison between HFRL and VFRL. In HFRL, the environment that each agent interacts with is independent of the others, while the state space and action space of different agents are aligned to solve similar problems. The action of each agent only affects its own environment and results in corresponding rewards. As an agent can hardly explore all states of its environment, multiple agents interacting with their own copy of the environment can accelerate training and improve model performance by sharing experience. Therefore, horizontal agents use server-client model or peer-to-peer model to transmit and exchange the gradients or parameters of their policy models (actors) and\u002For value function models (critics). In VFRL, multiple agents interact with the same global environment, but each can only observe limited state information in the scope of its view. Agents can perform different actions depending on the observed environment and receive local reward or even no reward. Based on the actual scenario, there may be some observation overlap between agents. In addition, all agents’ actions affect the global environment dynamics and total rewards. As opposed to the horizontal arrangement of independent environments in HFRL, the vertical arrangement of observations in VFRL poses a more complex problem and is less studied in the existing literature.\u003C\u002Fp\u003E\u003Cdiv class=\"Figure-block\" id=\"fig8\"\u003E\u003Cdiv xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxhtml\" class=\"article-figure-image\"\u003E\u003Ca href=\"\u002Farticles\u002Fir.2021.02\u002Fimage\u002Ffig8\" class=\"Article-img\" alt=\"\" target=\"_blank\"\u003E\u003Cimg alt=\"Federated reinforcement learning: techniques, applications, and open challenges\" src=\"https:\u002F\u002Fimage.oaes.cc\u002F38ebf366-5fba-47ab-86ac-aa0a6cc124ef\u002F4325.fig.8.jpg\" class=\"\" title=\"\" alt=\"\" \u002F\u003E\u003C\u002Fa\u003E\u003C\u002Fdiv\u003E\u003Cdiv class=\"article-figure-note\"\u003E\u003Cp class=\"figure-note\"\u003E\u003C\u002Fp\u003E\u003Cp class=\"figure-note\"\u003EFigure 8. Comparison of horizontal federated reinforcement learning and vertical federated reinforcement learning.\u003C\u002Fp\u003E\u003C\u002Fdiv\u003E\u003C\u002Fdiv\u003E\u003C\u002Fdiv\u003E\u003Cdiv id=\"sec2-8\" class=\"article-Section\"\u003E\u003Ch3 \u003E4.2. Horizontal federated reinforcement learning\u003C\u002Fh3\u003E\u003Cp class=\"\"\u003EHFRL can be applied in scenarios in which the agents may be distributed geographically, but they face similar decision-making tasks and have very little interaction with each other in the observed environments. Each participating agent independently executes decision-making actions based on the current state of environment and obtains positive or negative rewards for evaluation. Since the environment explored by one agent is limited and each agent is unwilling to share the collected data, multiple agents try to train the policy and\u002For value model together to improve model performance and increase learning efficiency. The purpose of HFRL is to alleviate the sample-efficiency problem in RL, and help each agent quickly obtain the optimal policy which can maximize the expected cumulative reward for specific tasks, while considering privacy protection.\u003C\u002Fp\u003E\u003Cp class=\"\"\u003EIn the HFRL problem, the environment, state space, and action space can replace the data set, feature space, and label space of basic FL. More formally, we assume that \u003Ci\u003EN\u003C\u002Fi\u003E agents \u003Cinline-formula\u003E\u003Ctex-math id=\"M1\"\u003E$$ \\{\\mathcal{F}_i\\}_{i=1}^{N} $$\u003C\u002Ftex-math\u003E\u003C\u002Finline-formula\u003E can observe the environment \u003Cinline-formula\u003E\u003Ctex-math id=\"M1\"\u003E$$ \\{\\mathcal{E}_i\\}_{i=1}^{N} $$\u003C\u002Ftex-math\u003E\u003C\u002Finline-formula\u003E within their field of vision, \u003Cinline-formula\u003E\u003Ctex-math id=\"M1\"\u003E$$ \\mathcal{G} $$\u003C\u002Ftex-math\u003E\u003C\u002Finline-formula\u003E denotes the collection of all environments. The environment \u003Cinline-formula\u003E\u003Ctex-math id=\"M1\"\u003E$$ \\mathcal{E}_i $$\u003C\u002Ftex-math\u003E\u003C\u002Finline-formula\u003E where the \u003Ci\u003Ei\u003C\u002Fi\u003E-th agent is located has a similar model, \u003Ci\u003Ei.e.\u003C\u002Fi\u003E, state transition probability and reward function compared to other environments. Note that the environment \u003Cinline-formula\u003E\u003Ctex-math id=\"M1\"\u003E$$ \\mathcal{E}_i $$\u003C\u002Ftex-math\u003E\u003C\u002Finline-formula\u003E is independent of the other environments, in that the state transition and reward model of \u003Cinline-formula\u003E\u003Ctex-math id=\"M1\"\u003E$$ \\mathcal{E}_i $$\u003C\u002Ftex-math\u003E\u003C\u002Finline-formula\u003E do not depend on the states and actions of the other environments. Each agent \u003Cinline-formula\u003E\u003Ctex-math id=\"M1\"\u003E$$ F_i $$\u003C\u002Ftex-math\u003E\u003C\u002Finline-formula\u003E interacts with its own environment \u003Cinline-formula\u003E\u003Ctex-math id=\"M1\"\u003E$$ \\mathcal{E}_i $$\u003C\u002Ftex-math\u003E\u003C\u002Finline-formula\u003E to learn an optimal policy. Therefore, the conditions for HFRL are presented as follows, \u003Ci\u003Ei.e.\u003C\u002Fi\u003E,\u003C\u002Fp\u003E\u003Cp class=\"\"\u003E\u003Cdiv class=\"disp-formula\"\u003E\u003Clabel\u003E\u003C\u002Flabel\u003E\u003Ctex-math id=\"E1\"\u003E $$ \\mathcal{S}_{i}=\\mathcal{S}_{j}, \\mathcal{A}_{i}=\\mathcal{A}_{j}, \\mathcal{E}_{i} \\neq \\mathcal{E}_{j}, \\forall i, j \\in\\{1,2, \\ldots, N\\}, \\mathcal{E}_{i}, \\mathcal{E}_{j} \\in \\mathcal{G}, i \\neq j, $$ \u003C\u002Ftex-math\u003E\u003C\u002Fdiv\u003E\u003C\u002Fp\u003E\u003Cp class=\"\"\u003Ewhere \u003Cinline-formula\u003E\u003Ctex-math id=\"M1\"\u003E$$ \\mathcal{S}_i $$\u003C\u002Ftex-math\u003E\u003C\u002Finline-formula\u003E and \u003Cinline-formula\u003E\u003Ctex-math id=\"M1\"\u003E$$ \\mathcal{S}_j $$\u003C\u002Ftex-math\u003E\u003C\u002Finline-formula\u003E denote the similar state space encountered by the \u003Ci\u003Ei\u003C\u002Fi\u003E-th agent and \u003Ci\u003Ej\u003C\u002Fi\u003E-th agent, respectively. \u003Cinline-formula\u003E\u003Ctex-math id=\"M1\"\u003E$$ \\mathcal{A}_i $$\u003C\u002Ftex-math\u003E\u003C\u002Finline-formula\u003E and \u003Cinline-formula\u003E\u003Ctex-math id=\"M1\"\u003E$$ \\mathcal{A}_j $$\u003C\u002Ftex-math\u003E\u003C\u002Finline-formula\u003E denote the similar action space of the \u003Ci\u003Ei\u003C\u002Fi\u003E-th agent and \u003Ci\u003Ej\u003C\u002Fi\u003E-th agent, respectively The observed environment \u003Cinline-formula\u003E\u003Ctex-math id=\"M1\"\u003E$$ \\mathcal{E}_i $$\u003C\u002Ftex-math\u003E\u003C\u002Finline-formula\u003E and \u003Cinline-formula\u003E\u003Ctex-math id=\"M1\"\u003E$$ \\mathcal{E}_j $$\u003C\u002Ftex-math\u003E\u003C\u002Finline-formula\u003E are two different environments that are assumed to be independent and ideally identically distributed.\u003C\u002Fp\u003E\u003Cp class=\"\"\u003E\u003Ca href=\"#fig9\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"fig9\"\u003EFigure 9\u003C\u002Fa\u003E shows the HFRL in graphic form. Each agent is represented by a cuboid. The axes of the cuboid denote three dimensions of information, \u003Ci\u003Ei.e.\u003C\u002Fi\u003E, the environment, state space, and action space. We can intuitively see that all environments are arranged horizontally, and multiple agents have aligned state and action spaces. In other words, each agent explores independently in its respective environment, and needs to obtain optimal strategies for similar tasks. In HFRL, agents share their experiences by exchanging masked models to enhance sample efficiency and accelerate the learning process.\u003C\u002Fp\u003E\u003Cdiv class=\"Figure-block\" id=\"fig9\"\u003E\u003Cdiv xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxhtml\" class=\"article-figure-image\"\u003E\u003Ca href=\"\u002Farticles\u002Fir.2021.02\u002Fimage\u002Ffig9\" class=\"Article-img\" alt=\"\" target=\"_blank\"\u003E\u003Cimg alt=\"Federated reinforcement learning: techniques, applications, and open challenges\" src=\"https:\u002F\u002Fimage.oaes.cc\u002F38ebf366-5fba-47ab-86ac-aa0a6cc124ef\u002F4325.fig.9.jpg\" class=\"\" title=\"\" alt=\"\" \u002F\u003E\u003C\u002Fa\u003E\u003C\u002Fdiv\u003E\u003Cdiv class=\"article-figure-note\"\u003E\u003Cp class=\"figure-note\"\u003E\u003C\u002Fp\u003E\u003Cp class=\"figure-note\"\u003EFigure 9. Illustration of horizontal federated reinforcement learning.\u003C\u002Fp\u003E\u003C\u002Fdiv\u003E\u003C\u002Fdiv\u003E\u003Cp class=\"\"\u003EA typical example of HFRL is the autonomous driving system in IoV. As vehicles drive on roads throughout the city and country, they can collect various environmental information and train the autonomous driving models locally. Due to driving regulations, weather conditions, driving routes, and other factors, one vehicle cannot be exposed to every possible situation in the environment. Moreover, the vehicles have basically the same operations, including braking, acceleration, steering, \u003Ci\u003Eetc.\u003C\u002Fi\u003E Therefore, vehicles driving on different roads, different cities, or even different countries could share their learned experience with each other by FRL without revealing their driving data according to the premise of privacy protection. In this case, even if other vehicles have never encountered a situation, they can still perform the best action by using the shared model. The exploration of multiple vehicles together also creates an increased chance of learning rare cases to ensure the reliability of the model.\u003C\u002Fp\u003E\u003Cp class=\"\"\u003EFor a better understanding of HFRL, \u003Ca href=\"#fig10\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"fig10\"\u003EFigure 10\u003C\u002Fa\u003E shows an example of HFRL architecture using the server-client model. The coordinator is responsible for establishing encrypted communication with agents and implementing aggregation of shared models. The multiple parallel agents may be composed of heterogeneous equipment (\u003Ci\u003Ee.g.\u003C\u002Fi\u003E, IoT devices, smart phone and computers, \u003Ci\u003Eetc.\u003C\u002Fi\u003E) and distributed geographically. It is worth noting that there is no specific requirement for the number of agents, and agents are free to choose to join or leave. The basic procedure for conducting HFRL can be summarized as follows.\u003C\u002Fp\u003E\u003Cdiv class=\"Figure-block\" id=\"fig10\"\u003E\u003Cdiv xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxhtml\" class=\"article-figure-image\"\u003E\u003Ca href=\"\u002Farticles\u002Fir.2021.02\u002Fimage\u002Ffig10\" class=\"Article-img\" alt=\"\" target=\"_blank\"\u003E\u003Cimg alt=\"Federated reinforcement learning: techniques, applications, and open challenges\" src=\"https:\u002F\u002Fimage.oaes.cc\u002F38ebf366-5fba-47ab-86ac-aa0a6cc124ef\u002F4325.fig.10.jpg\" class=\"\" title=\"\" alt=\"\" \u002F\u003E\u003C\u002Fa\u003E\u003C\u002Fdiv\u003E\u003Cdiv class=\"article-figure-note\"\u003E\u003Cp class=\"figure-note\"\u003E\u003C\u002Fp\u003E\u003Cp class=\"figure-note\"\u003EFigure 10. An example of horizontal federated reinforcement learning architecture.\u003C\u002Fp\u003E\u003C\u002Fdiv\u003E\u003C\u002Fdiv\u003E\u003Cul class=\"tipsDisc\"\u003E\u003Cli\u003E\u003Cp\u003EStep 1: The initialization\u002Fjoin process can be divided into two cases, one is when the agent has no model locally, and the other is when the agent has a model locally. For the first case, the agent can directly download the shared global model from a coordinator. For the second case, the agent needs to confirm the model type and parameters with the central coordinator.\u003C\u002Fp\u003E\u003C\u002Fli\u003E\u003Cli\u003E\u003Cp\u003EStep 2: Each agent independently observes the state of the environment and determines the private strategy based on the local model. The selected action is evaluated by the next state and received reward. All agents train respective models in state-action-reward-state (SARS) cycles.\u003C\u002Fp\u003E\u003C\u002Fli\u003E\u003Cli\u003E\u003Cp\u003EStep 3: Local model parameters are encrypted and transmitted to the coordinator. Agents may submit local models at any time as long as the trigger conditions are met.\u003C\u002Fp\u003E\u003C\u002Fli\u003E\u003Cli\u003E\u003Cp\u003EStep 4: The coordinator conducts the specific aggregation algorithm to evolve the global federated model. Actually, there is no need to wait for submissions from all agents, and appropriate aggregation conditions can be formulated depending on communication resources.\u003C\u002Fp\u003E\u003C\u002Fli\u003E\u003Cli\u003E\u003Cp\u003EStep 5: The coordinator sends back the aggregated model to the agents.\u003C\u002Fp\u003E\u003C\u002Fli\u003E\u003Cli\u003E\u003Cp\u003EStep 6: The agents improve their respective models by fusing the federated model.\u003C\u002Fp\u003E\u003C\u002Fli\u003E\u003C\u002Ful\u003E\u003Cp class=\"\"\u003EFollowing the above architecture and process, applications suitable for HFRL should meet the following characteristics. First, agents have similar tasks to make decisions under dynamic environments. Different from the FL setting, the goal of the HFRL-based application is to find the optimal strategy to maximize reward in the future. For the agent to accomplish the task requirements, the optimal strategy directs them to perform certain actions, such as control, scheduling, navigation, \u003Ci\u003Eetc.\u003C\u002Fi\u003E Second, distributed agents maintain independent observations. Each agent can only observe the environment within its field of view, but does not ensure that the collected data follows the same distribution. Third, it is important to protect the data that each agent collects and explores. Agents are presumed to be honest but curious, \u003Ci\u003Ei.e.\u003C\u002Fi\u003E, they honestly follow the learning mechanism but are curious about private information held by other agents. Due to this, the data used for training is only stored at the owner and is not transferred to the coordinator. HFRL provides an implementation method for sharing experiences under the constraints of privacy protection. Additionally, various reasons limit the agent’s ability to explore the environment in a balanced manner. Participating agents may include heterogeneous devices. The amount of data collected by each agent is unbalanced due to mobility, observation, energy and other factors. However, all participants have sufficient computing, storage, and communication capabilities. These capabilities assist the agent in completing model training, merging, and other basic processes. Finally, the environment observed by a agent may change dynamically, causing differences in data distribution. The participating agents need to update the model in time to quickly adapt to environmental changes and construct a personalized local model.\u003C\u002Fp\u003E\u003Cp class=\"\"\u003EIn existing RL studies, some applications that meet the above characteristics can be classified as HFRL. Nadiger \u003Ci\u003Eet al.\u003C\u002Fi\u003E\u003Csup\u003E[\u003Ca href=\"#B56\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B56\"\u003E56\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E presents a typical HFRL architecture, which includes the grouping policy, the learning policy, and the federation policy. In this work, RL is used to show the applicability of granular personalization and FL is used to reduce training time. To demonstrate the effectiveness of the proposed architecture, a non-player character in the Atari game Pong is implemented and evaluated. In the study from Liu \u003Ci\u003Eet al.\u003C\u002Fi\u003E\u003Csup\u003E[\u003Ca href=\"#B57\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B57\"\u003E57\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E, the authors propose the lifelong federated reinforcement learning (LFRL) for navigation in cloud robotic systems. It enables the robot to learn efficiently in a new environment and use prior knowledge to quickly adapt to the changes in the environment. Each robot trains a local model according to its own navigation task, and the centralized cloud server implements a knowledge fusion algorithm for upgrading a shared model. In considering that the local model and the shared model might have different network structures, this paper proposes to apply transfer learning to improve the performance and efficiency of the shared model. Further, researchers also focus on HFRL-based applications in the IoT due to the high demand for privacy protection. Ren \u003Ci\u003Eet al.\u003C\u002Fi\u003E\u003Csup\u003E[\u003Ca href=\"#B58\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B58\"\u003E58\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E suggest deploying the FL architecture between edge nodes and IoT devices for computation offloading tasks. IoT devices can download RL model from edge nodes and train the local model using own data, including the remained energy resources and the workload of IoT device, \u003Ci\u003Eetc.\u003C\u002Fi\u003E The edge node aggregates the updated private model into the shared model. Although this method considers privacy protection issues, it requires further evaluation regarding the cost of communication resources by the model exchange. In addition, the work\u003Csup\u003E[\u003Ca href=\"#B59\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B59\"\u003E59\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E proposes a federated deep-reinforcement-learning-based framework (FADE) for edge caching. Edge devices, including base stations (BSs), can cooperatively learn a predictive model using the first round of training parameters for local learning, and then upload the local parameters tuned to the next round of global training. By keeping the training on local devices, the FADE can enable fast training and decouple the learning process between the cloud and data owner in a distributed-centralized manner. More HFRL-based applications will be classified and summarized in the next section.\u003C\u002Fp\u003E\u003Cp class=\"\"\u003EPrior to HFRL, a variety of distributed RL algorithms have been extensively investigated, which are closely related to HFRL. In general, distributed RL algorithms can be divided into two types: synchronized and asynchronous. In synchronous RL algorithms, such as Sync-Opt synchronous stochastic optimization (Sync-Opt) \u003Csup\u003E[\u003Ca href=\"#B60\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B60\"\u003E60\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E and parallel advantage actor critic (PAAC)\u003Csup\u003E[\u003Ca href=\"#B3\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B3\"\u003E3\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E, the agents explore their own environments separately, and after a number of samples are collected, the global parameters are updated synchronously. On the contrary, the coordinator will update the global model immediately after receiving the gradient from an arbitrary agent in asynchronous RL algorithms, rather than waiting for other agents. Several asynchronous RL algorithms are presented, including A3C\u003Csup\u003E[\u003Ca href=\"#B61\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B61\"\u003E61\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E, Impala\u003Csup\u003E[\u003Ca href=\"#B62\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B62\"\u003E62\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E, Ape-X\u003Csup\u003E[\u003Ca href=\"#B63\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B63\"\u003E63\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E and general reinforcement learning architecture (Go-rila)\u003Csup\u003E[\u003Ca href=\"#B1\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B1\"\u003E1\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E. From the perspective of technology development, HFRL can also be considered security-enhanced parallel RL. In parallel RL, multiple agents interact with a stochastic environment to seek the optimal policy for the same task\u003Csup\u003E[\u003Ca href=\"#B1\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B1\"\u003E1\u003C\u002Fa\u003E,\u003Ca href=\"#B2\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B2\"\u003E2\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E. By building a closed loop of data and knowledge in parallel systems, parallel RL helps determine the next course of action for each agent. The state and action representations are fed into a designed neural network to approximate the action value function\u003Csup\u003E[\u003Ca href=\"#B64\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B64\"\u003E64\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E. However, parallel RL typically transfers the experience of agent without considering privacy protection issues\u003Csup\u003E[\u003Ca href=\"#B7\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B7\"\u003E7\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E. In the implementation of HFRL, further restrictions accompany privacy protection and communication consumption to adapt to special scenarios, such as IoT applications\u003Csup\u003E[\u003Ca href=\"#B59\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B59\"\u003E59\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E. In addition, another point to consider is Non-IID data. In order to ensure convergence of the RL model, it is generally assumed in parallel RL that the states transitions in the environment follow the same distribution, \u003Ci\u003Ei.e.\u003C\u002Fi\u003E, the environments of different agents are IID. But in actual scenarios, the situation faced by agents may differ slightly, so that the models of environments for different agents are not identically distributed. Therefore, HFRL needs to improve the generalization ability of the model compared with parallel RL to meet the challenges posed by Non-IID data.\u003C\u002Fp\u003E\u003Cp class=\"\"\u003EBased on the potential issues faced by the current RL technology, the advantages of HFRL can be summarized as follows.\u003C\u002Fp\u003E\u003Cul class=\"tipsDisc\"\u003E\u003Cli\u003E\u003Cp\u003EEnhancing training speed. In the case of a similar target task, multiple agents sharing training experiences gained from different environments can expedite the learning process. The local model rapidly evolves through aggregation and update algorithms to assess the unexplored environment. Moreover, the data obtained by different agents are independent, reducing correlations between the observed data. Furthermore, this also helps to solve the issue of unbalanced data caused by various restrictions.\u003C\u002Fp\u003E\u003C\u002Fli\u003E\u003Cli\u003E\u003Cp\u003EImproving the reliability of model. When the dimensions of the state of the environment are enormous or even uncountable, it is difficult for a single agent to train an optimal strategy for situations with extremely low occurrence probabilities. Horizontal agents are exploring independently while building a cooperative model to improve the local model’s performance on rare states.\u003C\u002Fp\u003E\u003C\u002Fli\u003E\u003Cli\u003E\u003Cp\u003EMitigating the problems of devices heterogeneity. Different devices deploying RL agents in the HFRL architecture may have different computational and communication capabilities. Some devices may not meet the basic requirements for training, but strategies are needed to guide actions. HFRL makes it possible for all agents to obtain the shared model equally for the target task.\u003C\u002Fp\u003E\u003C\u002Fli\u003E\u003Cli\u003E\u003Cp\u003EAddressing the issue of non-identical environment. Considering the differences in the environment dynamics for the different agents, the assumption of IID data may be broken. Under the HFRL architecture, agents in not identically-distributed environment models can still cooperate to learn a federated model. In order to address the difference in environment dynamics, a personalized update algorithm of local model could be designed to minimize the impact of this issue.\u003C\u002Fp\u003E\u003C\u002Fli\u003E\u003Cli\u003E\u003Cp\u003EIncreasing the flexibility of the system. The agent can decide when to participate in the cooperative system at any time, because HFRL allows asynchronous requests and aggregation of shared models. In the existing HFRL-based application, new agents also can apply for membership and benefit from downloading the shared model.\u003C\u002Fp\u003E\u003C\u002Fli\u003E\u003C\u002Ful\u003E\u003C\u002Fdiv\u003E\u003Cdiv id=\"sec2-9\" class=\"article-Section\"\u003E\u003Ch3 \u003E4.3. Vertical federated reinforcement learning\u003C\u002Fh3\u003E\u003Cp class=\"\"\u003EIn VFL, samples of multiple data sets have different feature spaces but these samples may belong to the same groups or common users. The training data of each participant are divided vertically according to their features. More general and accurate models can be generated by building heterogeneous feature spaces without releasing private information. VFRL applies the methodology of VFL to RL and is suitable for POMDP scenarios where different RL agents are in the same environment but have different interactions with the environment. Specifically, different agents could have different observations that are only part of the global state. They could take actions from different action spaces and observe different rewards, or some agents even take no actions or cannot observe any rewards. Since the observation range of a single agent to the environment is limited, multiple agents cooperate to collect enough knowledge needed for decision making. The role of FL in VFRL is to aggregate the partial features observed by various agents. Especially for those agents without rewards, the aggregation effect of FL greatly enhances the value of such agents in their interactions with the environment, and ultimately helps with the strategy optimization. It is worth noting that in VFRL the issue of privacy protection needs to be considered, \u003Ci\u003Ei.e.\u003C\u002Fi\u003E, private data collected by some agents do not have to be shared with others. Instead, agents can transmit encrypted model parameters, gradients, or direct mid-product to each other. In short, the goal of VFRL is for agents interacting with the same environment to improve the performance of their policies and the effectiveness in learning them by sharing experiences without compromising the privacy.\u003C\u002Fp\u003E\u003Cp class=\"\"\u003EMore formally, we denote \u003Cinline-formula\u003E\u003Ctex-math id=\"M1\"\u003E$$ \\{\\mathcal{F}_i\\}_{i=1}^{N} $$\u003C\u002Ftex-math\u003E\u003C\u002Finline-formula\u003E as \u003Ci\u003EN\u003C\u002Fi\u003E agents in VFRL, which interact with a global environment \u003Cinline-formula\u003E\u003Ctex-math id=\"M1\"\u003E$$ \\mathcal{E} $$\u003C\u002Ftex-math\u003E\u003C\u002Finline-formula\u003E. The \u003Ci\u003Ei\u003C\u002Fi\u003E-th agent \u003Cinline-formula\u003E\u003Ctex-math id=\"M1\"\u003E$$ \\mathcal{F}_i $$\u003C\u002Ftex-math\u003E\u003C\u002Finline-formula\u003E is located in the environment \u003Cinline-formula\u003E\u003Ctex-math id=\"M1\"\u003E$$ \\mathcal{E}_i=\\mathcal{E} $$\u003C\u002Ftex-math\u003E\u003C\u002Finline-formula\u003E, obtains the local partial observation \u003Cinline-formula\u003E\u003Ctex-math id=\"M1\"\u003E$$ \\mathcal{O}_i $$\u003C\u002Ftex-math\u003E\u003C\u002Finline-formula\u003E, and can perform the set of actions \u003Cinline-formula\u003E\u003Ctex-math id=\"M1\"\u003E$$ \\mathcal{A}_i $$\u003C\u002Ftex-math\u003E\u003C\u002Finline-formula\u003E. Different from HFRL, the state\u002Fobservation and action spaces of two agents \u003Cinline-formula\u003E\u003Ctex-math id=\"M1\"\u003E$$ \\mathcal{F}_i $$\u003C\u002Ftex-math\u003E\u003C\u002Finline-formula\u003E and \u003Cinline-formula\u003E\u003Ctex-math id=\"M1\"\u003E$$ \\mathcal{F}_j $$\u003C\u002Ftex-math\u003E\u003C\u002Finline-formula\u003E may be not identical, but the aggregation of the state\u002Fobservation spaces and action spaces of all the agents constitutes the global state and action spaces of the global environment \u003Cinline-formula\u003E\u003Ctex-math id=\"M1\"\u003E$$ \\mathcal{E} $$\u003C\u002Ftex-math\u003E\u003C\u002Finline-formula\u003E. The conditions for VFRL can be defined as \u003Ci\u003Ei.e.\u003C\u002Fi\u003E,\u003C\u002Fp\u003E\u003Cp class=\"\"\u003E\u003Cdiv class=\"disp-formula\"\u003E\u003Clabel\u003E\u003C\u002Flabel\u003E\u003Ctex-math id=\"E1\"\u003E $$ \\mathcal{O}_i\\not=\\mathcal{O}_j,\\mathcal{A}_i\\not=\\mathcal{A}_j,\\mathcal{E}_i=\\mathcal{E}_j=\\mathcal{E},\\bigcup_{i=1}^{N}\\mathcal{O}_i=\\mathcal{S},\\bigcup_{i=1}^{N}\\mathcal{A}_i=\\mathcal{A},\\forall i,j\\in\\{1,2,\\dots,N\\},i\\not=j, $$ \u003C\u002Ftex-math\u003E\u003C\u002Fdiv\u003E\u003C\u002Fp\u003E\u003Cp class=\"\"\u003Ewhere \u003Cinline-formula\u003E\u003Ctex-math id=\"M1\"\u003E$$ \\mathcal{S} $$\u003C\u002Ftex-math\u003E\u003C\u002Finline-formula\u003E and \u003Cinline-formula\u003E\u003Ctex-math id=\"M1\"\u003E$$ \\mathcal{A} $$\u003C\u002Ftex-math\u003E\u003C\u002Finline-formula\u003E denote the global state space and action space of all participant agents respectively. It can be seen that all the observations of the \u003Ci\u003EN\u003C\u002Fi\u003E agents together constitute the global state space \u003Cinline-formula\u003E\u003Ctex-math id=\"M1\"\u003E$$ \\mathcal{S} $$\u003C\u002Ftex-math\u003E\u003C\u002Finline-formula\u003E of the environment \u003Cinline-formula\u003E\u003Ctex-math id=\"M1\"\u003E$$ \\mathcal{E} $$\u003C\u002Ftex-math\u003E\u003C\u002Finline-formula\u003E. Besides, the environments \u003Cinline-formula\u003E\u003Ctex-math id=\"M1\"\u003E$$ \\mathcal{E}_i $$\u003C\u002Ftex-math\u003E\u003C\u002Finline-formula\u003E and \u003Cinline-formula\u003E\u003Ctex-math id=\"M1\"\u003E$$ \\mathcal{E}_j $$\u003C\u002Ftex-math\u003E\u003C\u002Finline-formula\u003E are the same environment \u003Cinline-formula\u003E\u003Ctex-math id=\"M1\"\u003E$$ \\mathcal{E} $$\u003C\u002Ftex-math\u003E\u003C\u002Finline-formula\u003E. In most cases, there is a great difference between the observations of two agents \u003Cinline-formula\u003E\u003Ctex-math id=\"M1\"\u003E$$ \\mathcal{F}_i $$\u003C\u002Ftex-math\u003E\u003C\u002Finline-formula\u003E and \u003Cinline-formula\u003E\u003Ctex-math id=\"M1\"\u003E$$ \\mathcal{F}_j $$\u003C\u002Ftex-math\u003E\u003C\u002Finline-formula\u003E.\u003C\u002Fp\u003E\u003Cp class=\"\"\u003E\u003Ca href=\"#fig11\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"fig11\"\u003EFigure 11\u003C\u002Fa\u003E shows the architecture of VFRL. The dataset and feature space in VFL are converted to the environment and state space respectively. VFL divides the dataset vertically according to the features of samples, and VFRL divides agents based on the state spaces observed from the global environment. Generally speaking, every agent has its local state which can be different from that of the other agents and the aggregation of these local partial states corresponds to the entire environment state\u003Csup\u003E[\u003Ca href=\"#B65\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B65\"\u003E65\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E. In addition, after interacting with the environment, agents may generate their local actions which correspond to the labels in VFL.\u003C\u002Fp\u003E\u003Cdiv class=\"Figure-block\" id=\"fig11\"\u003E\u003Cdiv xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxhtml\" class=\"article-figure-image\"\u003E\u003Ca href=\"\u002Farticles\u002Fir.2021.02\u002Fimage\u002Ffig11\" class=\"Article-img\" alt=\"\" target=\"_blank\"\u003E\u003Cimg alt=\"Federated reinforcement learning: techniques, applications, and open challenges\" src=\"https:\u002F\u002Fimage.oaes.cc\u002F38ebf366-5fba-47ab-86ac-aa0a6cc124ef\u002F4325.fig.11.jpg\" class=\"\" title=\"\" alt=\"\" \u002F\u003E\u003C\u002Fa\u003E\u003C\u002Fdiv\u003E\u003Cdiv class=\"article-figure-note\"\u003E\u003Cp class=\"figure-note\"\u003E\u003C\u002Fp\u003E\u003Cp class=\"figure-note\"\u003EFigure 11. Illustration of vertical federated reinforcement learning.\u003C\u002Fp\u003E\u003C\u002Fdiv\u003E\u003C\u002Fdiv\u003E\u003Cp class=\"\"\u003ETwo types of agents can be defined for VFRL, \u003Ci\u003Ei.e.\u003C\u002Fi\u003E, decision-oriented agents and support-oriented agents. Decision-oriented agents \u003Cinline-formula\u003E\u003Ctex-math id=\"M1\"\u003E$$ \\{\\mathcal{F}_i\\}_{i=1}^K $$\u003C\u002Ftex-math\u003E\u003C\u002Finline-formula\u003E can interact with the environment \u003Cinline-formula\u003E\u003Ctex-math id=\"M1\"\u003E$$ \\mathcal{E} $$\u003C\u002Ftex-math\u003E\u003C\u002Finline-formula\u003E based on their local state \u003Cinline-formula\u003E\u003Ctex-math id=\"M1\"\u003E$$ \\{\\mathcal{S}_i\\}_{i=1}^K $$\u003C\u002Ftex-math\u003E\u003C\u002Finline-formula\u003E and action \u003Cinline-formula\u003E\u003Ctex-math id=\"M1\"\u003E$$ \\{\\mathcal{A}_i\\}_{i=1}^K $$\u003C\u002Ftex-math\u003E\u003C\u002Finline-formula\u003E. Meanwhile, support-oriented agents \u003Cinline-formula\u003E\u003Ctex-math id=\"M1\"\u003E$$ \\{\\mathcal{F}_i\\}_{i=K+1}^N $$\u003C\u002Ftex-math\u003E\u003C\u002Finline-formula\u003E take no actions and receive no rewards but only the observations of the environment, \u003Ci\u003Ei.e.\u003C\u002Fi\u003E, their local states \u003Cinline-formula\u003E\u003Ctex-math id=\"M1\"\u003E$$ \\{\\mathcal{S}_i\\}_{i=K+1}^N $$\u003C\u002Ftex-math\u003E\u003C\u002Finline-formula\u003E. In general, the following six steps, as shown in \u003Ca href=\"#fig12\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"fig12\"\u003EFigure 12\u003C\u002Fa\u003E, are the basic procedure for VFRL, \u003Ci\u003Ei.e.\u003C\u002Fi\u003E,\u003C\u002Fp\u003E\u003Cdiv class=\"Figure-block\" id=\"fig12\"\u003E\u003Cdiv xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxhtml\" class=\"article-figure-image\"\u003E\u003Ca href=\"\u002Farticles\u002Fir.2021.02\u002Fimage\u002Ffig12\" class=\"Article-img\" alt=\"\" target=\"_blank\"\u003E\u003Cimg alt=\"Federated reinforcement learning: techniques, applications, and open challenges\" src=\"https:\u002F\u002Fimage.oaes.cc\u002F38ebf366-5fba-47ab-86ac-aa0a6cc124ef\u002F4325.fig.12.jpg\" class=\"\" title=\"\" alt=\"\" \u002F\u003E\u003C\u002Fa\u003E\u003C\u002Fdiv\u003E\u003Cdiv class=\"article-figure-note\"\u003E\u003Cp class=\"figure-note\"\u003E\u003C\u002Fp\u003E\u003Cp class=\"figure-note\"\u003EFigure 12. An example of vertical federated reinforcement learning architecture.\u003C\u002Fp\u003E\u003C\u002Fdiv\u003E\u003C\u002Fdiv\u003E\u003Cul class=\"tipsDisc\"\u003E\u003Cli\u003E\u003Cp\u003EStep 1: Initialization is performed for all agent models.\u003C\u002Fp\u003E\u003C\u002Fli\u003E\u003Cli\u003E\u003Cp\u003EStep 2: Agents obtain states from the environment. For decision-oriented agents, actions are obtained based on the local models, and feedbacks are obtained through interactions with the environment, \u003Ci\u003Ei.e.\u003C\u002Fi\u003E, the states of the next time step and rewards. The data tuple of state-action-reward-state (SARS) is used to train the local models.\u003C\u002Fp\u003E\u003C\u002Fli\u003E\u003Cli\u003E\u003Cp\u003EStep 3: All agents calculate the mid-products of the local models and then transmit the encrypted mid-products to the federated model.\u003C\u002Fp\u003E\u003C\u002Fli\u003E\u003Cli\u003E\u003Cp\u003EStep 4: The federated model performs the aggregation calculation for mid-products and trains the federated model based on the aggregation results.\u003C\u002Fp\u003E\u003C\u002Fli\u003E\u003Cli\u003E\u003Cp\u003EStep 5: Federated model encrypts model parameters such as weight and gradient and passes them back to other agents.\u003C\u002Fp\u003E\u003C\u002Fli\u003E\u003Cli\u003E\u003Cp\u003EStep 6: All agents update their local models based on the received encrypted parameters.\u003C\u002Fp\u003E\u003C\u002Fli\u003E\u003C\u002Ful\u003E\u003Cp class=\"\"\u003EAs an example of VFRL, consider a microgrid (MG) system including household users, the power company and the photovoltaic (PV) management company as the agents. All the agents observe the same MG environment while their local state spaces are quite different. The global states of the MG system generally consist of several dimensions\u002Ffeatures, \u003Ci\u003Ei.e.\u003C\u002Fi\u003E, state-of-charge (SOC) of the batteries, load consumption of the household users, power generation from PV, etc. The household agents can obtain the SOC of their own batteries and their own load consumption, the power company can know the load consumption of all the users, and PV management company can know the power generation of PV. As to the action, the power company needs to make decisions on the power dispatch of the diesel generators (DG), and the household users can make decisions to manage their electrical utilities with demand response. Finally, the power company can observe rewards such as the cost of DG power generation, the balance between power generation and consumption, and the household users can observe rewards such as their electricity bill that is related to their power consumption. In order to learn the optimal policies, these agents need to communicate with each other to share their observations. However, PV managers do not want to expose their data to other companies, and household users also want to keep their consumption data private. In this way, VFRL is suitable to achieve this goal and can help improve policy decisions without exposing specific data.\u003C\u002Fp\u003E\u003Cp class=\"\"\u003ECompared with HFRL, there are currently few works on VFRL. Zhuo \u003Ci\u003Eet al.\u003C\u002Fi\u003E\u003Csup\u003E[\u003Ca href=\"#B65\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B65\"\u003E65\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E present the federated deep reinforcement learning (FedRL) framework. The purpose of this paper is to solve the challenge where the feature space of states is small and the training data are limited. Transfer learning approaches in DRL are also solutions for this case. However, when considering the privacy-aware applications, directly transferring data or models should not be used. Hence, FedRL combines the advantage of FL with RL, which is suitable for the case when agents need to consider their privacy. FedRL framework assumes agents cannot share their partial observations of the environment and some agents are unable to receive rewards. It builds a shared value network, \u003Ci\u003Ei.e.\u003C\u002Fi\u003E, multiLayer perceptron (MLP), and takes its own Q-network output and encryption value as input to calculate a global Q-network output. Based on the output of global Q-network, the shared value network and self Q-network are updated. Two agents are used in the FedRL algorithm, \u003Ci\u003Ei.e.\u003C\u002Fi\u003E, agent \u003Ci\u003Eα\u003C\u002Fi\u003E and \u003Ci\u003Eβ\u003C\u002Fi\u003E, which interact with the same environment. However, agent \u003Ci\u003Eβ\u003C\u002Fi\u003E cannot build its own policies and rewards. Finally, FedRL is applied in two different games, \u003Ci\u003Ei.e.\u003C\u002Fi\u003E, Grid-World and Text2Action, and achieves better results than the other baselines. Although the VFRL model in this paper only contains two agents, and the structure of the aggregated neural network model is relatively simple, we believe that it is a great attempt to first implement VFRL and verify its effectiveness.\u003C\u002Fp\u003E\u003Cp class=\"\"\u003EMulti-agent RL (MARL) is very closely related to VFRL. As the name implies, MARL takes into account the existence of multiple agents in the RL system. However, the empirical evaluation shows that applying the simple single-agent RL algorithms directly to scenarios of multiple agents cannot converge to the optimal solution, since the environment is no longer static from the perspective of each agent \u003Csup\u003E[\u003Ca href=\"#B66\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B66\"\u003E66\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E. In specific, the action of each agent will affect the next state, thus affecting all agents in the future time step\u003Csup\u003E[\u003Ca href=\"#B67\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B67\"\u003E67\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E. Besides, the actions performed by one certain agent will yield different rewards depending on the actions taken by other agents. This means that agents in MARL correlate with each other, rather than being independent of each other. This challenge, called as the non-stationarity of the environment, is the main problem to be solved in the development of an efficient MARL algorithm\u003Csup\u003E[\u003Ca href=\"#B68\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B68\"\u003E68\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E.\u003C\u002Fp\u003E\u003Cp class=\"\"\u003EMARL and VFRL both study the problem of multiple agents learning concurrently how to solve a task by interacting with the same environment\u003Csup\u003E[\u003Ca href=\"#B69\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B69\"\u003E69\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E. Since MARL and VFRL have a large range of similarities, the review of MARL’s related works is a very useful guide to help researchers summarize the research focus and better understand VFRL. There is abundant literature related to MARL. However, most MARL research \u003Csup\u003E[\u003Ca href=\"#B70\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B70\"\u003E70\u003C\u002Fa\u003E-\u003Ca href=\"#B73\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B73\"\u003E73\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E is based on a fully observed markov decision process (MDP), where each agent is assumed to have the global observation of the system state\u003Csup\u003E[\u003Ca href=\"#B68\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B68\"\u003E68\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E. These MARL algorithms are not applicable to the case of POMDP where the observations of individual agents are often only a part of the overall environment\u003Csup\u003E[\u003Ca href=\"#B74\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B74\"\u003E74\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E. Partial observability is a crucial consideration for the development of algorithms that can be applied to real-world problems\u003Csup\u003E[\u003Ca href=\"#B75\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B75\"\u003E75\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E. Since VFRL is mainly oriented towards POMDP scenarios, it is more important to analyze the related works of MARL based on POMDP as the guidance of VFRL.\u003C\u002Fp\u003E\u003Cp class=\"\"\u003EAgents in the above scenarios partially observe the system state and make decisions at each step to maximize the overall rewards for all agents, which can be formalized as a decentralized partially observable markov decision process (Dec-POMDP)\u003Csup\u003E[\u003Ca href=\"#B76\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B76\"\u003E76\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E. Optimally addressing a Dec-POMDP model is well known to be a very challenging problem. In the early works, Omidshafiei \u003Ci\u003Eet al.\u003C\u002Fi\u003E\u003Csup\u003E[\u003Ca href=\"#B77\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B77\"\u003E77\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E proposes a two-phase MT-MARL approach that concludes the methods of cautiously-optimistic learners for action-value approximation and concurrent experience replay trajectories (CERTs) as the experience replay targeting sample-efficient and stable MARL. The authors also apply the recursive neural network (RNN) to estimate the non-observed state and hysteretic Q-learning to address the problem of non-stationarity in Dec-POMDP. Han \u003Ci\u003Eet al.\u003C\u002Fi\u003E\u003Csup\u003E[\u003Ca href=\"#B78\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B78\"\u003E78\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E designs a neural network architecture, IPOMDP-net, which extends QMDP-net planning algorithm\u003Csup\u003E[\u003Ca href=\"#B79\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B79\"\u003E79\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E to MARL settings under POMDP. Besides, Mao \u003Ci\u003Eet al.\u003C\u002Fi\u003E\u003Csup\u003E[\u003Ca href=\"#B80\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B80\"\u003E80\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E introduces the concept of information state embedding to compress agents’ histories and proposes an RNN model combining the state embedding. Their method, \u003Ci\u003Ei.e.\u003C\u002Fi\u003E, embed-then-learn pipeline, is universal since the embedding can be fed into any existing partially observable MARL algorithm as the black-box. In the study from Mao \u003Ci\u003Eet al.\u003C\u002Fi\u003E\u003Csup\u003E[\u003Ca href=\"#B81\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B81\"\u003E81\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E, the proposed attention MADDPG (ATT-MADDPG) has several critic networks for various agents under POMDP. A centralized critic is adopted to collect the observations and actions of the teammate agents. Specifically, the attention mechanism is applied to enhance the centralized critic. The final introduced work is from Lee \u003Ci\u003Eet al.\u003C\u002Fi\u003E\u003Csup\u003E[\u003Ca href=\"#B82\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B82\"\u003E82\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E. They present an augmenting MARL algorithm based on pretraining to address the challenge in disaster response. It is interesting that they use behavioral cloning (BC), a supervised learning method where agents learn their policy from demonstration samples, as the approach to pretrain the neural network. BC can generate a feasible Dec-POMDP policy from demonstration samples, which offers advantages over plain MARL in terms of solution quality and computation time.\u003C\u002Fp\u003E\u003Cp class=\"\"\u003ESome MARL algorithms also concentrate on the communication issue of POMDP. In the study from Sukhbaatar \u003Ci\u003Eet al.\u003C\u002Fi\u003E\u003Csup\u003E[\u003Ca href=\"#B83\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B83\"\u003E83\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E, communication between the agents is performed for a number of rounds before their action is selected. The communication protocol is learned concurrently with the optimal policy. Foerster \u003Ci\u003Eet al.\u003C\u002Fi\u003E\u003Csup\u003E[\u003Ca href=\"#B84\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B84\"\u003E84\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E proposes a deep recursive network architecture, \u003Ci\u003Ei.e.\u003C\u002Fi\u003E, Deep Distributed Recurrent Q-network (DDRQN), to address the communication problem in a multi-agent partially-observable setting. This work makes three fundamental modifications to previous algorithms. The first one is last-action inputs, which let each agent access its previous action as an input for the next time-step. Besides, inter-agent weight sharing allows diverse behavior between agents, as the agents receive different observations and thus evolve in different hidden states. The final one is disabling experience replay, which is because the non-stationarity of the environment renders old experiences obsolete or misleading. Foerster \u003Ci\u003Eet al.\u003C\u002Fi\u003E\u003Csup\u003E[\u003Ca href=\"#B84\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B84\"\u003E84\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E considers the communication task of fully cooperative, partially observable, sequential multi-agent decision-making problems. In their system model, each agent can receive a private observation and take actions that affect the environment. In addition, the agent can also communicate with its fellow agents via a discrete limited-bandwidth channel. Despite the partial observability and limited channel capacity, authors achieved the task that the two agents could discover a communication protocol that enables them to coordinate their behavior based on the approach of deep recurrent Q-networks.\u003C\u002Fp\u003E\u003Cp class=\"\"\u003EWhile there are some similarities between MARL and VFRL, several important differences have to be paid attention to, \u003Ci\u003Ei.e.\u003C\u002Fi\u003E,\u003C\u002Fp\u003E\u003Cul class=\"tipsDisc\"\u003E\u003Cli\u003E\u003Cp\u003EVFRL and some MARL algorithms are able to address similar problems, \u003Ci\u003Ee.g.\u003C\u002Fi\u003E, the issues of POMDP. However, there are differences between the solution ideas between two algorithms. Since VFRL is the product of applying VFL to RL, the FL component of VFRL focuses more on the aggregation of partial features, including states and rewards, observed by different agents since VFRL inception. Security is also an essential issue in VFRL. On the contrary, MARL may arise as the most natural way of adding more than one agent in a RL system\u003Csup\u003E[\u003Cxref ref-type=\"bibr\" rid=\"B85\"\u003E85\u003C\u002Fxref\u003E]\u003C\u002Fsup\u003E. In MARL, agents not only interact with the environment, but also have complex interactive relationships with other agents, which creates a great obstacle to the solution of policy optimization. Therefore, the original intentions of two algorithms are different.\u003C\u002Fp\u003E\u003C\u002Fli\u003E\u003Cli\u003E\u003Cp\u003ETwo algorithms are slightly different in terms of the structure. The agents in MARL must surely have the reward even some of them may not have their own local actions. However, in some cases, the agents in VFRL are not able to generate a corresponding operation policy, so in these cases, some agents have no actions and rewards\u003Csup\u003E[\u003Cxref ref-type=\"bibr\" rid=\"B65\"\u003E65\u003C\u002Fxref\u003E]\u003C\u002Fsup\u003E. Therefore, VFRL can solve more extensive problems that MARL is not capable of solving.\u003C\u002Fp\u003E\u003C\u002Fli\u003E\u003Cli\u003E\u003Cp\u003EBoth two algorithms involve the communication problem between agents. In MARL, information such as the states of other agents and model parameters can be directly and freely propagated among agents. During communication, some MARL methods such as DDRQN in the work of Foerster \u003Ci\u003Eet al.\u003C\u002Fi\u003E\u003Csup\u003E[\u003Cxref ref-type=\"bibr\" rid=\"B84\"\u003E84\u003C\u002Fxref\u003E]\u003C\u002Fsup\u003E consider the previous action as an input for the next time-step state. Weight sharing is also allowed between agents. However, VFRL assumes states cannot be shared among agents. Since these agents do not exchange experience and data directly, VFRL focuses more on security and privacy issues of communication between agents, as well as how to process mid-products transferred by other agents and aggregate federated models.\u003C\u002Fp\u003E\u003C\u002Fli\u003E\u003C\u002Ful\u003E\u003Cp class=\"\"\u003EIn summary, as a potential and notable algorithm, VFRL has several advantages as follows, \u003Ci\u003Ei.e.\u003C\u002Fi\u003E,\u003C\u002Fp\u003E\u003Cul class=\"tipsDisc\"\u003E\u003Cli\u003E\u003Cp\u003EExcellent privacy protection. VFRL inherits the FL algorithm’s idea of data privacy protection, so for the task of multiple agents cooperation in the same environment, information interaction can be carried out confidently to enhance the learning efficiency of RL model. In this process, each participant does not have to worry about any leakage of raw real-time data.\u003C\u002Fp\u003E\u003C\u002Fli\u003E\u003Cli\u003E\u003Cp\u003EWide application scenarios. With appropriate knowledge extraction methods, including algorithm design and system modeling, VFRL can solve more real-world problems compared with MARL algorithms. This is because VFRL can consider some agents that cannot generate rewards into the system model, so as to integrate their partial observation information of the environment based on FL while protecting privacy, train a more robust RL agent, and further improve learning efficiency.\u003C\u002Fp\u003E\u003C\u002Fli\u003E\u003C\u002Ful\u003E\u003C\u002Fdiv\u003E\u003Cdiv id=\"sec2-10\" class=\"article-Section\"\u003E\u003Ch3 \u003E4.4. Other types of FRL\u003C\u002Fh3\u003E\u003Cp class=\"\"\u003EThe above HFRL or VFRL algorithms borrow ideas from FL for federation between RL agents. Meanwhile, there are also some existing works on FRL that are less affected by FL. Hence, they do not belong to either HFRL or VFRL, but federation between agents is also implemented.\u003C\u002Fp\u003E\u003Cp class=\"\"\u003EThe study from Hu \u003Ci\u003Eet al.\u003C\u002Fi\u003E\u003Csup\u003E[\u003Ca href=\"#B86\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B86\"\u003E86\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E is a typical example, which proposes a reward shaping based general FRL algorithm, called federated reward shaping (FRS). It uses reward shaping to share federated information to improve policy quality and training speed. FRS adopts the server-client architecture. The server includes the federated model, while each client completes its own tasks based on the local model. This algorithm can be combined with different kinds of RL algorithms. However, it should be noted that FRS focuses on reward shaping, this algorithm cannot be used when there is no reward in some agents in VFRL. In addition, FRS performs knowledge aggregation by sharing high-level information such as reward shaping value or embedding between client and server instead of sharing experience or policy directly. The convergence of FRS is also guaranteed since only minor changes are made during the learning process, which is the modification of the reward in the replay buffer.\u003C\u002Fp\u003E\u003Cp class=\"\"\u003EAs another example, Anwar \u003Ci\u003Eet al.\u003C\u002Fi\u003E\u003Csup\u003E[\u003Ca href=\"#B87\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B87\"\u003E87\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E achieves federation between agents by smoothing the average weight. This work analyzes the multi-task FRL algorithms (MT-FedRL) with adversaries. Agents only interact and make observations in their environment, which can be featured by different MDPs. Different from HFRL, the state and action spaces do not need to be the same in these environments. The goal of MT-FedRL is to learn a unified policy, which is jointly optimized across all of the environments. MT-FedRL adopts policy gradient methods for RL. In other words, policy parameter is needed to learn the optimal policy. The server-client architecture is also applied and all agents should share their own information with a centralized server. The role of non-negative smoothing average weights is to achieve a consensus among the agents’ parameters. As a result, they can help to incorporate the knowledge from other agents as the process of federation.\u003C\u002Fp\u003E\u003C\u002Fdiv\u003E\u003C\u002Fdiv\u003E\u003Cdiv id=\"sec1-5\" class=\"article-Section\"\u003E\u003Ch2 \u003E5. Applications of FRL\u003C\u002Fh2\u003E\u003Cp class=\"\"\u003EIn this section, we provide an extensive discussion of the application of FRL in a variety of tasks, such as edge computing, communications, control optimization, attack detection, \u003Ci\u003Eetc.\u003C\u002Fi\u003E This section is aimed at enabling readers to understand the applicable scenarios and research status of FRL.\u003C\u002Fp\u003E\u003Cdiv id=\"sec2-11\" class=\"article-Section\"\u003E\u003Ch3 \u003E5.1. FRL for edge computing\u003C\u002Fh3\u003E\u003Cp class=\"\"\u003EIn recent years, edge equipment, such as BSs and road side units (RSUs), has been equipped with increasingly advanced communication, computing and storage capabilities. As a result, edge computing is proposed to delegating more tasks to edge equipment in order to reduce the communication load and reduce the corresponding delay. However, the issue of privacy protection remains challenging since it may be untrustworthy for the data owner to hand off their private information to a third-party edge server\u003Csup\u003E[\u003Ca href=\"#B4\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B4\"\u003E4\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E. FRL offers a potential solution for achieving privacy-protected intelligent edge computing, especially in decision-making tasks like caching and offloading. Additionally, the multi-layer processing architecture of edge computing is also suitable for implementing FRL through the server-client model. Therefore, many researchers have focused on applying FRL to edge computing.\u003C\u002Fp\u003E\u003Cp class=\"\"\u003EThe distributed data of large-scale edge computing architecture makes it possible for FRL to provide distributed intelligent solutions to achieve resource optimization at the edge. For mobile edge networks, a potential FRL framework is presented for edge system\u003Csup\u003E[\u003Ca href=\"#B88\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B88\"\u003E88\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E, named as “In-Edge AI”, to address optimization of mobile edge computing, caching, and communication problems. The authors also propose some ideas and paradigms for solving these problems by using DRL and Distributed DRL. To carry out dynamic system-level optimization and reduce the unnecessary transmission load, “In-Edge AI” framework takes advantage of the collaboration among edge nodes to exchange learning parameters for better training and inference of models. It has been evaluated that the framework has high performance and relatively low learning overhead, while the mobile communication system is cognitive and adaptive to the environment. The paper provides good prospects for the application of FRL to edge computing, but there are still many challenges to overcome, including the adaptive improvement of the algorithm, and the training time of the model from scratch \u003Ci\u003Eetc.\u003C\u002Fi\u003E\u003C\u002Fp\u003E\u003Cp class=\"\"\u003EEdge caching has been considered a promising technique for edge computing to meet the growing demands for next-generation mobile networks and beyond. Addressing the adaptability and collaboration challenges of the dynamic network environment, Wang \u003Ci\u003Eet al.\u003C\u002Fi\u003E\u003Csup\u003E[\u003Ca href=\"#B89\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B89\"\u003E89\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E proposes a device-to-device (D2D)-assisted heterogeneous collaborative edge caching framework. User equipment (UE) in a mobile network uses the local DQN model to make node selection and cache replacement decisions based on network status and historical information. In other words, UE decides where to fetch content and which content should be replaced in its cache list. The BS calculates aggregation weights based on the training evaluation indicators from UE. To solve the long-term mixed-integer linear programming problem, the attention-weighted federated deep reinforcement learning (AWFDRL) is presented, which optimizes the aggregation weights to avoid the imbalance of the local model quality and improve the learning efficiency of the DQN. The convergence of the proposed algorithm is verified and simulation results show that the AWFDRL framework can perform well on average delay, hit rate, and offload traffic.\u003C\u002Fp\u003E\u003Cp class=\"\"\u003EA federated solution for cooperative edge caching management in fog radio access networks (F-RANs) is proposed \u003Csup\u003E[\u003Ca href=\"#B90\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B90\"\u003E90\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E. Both edge computing and fog computing involve bringing intelligence and processing to the origins of data. The key difference between the two architectures is where the computing node is positioned. A dueling deep Q-network based cooperative edge caching method is proposed to overcome the dimensionality curse of RL problem and improve caching performance. Agents are developed in fog access points (F-APs) and allowed to build a local caching model for optimal caching decisions based on the user content request and the popularity of content. HFRL is applied to aggregate the local models into a global model in the cloud server. The proposed method outperforms three classical content caching methods and two RL algorithms in terms of reducing content request delays and increasing cache hit rates.\u003C\u002Fp\u003E\u003Cp class=\"\"\u003EFor edge-enabled IoT, Majidi \u003Ci\u003Eet al.\u003C\u002Fi\u003E\u003Csup\u003E[\u003Ca href=\"#B91\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B91\"\u003E91\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E proposes a dynamic cooperative caching method based on hierarchical federated deep reinforcement learning (HFDRL), which is used to determine which content should be cached or evicted by predicting future user requests. Edge devices that have a strong relationship are grouped into a cluster and one head is selected for this cluster. The BS trains the Q-value based local model by using BS states, content states, and request states. The head has enough processing and caching capabilities to deal with model aggregation in the cluster. By categorizing edge devices hierarchically, HFDRL improves the response time delay to keeps both small and large clusters from experiencing the disadvantages they could encounter. Storage partitioning allows content to be stored in clusters at different levels using the storage space of each device. The simulation results show the proposed method using MovieLens datasets improves the average content access delay and the hit rate.\u003C\u002Fp\u003E\u003Cp class=\"\"\u003EConsidering the low latency requirements and privacy protection issue of IoV, the study of efficient and secure caching methods has attracted many researchers. An FRL-empowered task caching problem with IoV has been analyzed by Zhao \u003Ci\u003Eet al.\u003C\u002Fi\u003E\u003Csup\u003E[\u003Ca href=\"#B92\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B92\"\u003E92\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E. The work proposes a novel cooperative caching algorithm (CoCaRL) for vehicular networks with multi-level FRL to dynamically determine which contents should be replaced and where the content requests should be served. This paper develops a two-level aggregation mechanism for federated learning to speed up the convergence rate and reduces communication overhead, while DRL task is employed to optimize the cooperative caching policy between RSUs of vehicular networks. Simulation results show that the proposed algorithm can achieve a high hit rate, good adaptability and fast convergence in a complex environment.\u003C\u002Fp\u003E\u003Cp class=\"\"\u003EApart from caching services, FRL has demonstrated its strong ability to facilitate resource allocation in edge computing. In the study from Zhu \u003Ci\u003Eet al.\u003C\u002Fi\u003E\u003Csup\u003E[\u003Ca href=\"#B93\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B93\"\u003E93\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E, the authors specifically focus on the data offloading task for mobile edge computing (MEC) systems. To achieve joint collaboration, the heterogeneous multi-agent actor-critic (H-MAAC) framework is proposed, in which edge devices independently learn the interactive strategies through their own observations. The problem is formulated as a multi-agent MDP for modeling edge devices’ data allocation strategies, \u003Ci\u003Ei.e.\u003C\u002Fi\u003E, moving the data, locally executing or offloading to a cloud server. The corresponding joint cooperation algorithm that combines the edge federated model with the multi-agent actor-critic RL is also presented. Dual lightweight neural networks are built, comprising original actor\u002Fcritic networks and target actor\u002Fcritic networks.\u003C\u002Fp\u003E\u003Cp class=\"\"\u003EBlockchain technology has also attracted lot attention from researchers in edge computing fields since it is able to provide reliable data management within the massive distributed edge nodes. In the study from Yu \u003Ci\u003Eet al.\u003C\u002Fi\u003E\u003Csup\u003E[\u003Ca href=\"#B94\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B94\"\u003E94\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E, the intelligent ultra-dense edge computing (I-UDEC) framework is proposed, integrating with blockchain and RL technologies into 5G ultra-dense edge computing networks. In order to achieve low overhead computation offloading decisions and resource allocation strategies, authors design a two-timescale deep reinforcement learning (2TS-DRL) approach, which consists of a fast-timescale and a slow-timescale learning process. The target model can be trained in a distributed manner via FL architecture, protecting the privacy of edge devices.\u003C\u002Fp\u003E\u003Cp class=\"\"\u003EAdditionally, to deal with the different types of optimization tasks, variants of FRL are being studied. Zhu \u003Ci\u003Eet al.\u003C\u002Fi\u003E\u003Csup\u003E[\u003Ca href=\"#B95\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B95\"\u003E95\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E presents a resource allocation method for edge computing systems, called concurrent federated reinforcement learning (CFRL). The edge node continuously receives tasks from serviced IoT devices and stores those tasks in a queue. Depending on its own resource allocation status, the node determines the scheduling strategy so that all tasks are completed as soon as possible. In case the edge host does not have enough available resources for the task, the task can be offloaded to the server. Contrary to the definition of the central server in the basic FRL, the aim of central server in CFRL is to complete the tasks that the edge nodes cannot handle instead of aggregating local models. Therefore, the server needs to train a special resource allocation model based on its own resource status, forwarded tasks and unique rewards. The main idea of CFRL is that edge nodes and the server cooperatively participate in all task processing in order to reduce total computing time and provide a degree of privacy protection.\u003C\u002Fp\u003E\u003C\u002Fdiv\u003E\u003Cdiv id=\"sec2-12\" class=\"article-Section\"\u003E\u003Ch3 \u003E5.2. FRL for communication networks\u003C\u002Fh3\u003E\u003Cp class=\"\"\u003EIn parallel with the continuous evolution of communication technology, a number of heterogeneous communication systems are also being developed to adapt to different scenarios. Many researchers are also working toward intelligent management of communication systems. The traditional ML-based management methods are often inefficient due to their centralized data processing architecture and the risk of privacy leakage\u003Csup\u003E[\u003Ca href=\"#B5\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B5\"\u003E5\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E. FRL can play an important role in services slicing and access controlling to replace centralized ML methods.\u003C\u002Fp\u003E\u003Cp class=\"\"\u003EIn communication network services, network function virtualization (NFV) is a critical component of achieving scalability and flexibility. Huang \u003Ci\u003Eet al.\u003C\u002Fi\u003E\u003Csup\u003E[\u003Ca href=\"#B96\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B96\"\u003E96\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E proposes a novel scalable service function chains orchestration (SSCO) scheme for NFV-enabled networks via FRL. In the work, a federated-learning-based framework for training global learning, along with a time-variant local model exploration, is designed for scalable SFC orchestration. It prevents data sharing among stakeholders and enables quick convergence of the global model. To reduce communication costs, SSCO allows the parameters of local models to be updated just at the beginning and end of each episode through distributed clients and the cloud server. A DRL approach is used to map virtual network functions (VNFs) into networks with local knowledge of resources and instantiation cost. In addition, the authors also propose a loss-weight-based mechanism for generation and exploitation of reference samples for training in replay buffers, avoiding the strong relevance of each sample. Simulation results demonstrate that SSCO can significantly reduce placement errors and improve resource utilization ratios to place time-variant VNFs, as well as achieving desirable scalability.\u003C\u002Fp\u003E\u003Cp class=\"\"\u003ENetwork slicing (NS) is also a form of virtual network architecture to support divergent requirements sustainably. The work from Liu \u003Ci\u003Eet al.\u003C\u002Fi\u003E\u003Csup\u003E[\u003Ca href=\"#B97\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B97\"\u003E97\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E proposes a device association scheme (such as access control and handover management) for radio access network (RAN) slicing by exploiting a hybrid federated deep reinforcement learning (HDRL) framework. In view of the large state-action space and variety of services, HDRL is designed with two layers of model aggregations. Horizontal aggregation deployed on BSs is used for the same type of service. Generally, data samples collected by different devices within the same service have similar features. The discrete-action DRL algorithm, \u003Ci\u003Ei.e.\u003C\u002Fi\u003E, DDQN, is employed to train the local model on individual smart devices. BS is able to aggregate model parameters and establish a cooperative global model. Vertical aggregation developed on the third encrypted party is responsible for the services of different types. In order to promote collaboration between devices with different tasks, authors aggregate local access features to form a global access feature, in which the data from different flows is strongly correlated since different data flows are competing for radio resources with each other. Furthermore, the Shapley value\u003Csup\u003E[\u003Ca href=\"#B98\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B98\"\u003E98\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E, which represents the average marginal contribution of a specific feature across all possible feature combinations, is used to reduce communication cost in vertical aggregation based on the global access feature. Simulation results show that HDRL can improve network throughput and communication efficiency.\u003C\u002Fp\u003E\u003Cp class=\"\"\u003EThe open radio access network (O-RAN) has emerged as a paradigm for supporting multi-class wireless services in 5G and beyond networks. To deal with the two critical issues of load balance and handover control, Cao \u003Ci\u003Eet al.\u003C\u002Fi\u003E\u003Csup\u003E[\u003Ca href=\"#B99\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B99\"\u003E99\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E proposes a federated DRL-based scheme to train the model for user access control in the O-RAN. Due to the mobility of UEs and the high cost of the handover between BSs, it is necessary for each UE to access the appropriate BS to optimize its throughput performance. As independent agents, UEs make access decisions with assistance from a global model server, which updates global DQN parameters by averaging DQN parameters of selected UEs. Further, the scheme proposes only partially exchanging DQN parameters to reduce communication overheads, and using the dueling structure to allow convergence for independent agents. Simulation results demonstrate that the scheme increases long-term throughput while avoiding frequent handovers of users with limited signaling overheads.\u003C\u002Fp\u003E\u003Cp class=\"\"\u003EThe issue of optimizing user access is important in wireless communication systems. FRL can provide interesting solutions for enabling efficient and privacy-enhanced management of access control. Zhang \u003Ci\u003Eet al.\u003C\u002Fi\u003E\u003Csup\u003E[\u003Ca href=\"#B100\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B100\"\u003E100\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E studies the problem of multi-user access in WIFI networks. In order to mitigate collision events on channel access, an enhanced multiple access mechanism based on FRL is proposed for user-dense scenarios. In particular, distributed stations train their local q-learning networks through channel state, access history and feedback from central access point (AP). AP uses the central aggregation algorithm to update the global model every period of time and broadcast it to all stations. In addition, a monte carlo (MC) reward estimation method for the training phase of local model is introduced, which allocates more weight to the reward of that current state by reducing the previous cumulative reward.\u003C\u002Fp\u003E\u003Cp class=\"\"\u003EFRL is also studied for intelligent cyber-physical systems (ICPS), which aims to meet the requirements of intelligent applications for high-precision, low-latency analysis of big data. In light of the heterogeneity brought by multiple agents, the central RL-based resource allocation scheme has non-stationary issues and does not consider privacy issues. Therefore, the work from Xu \u003Ci\u003Eet al.\u003C\u002Fi\u003E\u003Csup\u003E[\u003Ca href=\"#B101\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B101\"\u003E101\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E proposes a multi-agent FRL (MA-FRL) mechanism which synthesizes a good inferential global policy from encrypted local policies of agents without revealing private information. The data resource allocation and secure communication problems are formulated as a Stackelberg game with multiple participants, including near devices (NDs), far devices (FDs) and relay devices (RDs). Take into account the limited scope of the heterogeneous devices, the authors model this multi-agent system as a POMDP. Furthermore, it is proved that MA-FRL is \u003Ci\u003Eµ\u003C\u002Fi\u003E-strongly convex and \u003Ci\u003Eβ\u003C\u002Fi\u003E-smooth and derives its convergence speed in expectation.\u003C\u002Fp\u003E\u003Cp class=\"\"\u003EZhang \u003Ci\u003Eet al.\u003C\u002Fi\u003E\u003Csup\u003E[\u003Ca href=\"#B102\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B102\"\u003E102\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E pays attention to the challenges in cellular vehicle-to-everything (V2X) communication for future vehicular applications. A joint optimization problem of selecting the transmission mode and allocating the resources is presented. This paper proposes a decentralized DRL algorithm for maximizing the amount of available vehicle-to-infrastructure capacity while meeting the latency and reliability requirements of vehicle-to-vehicle (V2V) pairs. Considering limited local training data at vehicles, the federated learning algorithm is conducted on a small timescale. On the other hand, the graph theory-based vehicle clustering algorithm is conducted on a large timescale.\u003C\u002Fp\u003E\u003Cp class=\"\"\u003EThe development of communication technologies in extreme environments is important, including deep underwater exploration. The architecture and philosophy of FRL are applied to smart ocean applications in the study of Kwon\u003Csup\u003E[\u003Ca href=\"#B103\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B103\"\u003E103\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E. To deal with the nonstationary environment and unreliable channels of underwater wireless networks, the authors propose a multi-agent DRL-based algorithm that can realize FL computation with internet-of-underwater-things (IoUT) devices in the ocean environment. The cooperative model is trained by MADDPG for cell association and resource allocation problems. As for downlink throughput, it is found that the proposed MADDPG-based algorithm performed 80% and 41% better than the standard actor-critic and DDPG algorithms, respectively.\u003C\u002Fp\u003E\u003C\u002Fdiv\u003E\u003Cdiv id=\"sec2-13\" class=\"article-Section\"\u003E\u003Ch3 \u003E5.3. FRL for control optimization\u003C\u002Fh3\u003E\u003Cp class=\"\"\u003EReinforcement learning based control schemes are considered as one of the most effective ways to learn a nonlinear control strategy in complex scenarios, such as robotics. Individual agent’s exploration of the environment is limited by its own field of vision and usually needs a great deal of training to obtain the optimal strategy. The FRL-based approach has emerged as an appealing way to realize control optimization without exposing agent data or compromising privacy.\u003C\u002Fp\u003E\u003Cp class=\"\"\u003EAutomated control of robots is a typical example of control optimization problems. Liu \u003Ci\u003Eet al.\u003C\u002Fi\u003E\u003Csup\u003E[\u003Ca href=\"#B57\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B57\"\u003E57\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E discusses robot navigation scenarios and focuses on how to make robots transfer their experience so that they can make use of prior knowledge and quickly adapt to changing environments. As a solution, a cooperative learning architecture, called LFRL, is proposed for navigation in cloud robotic systems. Under the FRL-based architecture, the authors propose a corresponding knowledge fusion algorithm to upgrade the shared model deployed on the cloud. In addition, the paper also discusses the problems and feasibility of applying transfer learning algorithms to different tasks and network structures between the shared model and the local model.\u003C\u002Fp\u003E\u003Cp class=\"\"\u003EFRL is combined with autonomous driving of robotic vehicles in the study of Liang \u003Ci\u003Eet al.\u003C\u002Fi\u003E\u003Csup\u003E[\u003Ca href=\"#B104\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B104\"\u003E104\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E. To reach rapid training from a simulation environment to a real-world environment, Liang \u003Ci\u003Eet al.\u003C\u002Fi\u003E\u003Csup\u003E[\u003Ca href=\"#B104\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B104\"\u003E104\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E presents a federated transfer reinforcement learning (FTRL) framework for knowledge extraction where all the vehicles make corresponding actions with the knowledge learned by others. The framework can potentially be used to train more powerful tasks by pooling the resources of multiple entities without revealing raw data information in real-life scenarios. To evaluate the feasibility of the proposed framework, authors perform real-life experiments on steering control tasks for collision avoidance of autonomous driving robotic cars and it is demonstrated that the framework has superior performance to the non-federated local training process. Note that the framework can be considered an extension of HFRL, because the target tasks to be accomplished are highly-relative and all observation data are pre-aligned.\u003C\u002Fp\u003E\u003Cp class=\"\"\u003EFRL also appears as an attractive approach for enabling intelligent control of IoT devices without revealing private information. Lim\u003Ci\u003Eet al.\u003C\u002Fi\u003E\u003Csup\u003E[\u003Ca href=\"#B105\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B105\"\u003E105\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E proposes a FRL architecture which allows agents working on independent IoT devices to share their learning experiences with each other, and transfer the policy model parameters to other agents. The aim is to effectively control multiple IoT devices of the same type but with slightly different dynamics. Whenever an agent meets the predefined criteria, its mature model will be shared by the server with all other agents in training. The agents continue training based on the shared model until the local model converges in the respective environment. The actor-critical proximal policy optimization (Actor-Critic PPO) algorithm is integrated into the control of multiple rotary inverted pendulum (RIP) devices. The results show that the proposed architecture facilitates the learning process and if more agents participate the learning speed can be improved. In addition, Lim \u003Ci\u003Eet al.\u003C\u002Fi\u003E\u003Csup\u003E[\u003Ca href=\"#B106\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B106\"\u003E106\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E uses FRL architecture based on a multi-agent environment to solve the problems and limitations of RL for applications to the real-world problems. The proposed federation policy allows multiple agents to share their learning experiences to get better learning efficacy. The proposed scheme adopts Actor-Critic PPO algorithm for four types of RL simulation environments from OpenAI Gym as well as RIP in real control systems. Compared to a previous real-environment study, the scheme enhances learning performance by approximately 1.2 times.\u003C\u002Fp\u003E\u003C\u002Fdiv\u003E\u003Cdiv id=\"sec2-14\" class=\"article-Section\"\u003E\u003Ch3 \u003E5.4. FRL for attack detection\u003C\u002Fh3\u003E\u003Cp class=\"\"\u003EWith the heterogeneity of services and the sophistication of threats, it is challenging to detect these attacks using traditional methods or centralized ML-based methods, which have a high false alarm rate and do not take privacy into account. FRL offers a powerful alternative to detecting attacks and provides support for network defense in different scenarios.\u003C\u002Fp\u003E\u003Cp class=\"\"\u003EBecause of various constraints, IoT applications have become a primary target for malicious adversaries that can disrupt normal operations or steal confidential information. In order to address the security issues in flying ad-hoc network (FANET), Mowla \u003Ci\u003Eet al.\u003C\u002Fi\u003E\u003Csup\u003E[\u003Ca href=\"#B107\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B107\"\u003E107\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E proposes an adaptive FRL-based jamming attack defense strategy for unmanned aerial vehicles (UAVs). A model-free Q-learning mechanism is developed and deployed on distributed UAVs to cooperatively learn detection models for jamming attacks. According to the results, the average accuracy of the federated jamming detection mechanism, employed in the proposed defense strategy, is 39.9% higher than the distributed mechanism when verified with the CRAWDAD standard and the ns-3 simulated FANET jamming attack dataset.\u003C\u002Fp\u003E\u003Cp class=\"\"\u003EAn efficient traffic monitoring framework, known as DeepMonitor, is presented in the study of Nguyen \u003Ci\u003Eet al.\u003C\u002Fi\u003E\u003Csup\u003E[\u003Ca href=\"#B108\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B108\"\u003E108\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E to provide fine-grained traffic analysis capability at the edge of software defined network (SDN) based IoT networks. The agents deployed in edge nodes consider the different granularity-level requirements and their maximum flow-table capacity to achieve the optimal flow rule match-field strategy. The control optimization problem is formulated as the MDP and a federated DDQN algorithm is developed to improve the learning performance of agents. The results show that the proposed monitoring framework can produce reliable traffic granularity at all levels of traffic granularity and substantially mitigate the issue of flow-table overflows. In addition, the distributed denial of service (DDoS) attack detection performance of an intrusion detection system can be enhanced by up to 22.83% by using DeepMonitor instead of FlowStat.\u003C\u002Fp\u003E\u003Cp class=\"\"\u003EIn order to reduce manufacturing costs and improve production efficiency, the industrial internet of things (IIoT) is proposed as a potentially promising research direction. It is a challenge to implement anomaly detection mechanisms in IIoT applications with data privacy protection. Wang \u003Ci\u003Eet al.\u003C\u002Fi\u003E\u003Csup\u003E[\u003Ca href=\"#B109\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B109\"\u003E109\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E proposes a reliable anomaly detection strategy for IIoT using FRL techniques. In the system framework, there are four entities involved in establishing the detection model, \u003Ci\u003Ei.e.\u003C\u002Fi\u003E, the Global Anomaly Detection Center (GADC), the Local Anomaly Detection Center (LADC), the Regional Anomaly Detection Center (RADC), and the users. The anomaly detection is suggested to be implemented in two phases, including anomaly detection for RADC and users. Especially, the GADC can build global RADC anomaly detection models based on local models trained by LADCs. Different from RADC anomaly detection based on action deviations, user anomaly detection is mainly concerned with privacy leakage and is employed by RADC and GADC. Note that the DDPG algorithm is applied for local anomaly detection model training.\u003C\u002Fp\u003E\u003C\u002Fdiv\u003E\u003Cdiv id=\"sec2-15\" class=\"article-Section\"\u003E\u003Ch3 \u003E5.5. FRL for other applications\u003C\u002Fh3\u003E\u003Cp class=\"\"\u003EDue to the outstanding performance of training efficiency and privacy protection, many researchers are exploring the possible applications of FRL.\u003C\u002Fp\u003E\u003Cp class=\"\"\u003EFL has been applied to realize distributed energy management in IoT applications. In the revolution of smart home, smart meters are deployed in the advanced metering infrastructure (AMI) to monitor and analyze the energy consumption of users in real-time. As an example\u003Csup\u003E[\u003Ca href=\"#B110\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B110\"\u003E110\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E, the FRL-based approach is proposed for the energy management of multiple smart homes with solar PVs, home appliances, and energy storage. Multiple local home energy management systems (LHEMSs) and a global server (GS) make up FRL architecture of the smart home. DRL agents for LHEMSs construct and upload local models to the GS by using energy consumption data. The GS updates the global model based on local models of LHEMSs using the federated stochastic gradient descent (FedSGD) algorithm. Under heterogeneous home environments, simulation results indicate that the proposed approach outperforms others when it comes to convergence speed, appliance energy consumption, and the number of agents.\u003C\u002Fp\u003E\u003Cp class=\"\"\u003EMoreover, FRL offers an alternative to share information with low latency and privacy preservation. The collaborative perception of vehicles provided by IoV can greatly enhance the ability to sense things beyond their line of sight, which is important for autonomous driving. Region quadtrees have been proposed as a storage and communication resource-saving solution for sharing perception information\u003Csup\u003E[\u003Ca href=\"#B111\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B111\"\u003E111\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E. It is challenging to tailor the number and resolution of transmitted quadtree blocks to bandwidth availability. In the framework of FRL, Mohamed \u003Ci\u003Eet al.\u003C\u002Fi\u003E\u003Csup\u003E[\u003Ca href=\"#B112\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B112\"\u003E112\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E presents a quadtree-based point cloud compression mechanism to select cooperative perception messages. Specifically, over a period of time, each vehicle covered by an RSU transfers its latest network weights with the RSU, which then averages all of the received model parameters and broadcasts the result back to the vehicles. Optimal sensory information transmission (\u003Ci\u003Ei.e.\u003C\u002Fi\u003E, quadtree blocks) and appropriate resolution levels for a given vehicle pair are the main objectives of a vehicle. The dueling and branching concepts are also applied to overcome the vast action space inherent in the formulation of the RL problem. Simulation results show that the learned policies achieve higher vehicular satisfaction and the training process is enhanced by FRL.\u003C\u002Fp\u003E\u003C\u002Fdiv\u003E\u003Cdiv id=\"sec2-16\" class=\"article-Section\"\u003E\u003Ch3 \u003E5.6. Lessons Learned\u003C\u002Fh3\u003E\u003Cp class=\"\"\u003EIn the following, we summarize the major lessons learned from this survey in order to provide a comprehensive understanding of current research on FRL applications.\u003C\u002Fp\u003E\u003Cdiv id=\"sec3-4\" class=\"article-Section\"\u003E\u003Ch4 \u003E5.6.1. Lessons learned from the aggregation algorithms\u003C\u002Fh4\u003E\u003Cp class=\"\"\u003EThe existing FRL literature usually uses classical DRL algorithms, such as DQN and DDPG, at the participants, while the gradients or parameters of the critic and\u002For actor networks are periodically reported synchronously or asynchronously by the participants to the coordinator. The coordinator then aggregates the parameters or gradients and sends the updated values to the participants. In order to meet the challenges presented by different scenarios, the aggregation algorithms have been designed as a key feature of FRL. In the original FedAvg algorithm\u003Csup\u003E[\u003Ca href=\"#B12\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B12\"\u003E12\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E, the number of samples in a participant’s dataset determines its influence on the global model. In accordance with this idea, several papers propose different methods to calculate the weights in the aggregation algorithms according to the requirement of application. In the study from Lim \u003Ci\u003Eet al.\u003C\u002Fi\u003E\u003Csup\u003E[\u003Ca href=\"#B106\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B106\"\u003E106\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E, the aggregation weight is derived from the average of the cumulative rewards of the last ten episodes. Greater weights are placed on the models of those participants with higher rewards. In contrast to the positive correlation of reward, Huang \u003Ci\u003Eet al.\u003C\u002Fi\u003E\u003Csup\u003E[\u003Ca href=\"#B96\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B96\"\u003E96\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E takes the error rate of action as an essential factor to assign weights for participating in the global model training. In D2D -assisted edge caching, Wang \u003Ci\u003Eet al.\u003C\u002Fi\u003E\u003Csup\u003E[\u003Ca href=\"#B89\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B89\"\u003E89\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E uses the reward and some device-related indicators as the measurement to evaluate the local model’s contribution to the global model. Moreover, the existing FRL methods based on offline DRL algorithms, such DQN and DDPG, usually use experience replay. Sampling random batch from replay memory can break correlations of continuous transition tuples and accelerate the training process. To arrive at an accurate evaluation of the participants, the paper\u003Csup\u003E[\u003Ca href=\"#B102\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B102\"\u003E102\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E calculates the aggregation weight based on the size of the training batch in each iteration.\u003C\u002Fp\u003E\u003Cp class=\"\"\u003EThe above aggregation methods can effectively deal with the issue of data imbalance and performance discrepancy between participants, but it is hard for participants to cope with subtle environmental differences. According to the paper\u003Csup\u003E[\u003Ca href=\"#B105\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B105\"\u003E105\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E, as soon as a participant reaches the predefined criteria in its own environment, it should stop learning and send its model parameters as a reference to the remaining individuals. Exchanging mature network models (satisfying terminal conditions) can help other participants complete their training quickly. Participants in other similar environments can continue to use FRL for further updating their parameters to achieve the desired model performance according to their individual environments. Liu \u003Ci\u003Eet al.\u003C\u002Fi\u003E\u003Csup\u003E[\u003Ca href=\"#B57\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B57\"\u003E57\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E also suggests that the sharing global model in the cloud is not the final policy model for local participants. An effective transfer learning should be applied to resolve the structural difference between the shared network and private network.\u003C\u002Fp\u003E\u003C\u002Fdiv\u003E\u003Cdiv id=\"sec3-5\" class=\"article-Section\"\u003E\u003Ch4 \u003E5.6.2. Lessons learned from the relationship between FL and RL\u003C\u002Fh4\u003E\u003Cp class=\"\"\u003EIn most of the literature on FRL, FL is used to improve the performance of RL. With FL, the learning experience can be shared among decentralized multiple parties while ensuring privacy and scalability without requiring direct data offloading to servers or third parties. Therefore, FL can expand the scope and enhance the security of RL. Among the applications of FRL, most researchers focus on the communication network system due to its robust security requirements, advanced distributed architecture, and a variety of decision-making tasks. Data offloading\u003Csup\u003E[\u003Ca href=\"#B93\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B93\"\u003E93\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E and caching\u003Csup\u003E[\u003Ca href=\"#B89\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B89\"\u003E89\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E solutions powered by distributed AI are available from FRL. In addition, with the ability to detect a wide range of attacks and support defense solutions, FRL has emerged as a strong alternative for performing distributed learning for security-sensitive scenarios. Enabled by the privacy-enhancing and cooperative features, detection and defense solutions can be learned quickly where multiple participants join to build a federated model \u003Csup\u003E[\u003Ca href=\"#B107\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B107\"\u003E107\u003C\u002Fa\u003E,\u003Ca href=\"#B109\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B109\"\u003E109\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E. FRL can also provide viable solutions to realize intelligence for control systems with many applied domains such as robotics\u003Csup\u003E[\u003Ca href=\"#B57\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B57\"\u003E57\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E and autonomous driving\u003Csup\u003E[\u003Ca href=\"#B104\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B104\"\u003E104\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E without data exchange and privacy leakage. The data owners (robot or vehicle) may not trust the third-party server and therefore hesitate to upload their private information to potentially insecure learning systems. Each participant of FRL runs a separate RL model for determining its own control policy and gains experience by sharing model parameters, gradients or losses.\u003C\u002Fp\u003E\u003Cp class=\"\"\u003EMeanwhile, RL may have the potential to optimize FL schemes and improve the efficiency of training. Due to the unstable network connectivity, it is not practical for FL to update and aggregate models simultaneously across all participants. Therefore, Wang \u003Ci\u003Eet al.\u003C\u002Fi\u003E\u003Csup\u003E[\u003Ca href=\"#B113\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B113\"\u003E113\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E proposes a RL-based control framework that intelligently chooses the participants to participate in each round of FL with the aim to speed up convergence. Similarly, Zhang \u003Ci\u003Eet al.\u003C\u002Fi\u003E\u003Csup\u003E[\u003Ca href=\"#B114\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B114\"\u003E114\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E applies RL to pre-select a set of candidate edge participants, and then determine reliable edge participants through social attribute perception. In IoT or IoV scenarios, due to the heterogeneous nature of participating devices, different computing and communication resources are available to them. RL can speed up training by coordinating the allocation of resources between participants. Zhan \u003Ci\u003Eet al.\u003C\u002Fi\u003E\u003Csup\u003E[\u003Ca href=\"#B115\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B115\"\u003E115\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E defines the L4L (Learning for Learning) concept, \u003Ci\u003Ei.e.\u003C\u002Fi\u003E, use RL to improve FL. Using the heterogeneity of participants and dynamic network connections, this paper investigates a computational resource control problem for FL that simultaneously considers learning time and energy efficiency. An experience-driven resource control approach based on RL is presented to derive the near-optimal strategy with only the participants’ bandwidth information in the previous training rounds. In addition, as with any other ML algorithm, FL algorithms are vulnerable to malicious attacks. RL has been studied to defend against attacks in various scenarios, and it can also enhance the security of FL. The paper\u003Csup\u003E[\u003Ca href=\"#B116\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B116\"\u003E116\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E proposes a reputation-aware RL (RA-RL) based selection method to ensure that FL is not disrupted. The participating devices’ attributes, including computing resources and trust values, \u003Ci\u003Eetc\u003C\u002Fi\u003E, are used as part of the environment in RL. In the aggregation of the global model, devices with high reputation levels will have a greater chance of being considered to reduce the effects of malicious devices mixed into FL.\u003C\u002Fp\u003E\u003C\u002Fdiv\u003E\u003Cdiv id=\"sec3-6\" class=\"article-Section\"\u003E\u003Ch4 \u003E5.6.3. Lessons learned from categories of FRL\u003C\u002Fh4\u003E\u003Cp class=\"\"\u003EAs discussed above, FRL can be divided into two main categories, \u003Ci\u003Ei.e.\u003C\u002Fi\u003E, HFRL and VFRL. Currently, most of the existing research is focused on HFRL, while little attention is devoted to VFRL. The reason for this is that HFRL has obvious application scenarios, where multiple participants have similar decision-making tasks with individual environments, such as caching allocation\u003Csup\u003E[\u003Ca href=\"#B59\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B59\"\u003E59\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E, offloading optimization\u003Csup\u003E[\u003Ca href=\"#B58\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B58\"\u003E58\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E, and attack monitoring\u003Csup\u003E[\u003Ca href=\"#B108\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B108\"\u003E108\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E. The participants and coordinator only need to train a similar model with the same state and action spaces. Consequently, the algorithm design can be implemented and the training convergence can be verified relatively easily. On the other hand, even though VFRL has a higher degree of technical difficulty at the algorithm design level, it also has a wide range of possible applications. In a multi-agent scenario, for example, a single agent is limited by its ability to observe only part of the environment, whereas the transition of the environment is determined by the behavior of all the agents. Zhuo \u003Ci\u003Eet al.\u003C\u002Fi\u003E\u003Csup\u003E[\u003Ca href=\"#B65\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B65\"\u003E65\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E assumes agents cannot share their partial observations of the environment and some agents are unable to receive rewards. The federated Q-network aggregation algorithm between two agents is proposed for VFRL. The paper\u003Csup\u003E[\u003Ca href=\"#B97\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B97\"\u003E97\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E specifically applies both HFRL and VFRL for radio access network slicing. For the same type of services, similar data samples are trained locally at participating devices, and BSs perform horizontal aggregation to integrate a cooperative access model by adopting an iterative approach. The terminal device also can optimize the selection of base stations and network slices based on the global model of VFRL, which aggregates access features generated by different types of services on the third encrypted party. The method improves the device’s ability to select the appropriate access points when initiating different types of service requests under restrictions regarding privacy protection. The feasible implementation of VFRL also provides guidance for future research.\u003C\u002Fp\u003E\u003C\u002Fdiv\u003E\u003C\u002Fdiv\u003E\u003C\u002Fdiv\u003E\u003Cdiv id=\"sec1-6\" class=\"article-Section\"\u003E\u003Ch2 \u003E6. Open issues and future research directions\u003C\u002Fh2\u003E\u003Cp class=\"\"\u003EAs we presented in the previous section, FRL serves an increasingly important role as an enabler of various applications. While the FRL-based approach possesses many advantages, there are a number of critical open issues to consider for future implementation. Therefore, this section focuses on several key challenges, including those inherited from FL such as security and communication issues, as well as those unique to FRL. Research on tackling these issues offers interesting directions for the future.\u003C\u002Fp\u003E\u003Cdiv id=\"sec2-17\" class=\"article-Section\"\u003E\u003Ch3 \u003E6.1. Learning convergence in HFRL\u003C\u002Fh3\u003E\u003Cp class=\"\"\u003EIn realistic HFRL scenarios, while the agents perform similar tasks, the inherent dynamics for the different environments in which the agents reside are usually not exactly identically distributed. The slight difference in the stochastic properties of the transition models for multiple agents could cause the learning convergence issue. One possible method to address this problem is by adjusting the frequency of global aggregation, \u003Ci\u003Ei.e.\u003C\u002Fi\u003E, after each global aggregation, a period of time is left for each agent to fine-tune its local parameters according to its own environment. Apart from the non-identical environment problem, another interesting and important problem is how to leverage FL to make RL algorithms converge better and faster. It is well-known that DRL algorithms could be unstable and diverge, especially when off-policy training is combined with function approximation and bootstrapping. In FRL, the training curves of some agents could diverge while others converge although the agents are trained in the exact replicas of the same environment. By leveraging FL, it is envisioned that we could expedite the training process as well as increase the stability. For example, we could selectively aggregate the parameters of a subset of agents with a larger potential for convergence, and later transfer the converged parameters to all the agents. To tackle the above problems, several possible solutions proposed for FL algorithm contains certain reference significance. For example, server operators could account for heterogeneity inherent in partial information by adding a proximal term\u003Csup\u003E[\u003Ca href=\"#B117\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B117\"\u003E117\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E. The local updates submitted by agents are constrained by the tunable term and have a different effect on the global parameters. In addition, a probabilistic agent selection scheme can be implemented to select the agents whose local FL models have significant effects on the global model to minimize the FL convergence time and the FL training loss\u003Csup\u003E[\u003Ca href=\"#B118\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B118\"\u003E118\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E. Another problem is theoretical analysis of the convergence bounds. Although some existing studies have been directed at this problem\u003Csup\u003E[\u003Ca href=\"#B119\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B119\"\u003E119\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E, the convergence can be guaranteed since the loss function is convex. How to analyze and evaluate the non-convex loss functions in HFRL is also an important research topic in the future.\u003C\u002Fp\u003E\u003C\u002Fdiv\u003E\u003Cdiv id=\"sec2-18\" class=\"article-Section\"\u003E\u003Ch3 \u003E6.2. Agents without rewards in VFRL\u003C\u002Fh3\u003E\u003Cp class=\"\"\u003EIn most existing works, all the RL agents have the ability to take part in full interaction with the environment and can generate their own actions and rewards. Even though some MARL agents may not participate in the policy decision, they still generate their own reward for evaluation. In some scenarios, special agents in VFRL take the role of providing assistance to other agents. They can only observe the environment and pass on the knowledge of their observation, so as to help other agents make more effective decisions. Therefore, such agents do not have their own actions and rewards. The traditional RL models cannot effectively deal with this thorny problem. Many algorithms either directly use the states of such agents as public knowledge in the system model or design corresponding action and reward for such agents, which may be only for convenience of calculation and have no practical significance. These approaches cannot fundamentally overcome the challenge, especially when privacy protection is also an essential objective to be complied with. Although the FedRL algorithm\u003Csup\u003E[\u003Ca href=\"#B65\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B65\"\u003E65\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E is proposed to deal with the above problem, which has demonstrated good performance, there are still some limitations. First of all, the number of agents used in experiments and algorithms is limited to two, which means the scalability of this algorithm is not high and VFRL algorithms for a large number of agents need to be designed. Secondly, this algorithm uses Q-network as the federated model, which is a relatively simple algorithm. Therefore, how to design VFRL models based on other more complex and changeable networks remains an open issue.\u003C\u002Fp\u003E\u003C\u002Fdiv\u003E\u003Cdiv id=\"sec2-19\" class=\"article-Section\"\u003E\u003Ch3 \u003E6.3. Communications\u003C\u002Fh3\u003E\u003Cp class=\"\"\u003EIn FRL, the agents need to exchange the model parameters, gradients, intermediate results, etc., between themselves or with a central server. Due to the limited communication resources and battery capacity, the communication cost is an important consideration when implementing these applications. With an increased number of participants, the coordinator has to bear more network workload within the client-server FRL model\u003Csup\u003E[\u003Ca href=\"#B120\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B120\"\u003E120\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E. This is because each participant needs to upload and download model updates through the coordinator. Although the distributed peer-to-peer model does not require a central coordinator, each agent may have to exchange information with other participants more frequently. In current research for distributed models, there are no effective model exchange protocols to determine when to share experiences with which agents. In addition, DRL involves updating parameters in deep neural networks. Several popular DRL algorithms, such as DQN\u003Csup\u003E[\u003Ca href=\"#B121\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B121\"\u003E121\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E and DDPG\u003Csup\u003E[\u003Ca href=\"#B122\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B122\"\u003E122\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E, consist of multiple layers or multiple networks. Model updates contain millions of parameters, which isn’t feasible for scenarios with limited communication resources. The research directions for the above issues can be divided into three categories. First, it is necessary to design a dynamic update mechanism for participants to optimize the number of model exchanges. A second research direction is to use model compression algorithms to reduce the amount of communication data. Finally, aggregation algorithms that allow participants to only submit the important parts of local model should be studied further.\u003C\u002Fp\u003E\u003C\u002Fdiv\u003E\u003Cdiv id=\"sec2-20\" class=\"article-Section\"\u003E\u003Ch3 \u003E6.4. Privacy and Security\u003C\u002Fh3\u003E\u003Cp class=\"\"\u003EAlthough FL provides privacy protection that allows the agents to exchange information in a secure manner during the learning process, it still has several privacy and security vulnerabilities associated with communication and attack\u003Csup\u003E[\u003Ca href=\"#B123\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B123\"\u003E123\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E. As FRL is implemented based on FL algorithms, these problems also exist in FRL in the same or variant form. It is important to note that the data poisoning attack is a different attack mode between FL and FRL. In the existing classification tasks of FL, each piece of training data in the dataset corresponds to a respective label. The attacker flips the labels on training examples in one category onto another while the features of the examples are kept unchanged, misguiding the establishment of a target model\u003Csup\u003E[\u003Ca href=\"#B124\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B124\"\u003E124\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E. However, in the decision-making task of FRL, the training data is continuously generated from the interaction between the agent and the environment. As a result, the data poisoning attack is implemented in another way. For example, the malicious agent tampers with the reward, which causes the evaluative function to shift. An option is to conduct regular safety assessments for all participants. Participants whose evaluation indicator falls below the threshold are punished to reduce the impact on the global model\u003Csup\u003E[\u003Ca href=\"#B125\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B125\"\u003E125\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E. Apart form the insider attacks which are launched by the agents in the FRL system, there may be various outsider attacks which are launched by intruders or eavesdroppers. Intruders may hide in the environment where the agent is and manipulate the transitions of environment to achieve specific goals. In addition, by listening to the communication between the coordinator and the agent, the eavesdropper may infer sensitive information from exchanging parameters and gradients\u003Csup\u003E[\u003Ca href=\"#B126\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B126\"\u003E126\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E. Therefore, the development of technology that detects and protects against attacks and privacy threats does have great potential and is urgently needed.\u003C\u002Fp\u003E\u003C\u002Fdiv\u003E\u003Cdiv id=\"sec2-21\" class=\"article-Section\"\u003E\u003Ch3 \u003E6.5. Join and exit mechanisms design\u003C\u002Fh3\u003E\u003Cp class=\"\"\u003EOne overlooked aspect of FRL-based research is the join and exit process of participants. In practice, the management of participants is essential to the normal progression of cooperation. As mentioned earlier in the security issue, the penetration of malicious participants severely impacts the performance of the cooperative model and the speed of training. The joining mechanism provides participants with the legal status to engage in federated cooperation. It is the first line of defense against malicious attackers. In contrast, the exit mechanism signifies the cancellation of the permission for cooperation. Participant-driven or enforced exit mechanisms are both possible. In particular, for synchronous algorithms, ignoring the exit mechanism can negatively impact learning efficiency. This is because the coordinator needs to wait for all participants to submit their information. In the event that any participant is offline or compromised and unable to upload, the time for one round of training will be increased indefinitely. To address the bottleneck, a few studies consider updating the global model using the selected models from a subset of participants \u003Csup\u003E[\u003Ca href=\"#B113\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B113\"\u003E113\u003C\u002Fa\u003E,\u003Ca href=\"#B127\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B127\"\u003E127\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E. Unfortunately, there is no comprehensive consideration of the exit mechanism, and the communication of participants is typically assumed to be reliable. Therefore, research gaps of FRL still exist in joining and exiting mechanisms. It is expected that the coordinator or monitoring system, upon discovering a failure, disconnection, or malicious participant, will use the exit mechanism to reduce its impact on the global model or even eliminate it.\u003C\u002Fp\u003E\u003C\u002Fdiv\u003E\u003Cdiv id=\"sec2-22\" class=\"article-Section\"\u003E\u003Ch3 \u003E6.6. Incentive mechanisms\u003C\u002Fh3\u003E\u003Cp class=\"\"\u003EFor most studies, the agents taking part in the FRL process are assumed to be honest and voluntary. Each agent provides assistance for the establishment of the cooperation model following the rules and freely shares the masked experience through encrypted parameters or gradients. An agent’s motivation for participation may come from regulation or incentive mechanisms. The FRL process within an organization is usually governed by regulations. For example, BSs belonging to the same company establish a joint model for offloading and caching. Nevertheless, because participants may be members of different organizations or use disparate equipment, it is difficult for regulation to force all parties to share information learned from their own data in the same manner. If there are no regulatory measures, participants prone to selfish behavior will only benefit from the cooperation model but not submit local updates. Therefore, the cooperation of multiple parties, organizations, or individuals requires a fair and efficient incentive mechanism to encourage their active participation. In this way, agents providing more contributions can benefit more and selfish agents unwilling to share there learning experience will receive less benefit. As an example, Google Keyboard\u003Csup\u003E[\u003Ca href=\"#B128\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B128\"\u003E128\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E users can choose whether or not to allow Google to use their data, but if they do, they can benefit from more accurate word prediction. Although an incentive mechanism in a context-aware manner among data owners is proposed in the study from Yu \u003Ci\u003Eet al.\u003C\u002Fi\u003E\u003Csup\u003E[\u003Ca href=\"#B129\" class=\"Link_style\" data-jats-ref-type=\"bibr\" data-jats-rid=\"B129\"\u003E129\u003C\u002Fa\u003E]\u003C\u002Fsup\u003E, it is not suitable for the RL problems. There is still no clear plan of action regarding how the FRL-based application can be designed to create a reasonable incentive mechanism for inspiring agents to participate in collaborative learning. To be successful, future research needs to propose a quantitative standard for evaluating the contribution of agents in FRL.\u003C\u002Fp\u003E\u003C\u002Fdiv\u003E\u003Cdiv id=\"sec2-23\" class=\"article-Section\"\u003E\u003Ch3 \u003E6.7. Peer-to-peer cooperation\u003C\u002Fh3\u003E\u003Cp class=\"\"\u003EFRL applications have the option of choosing between a central server-client model as well as a distributed peer-to-peer model. A distributed model can not only eliminate the single point of failure, but it can also improve energy efficiency significantly by allowing models to be exchanged directly between two agents. In a typical application, two adjacent cars share experience learned from road condition environment in the form of models with D2D communications to assist autonomous driving. However, the distributed cooperation increases the complexity of the learning system and imposes stricter requirements for application scenarios. This research should include, but not be limited to, the agent selection method for the exchange model, the mechanism for triggering the model exchange, the improvement of algorithm adaptability, and the convergence analysis of the aggregation algorithm.\u003C\u002Fp\u003E\u003C\u002Fdiv\u003E\u003C\u002Fdiv\u003E\u003Cdiv id=\"sec1-7\" class=\"article-Section\"\u003E\u003Ch2 \u003E7. Conclusion\u003C\u002Fh2\u003E\u003Cp class=\"\"\u003EAs a new and potential branch of RL, FL can make learning safer and more efficient while leveraging the benefits of FL. We have discussed the basic definitions of FL and RL as well as our thoughts on their integration in this paper. In general, FRL algorithms can be classified into two categories, \u003Ci\u003Ei.e.\u003C\u002Fi\u003E, HFRL and VFRL. Thus, the definition and general framework of these two categories have been given. Specifically, we have highlighted the difference between HFRL and VFRL. Then, a lot of existing FRL schemes have been summarized and analyzed according to different applications. Finally, the potential challenges in the development of FRL algorithms have been explored. Several open issues of FRL have been identified, which will encourage more efforts toward further research in FRL.\u003C\u002Fp\u003E\u003C\u002Fdiv\u003E\u003Cdiv class=\"article-Section article-declarations\"\u003E\u003Ch2\u003EDeclarations\u003C\u002Fh2\u003E\u003Ch3\u003EAuthors’ contributions\u003C\u002Fh3\u003E\u003Cp\u003EMade substantial contributions to the research and investigation process, reviewed and summarized the literature, wrote and edited the original draft: Qi J, Zhou Q\u003C\u002Fp\u003E\u003Cp\u003EPerformed oversight and leadership responsibility for the research activity planning and execution, as well as developed ideas and evolution of overarching research aims: Lei L\u003C\u002Fp\u003E\u003Cp\u003EPerformed critical review, commentary and revision, as well as provided administrative, technical, and material support: Zheng K\u003C\u002Fp\u003E\u003Ch3\u003EAvailability of data and materials\u003C\u002Fh3\u003E\u003Cp\u003ENot applicable.\u003C\u002Fp\u003E\u003Ch3\u003EFinancial support and sponsorship\u003C\u002Fh3\u003E\u003Cp\u003EThis work was supported by the Natural Sciences and Engineering Research Council (NSERC) of Canada (Discovery Grant No. 401718) and the CARE-AI Seed Fund at the University of Guelph.\u003C\u002Fp\u003E\u003Ch3\u003EConflicts of interest\u003C\u002Fh3\u003E\u003Cp\u003EThe authors declared that there are no conflicts of interest.\u003C\u002Fp\u003E\u003Ch3\u003EEthical approval and consent to participate\u003C\u002Fh3\u003E\u003Cp\u003ENot applicable.\u003C\u002Fp\u003E\u003Ch3\u003EConsent for publication\u003C\u002Fh3\u003E\u003Cp\u003ENot applicable.\u003C\u002Fp\u003E\u003Ch3\u003ECopyright\u003C\u002Fh3\u003E\u003Cp\u003E© The Author(s) 2021.\u003C\u002Fp\u003E\u003C\u002Fdiv\u003E",translate:[{language:"en",new_title:aD,new_abstract:cz,new_keywords:"Federated Learning, Reinforcement Learning, Federated Reinforcement Learning",is_check:m},{language:"cn",new_title:"联邦强化学习:技术、应用和挑战",new_abstract:"本文介绍了联邦强化学习(FRL)的全面调查,这是强化学习(RL)领域中新兴且有前景的领域。我们从联邦学习(FL)和RL的教程开始,然后专注于介绍FRL作为一种新方法,通过利用FL的基本思想来提高RL的性能,同时保护数据隐私。根据框架中代理的分布特征,FRL算法可以分为两类,即水平联合强化学习和垂直联合强化学习(VFRL)。我们通过公式提供了每个类别的详细定义,从技术角度探讨了FRL的演变,并强调了其相对于先前RL算法的优势。此外,我们按应用领域总结了FRL的现有研究成果,包括边缘计算、通信、控制优化和攻击检测。最后,我们描述并讨论了几个对解决FRL中存在的问题至关重要的研究方向。",new_keywords:"联邦学习,强化学习,联邦强化学习",is_check:h},{language:"de",new_title:"Federiertes Verstärkungslernen: Techniken, Anwendungen und offene Herausforderungen",new_abstract:"Dieses Papier präsentiert eine umfassende Untersuchung des föderierten Verstärkungslernens (FRL), einem aufstrebenden und vielversprechenden Bereich im Verstärkungslernen (RL). Wir beginnen mit einem Tutorial über föderiertes Lernen (FL) und RL und konzentrieren uns dann auf die Einführung von FRL als eine neue Methode mit großem Potenzial, indem wir die Grundidee von FL nutzen, um die Leistung von RL zu verbessern, während die Datenprivatsphäre erhalten bleibt. Basierend auf den Verteilungsmerkmalen der Agenten im Rahmen können FRL-Algorithmen in zwei Kategorien unterteilt werden, nämlich Horizontal Federated Reinforcement Learning und vertikales föderiertes Verstärkungslernen (VFRL). Wir geben die detaillierten Definitionen jeder Kategorie durch Formeln an, untersuchen die Evolution von FRL aus technischer Sicht und heben ihre Vorteile gegenüber früheren RL-Algorithmen hervor. Darüber hinaus werden die bestehenden Arbeiten zu FRL nach Anwendungsbereichen zusammengefasst, einschließlich Edge Computing, Kommunikation, Steuerungsoptimierung und Angriffserkennung. Schließlich beschreiben und diskutieren wir mehrere wichtige Forschungsrichtungen, die entscheidend für die Lösung der offenen Probleme innerhalb von FRL sind.",new_keywords:"Federated Learning, Verstärkendes Lernen, Föderiertes Verstärkendes Lernen",is_check:h},{language:"fa",new_title:"Apprentissage par renforcement fédéré : techniques, applications et défis ouverts",new_abstract:"Cet article présente une enquête approfondie sur l'apprentissage par renforcement fédéré (FRL), un domaine émergent et prometteur dans l'apprentissage par renforcement (RL). En commençant par un tutoriel sur l'apprentissage fédéré (FL) et le RL, nous nous concentrons ensuite sur l'introduction du FRL en tant que nouvelle méthode avec un grand potentiel en exploitant l'idée de base du FL pour améliorer les performances du RL tout en préservant la confidentialité des données. Selon les caractéristiques de distribution des agents dans le cadre, les algorithmes FRL peuvent être divisés en deux catégories, à savoir l'apprentissage par renforcement fédéré horizontal et l'apprentissage par renforcement fédéré vertical (VFRL). Nous fournissons les définitions détaillées de chaque catégorie par des formules, étudions l'évolution du FRL d'un point de vue technique, et soulignons ses avantages par rapport aux algorithmes RL précédents. De plus, les travaux existants sur le FRL sont résumés par domaines d'application, incluant le traitement en périphérie, la communication, l'optimisation de contrôle, et la détection d'attaques. Enfin, nous décrivons et discutons plusieurs directions de recherche clés cruciales pour résoudre les problèmes ouverts au sein du FRL.",new_keywords:"L'apprentissage fédéré, l'apprentissage par renforcement, l'apprentissage fédéré par renforcement.",is_check:h},{language:"jp",new_title:"フェデレーテッド強化学習:技術、アプリケーション、そしてオープンな課題",new_abstract:"この論文では、強化学習(RL)の新興で有望な分野であるフェデレーテッド強化学習(FRL)の包括的な調査を行います。フェデレーテッドラーニング(FL)とRLのチュートリアルから始め、基本的なFLのアイデアを活用してRLのパフォーマンスを改善し、データプライバシーを保持しながら高い潜在能力を持つ新しいメソッドであるFRLの紹介に焦点を当てます。枠組み内のエージェントの分布特性に基づいて、FRLアルゴリズムは、水平フェデレーテッド強化学習と垂直フェデレーテッド強化学習(VFRL)の2つのカテゴリに分類されます。各カテゴリの詳細な定義を式で示し、技術的な観点からFRLの進化を調査し、以前のRLアルゴリズムに比してその利点を強調します。さらに、FRLに関する既存の研究は、エッジコンピューティング、通信、制御最適化、攻撃検出などの応用分野についてまとめられています。最後に、FRL内の未解決の問題を解決するために重要ないくつかの研究方向を説明し、議論します。",new_keywords:"フェデレーテッドラーニング、強化学習、フェデレーテッド強化学習",is_check:h},{language:"py",new_title:"Федеративное обучение с подкреплением: техники, приложения и открытые задачи.",new_abstract:"Эта статья представляет собой всесторонний обзор федеративного обучения с подкреплением (FRL), восходящего и многообещающего направления в обучении с подкреплением (RL). Начиная с учебника федеративного обучения (FL) и RL, мы затем сосредотачиваемся на введении FRL как нового метода с большим потенциалом путем использования основной идеи FL для улучшения производительности RL, сохраняя при этом конфиденциальность данных. Согласно распределительным характеристикам агентов в рамках, алгоритмы FRL можно разделить на две категории, т. е. Горизонтальное федеративное обучение с подкреплением и вертикальное федеративное обучение с подкреплением (VFRL). Мы предоставляем подробные определения каждой категории формулами, исследуем эволюцию FRL с технической точки зрения, и выделяем его преимущества по сравнению с предыдущими алгоритмами RL. Кроме того, существующие работы по FRL резюмируются по областям применения, включая вычисления на краю, коммуникации, оптимизацию управления и обнаружение атак. Наконец, мы описываем и обсуждаем несколько ключевых научных направлений, которые важны для решения открытых проблем в FRL.",new_keywords:"Федеративное обучение, обучение с подкреплением, федеративное обучение с подкреплением.",is_check:h},{language:"sk",new_title:"페데레이티드 강화 학습: 기술, 응용 및 오픈 챌린지",new_abstract:"이 논문은 강화 학습 (RL)의 신흥 분야 인 연합 강화 학습 (FRL)의 포괄적인 조사를 제공합니다. 연합 학습 (FL) 및 RL의 자습서로 시작하여 기본 FL의 개념을 활용하여 RL의 성능을 향상시키고 데이터 개인 정보 보호를 유지하면서 큰 잠재력을 가진 새로운 방법인 FRL의 소개에 초점을 맞춥니다. 프레임워크의 에이전트의 분포 특성에 따라 FRL 알고리즘을 수평 연합 강화 학습과 수직 연합 강화 학습 (VFRL) 두 가지 범주로 나눌 수 있습니다. 우리는 각 범주의 세부 정의를 수식으로 제공하고 기술적으로 FRL의 진화를 조사하고 이전 RL 알고리즘보다 그 장점을 강조합니다. 게다가, 연합 강화 학습에 대한 기존 연구들이 엣지 컴퓨팅, 통신, 제어 최적화 및 공격 탐지를 포함한 응용 분야별로 요약됩니다. 마지막으로, FRL 내의 미해결 문제를 해결하는 데 중요한 여러 핵심 연구 방향을 설명하고 논의합니다.",new_keywords:"연합 학습, 강화 학습, 연합 강화 학습",is_check:h},{language:"it",new_title:"Apprendimento federato con rinforzo: tecniche, applicazioni e sfide aperte",new_abstract:"Questo articolo presenta un'ampia panoramica del reinforcement learning federato (FRL), un campo emergente e promettente nell'apprendimento per rinforzo (RL). Iniziando con un tutorial sull'apprendimento federato (FL) e RL, ci concentriamo quindi sull'introduzione del FRL come nuovo metodo con grandi potenzialità, sfruttando l'idea di base del FL per migliorare le prestazioni del RL preservando al contempo la privacy dei dati. In base alle caratteristiche di distribuzione degli agenti nel framework, gli algoritmi FRL possono essere divisi in due categorie, ossia Horizontal Federated Reinforcement Learning e vertical federated reinforcement learning (VFRL). Forniamo le definizioni dettagliate di ciascuna categoria attraverso formule, indaghiamo sull'evoluzione del FRL da un punto di vista tecnico e evidenziamo i suoi vantaggi rispetto agli algoritmi RL precedenti. Inoltre, vengono riassunti i lavori esistenti sul FRL per settori applicativi, tra cui edge computing, comunicazioni, ottimizzazione del controllo e rilevamento degli attacchi. Infine, descriviamo e discutiamo diverse direzioni di ricerca chiave cruciali per risolvere i problemi aperti all'interno del FRL.",new_keywords:"Apprendimento federato, Apprendimento per rinforzo, Apprendimento federato per rinforzo",is_check:h},{language:"fs",new_title:"Aprendizaje por refuerzo federado: técnicas, aplicaciones y desafíos abiertos",new_abstract:"Este documento presenta una revisión exhaustiva del aprendizaje por refuerzo federado (FRL), un campo emergente y prometedor en el aprendizaje por refuerzo (RL). Comenzando con un tutorial de aprendizaje federado (FL) y RL, nos enfocamos en la introducción de FRL como un nuevo método con un gran potencial al aprovechar la idea básica de FL para mejorar el rendimiento de RL mientras se preserva la privacidad de los datos. Según las características de distribución de los agentes en el marco de trabajo, los algoritmos de FRL se pueden dividir en dos categorías, es decir, Aprendizaje por Refuerzo Federado Horizontal y Aprendizaje por Refuerzo Federado Vertical (VFRL). Proporcionamos las definiciones detalladas de cada categoría mediante fórmulas, investigamos la evolución de FRL desde una perspectiva técnica y destacamos sus ventajas sobre los algoritmos de RL anteriores. Además, se resumen los trabajos existentes sobre FRL por campos de aplicación, incluyendo la informática de borde, la comunicación, la optimización del control y la detección de ataques. Finalmente, describimos y discutimos varias direcciones de investigación clave que son cruciales para resolver los problemas abiertos dentro de FRL.",new_keywords:"Aprendizaje federado, Aprendizaje por refuerzo, Aprendizaje federado por refuerzo.",is_check:h},{language:"po",new_title:"Aprendizado por reforço federado: técnicas, aplicações e desafios abertos.",new_abstract:"Este artigo apresenta uma revisão abrangente da aprendizagem por reforço federada (FRL), um campo emergente e promissor na aprendizagem por reforço (RL). Começando com um tutorial de aprendizagem federada (FL) e RL, nós nos concentramos na introdução da FRL como um novo método com grande potencial, aproveitando a ideia básica da FL para melhorar o desempenho da RL enquanto preserva a privacidade dos dados. De acordo com as características de distribuição dos agentes no framework, os algoritmos FRL podem ser divididos em duas categorias, ou seja, Aprendizagem por Reforço Federada Horizontal e aprendizagem por reforço federada vertical (VFRL). Fornecemos definições detalhadas de cada categoria por meio de fórmulas, investigamos a evolução da FRL de uma perspectiva técnica e destacamos suas vantagens sobre os algoritmos RL anteriores. Além disso, os trabalhos existentes sobre FRL são resumidos por campos de aplicação, incluindo computação de borda, comunicação, otimização de controle e detecção de ataques. Finalmente, descrevemos e discutimos várias direções de pesquisa chave que são cruciais para resolver os problemas em aberto dentro da FRL.",new_keywords:"Aprendizagem Federada, Aprendizagem por Reforço, Aprendizagem Federada por Reforço",is_check:h}]},ArtDataF:[{id:1786980,article_id:b,reference_num:h,reference:"Nair A, Srinivasan P, Blackwell S, et al. Massively parallel methods for deep reinforcement learning. CoRR 2015;abs\u002F1507.04296. Available from: \u003Ca target=\"_blank\" xmlns:xlink=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxlink\" href=\"http:\u002F\u002Farxiv.org\u002Fabs\u002F1507.04296\"\u003Ehttp:\u002F\u002Farxiv.org\u002Fabs\u002F1507.04296.\u003C\u002Fa\u003E.",refdoi:a,pubmed:a,pmc:a},{id:1786981,article_id:b,reference_num:m,reference:"Grounds M, Kudenko D. Parallel reinforcement learning with linear function approximation. In: Tuyls K, Nowe A, Guessoum Z, Kudenko D, editors. Adaptive Agents and Multi-Agent Systems III. Adaptation and Multi-Agent Learning. Berlin, Heidelberg: Springer Berlin Heidelberg; 2008. pp. 60-74.",refdoi:"https:\u002F\u002Fdx.doi.org\u002F10.1007\u002F978-3-540-77949-0_5",pubmed:a,pmc:a},{id:1786982,article_id:b,reference_num:s,reference:"Clemente AV, Martínez HNC, Chandra A. Efficient parallel methods for deep reinforcement learning. CoRR 2017;abs\u002F1705.04862. Available from: \u003Ca target=\"_blank\" xmlns:xlink=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxlink\" href=\"http:\u002F\u002Farxiv.org\u002Fabs\u002F1705.04862\"\u003Ehttp:\u002F\u002Farxiv.org\u002Fabs\u002F1705.04862.\u003C\u002Fa\u003E.",refdoi:a,pubmed:a,pmc:a},{id:1786983,article_id:b,reference_num:u,reference:"Lim WYB, Luong NC, Hoang DT, et al. Federated learning in mobile edge networks: a comprehensive survey. \u003Ci\u003EIEEE Communications Surveys Tutorials\u003C\u002Fi\u003E 2020;22:2031-63.",refdoi:"https:\u002F\u002Fdx.doi.org\u002F10.1109\u002FCOMST.2020.2986024",pubmed:a,pmc:a},{id:1786984,article_id:b,reference_num:v,reference:"Nguyen DC, Ding M, Pathirana PN, et al. Federated learning for internet of things: a comprehensive survey. \u003Ci\u003EIEEE Communications Surveys Tutorials\u003C\u002Fi\u003E 2021;23:1622-58.",refdoi:"https:\u002F\u002Fdx.doi.org\u002F10.1109\u002FCOMST.2021.3075439",pubmed:a,pmc:a},{id:1786985,article_id:b,reference_num:x,reference:"Khan LU, Saad W, Han Z, Hossain E, Hong CS. Federated learning for internet of things: recent advances, taxonomy, and open challenges. \u003Ci\u003EIEEE Communications Surveys Tutorials\u003C\u002Fi\u003E 2021;23:1759-99.",refdoi:"https:\u002F\u002Fdx.doi.org\u002F10.1109\u002FCOMST.2021.3090430",pubmed:a,pmc:a},{id:1786986,article_id:b,reference_num:aM,reference:"Yang Q, Liu Y, Cheng Y, et al. 1st ed. Morgan & Claypool; 2019.",refdoi:a,pubmed:a,pmc:a},{id:1786987,article_id:b,reference_num:aN,reference:"Yang Q, Liu Y, Chen T, Tong Y. Federated machine learning: concept and applications. \u003Ci\u003EACM Transactions on Intelligent Systems and Technology (TIST)\u003C\u002Fi\u003E 2019;10:1-19.",refdoi:"https:\u002F\u002Fdx.doi.org\u002F10.1145\u002F3298981",pubmed:a,pmc:a},{id:1786988,article_id:b,reference_num:aO,reference:"Qinbin L, Zeyi W, Bingsheng H. Federated learning systems: vision, hype and reality for data privacy and protection. CoRR 2019;abs\u002F1907.09693. Available from: \u003Ca target=\"_blank\" xmlns:xlink=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxlink\" href=\"http:\u002F\u002Farxiv.org\u002Fabs\u002F1907.09693\"\u003Ehttp:\u002F\u002Farxiv.org\u002Fabs\u002F1907.09693.\u003C\u002Fa\u003E.",refdoi:a,pubmed:a,pmc:a},{id:1786989,article_id:b,reference_num:bV,reference:"Li T, Sahu AK, Talwalkar A, Smith V. Federated learning: challenges, methods, and future directions. \u003Ci\u003EIEEE Signal Processing Magazine\u003C\u002Fi\u003E 2020;37:50-60.",refdoi:"https:\u002F\u002Fdx.doi.org\u002F10.1109\u002FMSP.2020.2975749",pubmed:a,pmc:a},{id:1786990,article_id:b,reference_num:aP,reference:"Wang S, Tuor T, Salonidis T, Leung KK, Makaya C, et al. Adaptive federated learning in resource constrained edge computing systems. \u003Ci\u003EIEEE Journal on Selected Areas in Communications\u003C\u002Fi\u003E 2019;37:1205-21.",refdoi:"https:\u002F\u002Fdx.doi.org\u002F10.1109\u002FJSAC.2019.2904348",pubmed:a,pmc:a},{id:1786991,article_id:b,reference_num:dL,reference:"McMahan HB, Moore E, Ramage D, y Arcas BA. Communication-efficient learning of deep networks from decentralized data. CoRR 2016;abs\u002F1602.05629. Available from: \u003Ca target=\"_blank\" xmlns:xlink=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxlink\" href=\"http:\u002F\u002Farxiv.org\u002Fabs\u002F1602.05629\"\u003Ehttp:\u002F\u002Farxiv.org\u002Fabs\u002F1602.05629.\u003C\u002Fa\u003E.",refdoi:a,pubmed:a,pmc:a},{id:1786992,article_id:b,reference_num:dM,reference:"Phong LT, Aono Y, Hayashi T, Wang L, Moriai S. Privacy-preserving deep learning via additively homomorphic encryption. \u003Ci\u003EIEEE Transactions on Information Forensics and Security\u003C\u002Fi\u003E 2018;13:1333-45.",refdoi:"https:\u002F\u002Fdx.doi.org\u002F10.1109\u002FTIFS.2017.2787987",pubmed:a,pmc:a},{id:1786993,article_id:b,reference_num:bW,reference:"Zhu H, Jin Y. Multi-objective evolutionary federated learning. \u003Ci\u003EIEEE Transactions on Neural Networks and Learning Systems\u003C\u002Fi\u003E 2020;31:1310-22.",refdoi:"https:\u002F\u002Fdx.doi.org\u002F10.1109\u002FTNNLS.2019.2919699",pubmed:a,pmc:a},{id:1786994,article_id:b,reference_num:bX,reference:"Kairouz P, McMahan HB, Avent B, et al. Advances and open problems in federated learning. CoRR 2019;abs\u002F1912.04977. Available from: \u003Ca target=\"_blank\" xmlns:xlink=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxlink\" href=\"http:\u002F\u002Farxiv.org\u002Fabs\u002F1912.04977\"\u003Ehttp:\u002F\u002Farxiv.org\u002Fabs\u002F1912.04977.\u003C\u002Fa\u003E.",refdoi:a,pubmed:a,pmc:a},{id:1786995,article_id:b,reference_num:bY,reference:"Pan SJ, Yang Q. A survey on transfer learning. \u003Ci\u003EIEEE Transactions on Knowledge and Data Engineering\u003C\u002Fi\u003E 2010;22:1345-59.",refdoi:"https:\u002F\u002Fdx.doi.org\u002F10.1109\u002FTKDE.2009.191",pubmed:a,pmc:a},{id:1786996,article_id:b,reference_num:"17",reference:"Li Y. Deep reinforcement learning: an overview. CoRR 2017;abs\u002F1701.07274. Available from: \u003Ca target=\"_blank\" xmlns:xlink=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxlink\" href=\"http:\u002F\u002Farxiv.org\u002Fabs\u002F1701.07274\"\u003Ehttp:\u002F\u002Farxiv.org\u002Fabs\u002F1701.07274.\u003C\u002Fa\u003E.",refdoi:a,pubmed:a,pmc:a},{id:1786997,article_id:b,reference_num:"18",reference:"Xu Z, Tang J, Meng J, et al. Experience-driven networking: a deep reinforcement learning based approach. In: IEEE INFOCOM 2018-IEEE Conference on Computer Communications. IEEE; 2018. pp. 1871-79.",refdoi:"https:\u002F\u002Fdx.doi.org\u002F10.1109\u002FINFOCOM.2018.8485853",pubmed:a,pmc:a},{id:1786998,article_id:b,reference_num:"19",reference:"Mohammadi M, Al-Fuqaha A, Guizani M, Oh JS. Semisupervised deep reinforcement learning in support of IoT and smart city services. \u003Ci\u003EIEEE Internet of Things Journal\u003C\u002Fi\u003E 2018;5:624-35.",refdoi:"https:\u002F\u002Fdx.doi.org\u002F10.1109\u002FJIOT.2017.2712560",pubmed:a,pmc:a},{id:1786999,article_id:b,reference_num:bZ,reference:"Bu F, Wang X. A smart agriculture IoT system based on deep reinforcement learning. Future Generation Computer Systems 2019;99:500–507. Available from: \u003Ca target=\"_blank\" xmlns:xlink=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxlink\" href=\"https:\u002F\u002Fwww.sciencedirect.com\u002Fscience\u002Farticle\u002Fpii\u002FS0167739X19307277\"\u003Ehttps:\u002F\u002Fwww.sciencedirect.com\u002Fscience\u002Farticle\u002Fpii\u002FS0167739X19307277.\u003C\u002Fa\u003E.",refdoi:a,pubmed:a,pmc:a},{id:1787000,article_id:b,reference_num:"21",reference:"Xiong X, Zheng K, Lei L, Hou L. Resource allocation based on deep reinforcement learning in IoT edge computing. \u003Ci\u003EIEEE Journal on Selected Areas in Communications\u003C\u002Fi\u003E 2020;38:1133-46.",refdoi:"https:\u002F\u002Fdx.doi.org\u002F10.1109\u002FJSAC.2020.2986615",pubmed:a,pmc:a},{id:1787001,article_id:b,reference_num:"22",reference:"Lei L, Qi J, Zheng K. Patent analytics based on feature vector space model: a case of IoT. \u003Ci\u003EIEEE Access\u003C\u002Fi\u003E 2019;7:45705-15.",refdoi:"https:\u002F\u002Fdx.doi.org\u002F10.1109\u002FACCESS.2019.2909123",pubmed:a,pmc:a},{id:1787002,article_id:b,reference_num:"23",reference:"Shalev-Shwartz S, Shammah S, Shashua A. Safe, multi-agent, reinforcement learning for autonomous driving. CoRR 2016;abs\u002F1610.03295. Available from: \u003Ca target=\"_blank\" xmlns:xlink=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxlink\" href=\"http:\u002F\u002Farxiv.org\u002Fabs\u002F1610.03295\"\u003Ehttp:\u002F\u002Farxiv.org\u002Fabs\u002F1610.03295.\u003C\u002Fa\u003E.",refdoi:a,pubmed:a,pmc:a},{id:1787003,article_id:b,reference_num:"24",reference:"Sallab AE, Abdou M, Perot E, Yogamani S. Deep reinforcement learning framework for autonomous driving. \u003Ci\u003EElectronic Imaging\u003C\u002Fi\u003E 2017;2017:70-76.",refdoi:"https:\u002F\u002Fdx.doi.org\u002F10.2352\u002FISSN.2470-1173.2017.19.AVM-023",pubmed:a,pmc:a},{id:1787004,article_id:b,reference_num:b_,reference:"Taylor ME. Teaching reinforcement learning with mario: an argument and case study. In: Second AAAI Symposium on Educational Advances in Artificial Intelligence; 2011. Available from: \u003Ca target=\"_blank\" xmlns:xlink=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxlink\" href=\"https:\u002F\u002Fwww.aaai.org\u002Focs\u002Findex.php\u002FEAAI\u002FEAAI11\u002Fpaper\u002FviewPaper\u002F3515\"\u003Ehttps:\u002F\u002Fwww.aaai.org\u002Focs\u002Findex.php\u002FEAAI\u002FEAAI11\u002Fpaper\u002FviewPaper\u002F3515.\u003C\u002Fa\u003E.",refdoi:a,pubmed:a,pmc:a},{id:1787005,article_id:b,reference_num:"26",reference:"Holcomb SD, Porter WK, Ault SV, Mao G, Wang J. Overview on deepmind and its alphago zero ai. In: Proceedings of the 2018 international conference on big data and education 2018. pp. 67-71.",refdoi:"https:\u002F\u002Fdx.doi.org\u002F10.1145\u002F3206157.3206174",pubmed:a,pmc:a},{id:1787006,article_id:b,reference_num:"27",reference:"Watkins CJ, Dayan P. Q-learning. Machine learning 1992;8:279–92. Available from: \u003Ca target=\"_blank\" xmlns:xlink=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxlink\" href=\"https:\u002F\u002Flink.springer.com\u002Fcontent\u002Fpdf\u002F10.1007\u002FBF00992698.pdf\"\u003Ehttps:\u002F\u002Flink.springer.com\u002Fcontent\u002Fpdf\u002F10.1007\u002FBF00992698.pdf.\u003C\u002Fa\u003E.",refdoi:a,pubmed:a,pmc:a},{id:1787007,article_id:b,reference_num:b$,reference:"Thorpe TL. Vehicle traffic light control using sarsa. In: Online]. Available: citeseer. ist. psu. edu\u002Fthorpe97vehicle. html. Citeseer; 1997. Available from: \u003Ca target=\"_blank\" xmlns:xlink=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxlink\" href=\"https:\u002F\u002Fciteseer.ist.psu.edu\u002Fthorpe97vehicle.html\"\u003Ehttps:\u002F\u002Fciteseer.ist.psu.edu\u002Fthorpe97vehicle.html.\u003C\u002Fa\u003E.",refdoi:a,pubmed:a,pmc:a},{id:1787008,article_id:b,reference_num:ca,reference:"Silver D, Lever G, Heess N, et al. Deterministic policy gradient algorithms. In: Xing EP, Jebara T, editors. Proceedings of the 31st International Conference on Machine Learning. vol. 32 of Proceedings of Machine Learning Research. Bejing, China: PMLR; 2014. pp. 387–95. Available from: \u003Ca target=\"_blank\" xmlns:xlink=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxlink\" href=\"https:\u002F\u002Fproceedings.mlr.press\u002Fv32\u002Fsilver14.html\"\u003Ehttps:\u002F\u002Fproceedings.mlr.press\u002Fv32\u002Fsilver14.html.\u003C\u002Fa\u003E.",refdoi:a,pubmed:a,pmc:a},{id:1787009,article_id:b,reference_num:cb,reference:"Williams RJ. Simple statistical gradient-following algorithms for connectionist reinforcement learning. \u003Ci\u003EMachine learning\u003C\u002Fi\u003E 1992;8:229-56.",refdoi:"https:\u002F\u002Fdx.doi.org\u002F10.1007\u002FBF00992696",pubmed:a,pmc:a},{id:1787010,article_id:b,reference_num:"31",reference:"Konda VR, Tsitsiklis JN. Actor-critic algorithms. In: Advances in neural information processing systems; 2000. pp. 1008–14. Available from: \u003Ca target=\"_blank\" xmlns:xlink=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxlink\" href=\"https:\u002F\u002Fproceedings.neurips.cc\u002Fpaper\u002F1786-actor-critic-algorithms.pdf\"\u003Ehttps:\u002F\u002Fproceedings.neurips.cc\u002Fpaper\u002F1786-actor-critic-algorithms.pdf\u003C\u002Fa\u003E.",refdoi:a,pubmed:a,pmc:a},{id:1787011,article_id:b,reference_num:"32",reference:"Henderson P, Islam R, Bachman P, et al. Deep reinforcement learning that matters. In: Proceedings of the AAAI conference on artificial intelligence. vol. 32; 2018. Available from: \u003Ca target=\"_blank\" xmlns:xlink=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxlink\" href=\"https:\u002F\u002Fojs.aaai.org\u002Findex.php\u002FAAAI\u002Farticle\u002Fview\u002F11694\"\u003Ehttps:\u002F\u002Fojs.aaai.org\u002Findex.php\u002FAAAI\u002Farticle\u002Fview\u002F11694.\u003C\u002Fa\u003E.",refdoi:a,pubmed:a,pmc:a},{id:1787012,article_id:b,reference_num:F,reference:"Lei L, Tan Y, Dahlenburg G, Xiang W, Zheng K. Dynamic energy dispatch based on Deep Reinforcement Learning in IoT-Driven Smart Isolated Microgrids. \u003Ci\u003EIEEE Internet of Things Journal\u003C\u002Fi\u003E 2021;8:7938-53.",refdoi:"https:\u002F\u002Fdx.doi.org\u002F10.1109\u002FJIOT.2020.3042007",pubmed:a,pmc:a},{id:1787013,article_id:b,reference_num:"34",reference:"Lei L, Xu H, Xiong X, Zheng K, Xiang W, et al. Multiuser resource control with deep reinforcement learning in IoT edge computing. \u003Ci\u003EIEEE Internet of Things Journal\u003C\u002Fi\u003E 2019;6:10119-33.",refdoi:"https:\u002F\u002Fdx.doi.org\u002F10.1109\u002FJIOT.2019.2935543",pubmed:a,pmc:a},{id:1787014,article_id:b,reference_num:"35",reference:"Ohnishi S, Uchibe E, Yamaguchi Y, Nakanishi K, Yasui Y, et al. Constrained deep q-learning gradually approaching ordinary q-learning. \u003Ci\u003EFrontiers in neurorobotics\u003C\u002Fi\u003E 2019;13:103.",refdoi:"https:\u002F\u002Fdx.doi.org\u002F10.3389\u002Ffnbot.2019.00103",pubmed:"http:\u002F\u002Fwww.ncbi.nlm.nih.gov\u002Fpubmed\u002F31920613",pmc:"https:\u002F\u002Fwww.ncbi.nlm.nih.gov\u002Fpmc\u002Farticles\u002FPMC6914867"},{id:1787015,article_id:b,reference_num:cc,reference:"Peng J, Williams RJ. Incremental multi-step Q-learning. In: machine learning proceedings 1994. Elsevier; 1994. pp. 226-32.",refdoi:"https:\u002F\u002Fdx.doi.org\u002F10.1016\u002FB978-1-55860-335-6.50035-0",pubmed:a,pmc:a},{id:1787016,article_id:b,reference_num:cd,reference:"Mnih V, Kavukcuoglu K, Silver D, et al. Human-level control through deep reinforcement learning. \u003Ci\u003ENature\u003C\u002Fi\u003E 2015;518:529-33.",refdoi:"https:\u002F\u002Fdx.doi.org\u002F10.1038\u002Fnature14236",pubmed:a,pmc:a},{id:1787017,article_id:b,reference_num:"38",reference:"Lei L, Tan Y, Zheng K, et al. Deep reinforcement learning for autonomous internet of things: model, applications and challenges. \u003Ci\u003EIEEE Communications Surveys Tutorials\u003C\u002Fi\u003E 2020;22:1722-60.",refdoi:"https:\u002F\u002Fdx.doi.org\u002F10.1109\u002FCOMST.2020.2988367",pubmed:a,pmc:a},{id:1787018,article_id:b,reference_num:Y,reference:"Van Hasselt H, Guez A, Silver D. Deep reinforcement learning with double q-learning. In: proceedings of the AAAI conference on artificial intelligence. vol. 30; 2016. Available from: \u003Ca target=\"_blank\" xmlns:xlink=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxlink\" href=\"https:\u002F\u002Fojs.aaai.org\u002Findex.php\u002FAAAI\u002Farticle\u002Fview\u002F10295\"\u003Ehttps:\u002F\u002Fojs.aaai.org\u002Findex.php\u002FAAAI\u002Farticle\u002Fview\u002F10295.\u003C\u002Fa\u003E.",refdoi:a,pubmed:a,pmc:a},{id:1787019,article_id:b,reference_num:aL,reference:"Schaul T, Quan J, Antonoglou I, Silver D. Prioritized experience replay. arXiv preprint arXiv:151105952 2015. Available from: \u003Ca target=\"_blank\" xmlns:xlink=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxlink\" href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F1511.05952\"\u003Ehttps:\u002F\u002Farxiv.org\u002Fabs\u002F1511.05952.\u003C\u002Fa\u003E.",refdoi:a,pubmed:a,pmc:a},{id:1787020,article_id:b,reference_num:"41",reference:"Gu S, Lillicrap TP, Ghahramani Z, Turner RE, Levine S. Q-Prop: sample-efficient policy gradient with an off-policy critic. CoRR 2016;abs\u002F1611.02247. Available from: \u003Ca target=\"_blank\" xmlns:xlink=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxlink\" href=\"http:\u002F\u002Farxiv.org\u002Fabs\u002F1611.02247\"\u003Ehttp:\u002F\u002Farxiv.org\u002Fabs\u002F1611.02247.\u003C\u002Fa\u003E.",refdoi:a,pubmed:a,pmc:a},{id:1787021,article_id:b,reference_num:ce,reference:"Haarnoja T, Zhou A, Abbeel P, Levine S. Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: Dy J, Krause A, editors. Proceedings of the 35th International Conference on Machine Learning. vol. 80 of Proceedings of Machine Learning Research. PMLR; 2018. pp. 1861–70. Available from: \u003Ca target=\"_blank\" xmlns:xlink=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxlink\" href=\"https:\u002F\u002Fproceedings.mlr.press\u002Fv80\u002Fhaarnoja18b.html\"\u003Ehttps:\u002F\u002Fproceedings.mlr.press\u002Fv80\u002Fhaarnoja18b.html.\u003C\u002Fa\u003E.",refdoi:a,pubmed:a,pmc:a},{id:1787022,article_id:b,reference_num:"43",reference:"Mnih V, Badia AP, Mirza M, Graves A, Lillicrap T, et al. Asynchronous methods for deep reinforcement learning. In: Balcan MF, Weinberger KQ, editors. Proceedings of The 33rd International Conference on Machine Learning. vol. 48 of Proceedings of Machine Learning Research. New York, New York, USA: PMLR; 2016. pp. 1928–37. Available from: \u003Ca target=\"_blank\" xmlns:xlink=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxlink\" href=\"https:\u002F\u002Fproceedings.mlr.press\u002Fv48\u002Fmniha16.html\"\u003Ehttps:\u002F\u002Fproceedings.mlr.press\u002Fv48\u002Fmniha16.html.\u003C\u002Fa\u003E.",refdoi:a,pubmed:a,pmc:a},{id:1787023,article_id:b,reference_num:cf,reference:"Lillicrap TP, Hunt JJ, Pritzel A, et al. Continuous control with deep reinforcement learning. arXiv preprint arXiv: 150902971 2015. Available from: \u003Ca target=\"_blank\" xmlns:xlink=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxlink\" href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F1509.02971\"\u003Ehttps:\u002F\u002Farxiv.org\u002Fabs\u002F1509.02971.\u003C\u002Fa\u003E.",refdoi:a,pubmed:a,pmc:a},{id:1787024,article_id:b,reference_num:"45",reference:"Barth-Maron G, Hoffman MW, Budden D, et al. Distributed distributional deterministic policy gradients. CoRR 2018;abs\u002F1804.08617. Available from: \u003Ca target=\"_blank\" xmlns:xlink=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxlink\" href=\"http:\u002F\u002Farxiv.org\u002Fabs\u002F1804.08617\"\u003Ehttp:\u002F\u002Farxiv.org\u002Fabs\u002F1804.08617.\u003C\u002Fa\u003E.",refdoi:a,pubmed:a,pmc:a},{id:1787025,article_id:b,reference_num:"46",reference:"Fujimoto S, van Hoof H, Meger D. Addressing function approximation error in actor-critic methods. In: Dy J, Krause A, editors. Proceedings of the 35th International Conference on Machine Learning. vol. 80 of Proceedings of Machine Learning Research. PMLR; 2018. pp. 1587–96. Available from: \u003Ca target=\"_blank\" xmlns:xlink=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxlink\" href=\"https:\u002F\u002Fproceedings.mlr.press\u002Fv80\u002Ffujimoto18a.html\"\u003Ehttps:\u002F\u002Fproceedings.mlr.press\u002Fv80\u002Ffujimoto18a.html.\u003C\u002Fa\u003E.",refdoi:a,pubmed:a,pmc:a},{id:1787026,article_id:b,reference_num:"47",reference:"Schulman J, Levine S, Abbeel P, Jordan M, Moritz P. Trust region policy optimization. In: Bach F, Blei D, editors. Proceedings of the 32nd International Conference on Machine Learning. vol. 37 of Proceedings of Machine Learning Research. Lille, France: PMLR; 2015. pp. 1889–97. Available from: \u003Ca target=\"_blank\" xmlns:xlink=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxlink\" href=\"https:\u002F\u002Fproceedings.mlr.press\u002Fv37\u002Fschulman15.html\"\u003Ehttps:\u002F\u002Fproceedings.mlr.press\u002Fv37\u002Fschulman15.html.\u003C\u002Fa\u003E.",refdoi:a,pubmed:a,pmc:a},{id:1787027,article_id:b,reference_num:cg,reference:"Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O. Proximal policy optimization algorithms. CoRR 2017;abs\u002F1707.06347. Available from: \u003Ca target=\"_blank\" xmlns:xlink=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxlink\" href=\"http:\u002F\u002Farxiv.org\u002Fabs\u002F1707.06347\"\u003Ehttp:\u002F\u002Farxiv.org\u002Fabs\u002F1707.06347.\u003C\u002Fa\u003E.",refdoi:a,pubmed:a,pmc:a},{id:1787028,article_id:b,reference_num:"49",reference:"Zhu P, Li X, Poupart P. On improving deep reinforcement learning for POMDPs. CoRR 2017;abs\u002F1704.07978. Available from: \u003Ca target=\"_blank\" xmlns:xlink=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxlink\" href=\"http:\u002F\u002Farxiv.org\u002Fabs\u002F1704.07978\"\u003Ehttp:\u002F\u002Farxiv.org\u002Fabs\u002F1704.07978.\u003C\u002Fa\u003E.",refdoi:a,pubmed:a,pmc:a},{id:1787029,article_id:b,reference_num:"50",reference:"Hausknecht M, Stone P. Deep recurrent q-learning for partially observable mdps. In: 2015 aaai fall symposium series; 2015. Available from: \u003Ca target=\"_blank\" xmlns:xlink=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxlink\" href=\"https:\u002F\u002Fwww.aaai.org\u002Focs\u002Findex.php\u002FFSS\u002FFSS15\u002Fpaper\u002FviewPaper\u002F11673\"\u003Ehttps:\u002F\u002Fwww.aaai.org\u002Focs\u002Findex.php\u002FFSS\u002FFSS15\u002Fpaper\u002FviewPaper\u002F11673.\u003C\u002Fa\u003E.",refdoi:a,pubmed:a,pmc:a},{id:1787030,article_id:b,reference_num:"51",reference:"Heess N, Hunt JJ, Lillicrap TP, Silver D. Memory-based control with recurrent neural networks. CoRR 2015;abs\u002F1512.04455. Available from: \u003Ca target=\"_blank\" xmlns:xlink=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxlink\" href=\"http:\u002F\u002Farxiv.org\u002Fabs\u002F1512.04455\"\u003Ehttp:\u002F\u002Farxiv.org\u002Fabs\u002F1512.04455.\u003C\u002Fa\u003E.",refdoi:a,pubmed:a,pmc:a},{id:1787031,article_id:b,reference_num:"52",reference:"Foerster J, Nardelli N, Farquhar G, et al. Stabilising experience replay for deep multi-agent reinforcement learning. In: Precup D, Teh YW, editors. Proceedings of the 34th International Conference on Machine Learning. vol. 70 of Proceedings of Machine Learning Research. PMLR; 2017. pp. 1146–55. Available from: \u003Ca target=\"_blank\" xmlns:xlink=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxlink\" href=\"https:\u002F\u002Fproceedings.mlr.press\u002Fv70\u002Ffoerster17b.html\"\u003Ehttps:\u002F\u002Fproceedings.mlr.press\u002Fv70\u002Ffoerster17b.html.\u003C\u002Fa\u003E.",refdoi:a,pubmed:a,pmc:a},{id:1787032,article_id:b,reference_num:ch,reference:"Van der Pol E, Oliehoek FA. Coordinated deep reinforcement learners for traffic light control. Proceedings of learning, inference and control of multi-agent systems (at NIPS 2016) 2016. Available from: \u003Ca target=\"_blank\" xmlns:xlink=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxlink\" href=\"https:\u002F\u002Fwww.elisevanderpol.nl\u002Fpapers\u002FvanderpolNIPSMALIC2016.pdf\"\u003Ehttps:\u002F\u002Fwww.elisevanderpol.nl\u002Fpapers\u002FvanderpolNIPSMALIC2016.pdf.\u003C\u002Fa\u003E.",refdoi:a,pubmed:a,pmc:a},{id:1787033,article_id:b,reference_num:"54",reference:"Foerster J, Farquhar G, Afouras T, Nardelli N, Whiteson S. Counterfactual multi-agent policy gradients. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 32; 2018. Available from: \u003Ca target=\"_blank\" xmlns:xlink=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxlink\" href=\"https:\u002F\u002Fojs.aaai.org\u002Findex.php\u002FAAAI\u002Farticle\u002Fview\u002F11794\"\u003Ehttps:\u002F\u002Fojs.aaai.org\u002Findex.php\u002FAAAI\u002Farticle\u002Fview\u002F11794.\u003C\u002Fa\u003E.",refdoi:a,pubmed:a,pmc:a},{id:1787034,article_id:b,reference_num:"55",reference:"Lowe R, Wu Y, Tamar A, et al. Multi-agent actor-critic for mixed cooperative-competitive environments. CoRR 2017;abs\u002F1706.02275. Available from: \u003Ca target=\"_blank\" xmlns:xlink=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxlink\" href=\"http:\u002F\u002Farxiv.org\u002Fabs\u002F1706.02275\"\u003Ehttp:\u002F\u002Farxiv.org\u002Fabs\u002F1706.02275.\u003C\u002Fa\u003E.",refdoi:a,pubmed:a,pmc:a},{id:1787035,article_id:b,reference_num:"56",reference:"Nadiger C, Kumar A, Abdelhak S. Federated Reinforcement Learning for Fast Personalization. In: 2019 IEEE Second International Conference on Artificial Intelligence and Knowledge Engineering (AIKE) 2019. pp. 123-27.",refdoi:"https:\u002F\u002Fdx.doi.org\u002F10.1109\u002FAIKE.2019.00031",pubmed:a,pmc:a},{id:1787036,article_id:b,reference_num:"57",reference:"Liu B, Wang L, Liu M, Xu C. Lifelong federated reinforcement learning: a learning architecture for navigation in cloud robotic systems. CoRR 2019;abs\u002F1901.06455. Available from: \u003Ca target=\"_blank\" xmlns:xlink=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxlink\" href=\"http:\u002F\u002Farxiv.org\u002Fabs\u002F1901.06455\"\u003Ehttp:\u002F\u002Farxiv.org\u002Fabs\u002F1901.06455.\u003C\u002Fa\u003E.",refdoi:a,pubmed:a,pmc:a},{id:1787037,article_id:b,reference_num:ci,reference:"Ren J, Wang H, Hou T, Zheng S, Tang C. Federated learning-based computation offloading optimization in edge computing-supported internet of things. \u003Ci\u003EIEEE Access\u003C\u002Fi\u003E 2019;7:69194-201.",refdoi:"https:\u002F\u002Fdx.doi.org\u002F10.1109\u002FACCESS.2019.2919736",pubmed:a,pmc:a},{id:1787038,article_id:b,reference_num:"59",reference:"Wang X, Wang C, Li X, Leung VCM, Taleb T. Federated deep reinforcement learning for internet of things with decentralized cooperative edge caching. \u003Ci\u003EIEEE Internet of Things Journal\u003C\u002Fi\u003E 2020;7:9441-55.",refdoi:"https:\u002F\u002Fdx.doi.org\u002F10.1109\u002FJIOT.2020.2986803",pubmed:a,pmc:a},{id:1787039,article_id:b,reference_num:"60",reference:"Chen J, Monga R, Bengio S, Józefowicz R. Revisiting distributed synchronous SGD. CoRR 2016;abs\u002F1604.00981. Available from: \u003Ca target=\"_blank\" xmlns:xlink=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxlink\" href=\"http:\u002F\u002Farxiv.org\u002Fabs\u002F1604.00981\"\u003Ehttp:\u002F\u002Farxiv.org\u002Fabs\u002F1604.00981.\u003C\u002Fa\u003E.",refdoi:a,pubmed:a,pmc:a},{id:1787040,article_id:b,reference_num:"61",reference:"Mnih V, Badia AP, Mirza M, et al. Asynchronous methods for deep reinforcement learning. In: Balcan MF, Weinberger KQ, editors. Proceedings of The 33rd International Conference on Machine Learning. vol. 48 of Proceedings of Machine Learning Research. New York, New York, USA: PMLR; 2016. pp. 1928–37. Available from: \u003Ca target=\"_blank\" xmlns:xlink=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxlink\" href=\"https:\u002F\u002Fproceedings.mlr.press\u002Fv48\u002Fmniha16.html\"\u003Ehttps:\u002F\u002Fproceedings.mlr.press\u002Fv48\u002Fmniha16.html.\u003C\u002Fa\u003E.",refdoi:a,pubmed:a,pmc:a},{id:1787041,article_id:b,reference_num:"62",reference:"Espeholt L, Soyer H, Munos R, et al. IMPALA: Scalable distributed deep-RL with importance weighted actor- learner architectures. In: Dy J, Krause A, editors. Proceedings of the 35th International Conference on Machine Learning. vol. 80 of Proceedings of Machine Learning Research. PMLR; 2018. pp. 1407–16. Available from: \u003Ca target=\"_blank\" xmlns:xlink=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxlink\" href=\"http:\u002F\u002Fproceedings.mlr.press\u002Fv80\u002Fespeholt18a.html\"\u003Ehttp:\u002F\u002Fproceedings.mlr.press\u002Fv80\u002Fespeholt18a.html.\u003C\u002Fa\u003E.",refdoi:a,pubmed:a,pmc:a},{id:1787042,article_id:b,reference_num:"63",reference:"Horgan D, Quan J, Budden D, et al. Distributed prioritized experience replay. CoRR 2018;abs\u002F1803.00933. Available from: \u003Ca target=\"_blank\" xmlns:xlink=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxlink\" href=\"http:\u002F\u002Farxiv.org\u002Fabs\u002F1803.00933\"\u003Ehttp:\u002F\u002Farxiv.org\u002Fabs\u002F1803.00933.\u003C\u002Fa\u003E.",refdoi:a,pubmed:a,pmc:a},{id:1787043,article_id:b,reference_num:"64",reference:"Liu T, Tian B, Ai Y, et al. Parallel reinforcement learning: a framework and case study. \u003Ci\u003EIEEE\u002FCAA Journal of Automatica Sinica\u003C\u002Fi\u003E 2018;5:827-35.",refdoi:"https:\u002F\u002Fdx.doi.org\u002F10.1109\u002FJAS.2018.7511144",pubmed:a,pmc:a},{id:1787044,article_id:b,reference_num:"65",reference:"Zhuo HH, Feng W, Xu Q, Yang Q, Lin Y. Federated reinforcement learning. CoRR 2019;abs\u002F1901.08277. Available from: \u003Ca target=\"_blank\" xmlns:xlink=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxlink\" href=\"http:\u002F\u002Farxiv.org\u002Fabs\u002F1901.08277\"\u003Ehttp:\u002F\u002Farxiv.org\u002Fabs\u002F1901.08277.\u003C\u002Fa\u003E.",refdoi:a,pubmed:a,pmc:a},{id:1787045,article_id:b,reference_num:"66",reference:"Canese L, Cardarilli GC, Di Nunzio L, et al. Multi-agent reinforcement learning: a review of challenges and applications. \u003Ci\u003EApplied Sciences\u003C\u002Fi\u003E 2021;11:4948. Available from: \u003Ca target=\"_blank\" xmlns:xlink=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxlink\" href=\"https:\u002F\u002Fdoi.org\u002F10.3390\u002Fapp11114948\"\u003Ehttps:\u002F\u002Fdoi.org\u002F10.3390\u002Fapp11114948.\u003C\u002Fa\u003E.",refdoi:a,pubmed:a,pmc:a},{id:1787046,article_id:b,reference_num:cj,reference:"Busoniu L, Babuska R, De Schutter B. A Comprehensive survey of multiagent reinforcement learning. \u003Ci\u003EIEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews)\u003C\u002Fi\u003E 2008;38:156-72.",refdoi:"https:\u002F\u002Fdx.doi.org\u002F10.1109\u002FTSMCC.2007.913919",pubmed:a,pmc:a},{id:1787047,article_id:b,reference_num:ck,reference:"Zhang K, Yang Z, Başar T. Multi-agent reinforcement learning: a selective overview of theories and algorithms. \u003Ci\u003EHandbook of Rein forcement Learning and Control\u003C\u002Fi\u003E 2021:321-84.",refdoi:"https:\u002F\u002Fdx.doi.org\u002F10.1007\u002F978-3-030-60990-0_12",pubmed:a,pmc:a},{id:1787048,article_id:b,reference_num:"69",reference:"Stone P, Veloso M. Multiagent systems: a survey from a machine learning perspective. \u003Ci\u003EAutonomous Robots\u003C\u002Fi\u003E 2000;8:345-83.",refdoi:"https:\u002F\u002Fdx.doi.org\u002F10.1023\u002FA:1008942012299",pubmed:a,pmc:a},{id:1787049,article_id:b,reference_num:"70",reference:"Szepesvári C, Littman ML. A unified analysis of value-function-based reinforcement-learning algorithms. \u003Ci\u003ENeural computation\u003C\u002Fi\u003E 1999;11:2017-60.",refdoi:"https:\u002F\u002Fdx.doi.org\u002F10.5555\u002F1121924.1121927",pubmed:a,pmc:a},{id:1787050,article_id:b,reference_num:"71",reference:"Littman ML. Value-function reinforcement learning in markov games. \u003Ci\u003ECognitive systems research\u003C\u002Fi\u003E 2001;2:55-66.",refdoi:"https:\u002F\u002Fdx.doi.org\u002F10.1016\u002FS1389-0417(01)00015-8",pubmed:a,pmc:a},{id:1787051,article_id:b,reference_num:"72",reference:"Tan M. Multi-agent reinforcement learning: independent vs. cooperative agents. In: proceedings of the tenth international conference on machine learning 1993. pp. 330-37.",refdoi:a,pubmed:a,pmc:a},{id:1787052,article_id:b,reference_num:"73",reference:"Lauer M, Riedmiller M. An algorithm for distributed reinforcement learning in cooperative multi-agent systems. In: In Proceedings of the Seventeenth International Conference on Machine Learning. Citeseer; 2000. Available from: \u003Ca target=\"_blank\" xmlns:xlink=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxlink\" href=\"http:\u002F\u002Fciteseerx.ist.psu.edu\u002Fviewdoc\u002Fsummary\"\u003Ehttp:\u002F\u002Fciteseerx.ist.psu.edu\u002Fviewdoc\u002Fsummary.\u003C\u002Fa\u003E.",refdoi:a,pubmed:a,pmc:a},{id:1787053,article_id:b,reference_num:"74",reference:"Monahan GE. State of the art—a survey of partially observable Markov decision processes: theory, models, and algorithms. \u003Ci\u003EManagement science\u003C\u002Fi\u003E 1982;28:1-16.",refdoi:"https:\u002F\u002Fdx.doi.org\u002F10.1287\u002Fmnsc.28.1.1",pubmed:a,pmc:a},{id:1787054,article_id:b,reference_num:"75",reference:"Oroojlooyjadid A, Hajinezhad D. A review of cooperative multi-agent deep reinforcement learning. CoRR 2019;abs\u002F1908.03963. Available from: \u003Ca target=\"_blank\" xmlns:xlink=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxlink\" href=\"http:\u002F\u002Farxiv.org\u002Fabs\u002F1908.03963\"\u003Ehttp:\u002F\u002Farxiv.org\u002Fabs\u002F1908.03963.\u003C\u002Fa\u003E.",refdoi:a,pubmed:a,pmc:a},{id:1787055,article_id:b,reference_num:"76",reference:"Bernstein DS, Givan R, Immerman N, Zilberstein S. The complexity of decentralized control of Markov decision processes. \u003Ci\u003EMathematics of operations research\u003C\u002Fi\u003E 2002;27:819-40.",refdoi:"https:\u002F\u002Fdx.doi.org\u002F10.1287\u002Fmoor.27.4.819.297",pubmed:a,pmc:a},{id:1787056,article_id:b,reference_num:"77",reference:"Omidshafiei S, Pazis J, Amato C, How JP, Vian J. Deep decentralized multi-task multi-agent reinforcement learning under partial observability. In: Precup D, Teh YW, editors. Proceedings of the 34th International Conference on Machine Learning. vol. 70 of Proceedings of Machine Learning Research. PMLR; 2017. pp. 2681–90. Available from: \u003Ca target=\"_blank\" xmlns:xlink=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxlink\" href=\"https:\u002F\u002Fproceedings.mlr.press\u002Fv70\u002Fomidshafiei17a.html\"\u003Ehttps:\u002F\u002Fproceedings.mlr.press\u002Fv70\u002Fomidshafiei17a.html.\u003C\u002Fa\u003E.",refdoi:a,pubmed:a,pmc:a},{id:1787057,article_id:b,reference_num:"78",reference:"Han Y, Gmytrasiewicz P. Ipomdp-net: A deep neural network for partially observable multi-agent planning using interactive pomdps. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 33 2019. pp. 6062-69.",refdoi:"https:\u002F\u002Fdx.doi.org\u002F10.1609\u002Faaai.v33i01.33016062",pubmed:a,pmc:a},{id:1787058,article_id:b,reference_num:"79",reference:"Karkus P, Hsu D, Lee WS. QMDP-Net: Deep learning for planning under partial observability; 2017. Available from: \u003Ca target=\"_blank\" xmlns:xlink=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxlink\" href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F1703.06692\"\u003Ehttps:\u002F\u002Farxiv.org\u002Fabs\u002F1703.06692.\u003C\u002Fa\u003E.",refdoi:a,pubmed:a,pmc:a},{id:1787059,article_id:b,reference_num:"80",reference:"Mao W, Zhang K, Miehling E, Başar T. Information state embedding in partially observable cooperative multi-agent reinforcement learning. In: 2020 59th IEEE Conference on Decision and Control (CDC) 2020. pp. 6124-31.",refdoi:"https:\u002F\u002Fdx.doi.org\u002F10.1109\u002FCDC42340.2020.9303801",pubmed:a,pmc:a},{id:1787060,article_id:b,reference_num:"81",reference:"Mao H, Zhang Z, Xiao Z, Gong Z. Modelling the dynamic joint policy of teammates with attention multi-agent DDPG. CoRR 2018;abs\u002F1811.07029. Available from: \u003Ca target=\"_blank\" xmlns:xlink=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxlink\" href=\"http:\u002F\u002Farxiv.org\u002Fabs\u002F1811.07029\"\u003Ehttp:\u002F\u002Farxiv.org\u002Fabs\u002F1811.07029.\u003C\u002Fa\u003E.",refdoi:a,pubmed:a,pmc:a},{id:1787061,article_id:b,reference_num:"82",reference:"Lee HR, Lee T. Multi-agent reinforcement learning algorithm to solve a partially-observable multi-agent problem in disaster response. \u003Ci\u003EEuropean Journal of Operational Research\u003C\u002Fi\u003E 2021;291:296-308.",refdoi:"https:\u002F\u002Fdx.doi.org\u002F10.1016\u002Fj.ejor.2020.09.018",pubmed:a,pmc:a},{id:1787062,article_id:b,reference_num:"83",reference:"Sukhbaatar S, szlam a, Fergus R. Learning multiagent communication with backpropagation. In: Lee D, Sugiyama M, Luxburg U, Guyon I, Garnett R, editors. Advances in Neural Information Processing Systems. vol. 29. Curran Associates, Inc.; 2016. Available from: \u003Ca target=\"_blank\" xmlns:xlink=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxlink\" href=\"https:\u002F\u002Fproceedings.neurips.cc\u002Fpaper\u002F2016\u002Ffile\u002F55b1927fdafef39c48e5b73b5d61ea60-Paper.pdf\"\u003Ehttps:\u002F\u002Fproceedings.neurips.cc\u002Fpaper\u002F2016\u002Ffile\u002F55b1927fdafef39c48e5b73b5d61ea60-Paper.pdf.\u003C\u002Fa\u003E.",refdoi:a,pubmed:a,pmc:a},{id:1787063,article_id:b,reference_num:"84",reference:"Foerster JN, Assael YM, de Freitas N, Whiteson S. Learning to communicate with deep multi-agent reinforcement learning. CoRR 2016;abs\u002F1605.06676. Available from: \u003Ca target=\"_blank\" xmlns:xlink=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxlink\" href=\"http:\u002F\u002Farxiv.org\u002Fabs\u002F1605.06676\"\u003Ehttp:\u002F\u002Farxiv.org\u002Fabs\u002F1605.06676.\u003C\u002Fa\u003E.",refdoi:a,pubmed:a,pmc:a},{id:1787064,article_id:b,reference_num:"85",reference:"Buşoniu L, Babuška R, De Schutter B. Multi-agent reinforcement learning: an overview. \u003Ci\u003EInnovations in multiagent systems and applications 1\u003C\u002Fi\u003E 2010:183-221.",refdoi:"https:\u002F\u002Fdx.doi.org\u002F10.1007\u002F978-3-642-14435-6_7",pubmed:a,pmc:a},{id:1787065,article_id:b,reference_num:"86",reference:"Hu Y, Hua Y, Liu W, Zhu J. Reward shaping based federated reinforcement learning. \u003Ci\u003EIEEE Access\u003C\u002Fi\u003E 2021;9:67259-67.",refdoi:"https:\u002F\u002Fdx.doi.org\u002F10.1109\u002FACCESS.2021.3074221",pubmed:a,pmc:a},{id:1787066,article_id:b,reference_num:"87",reference:"Anwar A, Raychowdhury A. Multi-task federated reinforcement learning with adversaries. CoRR 2021;abs\u002F2103.06473. Available from: \u003Ca target=\"_blank\" xmlns:xlink=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxlink\" href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2103.06473\"\u003Ehttps:\u002F\u002Farxiv.org\u002Fabs\u002F2103.06473.\u003C\u002Fa\u003E.",refdoi:a,pubmed:a,pmc:a},{id:1787067,article_id:b,reference_num:"88",reference:"Wang X, Han Y, Wang C, et al. In-edge AI: intelligentizing mobile edge computing, caching and communication by federated learning. \u003Ci\u003EIEEE Network\u003C\u002Fi\u003E 2019;33:156-65.",refdoi:"https:\u002F\u002Fdx.doi.org\u002F10.1109\u002FMNET.2019.1800286",pubmed:a,pmc:a},{id:1787068,article_id:b,reference_num:cl,reference:"Wang X, Li R, Wang C, et al. Attention-weighted federated deep reinforcement learning for device-to-device assisted heterogeneous collaborative edge caching. \u003Ci\u003EIEEE Journal on Selected Areas in Communications\u003C\u002Fi\u003E 2021;39:154-69.",refdoi:"https:\u002F\u002Fdx.doi.org\u002F10.1109\u002FJSAC.2020.3036946",pubmed:a,pmc:a},{id:1787069,article_id:b,reference_num:"90",reference:"Zhang M, Jiang Y, Zheng FC, Bennis M, You X. Cooperative edge caching via federated deep reinforcement learning in Fog-RANs. In: 2021 IEEE International Conference on Communications Workshops (ICC Workshops) 2021. pp. 1-6.",refdoi:"https:\u002F\u002Fdx.doi.org\u002F10.1109\u002FICCWorkshops50388.2021.9473609",pubmed:a,pmc:a},{id:1787070,article_id:b,reference_num:"91",reference:"Majidi F, Khayyambashi MR, Barekatain B. HFDRL: an intelligent dynamic cooperate cashing method based on hierarchical federated deep reinforcement learning in edge-enabled IoT. \u003Ci\u003EIEEE Internet of Things Journal\u003C\u002Fi\u003E 2021:1-1.",refdoi:"https:\u002F\u002Fdx.doi.org\u002F10.1109\u002FMECO52532.2021.9460304",pubmed:a,pmc:a},{id:1787071,article_id:b,reference_num:cm,reference:"Zhao L, Ran Y, Wang H, Wang J, Luo J. Towards cooperative caching for vehicular networks with multi-level federated reinforcement learning. In: ICC 2021 - IEEE International Conference on Communications 2021. pp. 1-6.",refdoi:"https:\u002F\u002Fdx.doi.org\u002F10.1109\u002FICC42927.2021.9500714",pubmed:a,pmc:a},{id:1787072,article_id:b,reference_num:"93",reference:"Zhu Z, Wan S, Fan P, Letaief KB. Federated multi-agent actor-critic learning for age sensitive mobile edge computing. \u003Ci\u003EIEEE Internet of Things Journal\u003C\u002Fi\u003E 2021:1-1.",refdoi:"https:\u002F\u002Fdx.doi.org\u002F10.1109\u002FJIOT.2021.3078514",pubmed:a,pmc:a},{id:1787073,article_id:b,reference_num:"94",reference:"Yu S, Chen X, Zhou Z, Gong X, Wu D. When deep reinforcement learning meets federated learning: intelligent multi-timescale resource management for multi-access edge computing in 5G ultra dense network. arXiv:200910601 [cs] 2020 Sep. ArXiv: 2009.10601. Available from: \u003Ca target=\"_blank\" xmlns:xlink=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxlink\" href=\"http:\u002F\u002Farxiv.org\u002Fabs\u002F2009.10601\"\u003Ehttp:\u002F\u002Farxiv.org\u002Fabs\u002F2009.10601.\u003C\u002Fa\u003E.",refdoi:a,pubmed:a,pmc:a},{id:1787074,article_id:b,reference_num:"95",reference:"Tianqing Z, Zhou W, Ye D, Cheng Z, Li J. Resource allocation in IoT edge computing via concurrent federated reinforcement learning. \u003Ci\u003EIEEE Internet of Things Journal\u003C\u002Fi\u003E 2021:1-1.",refdoi:"https:\u002F\u002Fdx.doi.org\u002F10.1109\u002FJIOT.2021.3086910",pubmed:a,pmc:a},{id:1787075,article_id:b,reference_num:"96",reference:"Huang H, Zeng C, Zhao Y, et al. Scalable orchestration of service function chains in NFV-enabled networks: a federated reinforcement learning approach. \u003Ci\u003EIEEE Journal on Selected Areas in Communications\u003C\u002Fi\u003E 2021;39:2558-71.",refdoi:"https:\u002F\u002Fdx.doi.org\u002F10.1109\u002FJSAC.2021.3087227",pubmed:a,pmc:a},{id:1787076,article_id:b,reference_num:"97",reference:"Liu YJ, Feng G, Sun Y, Qin S, Liang YC. Device association for RAN slicing based on hybrid federated deep reinforcement learning. \u003Ci\u003EIEEE Transactions on Vehicular Technology\u003C\u002Fi\u003E 2020;69:15731-45.",refdoi:"https:\u002F\u002Fdx.doi.org\u002F10.1109\u002FTVT.2020.3033035",pubmed:a,pmc:a},{id:1787077,article_id:b,reference_num:"98",reference:"Wang G, Dang CX, Zhou Z. Measure Contribution of participants in federated learning. In: 2019 IEEE International Conference on Big Data (Big Data) 2019. pp. 2597-604.",refdoi:"https:\u002F\u002Fdx.doi.org\u002F10.1109\u002FBigData47090.2019.9006179",pubmed:a,pmc:a},{id:1787078,article_id:b,reference_num:"99",reference:"Cao Y, Lien SY, Liang YC, Chen KC. Federated deep reinforcement learning for user access control in open radio access networks. In: ICC 2021 - IEEE International Conference on Communications 2021. pp. 1-6.",refdoi:"https:\u002F\u002Fdx.doi.org\u002F10.1109\u002FICC42927.2021.9500603",pubmed:a,pmc:a},{id:1787079,article_id:b,reference_num:"100",reference:"Zhang L, Yin H, Zhou Z, Roy S, Sun Y. Enhancing WiFi multiple access performance with federated deep reinforcement learning. In: 2020 IEEE 92nd Vehicular Technology Conference (VTC2020-Fall) 2020. pp. 1-6.",refdoi:"https:\u002F\u002Fdx.doi.org\u002F10.1109\u002FVTC2020-Fall49728.2020.9348485",pubmed:a,pmc:a},{id:1787080,article_id:b,reference_num:"101",reference:"Xu M, Peng J, Gupta BB, et al. Multi-agent federated reinforcement learning for secure incentive mechanism in intelligent cyber-physical systems. \u003Ci\u003EIEEE Internet of Things Journal\u003C\u002Fi\u003E 2021:1-1.",refdoi:"https:\u002F\u002Fdx.doi.org\u002F10.1109\u002FJIOT.2021.3081626",pubmed:a,pmc:a},{id:1787081,article_id:b,reference_num:"102",reference:"Zhang X, Peng M, Yan S, Sun Y. Deep-reinforcement-learning-based mode selection and resource allocation for cellular V2X communications. \u003Ci\u003EIEEE Internet of Things Journal\u003C\u002Fi\u003E 2020;7:6380-91.",refdoi:"https:\u002F\u002Fdx.doi.org\u002F10.1109\u002FJIOT.2019.2962715",pubmed:a,pmc:a},{id:1787082,article_id:b,reference_num:"103",reference:"Kwon D, Jeon J, Park S, Kim J, Cho S. Multiagent DDPG-based deep learning for smart ocean federated learning IoT networks. \u003Ci\u003EIEEE Internet of Things Journal\u003C\u002Fi\u003E 2020;7:9895-903.",refdoi:"https:\u002F\u002Fdx.doi.org\u002F10.1109\u002FJIOT.2020.2988033",pubmed:a,pmc:a},{id:1787083,article_id:b,reference_num:"104",reference:"Liang X, Liu Y, Chen T, Liu M, Yang Q. Federated transfer reinforcement learning for autonomous driving. arXiv:191006001 [cs] 2019 Oct. ArXiv: 1910.06001. Available from: \u003Ca target=\"_blank\" xmlns:xlink=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxlink\" href=\"http:\u002F\u002Farxiv.org\u002Fabs\u002F1910.06001\"\u003Ehttp:\u002F\u002Farxiv.org\u002Fabs\u002F1910.06001.\u003C\u002Fa\u003E.",refdoi:a,pubmed:a,pmc:a},{id:1787084,article_id:b,reference_num:"105",reference:"Lim HK, Kim JB, Heo JS, Han YH. Federated reinforcement learning for training control policies on multiple IoT devices. Sensors 2020 Mar;20:1359. Available from: \u003Ca target=\"_blank\" xmlns:xlink=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxlink\" href=\"https:\u002F\u002Fwww.mdpi.com\u002F1424-8220\u002F20\u002F5\u002F1359\"\u003Ehttps:\u002F\u002Fwww.mdpi.com\u002F1424-8220\u002F20\u002F5\u002F1359.\u003C\u002Fa\u003E.",refdoi:a,pubmed:a,pmc:a},{id:1787085,article_id:b,reference_num:"106",reference:"Lim HK, Kim JB, Ullah I, Heo JS, Han YH. Federated reinforcement learning acceleration method for precise control of multiple devices. \u003Ci\u003EIEEE Access\u003C\u002Fi\u003E 2021;9:76296-306.",refdoi:"https:\u002F\u002Fdx.doi.org\u002F10.1109\u002FACCESS.2021.3083087",pubmed:a,pmc:a},{id:1787086,article_id:b,reference_num:"107",reference:"Mowla NI, Tran NH, Doh I, Chae K. AFRL: Adaptive federated reinforcement learning for intelligent jamming defense in FANET. \u003Ci\u003EJournal of Communications and Networks\u003C\u002Fi\u003E 2020;22:244-58.",refdoi:"https:\u002F\u002Fdx.doi.org\u002F10.1109\u002FJCN.2020.000015",pubmed:a,pmc:a},{id:1787087,article_id:b,reference_num:"108",reference:"Nguyen TG, Phan TV, Hoang DT, Nguyen TN, So-In C. Federated deep reinforcement learning for traffic monitoring in SDN-Based IoT networks. \u003Ci\u003EIEEE Transactions on Cognitive Communications and Networking\u003C\u002Fi\u003E 2021:1-1.",refdoi:"https:\u002F\u002Fdx.doi.org\u002F10.1109\u002FTCCN.2021.3102971",pubmed:a,pmc:a},{id:1787088,article_id:b,reference_num:"109",reference:"Wang X, Garg S, Lin H, et al. Towards accurate anomaly detection in industrial internet-of-things using hierarchical federated learning. \u003Ci\u003EIEEE Internet of Things Journal\u003C\u002Fi\u003E 2021:1-1.",refdoi:"https:\u002F\u002Fdx.doi.org\u002F10.1109\u002FJIOT.2021.3074382",pubmed:a,pmc:a},{id:1787089,article_id:b,reference_num:"110",reference:"Lee S, Choi DH. Federated reinforcement learning for energy management of multiple smart homes with distributed energy resources. \u003Ci\u003EIEEE Transactions on Industrial Informatics\u003C\u002Fi\u003E 2020:1-1.",refdoi:"https:\u002F\u002Fdx.doi.org\u002F10.1109\u002FTII.2020.3035451",pubmed:a,pmc:a},{id:1787090,article_id:b,reference_num:"111",reference:"Samet H. The quadtree and related hierarchical data structures. ACM Comput Surv 1984;16:187–260. Available from: \u003Ca target=\"_blank\" xmlns:xlink=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxlink\" href=\"https:\u002F\u002Fdoi.org\u002F10.1145\u002F356924.356930\"\u003Ehttps:\u002F\u002Fdoi.org\u002F10.1145\u002F356924.356930.\u003C\u002Fa\u003E.",refdoi:a,pubmed:a,pmc:a},{id:1787091,article_id:b,reference_num:"112",reference:"Abdel-Aziz MK, Samarakoon S, Perfecto C, Bennis M. Cooperative perception in vehicular networks using multi-agent reinforcement learning. In: 2020 54th Asilomar Conference on Signals, Systems, and Computers 2020. pp. 408-12.",refdoi:"https:\u002F\u002Fdx.doi.org\u002F10.1109\u002FIEEECONF51394.2020.9443539",pubmed:a,pmc:a},{id:1787092,article_id:b,reference_num:"113",reference:"Wang H, Kaplan Z, Niu D, Li B. Optimizing federated learning on Non-IID data with reinforcement learning. In: IEEE INFOCOM 2020 - IEEE Conference on Computer Communications. Toronto, ON, Canada: IEEE; 2020. pp. 1698–707. Available from: \u003Ca target=\"_blank\" xmlns:xlink=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxlink\" href=\"https:\u002F\u002Fieeexplore.ieee.org\u002Fdocument\u002F9155494\u002F\"\u003Ehttps:\u002F\u002Fieeexplore.ieee.org\u002Fdocument\u002F9155494\u002F.\u003C\u002Fa\u003E.",refdoi:a,pubmed:a,pmc:a},{id:1787093,article_id:b,reference_num:"114",reference:"Zhang P, Gan P, Aujla GS, Batth RS. Reinforcement learning for edge device selection using social attribute perception in industry 4.0. \u003Ci\u003EIEEE Internet of Things Journal\u003C\u002Fi\u003E 2021:1-1.",refdoi:"https:\u002F\u002Fdx.doi.org\u002F10.1109\u002FJIOT.2021.3088577",pubmed:a,pmc:a},{id:1787094,article_id:b,reference_num:"115",reference:"Zhan Y, Li P, Leijie W, Guo S. L4L: experience-driven computational resource control in federated learning. \u003Ci\u003EIEEE Transactions on Computers\u003C\u002Fi\u003E 2021:1-1.",refdoi:"https:\u002F\u002Fdx.doi.org\u002F10.1109\u002FTC.2021.3068219",pubmed:a,pmc:a},{id:1787095,article_id:b,reference_num:"116",reference:"Dong Y, Gan P, Aujla GS, Zhang P. RA-RL: reputation-aware edge device selection method based on reinforcement learning. In: 2021 IEEE 22nd International Symposium on a World of Wireless, Mobile and Multimedia Networks (WoWMoM) 2021. pp. 348-53.",refdoi:"https:\u002F\u002Fdx.doi.org\u002F10.1109\u002FWoWMoM51794.2021.00063",pubmed:a,pmc:a},{id:1787096,article_id:b,reference_num:"117",reference:"Sahu AK, Li T, Sanjabi M, et al. On the convergence of federated optimization in heterogeneous networks. CoRR 2018;abs\u002F1812.06127. Available from: \u003Ca target=\"_blank\" xmlns:xlink=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxlink\" href=\"http:\u002F\u002Farxiv.org\u002Fabs\u002F1812.06127\"\u003Ehttp:\u002F\u002Farxiv.org\u002Fabs\u002F1812.06127.\u003C\u002Fa\u003E.",refdoi:a,pubmed:a,pmc:a},{id:1787097,article_id:b,reference_num:"118",reference:"Chen M, Poor HV, Saad W, Cui S. Convergence time optimization for federated learning over wireless networks. \u003Ci\u003EIEEE Transactions on Wireless Communications\u003C\u002Fi\u003E 2021;20:2457-71.",refdoi:"https:\u002F\u002Fdx.doi.org\u002F10.1109\u002FTWC.2020.3042530",pubmed:a,pmc:a},{id:1787098,article_id:b,reference_num:"119",reference:"Li X, Huang K, Yang W, Wang S, Zhang Z. On the convergence of fedAvg on Non-IID data; 2020. Available from: \u003Ca target=\"_blank\" xmlns:xlink=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxlink\" href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F1907.02189?context=stat.ML\"\u003Ehttps:\u002F\u002Farxiv.org\u002Fabs\u002F1907.02189?context=stat.ML.\u003C\u002Fa\u003E.",refdoi:a,pubmed:a,pmc:a},{id:1787099,article_id:b,reference_num:"120",reference:"Bonawitz KA, Eichner H, Grieskamp W, et al. Towards federated learning at scale: system design. CoRR 2019;abs\u002F1902.01046. Available from: \u003Ca target=\"_blank\" xmlns:xlink=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxlink\" href=\"http:\u002F\u002Farxiv.org\u002Fabs\u002F1902.01046\"\u003Ehttp:\u002F\u002Farxiv.org\u002Fabs\u002F1902.01046.\u003C\u002Fa\u003E.",refdoi:a,pubmed:a,pmc:a},{id:1787100,article_id:b,reference_num:"121",reference:"Mnih V, Kavukcuoglu K, Silver D, et al. Human-level control through deep reinforcement learning. Nature 2015;518:529–33. Available from: \u003Ca target=\"_blank\" xmlns:xlink=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxlink\" href=\"https:\u002F\u002Fdoi.org\u002F10.1038\u002Fnature14236\"\u003Ehttps:\u002F\u002Fdoi.org\u002F10.1038\u002Fnature14236.\u003C\u002Fa\u003E.",refdoi:a,pubmed:a,pmc:a},{id:1787101,article_id:b,reference_num:"122",reference:"Lillicrap TP, Hunt JJ, Pritzel A, et al. Continuous control with deep reinforcement learning; 2019. Available from: \u003Ca target=\"_blank\" xmlns:xlink=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxlink\" href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F1509.02971\"\u003Ehttps:\u002F\u002Farxiv.org\u002Fabs\u002F1509.02971.\u003C\u002Fa\u003E.",refdoi:a,pubmed:a,pmc:a},{id:1787102,article_id:b,reference_num:"123",reference:"Lyu L, Yu H, Yang Q. Threats to federated learning: a survey. CoRR 2020;abs\u002F2003.02133. Available from: \u003Ca target=\"_blank\" xmlns:xlink=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxlink\" href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2003.02133\"\u003Ehttps:\u002F\u002Farxiv.org\u002Fabs\u002F2003.02133.\u003C\u002Fa\u003E.",refdoi:a,pubmed:a,pmc:a},{id:1787103,article_id:b,reference_num:"124",reference:"Fung C, Yoon CJM, Beschastnikh I. Mitigating sybils in federated learning poisoning. CoRR 2018;abs\u002F1808.04866. Available from: \u003Ca target=\"_blank\" xmlns:xlink=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxlink\" href=\"http:\u002F\u002Farxiv.org\u002Fabs\u002F1808.04866\"\u003Ehttp:\u002F\u002Farxiv.org\u002Fabs\u002F1808.04866.\u003C\u002Fa\u003E.",refdoi:a,pubmed:a,pmc:a},{id:1787104,article_id:b,reference_num:"125",reference:"Anwar A, Raychowdhury A. Multi-task federated reinforcement learning with adversaries 2021.",refdoi:a,pubmed:a,pmc:a},{id:1787105,article_id:b,reference_num:"126",reference:"Zhu L, Liu Z, Han S. Deep leakage from gradients. CoRR 2019;abs\u002F1906.08935. Available from: \u003Ca target=\"_blank\" xmlns:xlink=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxlink\" href=\"http:\u002F\u002Farxiv.org\u002Fabs\u002F1906.08935\"\u003Ehttp:\u002F\u002Farxiv.org\u002Fabs\u002F1906.08935.\u003C\u002Fa\u003E.",refdoi:a,pubmed:a,pmc:a},{id:1787106,article_id:b,reference_num:"127",reference:"Nishio T, Yonetani R. Client Selection for federated learning with heterogeneous resources in mobile edge. In: ICC 2019-2019 IEEE International Conference on Communications (ICC) 2019. pp. 1-7.",refdoi:"https:\u002F\u002Fdx.doi.org\u002F10.1109\u002FICC.2019.8761315",pubmed:a,pmc:a},{id:1787107,article_id:b,reference_num:"128",reference:"Yang T, Andrew G, Eichner H, et al. Applied federated learning: improving google keyboard query suggestions. CoRR 2018;abs\u002F1812.02903. Available from: \u003Ca target=\"_blank\" xmlns:xlink=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxlink\" href=\"http:\u002F\u002Farxiv.org\u002Fabs\u002F1812.02903\"\u003Ehttp:\u002F\u002Farxiv.org\u002Fabs\u002F1812.02903.\u003C\u002Fa\u003E.",refdoi:a,pubmed:a,pmc:a},{id:1787108,article_id:b,reference_num:"129",reference:"Yu H, Liu Z, Liu Y, et al. A fairness-aware incentive scheme for federated learning. In: Proceedings of the AAAI\u002FACM Conference on AI, Ethics, and Society. AIES ’20. New York, NY, USA: Association for Computing Machinery; 2020. p. 393–399. Available from: \u003Ca target=\"_blank\" xmlns:xlink=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxlink\" href=\"https:\u002F\u002Fdoi.org\u002F10.1145\u002F3375627.3375840\"\u003Ehttps:\u002F\u002Fdoi.org\u002F10.1145\u002F3375627.3375840.\u003C\u002Fa\u003E.",refdoi:a,pubmed:a,pmc:a}],ArtDataP:[{href:"\u002Farticles\u002Fir.2021.02\u002Fimage\u002Ffig1",post_id:"fig1",image:"https:\u002F\u002Fimage.oaes.cc\u002F38ebf366-5fba-47ab-86ac-aa0a6cc124ef\u002F4325.fig.1.jpg"},{href:"\u002Farticles\u002Fir.2021.02\u002Fimage\u002Ffig2",post_id:"fig2",image:"https:\u002F\u002Fimage.oaes.cc\u002F38ebf366-5fba-47ab-86ac-aa0a6cc124ef\u002F4325.fig.2.jpg"},{href:"\u002Farticles\u002Fir.2021.02\u002Fimage\u002Ffig3",post_id:"fig3",image:"https:\u002F\u002Fimage.oaes.cc\u002F38ebf366-5fba-47ab-86ac-aa0a6cc124ef\u002F4325.fig.3.jpg"},{href:"\u002Farticles\u002Fir.2021.02\u002Fimage\u002Ffig4",post_id:"fig4",image:"https:\u002F\u002Fimage.oaes.cc\u002F38ebf366-5fba-47ab-86ac-aa0a6cc124ef\u002F4325.fig.4.jpg"},{href:"\u002Farticles\u002Fir.2021.02\u002Fimage\u002Ffig5",post_id:"fig5",image:"https:\u002F\u002Fimage.oaes.cc\u002F38ebf366-5fba-47ab-86ac-aa0a6cc124ef\u002F4325.fig.5.jpg"},{href:"\u002Farticles\u002Fir.2021.02\u002Fimage\u002Ffig6",post_id:"fig6",image:"https:\u002F\u002Fimage.oaes.cc\u002F38ebf366-5fba-47ab-86ac-aa0a6cc124ef\u002F4325.fig.6.jpg"},{href:"\u002Farticles\u002Fir.2021.02\u002Fimage\u002Ffig7",post_id:"fig7",image:"https:\u002F\u002Fimage.oaes.cc\u002F38ebf366-5fba-47ab-86ac-aa0a6cc124ef\u002F4325.fig.7.jpg"},{href:"\u002Farticles\u002Fir.2021.02\u002Fimage\u002Ffig8",post_id:"fig8",image:"https:\u002F\u002Fimage.oaes.cc\u002F38ebf366-5fba-47ab-86ac-aa0a6cc124ef\u002F4325.fig.8.jpg"},{href:"\u002Farticles\u002Fir.2021.02\u002Fimage\u002Ffig9",post_id:"fig9",image:"https:\u002F\u002Fimage.oaes.cc\u002F38ebf366-5fba-47ab-86ac-aa0a6cc124ef\u002F4325.fig.9.jpg"},{href:"\u002Farticles\u002Fir.2021.02\u002Fimage\u002Ffig10",post_id:"fig10",image:"https:\u002F\u002Fimage.oaes.cc\u002F38ebf366-5fba-47ab-86ac-aa0a6cc124ef\u002F4325.fig.10.jpg"},{href:"\u002Farticles\u002Fir.2021.02\u002Fimage\u002Ffig11",post_id:"fig11",image:"https:\u002F\u002Fimage.oaes.cc\u002F38ebf366-5fba-47ab-86ac-aa0a6cc124ef\u002F4325.fig.11.jpg"},{href:"\u002Farticles\u002Fir.2021.02\u002Fimage\u002Ffig12",post_id:"fig12",image:"https:\u002F\u002Fimage.oaes.cc\u002F38ebf366-5fba-47ab-86ac-aa0a6cc124ef\u002F4325.fig.12.jpg"}],ArtDataT:[{date_published:cn,section:B,section_id:r,title:aF,doi:dN,abstract:dO,pdfurl:dP,elocation_id:c,fpage:dQ,article_id:aE,viewed:g,downloaded:g,video_url:c,volume:e,year:co,tag:dR,image:w,authors:aQ,video_img:a,journal_path:z,lpage:dS,author:aQ,specialissue:a,specialinfo:a,url_doi:aG},{date_published:1694534400,section:B,section_id:r,title:by,doi:"10.20517\u002Fir.2023.25",abstract:"\u003Cp\u003EMulti-vehicle pursuit (MVP) is one of the most challenging problems for intelligent traffic management systems due to multi-source heterogeneous data and its mission nature. While many reinforcement learning (RL) algorithms have shown promising abilities for MVP in structured grid-pattern roads, their lack of dynamic and effective traffic awareness limits pursuing efficiency. The sparse reward of pursuing tasks still hinders the optimization of these RL algorithms. Therefore, this paper proposes a distributed generative multi-adversarial RL for MVP (DGMARL-MVP) in urban traffic scenes. In DGMARL-MVP, a generative multi-adversarial network is designed to improve the Bellman equation by generating the potential dense reward, thereby properly guiding strategy optimization of distributed multi-agent RL. Moreover, a graph neural network-based intersecting cognition is proposed to extract integrated features of traffic situations and relationships among agents from multi-source heterogeneous data. These integrated and comprehensive traffic features are used to assist RL decision-making and improve pursuing efficiency. Extensive experimental results show that the DGMARL-MVP can reduce the pursuit time by 5.47% compared with proximal policy optimization and improve the pursuing average success rate up to 85.67%. Codes are open-sourced in Github.\u003C\u002Fp\u003E",pdfurl:"https:\u002F\u002Ff.oaes.cc\u002Fxmlpdf\u002F0a37142e-e036-4772-9977-2648f878a551\u002Fir3025.pdf",elocation_id:c,fpage:436,article_id:bx,viewed:168,downloaded:M,video_url:"https:\u002F\u002Fi.oaes.cc\u002Fuploads\u002F20230920\u002F658f4508ead6418080d7bb603c9156a2.mp4",volume:k,year:aR,tag:"436-52",image:"https:\u002F\u002Foaepublishstorage.blob.core.windows.net\u002F0a37142e-e036-4772-9977-2648f878a551\u002Fir3025-coverimg.jpg",authors:dT,video_img:"https:\u002F\u002Fi.oaes.cc\u002Fuploads\u002F20240205\u002F5bce74c396294f068bdaa590b992d51f.jpg",journal_path:z,lpage:452,author:dT,specialissue:a,specialinfo:a,url_doi:bz,image_list:"https:\u002F\u002Fimage.oaes.cc\u002F0a37142e-e036-4772-9977-2648f878a551\u002Fir3025-coverimg.jpg"},{date_published:1693411200,section:B,section_id:r,title:bB,doi:"10.20517\u002Fir.2023.23",abstract:"\u003Cp\u003EA significant challenge in self-driving technology involves the domain-specific training of prediction models on intentions of other surrounding vehicles. Separately processing domain-specific models requires substantial human resources, time, and equipment for data collection and training. For instance, substantial difficulties arise when directly applying a prediction model developed with data from China to the United States market due to complex factors such as differing driving behaviors and traffic rules. The emergence of transfer learning seems to offer solutions, enabling the reuse of models and data to enhance prediction efficiency across international markets. However, many transfer learning methods require a comparison between source and target data domains to determine what can be transferred, a process that can often be legally restricted. A specialized area of transfer learning, known as network-based transfer, could potentially provide a solution. This approach involves pre-training and fine-tuning \"student\" models using selected parameters from a \"teacher\" model. However, as networks typically have a large number of parameters, it raises questions about the most efficient methods for parameter selection to optimize transfer learning. An automatic parameter selector through reinforcement learning has been developed in this paper, named \"Automatic Transfer Selector via Reinforcement Learning\". This technique enhances the efficiency of parameter selection for transfer prediction between international self-driving markets, in contrast to manual methods. With this innovative approach, technicians are relieved from the labor-intensive task of testing each parameter combination, or enduring lengthy training periods to evaluate the impact of prediction transfer. Experiments have been conducted using a temporal convolutional neural network fully trained with the data from the Chinese market and one month's US data, focusing on improving the training efficiency of specific driving scenarios in the US. Results show that the proposed approach significantly improves the prediction transfer process.\u003C\u002Fp\u003E",pdfurl:"https:\u002F\u002Ff.oaes.cc\u002Fxmlpdf\u002Fbc131583-6d9c-43ba-8016-51066f5c5189\u002Fir3023.pdf",elocation_id:c,fpage:402,article_id:bA,viewed:173,downloaded:G,video_url:"https:\u002F\u002Fv.oaes.cc\u002Fuploads\u002F20231120\u002Fbf7afa20e5b44e598f7c8d2f0fd811ca.mp4",volume:k,year:aR,tag:"402-19",image:w,authors:dU,video_img:"https:\u002F\u002Fi.oaes.cc\u002Fuploads\u002F20240205\u002F9d3040b52e3448d7aa6611b890486d8a.jpg",journal_path:z,lpage:419,author:dU,specialissue:{id:1243,name:" Scene Understanding for Autonomous Robotics"},specialinfo:a,url_doi:bC},{date_published:1687795200,section:B,section_id:r,title:bE,doi:"10.20517\u002Fir.2023.10",abstract:"\u003Cp\u003EAutonomous navigation of unmanned aerial vehicles (UAVs) is widely used in building rescue systems. As the complexity of the task increases, traditional methods based on environment models are hard to apply. In this paper, a reinforcement learning (RL) algorithm is proposed to solve the UAV navigation problem. The UAV navigation task is modeled as a Markov Decision Process (MDP) with parameterized actions. In addition, the sparse reward problem is also taken into account. To address these issues, we develop the HER-MPDQN by combining Multi-Pass Deep Q-Network (MP-DQN) and Hindsight Experience Replay (HER). Two UAV navigation simulation environments with progressive difficulty are constructed to evaluate our method. The results show that HER-MPDQN outperforms other baselines in relatively simple tasks. Especially for complex tasks involving relay operations, only our method can achieve satisfactory performance.\u003C\u002Fp\u003E",pdfurl:"https:\u002F\u002Ff.oaes.cc\u002Fxmlpdf\u002F5a85a66c-d57d-4c7d-81a0-cd7a69f1fdbe\u002F5735.pdf",elocation_id:c,fpage:161,article_id:bD,viewed:255,downloaded:aS,video_url:c,volume:k,year:aR,tag:"161-75",image:"https:\u002F\u002Foaepublishstorage.blob.core.windows.net\u002F5a85a66c-d57d-4c7d-81a0-cd7a69f1fdbe\u002F5735-coverimg.jpg",authors:dV,video_img:a,journal_path:z,lpage:175,author:dV,specialissue:{id:dW,name:dX},specialinfo:a,url_doi:bF,image_list:"https:\u002F\u002Fimage.oaes.cc\u002F5a85a66c-d57d-4c7d-81a0-cd7a69f1fdbe\u002F5735-coverimg.jpg"},{date_published:1678291200,section:B,section_id:r,title:bH,doi:"10.20517\u002Fir.2023.04",abstract:"\u003Cp\u003EThe unmanned aerial vehicle (UAV) has been applied in unmanned air combat because of its flexibility and practicality. The short-range air combat situation is rapidly changing, and the UAV has to make the autonomous maneuver decision as quickly as possible. In this paper, a type of short-range air combat maneuver decision method based on deep reinforcement learning is proposed. Firstly, the combat environment, including UAV motion model and the position and velocity relationships, is described. On this basic, the combat process is established. Secondly, some improved points based on proximal policy optimization (PPO) are proposed to enhance the maneuver decision-making ability. The gate recurrent unit (GRU) can help PPO make decisions with continuous timestep data. The actor network's input is the observation of UAV, however, the input of the critic network, named state, includes the blood values which cannot be observed directly. In addition, the action space with 15 basic actions and well-designed reward function are proposed to combine the air combat environment and PPO. In particular, the reward function is divided into dense reward, event reward and end-game reward to ensure the training feasibility. The training process is composed of three phases to shorten the training time. Finally, the designed maneuver decision method is verified through the ablation study and confrontment tests. The results show that the UAV with the proposed maneuver decision method can obtain an effective action policy to make a more flexible decision in air combat.\u003C\u002Fp\u003E",pdfurl:"https:\u002F\u002Ff.oaes.cc\u002Fxmlpdf\u002F555d9665-2c08-46be-982d-813dc0379fb9\u002F5495.pdf",elocation_id:c,fpage:dY,article_id:bG,viewed:809,downloaded:582,video_url:c,volume:k,year:aR,tag:"76-94",image:"https:\u002F\u002Foaepublishstorage.blob.core.windows.net\u002F555d9665-2c08-46be-982d-813dc0379fb9\u002F5495-coverimg.jpg",authors:dZ,video_img:a,journal_path:z,lpage:94,author:dZ,specialissue:{id:dW,name:dX},specialinfo:a,url_doi:bI,image_list:"https:\u002F\u002Fimage.oaes.cc\u002F555d9665-2c08-46be-982d-813dc0379fb9\u002F5495-coverimg.jpg"},{date_published:1661961600,section:bw,section_id:L,title:bK,doi:"10.20517\u002Fir.2022.20",abstract:"\u003Cp\u003EBuilding controllers for legged robots with agility and intelligence has been one of the typical challenges in the pursuit of artificial intelligence (AI). As an important part of the AI field, deep reinforcement learning (DRL) can realize sequential decision making without physical modeling through end-to-end learning and has achieved a series of major breakthroughs in quadrupedal locomotion research. In this review article, we systematically organize and summarize relevant important literature, covering DRL algorithms from problem setting to advanced learning methods. These algorithms alleviate the specific problems encountered in the practical application of robots to a certain extent. We first elaborate on the general development trend in this field from several aspects, such as the DRL algorithms, simulation environments, and hardware platforms. Moreover, core components in the algorithm design, such as state and action spaces, reward functions, and solutions to reality gap problems, are highlighted and summarized. We further discuss open problems and propose promising future research directions to discover new areas of research.\u003C\u002Fp\u003E",pdfurl:"https:\u002F\u002Ff.oaes.cc\u002Fxmlpdf\u002F12bbd333-926b-4682-b8bc-e9945c452687\u002F5115.pdf",elocation_id:c,fpage:275,article_id:bJ,viewed:2659,downloaded:768,video_url:"https:\u002F\u002Fv1.oaepublish.com\u002Ffiles\u002Ftalkvideo\u002F5115.mp4",volume:p,year:cp,tag:"275-97",image:w,authors:d_,video_img:"https:\u002F\u002Fi.oaes.cc\u002Fuploads\u002F20240205\u002F647c73d0a5884b76a7d7697b7584a99d.jpg",journal_path:z,lpage:297,author:d_,specialissue:a,specialinfo:a,url_doi:bL},{date_published:1653840000,section:B,section_id:r,title:bN,doi:"10.20517\u002Fir.2022.11",abstract:"\u003Cp\u003ESince 2016 federated learning (FL) has been an evolving topic of discussion in the artificial intelligence (AI) research community. Applications of FL led to the development and study of federated reinforcement learning (FRL). Few works exist on the topic of FRL applied to autonomous vehicle (AV) platoons. In addition, most FRL works choose a single aggregation method (usually weight or gradient aggregation). We explore FRL's effectiveness as a means to improve AV platooning by designing and implementing an FRL framework atop a custom AV platoon environment. The application of FRL in AV platooning is studied under two scenarios: (1) Inter-platoon FRL (Inter-FRL) where FRL is applied to AVs across different platoons; (2) Intra-platoon FRL (Intra-FRL) where FRL is applied to AVs within a single platoon. Both Inter-FRL and Intra-FRL are applied to a custom AV platooning environment using both gradient and weight aggregation to observe the performance effects FRL can have on AV platoons relative to an AV platooning environment trained without FRL. It is concluded that Intra-FRL using weight aggregation (Intra-FRLWA) provides the best performance for controlling an AV platoon. In addition, we found that weight aggregation in FRL for AV platooning provides increases in performance relative to gradient aggregation. Finally, a performance analysis is conducted for Intra-FRLWA versus a platooning environment without FRL for platoons of length 3, 4 and 5 vehicles. It is concluded that Intra-FRLWA largely out-performs the platooning environment that is trained without FRL.\u003C\u002Fp\u003E",pdfurl:"https:\u002F\u002Ff.oaes.cc\u002Fxmlpdf\u002F84a3c9e0-beb5-4cc8-a73e-622ffb2f9749\u002F4885.pdf",elocation_id:c,fpage:145,article_id:bM,viewed:1269,downloaded:614,video_url:"https:\u002F\u002Fv1.oaepublish.com\u002Ffiles\u002Ftalkvideo\u002F4885.mp4",volume:p,year:cp,tag:"145-67",image:w,authors:d$,video_img:"https:\u002F\u002Fi.oaes.cc\u002Fuploads\u002F20240205\u002F56b67049975e4ed4826671b6d1efd788.jpg",journal_path:z,lpage:167,author:d$,specialissue:{id:ea,name:eb},specialinfo:a,url_doi:bO},{date_published:1647360000,section:bw,section_id:L,title:bQ,doi:"10.20517\u002Fir.2021.19",abstract:"\u003Cp\u003EThe extreme nonlinearity of robotic systems renders the control design step harder. The consideration of adaptive control in robotic manipulation started in the 1970s. However, in the presence of bounded disturbances, the limitations of adaptive control rise considerably, which led researchers to exploit some “algorithm modifications”. Unfortunately, these modifications often require a priori knowledge of bounds on the parameters and the perturbations and noise. In the 1990s, the field of Artificial Neural Networks was hugely investigated in general, and for control of dynamical systems in particular. Several types of Neural Networks (NNs) appear to be promising candidates for control system applications. In robotics, it all boils down to making the actuator perform the desired action. While purely control-based robots use the system model to define their input-output relations, Artificial Intelligence (AI)-based robots may or may not use the system model and rather manipulate the robot based on the experience they have with the system while training or possibly enhance it in real-time as well. In this paper, after discussing the drawbacks of adaptive control with bounded disturbances and the proposed modifications to overcome these limitations, we focus on presenting the work that implemented AI in nonlinear dynamical systems and particularly in robotics. We cite some work that targeted the inverted pendulum control problem using NNs. Finally, we emphasize the previous research concerning RL and Deep RL-based control problems and their implementation in robotics manipulation, while highlighting some of their major drawbacks in the field.\u003C\u002Fp\u003E",pdfurl:"https:\u002F\u002Ff.oaes.cc\u002Fxmlpdf\u002Fbc6ab2f2-3f45-49dd-8070-a1f091416d5a\u002F4634.pdf",elocation_id:c,fpage:37,article_id:bP,viewed:1757,downloaded:609,video_url:c,volume:p,year:cp,tag:"37-71",image:"https:\u002F\u002Foaepublishstorage.blob.core.windows.net\u002Fbc6ab2f2-3f45-49dd-8070-a1f091416d5a\u002F4634-coverimg.jpg",authors:ec,video_img:a,journal_path:z,lpage:aS,author:ec,specialissue:{id:ea,name:eb},specialinfo:a,url_doi:bR,image_list:"https:\u002F\u002Fimage.oaes.cc\u002Fbc6ab2f2-3f45-49dd-8070-a1f091416d5a\u002F4634-coverimg.jpg"},{date_published:cn,section:B,section_id:r,title:aF,doi:dN,abstract:dO,pdfurl:dP,elocation_id:c,fpage:dQ,article_id:aE,viewed:g,downloaded:g,video_url:c,volume:e,year:co,tag:dR,image:w,authors:aQ,video_img:a,journal_path:z,lpage:dS,author:aQ,specialissue:a,specialinfo:a,url_doi:aG},{date_published:cn,section:B,section_id:r,title:"Neurodynamics-based formation tracking control of leader-follower nonholonomic multiagent systems",doi:"10.20517\u002Fir.2024.21",abstract:"\u003Cp\u003EThis paper uses a bioinspired neurodynamic (BIN) approach to investigate the formation control problem of leader-follower nonholonomic multiagent systems. In scenarios where not all followers can receive the leader's state, a distributed adaptive estimator is presented to estimate the leader's state. The distributed formation controller, designed using the backstepping technique, utilizes the estimated leader states and neighboring formation tracking error. To address the issue of impractical velocity jumps, a BIN-based approach is integrated into the backstepping controller. Furthermore, considering the practical applications of nonholonomic multiagent systems, a backstepping controller with a saturation velocity constraint is proposed. Rigorous proofs are provided. Finally, the effectiveness of the presented formation control law is illustrated through numerical simulations.\u003C\u002Fp\u003E",pdfurl:"https:\u002F\u002Ff.oaes.cc\u002Fxmlpdf\u002Ff39a731b-c097-4c5a-a2b2-af4573b6f3f1\u002Fir4021.pdf",elocation_id:c,fpage:339,article_id:7351,viewed:g,downloaded:g,video_url:c,volume:e,year:co,tag:"339-62",image:w,authors:ed,video_img:a,journal_path:z,lpage:362,author:ed,specialissue:a,specialinfo:a,url_doi:"ir.2024.21"}],ArtNum:{view:7412,click:g,down:1789,read:g,like:Z,comment:w,xml_down:H,cite_click:dY,export_click:cq,cite_count:cC,share_count:k,tran_click:_,mp3_click:g,sharenum:g,id:b},articleShow:X,ArtBase:{seo:{title:aD,keywords:a,description:cE},picabstract:a,interview_pic:a,interview_url:a,review:a,video_url:cA,video_img:cF,oaestyle:cG,amastyle:cH,ctstyle:cI,acstyle:cJ,related:[{article_id:aE,journal_id:l,section_id:r,path:q,journal:n,ar_title:aF,date_published:cK,doi:aG,author:[{first_name:cL,middle_name:a,last_name:cM,ans:i,email:a,bio:a,photoUrl:a},{first_name:cN,middle_name:a,last_name:cO,ans:i,email:cP,bio:a,photoUrl:a},{first_name:cQ,middle_name:a,last_name:cR,ans:i,email:a,bio:a,photoUrl:a},{first_name:cS,middle_name:a,last_name:aH,ans:i,email:a,bio:a,photoUrl:a}]},{article_id:bx,journal_id:l,section_id:r,path:q,journal:n,ar_title:by,date_published:cT,doi:bz,author:[{first_name:cU,middle_name:a,last_name:A,ans:o,email:a,bio:a,photoUrl:a},{first_name:cV,middle_name:a,last_name:I,ans:o,email:a,bio:a,photoUrl:a},{first_name:cW,middle_name:a,last_name:J,ans:o,email:a,bio:a,photoUrl:a},{first_name:aI,middle_name:a,last_name:cX,ans:o,email:a,bio:a,photoUrl:a},{first_name:aH,middle_name:a,last_name:aJ,ans:o,email:a,bio:a,photoUrl:a},{first_name:K,middle_name:a,last_name:A,ans:o,email:a,bio:a,photoUrl:a},{first_name:cY,middle_name:a,last_name:aK,ans:cZ,email:c_,bio:a,photoUrl:a}]},{article_id:bA,journal_id:l,section_id:r,path:q,journal:n,ar_title:bB,date_published:c$,doi:bC,author:[{first_name:da,middle_name:a,last_name:db,ans:o,email:dc,bio:a,photoUrl:a},{first_name:dd,middle_name:a,last_name:J,ans:y,email:a,bio:a,photoUrl:a},{first_name:de,middle_name:a,last_name:I,ans:y,email:a,bio:a,photoUrl:a},{first_name:df,middle_name:a,last_name:dg,ans:o,email:a,bio:a,photoUrl:a}]},{article_id:bD,journal_id:l,section_id:r,path:q,journal:n,ar_title:bE,date_published:dh,doi:bF,author:[{first_name:di,middle_name:a,last_name:dj,ans:o,email:a,bio:a,photoUrl:a},{first_name:dk,middle_name:a,last_name:A,ans:y,email:a,bio:a,photoUrl:a},{first_name:dl,middle_name:a,last_name:dm,ans:y,email:dn,bio:a,photoUrl:a},{first_name:do0,middle_name:a,last_name:aJ,ans:dp,email:a,bio:a,photoUrl:a}]},{article_id:bG,journal_id:l,section_id:r,path:q,journal:n,ar_title:bH,date_published:dq,doi:bI,author:[{first_name:dr,middle_name:a,last_name:aI,ans:i,email:a,bio:a,photoUrl:a},{first_name:ds,middle_name:a,last_name:dt,ans:i,email:du,bio:a,photoUrl:a}]},{article_id:bJ,journal_id:l,section_id:L,path:q,journal:n,ar_title:bK,date_published:dv,doi:bL,author:[{first_name:dw,middle_name:a,last_name:aK,ans:i,email:a,bio:a,photoUrl:a},{first_name:A,middle_name:a,last_name:dx,ans:i,email:a,bio:a,photoUrl:a},{first_name:dy,middle_name:a,last_name:J,ans:i,email:dz,bio:a,photoUrl:a}]},{article_id:bM,journal_id:l,section_id:r,path:q,journal:n,ar_title:bN,date_published:dA,doi:bO,author:[{first_name:dB,middle_name:a,last_name:dC,ans:i,email:a,bio:a,photoUrl:a},{first_name:K,middle_name:a,last_name:K,ans:i,email:a,bio:a,photoUrl:a},{first_name:dD,middle_name:a,last_name:I,ans:i,email:a,bio:a,photoUrl:a}]},{article_id:bP,journal_id:l,section_id:L,path:q,journal:n,ar_title:bQ,date_published:dE,doi:bR,author:[{first_name:dF,middle_name:a,last_name:dG,ans:o,email:a,bio:a,photoUrl:a},{first_name:dH,middle_name:a,last_name:dI,ans:o,email:a,bio:a,photoUrl:a},{first_name:dJ,middle_name:a,last_name:dK,ans:y,email:a,bio:a,photoUrl:a}]}],editor:[]}}],fetch:{"data-v-0baa1603:0":{qKname:q,component:X,screenwidth:a}},error:c,state:{token:a,index:{data:{data:{footer:{},info:{},middle:{},nav:{},top:{}}},oaeNav:[{name:ee,sort:h,children:[{name:"Company",sort:h,url:"\u002Fabout\u002Fwho_we_are"},{name:"Latest News",sort:m,url:"\u002Fnews"},{name:cr,sort:s,url:"\u002Fabout\u002Fcontact_us"},{name:"History",sort:u,url:"\u002Fabout\u002Fhistory"},{name:"Careers",sort:v,url:"\u002Fabout\u002Fjoin_us"},{name:"Policies",sort:x,children:[{name:ef,sort:h,url:"\u002Fabout\u002Feditorial_policies"},{name:"Open Access Policy",sort:m,url:"\u002Fabout\u002Fopen_access_policy"},{name:"Research and Publication Ethics",sort:s,url:"\u002Fabout\u002Fresearch_and_publication_ethics"},{name:"Peer-review Policies",sort:u,url:"\u002Fabout\u002Fpeer_review_policies"},{name:"Publication Fees",sort:v,url:"\u002Fabout\u002Fpublication_fees"},{name:"Advertising Policy",sort:x,url:"\u002Fabout\u002Fadvertising_policy"}]}]},{name:eg,sort:m,children:[{name:"All Journals",sort:h,url:"\u002Falljournals"},{name:"Active Journals",sort:m,url:"\u002Factivejournals"},{name:"Archived Journals",sort:s,url:"\u002Factivedjournals"}]},{name:"Services",sort:s,children:[{name:"Language Editing",sort:h,url:"\u002Fabout\u002Flanguage_editing_services"},{name:"Layout & Production",sort:m,url:"\u002Fabout\u002Flayout_and_production"},{name:"Graphic Abstracts",sort:s,url:"\u002Fabout\u002Fgraphic_abstracts"},{name:"Video Abstracts",sort:u,url:"\u002Fabout\u002Fvideo_abstracts"},{name:"Think Tank",url:"https:\u002F\u002Fwww.oaescience.com\u002F"},{name:"Scierxiv",url:"https:\u002F\u002Fwww.scierxiv.com\u002F"},{name:"Submission System",url:"https:\u002F\u002Foaemesas.com\u002FLogin"}]},{name:"Collaborations",sort:u,children:[{name:"Strategic Collaborators",url:"\u002Fabout\u002Fcollaborators"},{name:"Journal Membership",url:"\u002Fpartners"},{name:"Conference Parterships",url:"\u002Fabout\u002Fparterships"}]},{name:"Insights",sort:v,children:[{name:"Latest Articles",url:"\u002Farticles"},{name:eh,url:"\u002Facademic"},{name:"Interactive Webinars",url:"\u002Fwebinars"}]},{name:"Academic Support",sort:x,children:[{name:"Author Hub",sort:h,url:"\u002Fabout\u002Fauthor"},{name:"Editor Hub",sort:m,url:"\u002Fabout\u002Feditor"},{name:"Reviewer Hub",sort:s,url:"\u002Fabout\u002Freviewer"},{name:"Conference Organizer",sort:u,url:"\u002Fabout\u002Forganizer"},{name:"Expert Lecture",sort:v,url:"\u002Fabout\u002Fexpert_lecture"}]}],oaeQkNav:[{name:eg,sort:m,children:[{name:ei,sort:h,children:[{name:$,sort:h,url:"\u002Fevcna"},{name:aa,sort:m,url:"\u002Fmrr"},{name:ab,sort:s,url:"\u002Fohir"}]},{name:ej,sort:m,children:[{name:ac,sort:h,url:"\u002Fcs"},{name:ad,sort:m,url:"\u002Fenergymater"},{name:ae,sort:s,url:"\u002Fjmi"},{name:af,sort:u,url:"\u002Fmicrostructures"},{name:ag,sort:v,url:"\u002Fminerals"},{name:ah,sort:x,url:"\u002Fss"}]},{name:ek,sort:s,children:[{name:ai,sort:h,url:"\u002Fcomengsys"},{name:aj,sort:m,url:"\u002Fdpr"},{name:ak,sort:s,url:"\u002Fgmo"},{name:n,sort:u,url:el},{name:al,sort:v,url:"\u002Fjsegc"},{name:am,sort:x,url:"\u002Fjsss"}]},{name:em,sort:u,children:[{name:an,sort:h,url:"\u002Fand"},{name:ao,sort:m,url:"\u002Fais"},{name:ap,sort:s,url:"\u002Fcdr"},{name:aq,sort:u,url:"\u002Fchatmed"},{name:ar,sort:v,url:"\u002Fhr"},{name:as,sort:x,url:"\u002Fjcmt"},{name:at,sort:aM,url:"\u002Fjtgg"},{name:au,sort:aN,url:"\u002Fmtod"},{name:av,sort:aO,url:"\u002Fmis"},{name:aw,sort:bV,url:"\u002Fpar"},{name:ax,sort:aP,url:"\u002Frdodj"},{name:ay,sort:dL,url:"\u002Fjca"},{name:az,sort:dM,url:"\u002Fvp"}]},{name:en,sort:v,children:[{name:aA,sort:h,url:"\u002Fcf"},{name:aB,sort:m,url:"\u002Fjeea"},{name:aC,sort:s,url:"\u002Fwecn"}]}]}],comList:{},jourtabs:[{category:g,category_name:"Whole",list:[{id:d,journal_id:aT,colour_tag:eo,isn:ep,title:an,image:eq,content:{url:er,eic:es,pub:f,total:F,index:et,rgba:eu,log_image:ev},time:c,category:e,sort:N,impact_factor:[],data_record:a,journal_name:an,journal_path:ew,is_show:d,total:aU},{id:p,journal_id:aV,colour_tag:ex,isn:ey,title:ao,image:ez,content:{url:eA,eic:eB,pub:f,total:cc,index:eC,rgba:eD,log_image:eE},time:c,category:e,sort:O,impact_factor:[{factor:eF,url:eG},{factor:j,url:a}],data_record:a,journal_name:ao,journal_path:eH,is_show:d,total:eI},{id:k,journal_id:M,colour_tag:eJ,isn:eK,title:ap,image:eL,content:{url:eM,eic:eN,pub:eO,total:eP,index:eQ,rgba:eR,log_image:eS},time:c,category:e,sort:d,impact_factor:[{factor:eT,url:eU},{factor:eV,url:eW},{factor:eX,url:eY},{factor:eZ,url:a},{factor:j,url:C}],data_record:a,journal_name:ap,journal_path:e_,is_show:d,total:e$},{id:e,journal_id:fa,colour_tag:fb,isn:fc,title:ai,image:fd,content:{url:fe,eic:ff,pub:f,total:cd,index:fg,rgba:fh,log_image:fi},time:c,category:k,sort:G,impact_factor:[{factor:fj,url:fk}],data_record:i,journal_name:ai,journal_path:fl,is_show:d,total:aS},{id:t,journal_id:bS,colour_tag:fm,isn:fn,title:aq,image:fo,content:{url:fp,eic:fq,total:bZ,pub:f,index:P,rgba:fr,log_image:fs},time:c,category:e,sort:aW,impact_factor:[],data_record:a,journal_name:aq,journal_path:ft,is_show:g,total:Q},{id:R,journal_id:G,colour_tag:fu,isn:fv,title:ac,image:fw,content:{url:fx,eic:fy,pub:f,total:cj,index:fz,rgba:fA,log_image:fB},time:c,category:p,sort:aX,impact_factor:[{factor:fC,url:fD}],data_record:a,journal_name:ac,journal_path:fE,is_show:d,total:fF},{id:S,journal_id:fG,colour_tag:fH,isn:fI,title:aA,image:fJ,content:{url:fK,eic:fL,pub:D,total:bY,index:P,rgba:fM,log_image:fN},time:c,category:t,sort:T,impact_factor:[],data_record:a,journal_name:aA,journal_path:fO,is_show:d,total:fP},{id:aY,journal_id:aZ,colour_tag:fQ,isn:fR,title:aj,image:fS,content:{url:fT,eic:fU,pub:f,total:bW,index:P,rgba:fV,log_image:fW},time:c,category:k,sort:l,impact_factor:[],data_record:a,journal_name:aj,journal_path:fX,is_show:g,total:l},{id:W,journal_id:Q,colour_tag:fY,isn:fZ,title:ad,image:f_,content:{url:f$,eic:ga,pub:f,total:cm,index:a_,rgba:gb,log_image:gc},time:c,category:p,sort:g,impact_factor:[{factor:j,url:C},{factor:gd,url:a},{factor:ge,url:a}],data_record:a,journal_name:ad,journal_path:gf,is_show:d,total:gg},{id:N,journal_id:a$,colour_tag:gh,isn:gi,title:$,image:gj,content:{url:gk,eic:gl,pub:U,total:ck,index:gm,rgba:gn,log_image:go},time:c,category:d,sort:p,impact_factor:[{factor:gp,url:gq},{factor:j,url:gr}],data_record:a,journal_name:$,journal_path:gs,is_show:d,total:gt},{id:ba,journal_id:gu,colour_tag:gv,isn:gw,title:ak,image:gx,content:{url:gy,eic:gz,pub:D,total:aP,index:a,rgba:gA,log_image:gB},time:c,category:k,sort:bb,impact_factor:[],data_record:a,journal_name:ak,journal_path:gC,is_show:d,total:V},{id:bc,journal_id:S,colour_tag:gD,isn:gE,title:ar,image:gF,content:{url:gG,eic:gH,pub:bd,total:gI,index:be,rgba:gJ,log_image:gK},time:c,category:e,sort:t,impact_factor:[{factor:gL,url:gM},{factor:gN,url:gO},{factor:j,url:C}],data_record:a,journal_name:ar,journal_path:gP,is_show:d,total:gQ},{id:Z,journal_id:l,colour_tag:cs,isn:bU,title:n,image:ct,content:{url:gR,eic:cu,pub:f,total:Y,index:gS,rgba:cv,log_image:cw},time:c,category:k,sort:gT,impact_factor:[{factor:cx,url:bf},{factor:j,url:a}],data_record:a,journal_name:n,journal_path:q,is_show:d,total:bg},{id:M,journal_id:R,colour_tag:gU,isn:gV,title:as,image:gW,content:{url:gX,pub:bd,total:gY,index:be,rgba:gZ,log_image:g_},time:c,category:e,sort:k,impact_factor:[{factor:g$,url:ha},{factor:hb,url:hc},{factor:j,url:C}],data_record:a,journal_name:as,journal_path:hd,is_show:d,total:he},{id:O,journal_id:bh,colour_tag:hf,isn:hg,title:ae,image:hh,content:{url:hi,eic:hj,pub:f,total:ce,index:a_,rgba:hk,log_image:hl},time:c,category:p,sort:aU,impact_factor:[{factor:j,url:a}],data_record:a,journal_name:ae,journal_path:hm,is_show:d,total:hn},{id:bi,journal_id:O,colour_tag:ho,isn:hp,title:at,image:hq,content:{url:hr,eic:hs,pub:E,total:ht,index:hu,rgba:hv,log_image:hw},time:c,category:e,sort:R,impact_factor:[{factor:hx,url:hy},{factor:j,url:hz}],data_record:a,journal_name:at,journal_path:hA,is_show:d,total:hB},{id:a$,journal_id:ba,colour_tag:hC,isn:hD,title:bj,image:hE,content:{url:hF,eic:hG,pub:hH,total:cg,index:a,rgba:hI,log_image:hJ},time:c,category:e,sort:hK,impact_factor:[],data_record:a,journal_name:bj,journal_path:hL,is_show:g,total:hM},{id:H,journal_id:H,colour_tag:hN,isn:hO,title:al,image:hP,content:{url:hQ,eic:hR,pub:f,total:F,index:hS,rgba:hT,log_image:hU},time:c,category:k,sort:hV,impact_factor:[],data_record:a,journal_name:al,journal_path:hW,is_show:g,total:hX},{id:hY,journal_id:bi,colour_tag:hZ,isn:h_,title:am,image:h$,content:{url:ia,pub:U,total:cb,index:ib,rgba:ic,log_image:id},time:c,category:k,sort:ie,impact_factor:[],data_record:a,journal_name:am,journal_path:if0,is_show:d,total:ig},{id:G,journal_id:ih,colour_tag:ii,isn:ij,title:aB,image:ik,content:{url:il,eic:im,pub:f,total:F,index:in0,rgba:io,log_image:ip},time:c,category:t,sort:bg,impact_factor:[{factor:iq,url:ir}],data_record:i,journal_name:aB,journal_path:is,is_show:d,total:it},{id:_,journal_id:iu,colour_tag:iv,isn:iw,title:aC,image:ix,content:{url:iy,eic:iz,pub:D,total:ca,index:iA,rgba:iB,log_image:iC},time:c,category:t,sort:iD,impact_factor:[{factor:iE,url:iF}],data_record:a,journal_name:aC,journal_path:iG,is_show:d,total:iH},{id:aV,journal_id:bk,colour_tag:iI,isn:iJ,title:au,image:iK,content:{url:iL,eic:iM,pub:f,total:Y,index:bl,rgba:iN,log_image:iO},time:c,category:e,sort:e,impact_factor:[{factor:iP,url:iQ},{factor:j,url:iR}],data_record:a,journal_name:au,journal_path:iS,is_show:d,total:iT},{id:aT,journal_id:iU,colour_tag:iV,isn:iW,title:aa,image:iX,content:{url:iY,eic:iZ,pub:f,total:cf,index:i_,rgba:i$,log_image:ja},time:c,category:d,sort:bT,impact_factor:[{factor:jb,url:jc},{factor:jd,url:je},{factor:bm,url:jf}],data_record:i,journal_name:aa,journal_path:jg,is_show:d,total:jh},{id:aZ,journal_id:bn,colour_tag:ji,isn:jj,title:af,image:jk,content:{url:jl,eic:jm,pub:f,total:ch,index:jn,rgba:jo,log_image:jp},time:c,category:p,sort:jq,impact_factor:[{factor:jr,url:js},{factor:j,url:jt}],data_record:a,journal_name:af,journal_path:ju,is_show:d,total:jv},{id:aW,journal_id:jw,colour_tag:jx,isn:jy,title:ag,image:jz,content:{url:jA,eic:jB,pub:f,total:bX,index:a,rgba:jC,log_image:jD},time:c,category:p,sort:jE,impact_factor:[],data_record:a,journal_name:ag,journal_path:jF,is_show:g,total:V},{id:bk,journal_id:Z,colour_tag:jG,isn:jH,title:av,image:jI,content:{url:jJ,eic:jK,pub:E,total:jL,index:jM,rgba:jN,log_image:jO},time:c,category:e,sort:W,impact_factor:[{factor:jP,url:jQ},{factor:j,url:a}],data_record:a,journal_name:av,journal_path:jR,is_show:d,total:jS},{id:jT,journal_id:t,colour_tag:jU,isn:jV,title:bo,image:jW,content:{url:jX,pub:bp,total:jY,index:a,rgba:jZ,log_image:j_},time:c,category:e,sort:j$,impact_factor:[],data_record:a,journal_name:bo,journal_path:ka,is_show:g,total:kb},{id:kc,journal_id:_,colour_tag:kd,isn:ke,title:ab,image:kf,content:{url:kg,pub:U,total:b_,index:kh,eic:ki,rgba:kj,log_image:kk},time:c,category:d,sort:kl,impact_factor:[],data_record:a,journal_name:ab,journal_path:km,is_show:g,total:bh},{id:bq,journal_id:k,colour_tag:kn,isn:ko,title:aw,image:kp,content:{url:kq,eic:kr,pub:bp,total:ks,rgba:kt,index:ku,log_image:kv},time:c,category:e,sort:aY,impact_factor:[{factor:kw,url:kx},{factor:j,url:ky}],data_record:i,journal_name:aw,journal_path:kz,is_show:d,total:kA},{id:aX,journal_id:bq,colour_tag:kB,isn:kC,title:ax,image:kD,content:{url:kE,eic:kF,pub:f,total:b$,index:kG,rgba:kH,log_image:kI},time:c,category:e,sort:kJ,impact_factor:[],data_record:a,journal_name:ax,journal_path:kK,is_show:d,total:kL},{id:br,journal_id:br,colour_tag:kM,isn:kN,title:ah,image:kO,content:{url:kP,eic:kQ,pub:f,total:ci,index:kR,rgba:kS,log_image:kT},time:c,category:p,sort:kU,impact_factor:[{factor:kV,url:kW},{factor:j,url:a}],data_record:a,journal_name:ah,journal_path:kX,is_show:d,total:kY},{id:V,journal_id:kZ,colour_tag:k_,isn:bs,title:bt,image:k$,content:{url:la,eic:lb,pub:f,total:m,index:a,rgba:lc,log_image:ld},time:c,category:t,sort:le,impact_factor:[],data_record:a,journal_name:bt,journal_path:lf,is_show:g,total:p},{id:bn,journal_id:bb,colour_tag:lg,isn:bs,title:bu,image:lh,content:{url:li,eic:lj,pub:D,total:s,index:a,rgba:lk,log_image:ll},time:c,category:k,sort:cq,impact_factor:[],data_record:a,journal_name:bu,journal_path:lm,is_show:g,total:k},{id:Q,journal_id:bc,colour_tag:ln,isn:lo,title:bv,image:lp,content:{url:lq,eic:lr,pub:E,total:aL,index:a,rgba:ls,log_image:lt},time:c,category:e,sort:lu,impact_factor:[],data_record:a,journal_name:bv,journal_path:lv,is_show:g,total:l},{id:T,journal_id:T,colour_tag:lw,isn:lx,title:ay,image:ly,content:{url:lz,pub:f,total:cl,index:bl,rgba:lA,log_image:lB},time:c,category:e,sort:t,impact_factor:[{factor:lC,url:lD},{factor:bm,url:lE}],data_record:i,journal_name:ay,journal_path:lF,is_show:d,total:lG},{id:lH,journal_id:N,colour_tag:lI,isn:lJ,title:az,image:lK,content:{url:lL,pub:E,total:lM,index:lN,rgba:lO,log_image:lP},time:c,category:e,sort:S,impact_factor:[{factor:lQ,url:lR}],data_record:a,journal_name:az,journal_path:lS,is_show:d,total:lT}]},{category:d,category_name:ei,list:[{id:N,journal_id:a$,colour_tag:gh,isn:gi,title:$,image:gj,content:{url:gk,eic:gl,pub:U,total:ck,index:gm,rgba:gn,log_image:go},time:c,category:d,sort:p,impact_factor:[{factor:gp,url:gq},{factor:j,url:gr}],data_record:a,journal_name:$,journal_path:gs,is_show:d,total:gt},{id:aT,journal_id:iU,colour_tag:iV,isn:iW,title:aa,image:iX,content:{url:iY,eic:iZ,pub:f,total:cf,index:i_,rgba:i$,log_image:ja},time:c,category:d,sort:bT,impact_factor:[{factor:jb,url:jc},{factor:jd,url:je},{factor:bm,url:jf}],data_record:i,journal_name:aa,journal_path:jg,is_show:d,total:jh},{id:kc,journal_id:_,colour_tag:kd,isn:ke,title:ab,image:kf,content:{url:kg,pub:U,total:b_,index:kh,eic:ki,rgba:kj,log_image:kk},time:c,category:d,sort:kl,impact_factor:[],data_record:a,journal_name:ab,journal_path:km,is_show:g,total:bh}]},{category:p,category_name:ej,list:[{id:R,journal_id:G,colour_tag:fu,isn:fv,title:ac,image:fw,content:{url:fx,eic:fy,pub:f,total:cj,index:fz,rgba:fA,log_image:fB},time:c,category:p,sort:aX,impact_factor:[{factor:fC,url:fD}],data_record:a,journal_name:ac,journal_path:fE,is_show:d,total:fF},{id:W,journal_id:Q,colour_tag:fY,isn:fZ,title:ad,image:f_,content:{url:f$,eic:ga,pub:f,total:cm,index:a_,rgba:gb,log_image:gc},time:c,category:p,sort:g,impact_factor:[{factor:j,url:C},{factor:gd,url:a},{factor:ge,url:a}],data_record:a,journal_name:ad,journal_path:gf,is_show:d,total:gg},{id:O,journal_id:bh,colour_tag:hf,isn:hg,title:ae,image:hh,content:{url:hi,eic:hj,pub:f,total:ce,index:a_,rgba:hk,log_image:hl},time:c,category:p,sort:aU,impact_factor:[{factor:j,url:a}],data_record:a,journal_name:ae,journal_path:hm,is_show:d,total:hn},{id:aZ,journal_id:bn,colour_tag:ji,isn:jj,title:af,image:jk,content:{url:jl,eic:jm,pub:f,total:ch,index:jn,rgba:jo,log_image:jp},time:c,category:p,sort:jq,impact_factor:[{factor:jr,url:js},{factor:j,url:jt}],data_record:a,journal_name:af,journal_path:ju,is_show:d,total:jv},{id:aW,journal_id:jw,colour_tag:jx,isn:jy,title:ag,image:jz,content:{url:jA,eic:jB,pub:f,total:bX,index:a,rgba:jC,log_image:jD},time:c,category:p,sort:jE,impact_factor:[],data_record:a,journal_name:ag,journal_path:jF,is_show:g,total:V},{id:br,journal_id:br,colour_tag:kM,isn:kN,title:ah,image:kO,content:{url:kP,eic:kQ,pub:f,total:ci,index:kR,rgba:kS,log_image:kT},time:c,category:p,sort:kU,impact_factor:[{factor:kV,url:kW},{factor:j,url:a}],data_record:a,journal_name:ah,journal_path:kX,is_show:d,total:kY}]},{category:k,category_name:ek,list:[{id:e,journal_id:fa,colour_tag:fb,isn:fc,title:ai,image:fd,content:{url:fe,eic:ff,pub:f,total:cd,index:fg,rgba:fh,log_image:fi},time:c,category:k,sort:G,impact_factor:[{factor:fj,url:fk}],data_record:i,journal_name:ai,journal_path:fl,is_show:d,total:aS},{id:aY,journal_id:aZ,colour_tag:fQ,isn:fR,title:aj,image:fS,content:{url:fT,eic:fU,pub:f,total:bW,index:P,rgba:fV,log_image:fW},time:c,category:k,sort:l,impact_factor:[],data_record:a,journal_name:aj,journal_path:fX,is_show:g,total:l},{id:ba,journal_id:gu,colour_tag:gv,isn:gw,title:ak,image:gx,content:{url:gy,eic:gz,pub:D,total:aP,index:a,rgba:gA,log_image:gB},time:c,category:k,sort:bb,impact_factor:[],data_record:a,journal_name:ak,journal_path:gC,is_show:d,total:V},{id:Z,journal_id:l,colour_tag:cs,isn:bU,title:n,image:ct,content:{url:gR,eic:cu,pub:f,total:Y,index:gS,rgba:cv,log_image:cw},time:c,category:k,sort:gT,impact_factor:[{factor:cx,url:bf},{factor:j,url:a}],data_record:a,journal_name:n,journal_path:q,is_show:d,total:bg},{id:H,journal_id:H,colour_tag:hN,isn:hO,title:al,image:hP,content:{url:hQ,eic:hR,pub:f,total:F,index:hS,rgba:hT,log_image:hU},time:c,category:k,sort:hV,impact_factor:[],data_record:a,journal_name:al,journal_path:hW,is_show:g,total:hX},{id:hY,journal_id:bi,colour_tag:hZ,isn:h_,title:am,image:h$,content:{url:ia,pub:U,total:cb,index:ib,rgba:ic,log_image:id},time:c,category:k,sort:ie,impact_factor:[],data_record:a,journal_name:am,journal_path:if0,is_show:d,total:ig},{id:bn,journal_id:bb,colour_tag:lg,isn:bs,title:bu,image:lh,content:{url:li,eic:lj,pub:D,total:s,index:a,rgba:lk,log_image:ll},time:c,category:k,sort:cq,impact_factor:[],data_record:a,journal_name:bu,journal_path:lm,is_show:g,total:k}]},{category:e,category_name:em,list:[{id:d,journal_id:aT,colour_tag:eo,isn:ep,title:an,image:eq,content:{url:er,eic:es,pub:f,total:F,index:et,rgba:eu,log_image:ev},time:c,category:e,sort:N,impact_factor:[],data_record:a,journal_name:an,journal_path:ew,is_show:d,total:aU},{id:p,journal_id:aV,colour_tag:ex,isn:ey,title:ao,image:ez,content:{url:eA,eic:eB,pub:f,total:cc,index:eC,rgba:eD,log_image:eE},time:c,category:e,sort:O,impact_factor:[{factor:eF,url:eG},{factor:j,url:a}],data_record:a,journal_name:ao,journal_path:eH,is_show:d,total:eI},{id:k,journal_id:M,colour_tag:eJ,isn:eK,title:ap,image:eL,content:{url:eM,eic:eN,pub:eO,total:eP,index:eQ,rgba:eR,log_image:eS},time:c,category:e,sort:d,impact_factor:[{factor:eT,url:eU},{factor:eV,url:eW},{factor:eX,url:eY},{factor:eZ,url:a},{factor:j,url:C}],data_record:a,journal_name:ap,journal_path:e_,is_show:d,total:e$},{id:t,journal_id:bS,colour_tag:fm,isn:fn,title:aq,image:fo,content:{url:fp,eic:fq,total:bZ,pub:f,index:P,rgba:fr,log_image:fs},time:c,category:e,sort:aW,impact_factor:[],data_record:a,journal_name:aq,journal_path:ft,is_show:g,total:Q},{id:bc,journal_id:S,colour_tag:gD,isn:gE,title:ar,image:gF,content:{url:gG,eic:gH,pub:bd,total:gI,index:be,rgba:gJ,log_image:gK},time:c,category:e,sort:t,impact_factor:[{factor:gL,url:gM},{factor:gN,url:gO},{factor:j,url:C}],data_record:a,journal_name:ar,journal_path:gP,is_show:d,total:gQ},{id:M,journal_id:R,colour_tag:gU,isn:gV,title:as,image:gW,content:{url:gX,pub:bd,total:gY,index:be,rgba:gZ,log_image:g_},time:c,category:e,sort:k,impact_factor:[{factor:g$,url:ha},{factor:hb,url:hc},{factor:j,url:C}],data_record:a,journal_name:as,journal_path:hd,is_show:d,total:he},{id:bi,journal_id:O,colour_tag:ho,isn:hp,title:at,image:hq,content:{url:hr,eic:hs,pub:E,total:ht,index:hu,rgba:hv,log_image:hw},time:c,category:e,sort:R,impact_factor:[{factor:hx,url:hy},{factor:j,url:hz}],data_record:a,journal_name:at,journal_path:hA,is_show:d,total:hB},{id:a$,journal_id:ba,colour_tag:hC,isn:hD,title:bj,image:hE,content:{url:hF,eic:hG,pub:hH,total:cg,index:a,rgba:hI,log_image:hJ},time:c,category:e,sort:hK,impact_factor:[],data_record:a,journal_name:bj,journal_path:hL,is_show:g,total:hM},{id:aV,journal_id:bk,colour_tag:iI,isn:iJ,title:au,image:iK,content:{url:iL,eic:iM,pub:f,total:Y,index:bl,rgba:iN,log_image:iO},time:c,category:e,sort:e,impact_factor:[{factor:iP,url:iQ},{factor:j,url:iR}],data_record:a,journal_name:au,journal_path:iS,is_show:d,total:iT},{id:bk,journal_id:Z,colour_tag:jG,isn:jH,title:av,image:jI,content:{url:jJ,eic:jK,pub:E,total:jL,index:jM,rgba:jN,log_image:jO},time:c,category:e,sort:W,impact_factor:[{factor:jP,url:jQ},{factor:j,url:a}],data_record:a,journal_name:av,journal_path:jR,is_show:d,total:jS},{id:jT,journal_id:t,colour_tag:jU,isn:jV,title:bo,image:jW,content:{url:jX,pub:bp,total:jY,index:a,rgba:jZ,log_image:j_},time:c,category:e,sort:j$,impact_factor:[],data_record:a,journal_name:bo,journal_path:ka,is_show:g,total:kb},{id:bq,journal_id:k,colour_tag:kn,isn:ko,title:aw,image:kp,content:{url:kq,eic:kr,pub:bp,total:ks,rgba:kt,index:ku,log_image:kv},time:c,category:e,sort:aY,impact_factor:[{factor:kw,url:kx},{factor:j,url:ky}],data_record:i,journal_name:aw,journal_path:kz,is_show:d,total:kA},{id:aX,journal_id:bq,colour_tag:kB,isn:kC,title:ax,image:kD,content:{url:kE,eic:kF,pub:f,total:b$,index:kG,rgba:kH,log_image:kI},time:c,category:e,sort:kJ,impact_factor:[],data_record:a,journal_name:ax,journal_path:kK,is_show:d,total:kL},{id:Q,journal_id:bc,colour_tag:ln,isn:lo,title:bv,image:lp,content:{url:lq,eic:lr,pub:E,total:aL,index:a,rgba:ls,log_image:lt},time:c,category:e,sort:lu,impact_factor:[],data_record:a,journal_name:bv,journal_path:lv,is_show:g,total:l},{id:T,journal_id:T,colour_tag:lw,isn:lx,title:ay,image:ly,content:{url:lz,pub:f,total:cl,index:bl,rgba:lA,log_image:lB},time:c,category:e,sort:t,impact_factor:[{factor:lC,url:lD},{factor:bm,url:lE}],data_record:i,journal_name:ay,journal_path:lF,is_show:d,total:lG},{id:lH,journal_id:N,colour_tag:lI,isn:lJ,title:az,image:lK,content:{url:lL,pub:E,total:lM,index:lN,rgba:lO,log_image:lP},time:c,category:e,sort:S,impact_factor:[{factor:lQ,url:lR}],data_record:a,journal_name:az,journal_path:lS,is_show:d,total:lT}]},{category:t,category_name:en,list:[{id:S,journal_id:fG,colour_tag:fH,isn:fI,title:aA,image:fJ,content:{url:fK,eic:fL,pub:D,total:bY,index:P,rgba:fM,log_image:fN},time:c,category:t,sort:T,impact_factor:[],data_record:a,journal_name:aA,journal_path:fO,is_show:d,total:fP},{id:G,journal_id:ih,colour_tag:ii,isn:ij,title:aB,image:ik,content:{url:il,eic:im,pub:f,total:F,index:in0,rgba:io,log_image:ip},time:c,category:t,sort:bg,impact_factor:[{factor:iq,url:ir}],data_record:i,journal_name:aB,journal_path:is,is_show:d,total:it},{id:_,journal_id:iu,colour_tag:iv,isn:iw,title:aC,image:ix,content:{url:iy,eic:iz,pub:D,total:ca,index:iA,rgba:iB,log_image:iC},time:c,category:t,sort:iD,impact_factor:[{factor:iE,url:iF}],data_record:a,journal_name:aC,journal_path:iG,is_show:d,total:iH},{id:V,journal_id:kZ,colour_tag:k_,isn:bs,title:bt,image:k$,content:{url:la,eic:lb,pub:f,total:m,index:a,rgba:lc,log_image:ld},time:c,category:t,sort:le,impact_factor:[],data_record:a,journal_name:bt,journal_path:lf,is_show:g,total:p}]}],trdId:a,timeLong:g,oldPath:a,qkactiveIndex:lU,oaeactiveIndex:lU,scopusCite:"data:image\u002Fjpeg;base64,\u002F9j\u002F4QnqRXhpZgAATU0AKgAAAAgADAEAAAMAAAABAZAAAAEBAAMAAAABAZAAAAECAAMAAAADAAAAngEGAAMAAAABAAIAAAESAAMAAAABAAEAAAEVAAMAAAABAAMAAAEaAAUAAAABAAAApAEbAAUAAAABAAAArAEoAAMAAAABAAIAAAExAAIAAAAfAAAAtAEyAAIAAAAUAAAA04dpAAQAAAABAAAA6AAAASAACAAIAAgACvyAAAAnEAAK\u002FIAAACcQQWRvYmUgUGhvdG9zaG9wIDIyLjQgKFdpbmRvd3MpADIwMjQ6MDY6MTIgMTQ6MDI6NDYAAAAEkAAABwAAAAQwMjMxoAEAAwAAAAEAAQAAoAIABAAAAAEAAABgoAMABAAAAAEAAABgAAAAAAAAAAYBAwADAAAAAQAGAAABGgAFAAAAAQAAAW4BGwAFAAAAAQAAAXYBKAADAAAAAQACAAACAQAEAAAAAQAAAX4CAgAEAAAAAQAACGQAAAAAAAAASAAAAAEAAABIAAAAAf\u002FY\u002F+0ADEFkb2JlX0NNAAH\u002F7gAOQWRvYmUAZIAAAAAB\u002F9sAhAAMCAgICQgMCQkMEQsKCxEVDwwMDxUYExMVExMYEQwMDAwMDBEMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMAQ0LCw0ODRAODhAUDg4OFBQODg4OFBEMDAwMDBERDAwMDAwMEQwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAz\u002FwAARCABgAGADASIAAhEBAxEB\u002F90ABAAG\u002F8QBPwAAAQUBAQEBAQEAAAAAAAAAAwABAgQFBgcICQoLAQABBQEBAQEBAQAAAAAAAAABAAIDBAUGBwgJCgsQAAEEAQMCBAIFBwYIBQMMMwEAAhEDBCESMQVBUWETInGBMgYUkaGxQiMkFVLBYjM0coLRQwclklPw4fFjczUWorKDJkSTVGRFwqN0NhfSVeJl8rOEw9N14\u002FNGJ5SkhbSVxNTk9KW1xdXl9VZmdoaWprbG1ub2N0dXZ3eHl6e3x9fn9xEAAgIBAgQEAwQFBgcHBgU1AQACEQMhMRIEQVFhcSITBTKBkRShsUIjwVLR8DMkYuFygpJDUxVjczTxJQYWorKDByY1wtJEk1SjF2RFVTZ0ZeLys4TD03Xj80aUpIW0lcTU5PSltcXV5fVWZnaGlqa2xtbm9ic3R1dnd4eXp7fH\u002F9oADAMBAAIRAxEAPwCwkkkuYepUkkkkpSSSSSlJJJJKUkkkkpSSSSSn\u002F9Cwkkr3RumnqWc2gkilo33OHO0fmt\u002FlWOXMwhKchGIuUjQennOMImUjUYiygxMHMzXbcWl1saFw0aP61joYtFv1U6wRJFTT4F5\u002FgxdZbbhdMw9z9tGNSAGgD\u002FNa1o+k5yxnfXPFDoZjWuZ2cS0H\u002FN3LRPJ8tiAGbKeM9v4epzhznNZbODEOAdT\u002FAN96YuHk9B6vitL7McvYOXVHfHxa33\u002F9FUF2bPrZ0p9L37nV2NaS2p7SC4x9FpbuauNc99jnWWGXvJc8+bjucqvNYsMOE4Z8fFdjfhbXK5c8+IZsfAY1W44lk7Wue8MY0veeGtBJP9lqs9N6df1LKGPT7QPdbYRIY3x\u002FrO\u002FMau3wOnYXTKC2hoYAJstd9J0fnWPTuW5Oeb1XwQH6X\u002Feo5rnIYPTXHkP6I\u002F7p5Gj6t9ZuaHegKgf9K4NP+aN7kV31T6wBIFTvIPP8WLbyfrZ0qkltRfkEd6x7f89+1qAz654ZdD8a1rfEbT+G5WPY5GOhyknvf\u002Fexa\u002Fv8\u002FL1DCAOxH\u002FfSedy+l9QwtcmhzG\u002Fvj3M\u002Fz2T\u002FANJVV03XPrDhZnS3UYjybLnBr2FpaQwHc+Z\u002FejauZVPmceOE6xT441d7\u002FT0tvlsmWcLyw9uV1W3+FUn\u002F0bC6L6mOZ9oy2n6ZYwj4Avlc6rGBnXdPy2ZVOpbo5h0Dmn6TCud5bIMeaEztE6\u002FX0vR8zjOXDOA3kNPMep6j63499vT67KgXMos32tGvthzd\u002FwDYlcfzwvQ+n9SxOo0C3HfP79Z0c0\u002FuvaqOd9V+mZW59TTi2nXdX9Gf5VR9i0Oa5Q5z72KQlxDa9\u002F7snP5TnBgj7OaJjwk61tf70XikiQBJ4Cv9T6Lm9MO64CygmG3s+j8LG\u002F4NUANzmtPDnAH5mFmThKEuGQMZDoXUhOM4iUSJRPUPc\u002FVzp4wumsLh+myALbT31HsZ\u002FYYsT61dVfdkHp1Toopj1o\u002FPefdsd\u002FIr\u002FwCrXXAAAAaAaALza+x1uRda4y6yx7ifi4rT56Xs4IYYaCWh\u002Fuw3\u002FwAZy+QHvZ8maepjqP709v8AFYJJJLKdZSSSSSn\u002F0rCSSkyu2wkVsdYWjc4NBdAH5x2rmKvZ6hZj31vFlbiyxv0XtJBH9pq2+nfWvNoc1maPtNPBeABYB46e2xYYIPGqRIAk6BSYs2TEbhIx\u002FwCifMLMuHHlFTiJf9If3ZPo9dmPmYwewi2i9sjuHNPiCuD6piN6f1K3Hb\u002FN1Pa9k\u002FuGLG\u002F5v0V2H1dx7sfpFFdwLXkF+w8tDnF7W\u002F5rlzH1oe1\u002FWrgNQxjGH4xu\u002FwC\u002FrR5\u002F1cviySHDO4\u002F86PFKLm\u002FD\u002FRzOXHE8WMCX\u002FMlwxk9w1wc0OaZBEg+RXnGXS7HzL6Hc12OHync3\u002Foldj9WeojM6c2px\u002FT4oFbx3IH81Z\u002FaaqX1n6Jbe\u002FwDaGIwvfEX1N+k4D6NjP3nNTuch94wQy49eH1UP3ZfN\u002FireSn935ieLJ6eL02f3o\u002FJ\u002FjPLJJT945Hgksh2FJJgQdAU6Sn\u002F\u002F07C6L6oX4dLsgW2tZkWloY12hLWj81x+l73LnUiAdDqucwZTiyRyACXD0Pi9JnxDLjljJMeLqPB9AyOjdKynepdjVveeXgQT\u002FaZtUaOhdIx7RbVisFjdWuMug+W8uXC1ZOVT\u002FM3WV+TXuA+6VK3MzLhttyLbB4Oe4j8qu\u002Ff8N8RwDj7+n\u002FpcLS+4Zq4RzB4Nq9X\u002FAEeJ7Lq\u002FX8Xp9ZYwi7KI9lQMgH960j6DVxNllltj7bXb7LCXPce5KiABxokqvM81PORxemI+WIbXLcrDBEiPqkfmkU+Fm5GBktycd0PboQfoub3Y\u002FwDkrs+mfWDA6g0N3Ci\u002F86mwgH\u002Frbvo2LhUiAedUeX5vJg0Hqgd4n9iOZ5THn1PpmNpj9v7z6Dk9J6ZmO35GPXY4\u002FnxBP9psOQW\u002FVzojTIxGH4lx\u002FwCqcuJqysukRTfbWPBr3AfdKk\u002FPz7BFmVc4eBsd\u002FerJ5\u002FAdZcuDL\u002FBP\u002FO4WsOQzjSPMER\u002Fwh\u002FzeJ6j6y4+BX0h1bPTpfW5r6WNhpJB9zWtH7zFyCUaydT4nUpKpzGYZp8QiIacNDwbfL4ThhwGZnqZWfF\u002F\u002F1LCSSS5h6lSSSSSlJJJJKUkkkkpSSSSSlJJJJKf\u002F2f\u002FtEX5QaG90b3Nob3AgMy4wADhCSU0EBAAAAAAABxwCAAACAAAAOEJJTQQlAAAAAAAQ6PFc8y\u002FBGKGie2etxWTVujhCSU0EOgAAAAAA1wAAABAAAAABAAAAAAALcHJpbnRPdXRwdXQAAAAFAAAAAFBzdFNib29sAQAAAABJbnRlZW51bQAAAABJbnRlAAAAAEltZyAAAAAPcHJpbnRTaXh0ZWVuQml0Ym9vbAAAAAALcHJpbnRlck5hbWVURVhUAAAAAQAAAAAAD3ByaW50UHJvb2ZTZXR1cE9iamMAAAAFaCFoN4u+f24AAAAAAApwcm9vZlNldHVwAAAAAQAAAABCbHRuZW51bQAAAAxidWlsdGluUHJvb2YAAAAJcHJvb2ZDTVlLADhCSU0EOwAAAAACLQAAABAAAAABAAAAAAAScHJpbnRPdXRwdXRPcHRpb25zAAAAFwAAAABDcHRuYm9vbAAAAAAAQ2xicmJvb2wAAAAAAFJnc01ib29sAAAAAABDcm5DYm9vbAAAAAAAQ250Q2Jvb2wAAAAAAExibHNib29sAAAAAABOZ3R2Ym9vbAAAAAAARW1sRGJvb2wAAAAAAEludHJib29sAAAAAABCY2tnT2JqYwAAAAEAAAAAAABSR0JDAAAAAwAAAABSZCAgZG91YkBv4AAAAAAAAAAAAEdybiBkb3ViQG\u002FgAAAAAAAAAAAAQmwgIGRvdWJAb+AAAAAAAAAAAABCcmRUVW50RiNSbHQAAAAAAAAAAAAAAABCbGQgVW50RiNSbHQAAAAAAAAAAAAAAABSc2x0VW50RiNQeGxAUgAAAAAAAAAAAAp2ZWN0b3JEYXRhYm9vbAEAAAAAUGdQc2VudW0AAAAAUGdQcwAAAABQZ1BDAAAAAExlZnRVbnRGI1JsdAAAAAAAAAAAAAAAAFRvcCBVbnRGI1JsdAAAAAAAAAAAAAAAAFNjbCBVbnRGI1ByY0BZAAAAAAAAAAAAEGNyb3BXaGVuUHJpbnRpbmdib29sAAAAAA5jcm9wUmVjdEJvdHRvbWxvbmcAAAAAAAAADGNyb3BSZWN0TGVmdGxvbmcAAAAAAAAADWNyb3BSZWN0UmlnaHRsb25nAAAAAAAAAAtjcm9wUmVjdFRvcGxvbmcAAAAAADhCSU0D7QAAAAAAEABIAAAAAQACAEgAAAABAAI4QklNBCYAAAAAAA4AAAAAAAAAAAAAP4AAADhCSU0D8gAAAAAACgAA\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002FAAA4QklNBA0AAAAAAAQAAAB4OEJJTQQZAAAAAAAEAAAAHjhCSU0D8wAAAAAACQAAAAAAAAAAAQA4QklNJxAAAAAAAAoAAQAAAAAAAAACOEJJTQP1AAAAAABIAC9mZgABAGxmZgAGAAAAAAABAC9mZgABAKGZmgAGAAAAAAABADIAAAABAFoAAAAGAAAAAAABADUAAAABAC0AAAAGAAAAAAABOEJJTQP4AAAAAABwAAD\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002FA+gAAAAA\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002FwPoAAAAAP\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F8D6AAAAAD\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002FA+gAADhCSU0ECAAAAAAAEAAAAAEAAAJAAAACQAAAAAA4QklNBB4AAAAAAAQAAAAAOEJJTQQaAAAAAANTAAAABgAAAAAAAAAAAAAAYAAAAGAAAAAPADIAMAAyADQAMAA2ADEAMgAtADEANAAwADEANQA1AAAAAQAAAAAAAAAAAAAAAAAAAAAAAAABAAAAAAAAAAAAAABgAAAAYAAAAAAAAAAAAAAAAAAAAAABAAAAAAAAAAAAAAAAAAAAAAAAABAAAAABAAAAAAAAbnVsbAAAAAIAAAAGYm91bmRzT2JqYwAAAAEAAAAAAABSY3QxAAAABAAAAABUb3AgbG9uZwAAAAAAAAAATGVmdGxvbmcAAAAAAAAAAEJ0b21sb25nAAAAYAAAAABSZ2h0bG9uZwAAAGAAAAAGc2xpY2VzVmxMcwAAAAFPYmpjAAAAAQAAAAAABXNsaWNlAAAAEgAAAAdzbGljZUlEbG9uZwAAAAAAAAAHZ3JvdXBJRGxvbmcAAAAAAAAABm9yaWdpbmVudW0AAAAMRVNsaWNlT3JpZ2luAAAADWF1dG9HZW5lcmF0ZWQAAAAAVHlwZWVudW0AAAAKRVNsaWNlVHlwZQAAAABJbWcgAAAABmJvdW5kc09iamMAAAABAAAAAAAAUmN0MQAAAAQAAAAAVG9wIGxvbmcAAAAAAAAAAExlZnRsb25nAAAAAAAAAABCdG9tbG9uZwAAAGAAAAAAUmdodGxvbmcAAABgAAAAA3VybFRFWFQAAAABAAAAAAAAbnVsbFRFWFQAAAABAAAAAAAATXNnZVRFWFQAAAABAAAAAAAGYWx0VGFnVEVYVAAAAAEAAAAAAA5jZWxsVGV4dElzSFRNTGJvb2wBAAAACGNlbGxUZXh0VEVYVAAAAAEAAAAAAAlob3J6QWxpZ25lbnVtAAAAD0VTbGljZUhvcnpBbGlnbgAAAAdkZWZhdWx0AAAACXZlcnRBbGlnbmVudW0AAAAPRVNsaWNlVmVydEFsaWduAAAAB2RlZmF1bHQAAAALYmdDb2xvclR5cGVlbnVtAAAAEUVTbGljZUJHQ29sb3JUeXBlAAAAAE5vbmUAAAAJdG9wT3V0c2V0bG9uZwAAAAAAAAAKbGVmdE91dHNldGxvbmcAAAAAAAAADGJvdHRvbU91dHNldGxvbmcAAAAAAAAAC3JpZ2h0T3V0c2V0bG9uZwAAAAAAOEJJTQQoAAAAAAAMAAAAAj\u002FwAAAAAAAAOEJJTQQUAAAAAAAEAAAAAzhCSU0EDAAAAAAIgAAAAAEAAABgAAAAYAAAASAAAGwAAAAIZAAYAAH\u002F2P\u002FtAAxBZG9iZV9DTQAB\u002F+4ADkFkb2JlAGSAAAAAAf\u002FbAIQADAgICAkIDAkJDBELCgsRFQ8MDA8VGBMTFRMTGBEMDAwMDAwRDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAENCwsNDg0QDg4QFA4ODhQUDg4ODhQRDAwMDAwREQwMDAwMDBEMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwM\u002F8AAEQgAYABgAwEiAAIRAQMRAf\u002FdAAQABv\u002FEAT8AAAEFAQEBAQEBAAAAAAAAAAMAAQIEBQYHCAkKCwEAAQUBAQEBAQEAAAAAAAAAAQACAwQFBgcICQoLEAABBAEDAgQCBQcGCAUDDDMBAAIRAwQhEjEFQVFhEyJxgTIGFJGhsUIjJBVSwWIzNHKC0UMHJZJT8OHxY3M1FqKygyZEk1RkRcKjdDYX0lXiZfKzhMPTdePzRieUpIW0lcTU5PSltcXV5fVWZnaGlqa2xtbm9jdHV2d3h5ent8fX5\u002FcRAAICAQIEBAMEBQYHBwYFNQEAAhEDITESBEFRYXEiEwUygZEUobFCI8FS0fAzJGLhcoKSQ1MVY3M08SUGFqKygwcmNcLSRJNUoxdkRVU2dGXi8rOEw9N14\u002FNGlKSFtJXE1OT0pbXF1eX1VmZ2hpamtsbW5vYnN0dXZ3eHl6e3x\u002F\u002FaAAwDAQACEQMRAD8AsJJJLmHqVJJJJKUkkkkpSSSSSlJJJJKUkkkkp\u002F\u002FQsJJK90bpp6lnNoJIpaN9zhztH5rf5VjlzMISnIRiLlI0Hp5zjCJlI1GIsoMTBzM123FpdbGhcNGj+tY6GLRb9VOsESRU0+Bef4MXWW24XTMPc\u002FbRjUgBoA\u002FzWtaPpOcsZ31zxQ6GY1rmdnEtB\u002Fzdy0TyfLYgBmynjPb+Hqc4c5zWWzgxDgHU\u002FwDfemLh5PQer4rS+zHL2Dl1R3x8Wt9\u002F\u002FRVBdmz62dKfS9+51djWktqe0guMfRaW7mrjXPfY51lhl7yXPPm47nKrzWLDDhOGfHxXY34W1yuXPPiGbHwGNVuOJZO1rnvDGNL3nhrQST\u002FZarPTenX9Syhj0+0D3W2ESGN8f6zvzGrt8Dp2F0ygtoaGACbLXfSdH51j07luTnm9V8EB+l\u002F3qOa5yGD01x5D+iP+6eRo+rfWbmh3oCoH\u002FSuDT\u002Fmje5Fd9U+sASBU7yDz\u002FFi28n62dKpJbUX5BHese3\u002FPftagM+ueGXQ\u002FGta3xG0\u002FhuVj2ORjocpJ73\u002F3sWv7\u002FPy9QwgDsR\u002F30nncvpfUMLXJocxv749zP89k\u002FwDSVVdN1z6w4WZ0t1GI8my5wa9haWkMB3Pmf3o2rmVT5nHjhOsU+ONXe\u002F09Lb5bJlnC8sPbldVt\u002FhVJ\u002F9Gwui+pjmfaMtp+mWMI+AL5XOqxgZ13T8tmVTqW6OYdA5p+kwrneWyDHmhM7ROv19L0fM4zlwzgN5DTzHqeo+t+Pfb0+uyoFzKLN9rRr7Yc3f8A2JXH88L0Pp\u002FUsTqNAtx3z+\u002FWdHNP7r2qjnfVfpmVufU04tp13V\u002FRn+VUfYtDmuUOc+9ikJcQ2vf+7Jz+U5wYI+zmiY8JOtbX+9F4pIkASeAr\u002FU+i5vTDuuAsoJht7Po\u002FCxv+DVADc5rTw5wB+ZhZk4ShLhkDGQ6F1ITjOIlEiUT1D3P1c6eMLprC4fpsgC2099R7Gf2GLE+tXVX3ZB6dU6KKY9aPz3n3bHfyK\u002F8Aq11wAAAGgGgC82vsdbkXWuMusse4n4uK0+el7OCGGGglof7sN\u002F8AGcvkB72fJmnqY6j+9Pb\u002FABWCSSSynWUkkkkp\u002F9KwkkpMrtsJFbHWFo3ODQXQB+cdq5ir2eoWY99bxZW4ssb9F7SQR\u002Faatvp31rzaHNZmj7TTwXgAWAeOntsWGCDxqkSAJOgUmLNkxG4SMf8AonzCzLhx5RU4iX\u002FSH92T6PXZj5mMHsItovbI7hzT4grg+qYjen9Stx2\u002FzdT2vZP7hixv+b9Fdh9Xce7H6RRXcC15BfsPLQ5xe1v+a5cx9aHtf1q4DUMYxh+Mbv8Av60ef9XL4skhwzuP\u002FOjxSi5vw\u002F0czlxxPFjAl\u002FzJcMZPcNcHNDmmQRIPkV5xl0ux8y+h3Ndjh8p3N\u002F6JXY\u002FVnqIzOnNqcf0+KBW8dyB\u002FNWf2mql9Z+iW3v8A2hiML3xF9TfpOA+jYz95zU7nIfeMEMuPXh9VD92Xzf4q3kp\u002Fd+Yniyeni9Nn96Pyf4zyySU\u002FeOR4JLIdhSSYEHQFOkp\u002F\u002F9Owui+qF+HS7IFtrWZFpaGNdoS1o\u002FNcfpe9y51IgHQ6rnMGU4skcgAlw9D4vSZ8Qy45YyTHi6jwfQMjo3Ssp3qXY1b3nl4EE\u002F2mbVGjoXSMe0W1YrBY3VrjLoPlvLlwtWTlU\u002FzN1lfk17gPulStzMy4bbci2weDnuI\u002FKrv3\u002FDfEcA4+\u002Fp\u002F6XC0vuGauEcweDavV\u002FwBHiey6v1\u002FF6fWWMIuyiPZUDIB\u002FetI+g1cTZZZbY+212+ywlz3HuSogAcaJKrzPNTzkcXpiPliG1y3KwwRIj6pH5pFPhZuRgZLcnHdD26EH6Lm92P8A5K7Ppn1gwOoNDdwov\u002FOpsIB\u002F6276Ni4VIgHnVHl+byYNB6oHeJ\u002FYjmeUx59T6ZjaY\u002Fb+8+g5PSemZjt+Rj12OP58QT\u002FabDkFv1c6I0yMRh+Jcf8AqnLiasrLpEU321jwa9wH3SpPz8+wRZlXOHgbHf3qyefwHWXLgy\u002FwT\u002FzuFrDkM40jzBEf8If83ieo+suPgV9IdWz06X1ua+ljYaSQfc1rR+8xcglGsnU+J1KSqcxmGafEIiGnDQ8G3y+E4YcBmZ6mVnxf\u002F9SwkkkuYepUkkkkpSSSSSlJJJJKUkkkkpSSSSSn\u002F9k4QklNBCEAAAAAAFcAAAABAQAAAA8AQQBkAG8AYgBlACAAUABoAG8AdABvAHMAaABvAHAAAAAUAEEAZABvAGIAZQAgAFAAaABvAHQAbwBzAGgAbwBwACAAMgAwADIAMQAAAAEAOEJJTQQGAAAAAAAHAAgAAAABAQD\u002F4Q7maHR0cDovL25zLmFkb2JlLmNvbS94YXAvMS4wLwA8P3hwYWNrZXQgYmVnaW49Iu+7vyIgaWQ9Ilc1TTBNcENlaGlIenJlU3pOVGN6a2M5ZCI\u002FPiA8eDp4bXBtZXRhIHhtbG5zOng9ImFkb2JlOm5zOm1ldGEvIiB4OnhtcHRrPSJBZG9iZSBYTVAgQ29yZSA3LjAtYzAwMCA3OS4yMTdiY2E2LCAyMDIxLzA2LzE0LTE4OjI4OjExICAgICAgICAiPiA8cmRmOlJERiB4bWxuczpyZGY9Imh0dHA6Ly93d3cudzMub3JnLzE5OTkvMDIvMjItcmRmLXN5bnRheC1ucyMiPiA8cmRmOkRlc2NyaXB0aW9uIHJkZjphYm91dD0iIiB4bWxuczp4bXA9Imh0dHA6Ly9ucy5hZG9iZS5jb20veGFwLzEuMC8iIHhtbG5zOnhtcE1NPSJodHRwOi8vbnMuYWRvYmUuY29tL3hhcC8xLjAvbW0vIiB4bWxuczpzdEV2dD0iaHR0cDovL25zLmFkb2JlLmNvbS94YXAvMS4wL3NUeXBlL1Jlc291cmNlRXZlbnQjIiB4bWxuczpkYz0iaHR0cDovL3B1cmwub3JnL2RjL2VsZW1lbnRzLzEuMS8iIHhtbG5zOnBob3Rvc2hvcD0iaHR0cDovL25zLmFkb2JlLmNvbS9waG90b3Nob3AvMS4wLyIgeG1wOkNyZWF0b3JUb29sPSJBZG9iZSBQaG90b3Nob3AgQ1M2IChXaW5kb3dzKSIgeG1wOkNyZWF0ZURhdGU9IjIwMjQtMDYtMTFUMTA6MTk6NDYrMDg6MDAiIHhtcDpNZXRhZGF0YURhdGU9IjIwMjQtMDYtMTJUMTQ6MDI6NDYrMDg6MDAiIHhtcDpNb2RpZnlEYXRlPSIyMDI0LTA2LTEyVDE0OjAyOjQ2KzA4OjAwIiB4bXBNTTpJbnN0YW5jZUlEPSJ4bXAuaWlkOjZiYWIzZWZkLTNkMDktZjk0MC04MDQwLTJkN2NmNDQ4YTM3NSIgeG1wTU06RG9jdW1lbnRJRD0ieG1wLmRpZDpDQTM4QjgxMjk5MjdFRjExQkMxNUYxQUU2QTgzRjUxRiIgeG1wTU06T3JpZ2luYWxEb2N1bWVudElEPSJ4bXAuZGlkOkNBMzhCODEyOTkyN0VGMTFCQzE1RjFBRTZBODNGNTFGIiBkYzpmb3JtYXQ9ImltYWdlL2pwZWciIHBob3Rvc2hvcDpMZWdhY3lJUFRDRGlnZXN0PSIwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMSIgcGhvdG9zaG9wOkNvbG9yTW9kZT0iMyIgcGhvdG9zaG9wOklDQ1Byb2ZpbGU9InNSR0IgSUVDNjE5NjYtMi4xIj4gPHhtcE1NOkhpc3Rvcnk+IDxyZGY6U2VxPiA8cmRmOmxpIHN0RXZ0OmFjdGlvbj0iY3JlYXRlZCIgc3RFdnQ6aW5zdGFuY2VJRD0ieG1wLmlpZDpDQTM4QjgxMjk5MjdFRjExQkMxNUYxQUU2QTgzRjUxRiIgc3RFdnQ6d2hlbj0iMjAyNC0wNi0xMVQxMDoxOTo0NiswODowMCIgc3RFdnQ6c29mdHdhcmVBZ2VudD0iQWRvYmUgUGhvdG9zaG9wIENTNiAoV2luZG93cykiLz4gPHJkZjpsaSBzdEV2dDphY3Rpb249InNhdmVkIiBzdEV2dDppbnN0YW5jZUlEPSJ4bXAuaWlkOkNCMzhCODEyOTkyN0VGMTFCQzE1RjFBRTZBODNGNTFGIiBzdEV2dDp3aGVuPSIyMDI0LTA2LTExVDEwOjE5OjQ2KzA4OjAwIiBzdEV2dDpzb2Z0d2FyZUFnZW50PSJBZG9iZSBQaG90b3Nob3AgQ1M2IChXaW5kb3dzKSIgc3RFdnQ6Y2hhbmdlZD0iLyIvPiA8cmRmOmxpIHN0RXZ0OmFjdGlvbj0ic2F2ZWQiIHN0RXZ0Omluc3RhbmNlSUQ9InhtcC5paWQ6NmJhYjNlZmQtM2QwOS1mOTQwLTgwNDAtMmQ3Y2Y0NDhhMzc1IiBzdEV2dDp3aGVuPSIyMDI0LTA2LTEyVDE0OjAyOjQ2KzA4OjAwIiBzdEV2dDpzb2Z0d2FyZUFnZW50PSJBZG9iZSBQaG90b3Nob3AgMjIuNCAoV2luZG93cykiIHN0RXZ0OmNoYW5nZWQ9Ii8iLz4gPC9yZGY6U2VxPiA8L3htcE1NOkhpc3Rvcnk+IDwvcmRmOkRlc2NyaXB0aW9uPiA8L3JkZjpSREY+IDwveDp4bXBtZXRhPiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIDw\u002FeHBhY2tldCBlbmQ9InciPz7\u002F4gxYSUNDX1BST0ZJTEUAAQEAAAxITGlubwIQAABtbnRyUkdCIFhZWiAHzgACAAkABgAxAABhY3NwTVNGVAAAAABJRUMgc1JHQgAAAAAAAAAAAAAAAAAA9tYAAQAAAADTLUhQICAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABFjcHJ0AAABUAAAADNkZXNjAAABhAAAAGx3dHB0AAAB8AAAABRia3B0AAACBAAAABRyWFlaAAACGAAAABRnWFlaAAACLAAAABRiWFlaAAACQAAAABRkbW5kAAACVAAAAHBkbWRkAAACxAAAAIh2dWVkAAADTAAAAIZ2aWV3AAAD1AAAACRsdW1pAAAD+AAAABRtZWFzAAAEDAAAACR0ZWNoAAAEMAAAAAxyVFJDAAAEPAAACAxnVFJDAAAEPAAACAxiVFJDAAAEPAAACAx0ZXh0AAAAAENvcHlyaWdodCAoYykgMTk5OCBIZXdsZXR0LVBhY2thcmQgQ29tcGFueQAAZGVzYwAAAAAAAAASc1JHQiBJRUM2MTk2Ni0yLjEAAAAAAAAAAAAAABJzUkdCIElFQzYxOTY2LTIuMQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAWFlaIAAAAAAAAPNRAAEAAAABFsxYWVogAAAAAAAAAAAAAAAAAAAAAFhZWiAAAAAAAABvogAAOPUAAAOQWFlaIAAAAAAAAGKZAAC3hQAAGNpYWVogAAAAAAAAJKAAAA+EAAC2z2Rlc2MAAAAAAAAAFklFQyBodHRwOi8vd3d3LmllYy5jaAAAAAAAAAAAAAAAFklFQyBodHRwOi8vd3d3LmllYy5jaAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABkZXNjAAAAAAAAAC5JRUMgNjE5NjYtMi4xIERlZmF1bHQgUkdCIGNvbG91ciBzcGFjZSAtIHNSR0IAAAAAAAAAAAAAAC5JRUMgNjE5NjYtMi4xIERlZmF1bHQgUkdCIGNvbG91ciBzcGFjZSAtIHNSR0IAAAAAAAAAAAAAAAAAAAAAAAAAAAAAZGVzYwAAAAAAAAAsUmVmZXJlbmNlIFZpZXdpbmcgQ29uZGl0aW9uIGluIElFQzYxOTY2LTIuMQAAAAAAAAAAAAAALFJlZmVyZW5jZSBWaWV3aW5nIENvbmRpdGlvbiBpbiBJRUM2MTk2Ni0yLjEAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAHZpZXcAAAAAABOk\u002FgAUXy4AEM8UAAPtzAAEEwsAA1yeAAAAAVhZWiAAAAAAAEwJVgBQAAAAVx\u002FnbWVhcwAAAAAAAAABAAAAAAAAAAAAAAAAAAAAAAAAAo8AAAACc2lnIAAAAABDUlQgY3VydgAAAAAAAAQAAAAABQAKAA8AFAAZAB4AIwAoAC0AMgA3ADsAQABFAEoATwBUAFkAXgBjAGgAbQByAHcAfACBAIYAiwCQAJUAmgCfAKQAqQCuALIAtwC8AMEAxgDLANAA1QDbAOAA5QDrAPAA9gD7AQEBBwENARMBGQEfASUBKwEyATgBPgFFAUwBUgFZAWABZwFuAXUBfAGDAYsBkgGaAaEBqQGxAbkBwQHJAdEB2QHhAekB8gH6AgMCDAIUAh0CJgIvAjgCQQJLAlQCXQJnAnECegKEAo4CmAKiAqwCtgLBAssC1QLgAusC9QMAAwsDFgMhAy0DOANDA08DWgNmA3IDfgOKA5YDogOuA7oDxwPTA+AD7AP5BAYEEwQgBC0EOwRIBFUEYwRxBH4EjASaBKgEtgTEBNME4QTwBP4FDQUcBSsFOgVJBVgFZwV3BYYFlgWmBbUFxQXVBeUF9gYGBhYGJwY3BkgGWQZqBnsGjAadBq8GwAbRBuMG9QcHBxkHKwc9B08HYQd0B4YHmQesB78H0gflB\u002FgICwgfCDIIRghaCG4IggiWCKoIvgjSCOcI+wkQCSUJOglPCWQJeQmPCaQJugnPCeUJ+woRCicKPQpUCmoKgQqYCq4KxQrcCvMLCwsiCzkLUQtpC4ALmAuwC8gL4Qv5DBIMKgxDDFwMdQyODKcMwAzZDPMNDQ0mDUANWg10DY4NqQ3DDd4N+A4TDi4OSQ5kDn8Omw62DtIO7g8JDyUPQQ9eD3oPlg+zD88P7BAJECYQQxBhEH4QmxC5ENcQ9RETETERTxFtEYwRqhHJEegSBxImEkUSZBKEEqMSwxLjEwMTIxNDE2MTgxOkE8UT5RQGFCcUSRRqFIsUrRTOFPAVEhU0FVYVeBWbFb0V4BYDFiYWSRZsFo8WshbWFvoXHRdBF2UXiReuF9IX9xgbGEAYZRiKGK8Y1Rj6GSAZRRlrGZEZtxndGgQaKhpRGncanhrFGuwbFBs7G2MbihuyG9ocAhwqHFIcexyjHMwc9R0eHUcdcB2ZHcMd7B4WHkAeah6UHr4e6R8THz4faR+UH78f6iAVIEEgbCCYIMQg8CEcIUghdSGhIc4h+yInIlUigiKvIt0jCiM4I2YjlCPCI\u002FAkHyRNJHwkqyTaJQklOCVoJZclxyX3JicmVyaHJrcm6CcYJ0kneierJ9woDSg\u002FKHEooijUKQYpOClrKZ0p0CoCKjUqaCqbKs8rAis2K2krnSvRLAUsOSxuLKIs1y0MLUEtdi2rLeEuFi5MLoIuty7uLyQvWi+RL8cv\u002FjA1MGwwpDDbMRIxSjGCMbox8jIqMmMymzLUMw0zRjN\u002FM7gz8TQrNGU0njTYNRM1TTWHNcI1\u002FTY3NnI2rjbpNyQ3YDecN9c4FDhQOIw4yDkFOUI5fzm8Ofk6Njp0OrI67zstO2s7qjvoPCc8ZTykPOM9Ij1hPaE94D4gPmA+oD7gPyE\u002FYT+iP+JAI0BkQKZA50EpQWpBrEHuQjBCckK1QvdDOkN9Q8BEA0RHRIpEzkUSRVVFmkXeRiJGZ0arRvBHNUd7R8BIBUhLSJFI10kdSWNJqUnwSjdKfUrESwxLU0uaS+JMKkxyTLpNAk1KTZNN3E4lTm5Ot08AT0lPk0\u002FdUCdQcVC7UQZRUFGbUeZSMVJ8UsdTE1NfU6pT9lRCVI9U21UoVXVVwlYPVlxWqVb3V0RXklfgWC9YfVjLWRpZaVm4WgdaVlqmWvVbRVuVW+VcNVyGXNZdJ114XcleGl5sXr1fD19hX7NgBWBXYKpg\u002FGFPYaJh9WJJYpxi8GNDY5dj62RAZJRk6WU9ZZJl52Y9ZpJm6Gc9Z5Nn6Wg\u002FaJZo7GlDaZpp8WpIap9q92tPa6dr\u002F2xXbK9tCG1gbbluEm5rbsRvHm94b9FwK3CGcOBxOnGVcfByS3KmcwFzXXO4dBR0cHTMdSh1hXXhdj52m3b4d1Z3s3gReG54zHkqeYl553pGeqV7BHtje8J8IXyBfOF9QX2hfgF+Yn7CfyN\u002FhH\u002FlgEeAqIEKgWuBzYIwgpKC9INXg7qEHYSAhOOFR4Wrhg6GcobXhzuHn4gEiGmIzokziZmJ\u002FopkisqLMIuWi\u002FyMY4zKjTGNmI3\u002FjmaOzo82j56QBpBukNaRP5GokhGSepLjk02TtpQglIqU9JVflcmWNJaflwqXdZfgmEyYuJkkmZCZ\u002FJpomtWbQpuvnByciZz3nWSd0p5Anq6fHZ+Ln\u002FqgaaDYoUehtqImopajBqN2o+akVqTHpTilqaYapoum\u002Fadup+CoUqjEqTepqaocqo+rAqt1q+msXKzQrUStuK4trqGvFq+LsACwdbDqsWCx1rJLssKzOLOutCW0nLUTtYq2AbZ5tvC3aLfguFm40blKucK6O7q1uy67p7whvJu9Fb2Pvgq+hL7\u002Fv3q\u002F9cBwwOzBZ8Hjwl\u002FC28NYw9TEUcTOxUvFyMZGxsPHQce\u002FyD3IvMk6ybnKOMq3yzbLtsw1zLXNNc21zjbOts83z7jQOdC60TzRvtI\u002F0sHTRNPG1EnUy9VO1dHWVdbY11zX4Nhk2OjZbNnx2nba+9uA3AXcit0Q3ZbeHN6i3ynfr+A24L3hROHM4lPi2+Nj4+vkc+T85YTmDeaW5x\u002Fnqegy6LzpRunQ6lvq5etw6\u002Fvshu0R7ZzuKO6070DvzPBY8OXxcvH\u002F8ozzGfOn9DT0wvVQ9d72bfb794r4Gfio+Tj5x\u002FpX+uf7d\u002FwH\u002FJj9Kf26\u002Fkv+3P9t\u002F\u002F\u002F\u002F7gAOQWRvYmUAZEAAAAAB\u002F9sAhAABAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAgICAgICAgICAgIDAwMDAwMDAwMDAQEBAQEBAQEBAQECAgECAgMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwP\u002FwAARCABgAGADAREAAhEBAxEB\u002F90ABAAM\u002F8QBogAAAAYCAwEAAAAAAAAAAAAABwgGBQQJAwoCAQALAQAABgMBAQEAAAAAAAAAAAAGBQQDBwIIAQkACgsQAAIBAwQBAwMCAwMDAgYJdQECAwQRBRIGIQcTIgAIMRRBMiMVCVFCFmEkMxdScYEYYpElQ6Gx8CY0cgoZwdE1J+FTNoLxkqJEVHNFRjdHYyhVVlcassLS4vJkg3SThGWjs8PT4yk4ZvN1Kjk6SElKWFlaZ2hpanZ3eHl6hYaHiImKlJWWl5iZmqSlpqeoqaq0tba3uLm6xMXGx8jJytTV1tfY2drk5ebn6Onq9PX29\u002Fj5+hEAAgEDAgQEAwUEBAQGBgVtAQIDEQQhEgUxBgAiE0FRBzJhFHEIQoEjkRVSoWIWMwmxJMHRQ3LwF+GCNCWSUxhjRPGisiY1GVQ2RWQnCnODk0Z0wtLi8lVldVY3hIWjs8PT4\u002FMpGpSktMTU5PSVpbXF1eX1KEdXZjh2hpamtsbW5vZnd4eXp7fH1+f3SFhoeIiYqLjI2Oj4OUlZaXmJmam5ydnp+So6SlpqeoqaqrrK2ur6\u002F9oADAMBAAIRAxEAPwAYvfADr6Eeve\u002Fde697917r3v3Xuve\u002Fde697917r3v3Xuve\u002Fde697917r3v3Xuve\u002Fde697917r\u002F0Bi98AOvoR697917r3v3Xuve\u002Fde697917r3v3Xuve\u002Fde697917r3v3Xuve\u002Fde697917r3v3Xuv\u002FRGL3wA6+hHr3v3XuhT6q6O7j7yyMuM6i633NvhqadaavytDTRUG18TO5KrHld2ZibH7fpJAVOqMVDzrpP7fHsY8n+3nPPuBcPb8mcsXV\u002FpbS0igJAhxh7iQpEp86ay2Diop0EOcfcHkb29tUu+duarTbw6lkjdi9xIB5x20QeZhwo2gIaju6OVR\u002FypPmFU0i1FRQ9T46dl1Ggquwa+apQ2voeWh2lVUev\u002FgsjLf8APucIvuee9MkayNFs8bEfC12+ofI6Ld1\u002FYxHUFT\u002FfI9j4pjFHcb1LHX41s0Cn50e5Vv2qD8ugX7M+CXy06moJsxubqDI5zB00c09VmOuclR79goqanXXNU12MxQi3JBAieouKFlABJIsfYH5q+7v7xcnwPebjydJc2CgkyWbrdBQOJZI\u002F1lUDJYx0oDU46HnKf3ifZfnS5jsdp53it9wcgLFeo1mXY4CpJJWAknFPGBrSnHopRBBIZXjYcNHLG8UsbD6pLFKqSxSKeGVgGU8EA+4WIIJDKQw4gggg+hByCPMHI8+powcggj1BBB+YIwR6EEg+XXvfuvdS8ZjsnnMpR4LA4nL7gzuQdI6DB7fxVfm8xWPI\u002FjT7fGYunq62RWk9OrRoB+pHtRZ2l5uV5Dt222U1zuEnwRQo8sjf6VEDMfThxx0zd3Vpt9nPuO5XsFtt0QJeWaRIolpk1kkZUBpmla\u002FLo6uyf5b\u002FAMyt8UdPko+rsdsyhqRdD2Ju7E7eySrceqbB45c\u002Fl6Xg30zRRP8A4X49zzsP3XPe3foVuf6rRWMDDH1dxHE\u002F+2iTxZV\u002F2yg\u002FLqBN\u002FwDvTexfL88tq3N8t\u002FcJx+itpJk\u002FKV\u002FBjb7VZh8+l\u002FX\u002FAMqL5g0dO01LSdS5aVRcUlH2Bkaedz\u002FqUkyO0aSm1f8ABnUf4+xFN9zz3ohQuibPKafCl24b7Brt0X9rDoN2\u002FwB8n2QnlEcs29QofxNZowH2hLlm\u002FYD0VPtj4x\u002FIPoxfuO1OqNzbcxHkWH+9NEtJubZ\u002Fle5RJdybcqMlQY8uAdP3n217WHPuH+cfaX3K5AUzc28n3VtZVp46aZ7eppSs0JdErXAk0HBxjqY+TPdr209wj4XJ\u002FOVpdXtK\u002FTvqguaDiRBOqO9PPwvEp0BnuPOpC6\u002F\u002F0hi98AOvoR6NH8OfjfP8pO8MT17V1FVQbIwmPk3h2blKCYU9fFtOiqoaSDB4yo1K9Pld2ZWeOjSZLyU1KKidQXiX3MPsZ7Xv7sc\u002FWewXDMmwW8ZuL11NG8BWCiJD\u002FHPIVjqDVE1yAHRQxD75e6cftB7fXvM0MSScwXEotrCNxqQ3DqWMsg4GO2jBkKnDyeFGTpc9bRO692dMfFHp4ZbMvhOtOqev8dQ4nG4\u002FGUDrTUyyzJSYvCYPD46GatymWydbMFjhhjkqKiZ2drku\u002FvrRu27cl+0\u002FJxvL1oNs5T2+JUVUU0UVosccaAs7ux4AFmJZ2PxN1yJ2bZue\u002FeTng2Vitxu3OW5SvI7yONTUBaSWWVyEjjjUVZmKoigKKdq9VpZL+c31ZDlJIcP0f2rk8KspWHK1eR2ZiK2eC9hP\u002FBps1UyQFl58ckyuPobHj3ixc\u002Ffg5NS6ZLPkjdZbIMQHZ7eNyK8fD8RqV4gFwfWnWVtr9xLnB7RHvvcDZ4b8jMapdSqD6eKIlB+1VI9KjoZcT\u002FNc+K+a2ZuTOHLbn2puzDYOtrcXsbeW2MjS5PcmYio5ZKLC4PI4T+N4PISVleqQ6lqk0B\u002FI4RAzKNrP73\u002FtFf7Fue5fW3Vpu0FuzpaXMDrJNIFOmKNovFiYs9Er4goGDGi1IAt79zb3hsN92rbxZWl5s09wqSXdrOjRwRFgHllSXwpUCpVqGM1I0rqYgHWlr8rl9wZLK7i3DUNV7h3JlspuPcFU7+RqjOZ+vqMtl5vJc6lfIVkmn\u002Fabe+V95fXm6Xt7um4ymTcbqaSaVjxaSVzI5PHizHzP29dXLazstttbPa9siCbZawxwQqBTTFCixxin+kUV+dehw+Nnx13x8oO0KHrbZbriqSCnTM733pVUjVmL2NtYTmBslNT64lyOcyk6NT4ug1p91UBnkZKeGeRZB9qvbDfvdjmy35a2b9K1VfEurplLR20INCxGA8rnthiqC7VLFY1d1j33V90OXvaPlG45q35TNMzGK0tVbTJd3FKhA1DoijBD3E1D4aUVQ0skanaU6L+OvSvxT2LVYnYeJocHSw0b5LeW\u002Fdw1NLLuTcJoKfy1ea3fuiqWnJpaWGFpBGDDQUaBvFFEl\u002FfW7kH225H9o+X3seXrOOCBI9dxdTMpml0CrSXE7U7RQtp7YoxXQiCvXIP3D90OffePmGG95jvZLiVnCWtnCrCCHW1FitrdS3cxIXV3TSmmt3anRVeyP5r\u002FAMV9lVlXjNq1W8e26+kkeE1OwsCn925Jom0SLBuncdZgsTXxKwIEtI1TE9rqxFj7iHmn73\u002FtHy\u002FPNabZNe7xcoxWtpEPBqONJ5miRl\u002FpR+Ip8iepj5V+5r7w7\u002FBBd7xDY7LbuAdN5MfHAORW3gWaRD\u002FRkEbCuQDjoI8P\u002FOX6hqKxI890v21hseXtJXUdRsrOSxR\u002F8dDQQ7io5pD\u002FALSrE\u002F6\u002FsGWv34OR2mVb\u002FkveIrcnLIbeUgepTxUJ+wEn7ehpe\u002FcU52jgZtu582We5phGF1ECfTWYGA+0gDoOvnJ\u002FME6U7o+MWU6+6Y3Rla3dPYmawOI3PgcttnPbfyuD2TR10WZ3A+RkyNDDQKMn\u002FD4qBVgnmMy1D2uisfYZ9\u002FfvI8h86+1d5y5yPvE0m87lPFHNE0MsMkNsrCWUuzoEo5RYqI7Fg7UwCQKPu+fdp5+5D92rTmbnvaIY9n2y3mkgmjnhmjlumQxQhAjl\u002FwBPW0xLoukotckDqjz3z\u002F66Cdf\u002F0xi98AOvoR6uY\u002FkyV2Jj7A+QWMl8Yz1TtLrvIURNvK+Fo8tuymr1jv6ikOQrYC9uAZFv9R7zn+41PaLv\u002FuPavT697SzdM58NHnWSg9A7x1PzXrBP79tveNy37aXaV\u002FdyXt6jeglaO2ZK\u002FMoj0\u002F0p+fRpf5unXe9t4\u002FH7am5trUeRy+F6y7Ai3ZvjDYyCarqE29Pt\u002FNYQbneigDy1NLtWryKTVBVHaClllnsEiciXvvk8s77vvttte5bTA81nte4C4uY0BYiIxSR+PpAJYQM4LUHYjvIaKjERB9ynmfYNi9y942neJ4oL\u002FdttNtaSyEKDMJopfA1GgVrhUKpUgPIqR\u002FE6g64COkqJJE6SxyKrxyRuskciMLq8boSrowNwQSCPfLnjkHHXU1lZGZHUhwaEHBB9CPI9cve+q9Yppo6eGWombRDBFJNK5+iRxKXdj\u002FwVVJ91Zgqsx4AV\u002FZ1eNGldI0FXYgAepOB1tV\u002Fy6ugKXov437Wq6+jjj372tT0PZO+qtkT7qOozlDDNtzbjTBRL9ptXbskFMsZJUVJqJAA0re+wH3avbqDkD2w2eSaADf8AdkS9umI7qyoDDCeJCwQlV01IEhlYU1nrjr96D3Jm9w\u002FdPeIbacnlzZmextFqdJWFyJ56cNVxOHkLcfD8JMhB1WJ\u002FNT+UmW312JWfGnaeTqKTr7r40MvZa0NUUh3vvespaTK0m28oISDU7f2ZQzQSy0zN4qnJVH7qN9onvFT73Xu9fbxzHL7X7LdPHsO36DeFWIFzcsFkWJqDuitlKkrq0tOx1qTChGW33PPaCy5e5Yg91t5tFfmXc9Ysda1NpaKzRtPHX4Zrpw6rIBqSBOxh47dVKf0H4AAA\u002FoALAD\u002FAD3hZ1mj1737r3Xvfuvde9+691\u002F\u002FUGL3wA6+hHoYOhO8N5fHPtXbfbGyFjrK7DefHZ3btTUGloN47RyUlOc7teuqAkopWqxSxT0lSUf7Wup4ZSrIHRhz7b+4G9e2HN+2837EqvNFVJoWOlbi3cgywM2dOrSGjejBJVRypAIII9x\u002Fb\u002FYvdHk7dOTOYCUt56PDMq6ntrmMN4Vwi1GoLqZJEqPEid0BDFWG150D8j+p\u002Fkvsum3d1tn4K0tCqbg2nkmp6Xd20sgVUVOJ3PgfNLPRTRO+lZRrpalSHhkkjYMewvt37m8n+6OxQ71yvuKyAr+rA9FuLd+DRzxVJUg4DCsbjuRmUg9ca\u002Fcn2t5z9qN+l2XmrbWjGqsNwmpra5T8MkE1AHBAqVxIhqsiKwI6K53b\u002FACvPjP2t\u002FE8ttPE13S28shNUVpzfXhjhwE9fUyK8s2U2HWmTbNTC\u002FqutJHQSXYkSA\u002FWJ+ffuo+1nOP1V5tdi+yb3IS3i2mIix83tW\u002FRK8cRiI1zq6l7kD73Xuxyd9JZbzex79sUShPCvamYIooBHdrSdSMUMjTLQUKkdUbfJv4Yd0fFaePI70pKLdPXddV\u002FZ4zs\u002FacNW2BjqHZft6HdmNqPLXbMyFQJAsRnkmoqhwyxVLONHvAH3X9iOePaMi83mOO85adwqXsAbwwTQBZ0NWt3JIpqLRsahJGIIHQf2m9+OQ\u002FeGNrXYZ5LPmeNNUlhcFfGKj4ntpFol0i0qwQLKgoXiCnV0VOnpUrq\u002FE46ZQ0OSzuCxdRGw4eDJZmhoKiNh+VeGoZT\u002FAIH3DcEAubqytW+GW4iQ\u002FY0iqR+YJHUySTNb217dIaPFbzSA+hSJ3B\u002FIqD1vGU1PBR08FJSxJBTUsMVPTwRjTHDBAixQxRqOFSONQAPwB778RxxwxxwxIFiRQABwAAoAPkB18+kssk8sk0rlpXYsxPEkmpJ+ZOetJffe4q\u002FeHYPY27spM9Rkt0dib6ztdUSWMks2Q3TlZk1EcftQFI1H0CoB+PfB3mbc5t75n5n3m5ctc3e5XUrk8Szzuf5YA8gAAMdd9uXdrt9j5Z5X2S0QLaWe12kKAcAEt4x\u002FM1J+ZJ6S\u002Fsl6N+ve\u002Fde697917r3v3Xuv\u002F9UYvfADr6Eeve\u002Fde6nYfK5fbmaody7ZzOY2zuXFyJLjdxbcyldg85QSRtqjNNlMbPTVaor8+MsY2\u002FtKRx7Vbff7htN9Bum0bhPabpEapNDI0UqH+i6EMPs4eo6T31nY7pYXG1btYQXe1TAh4J40licHjqjkDLX50DDyI6tA+Pf81nujrzI4zDd6Qp3BsHyQ01duCix1Fi+0cHRiyNXwfY\u002FY4HefgBLyU80NJVyhfRO7+hsuPbP73\u002FOmwXdrt\u002FuCi7vsBIDTKix3sS\u002FxDTpiuAMVV1SRuPi1wcRvcz7nHIfM1rd3\u002Ft452PmOhZIXd5LCVuOg69c1rXgrq0ka17owuRsE4DPde939b4\u002FPYebDb7617I2ytRTtPSrWYfcW3M5SNHNTV2ProQTHPBK0VRTTxrJG4aORFdWUdHbC+5e545agvrRob\u002FlrcrWo1LqjlhlWhV0ceYJV43UMpqjqGBA5p7lt3M3IHNNzt19HPt3Ne1XelqNplhniaoZHQ8QQGR0YhhRlYgg9ak3yh6lofjZ8jt99b4+Wdtr7J3XtjdW1ZKuWSoqYNjZN8Vu3E0lRUyFpauTBU5moDKxLyij1N6yffGz3a5Ng9r\u002FdLfuWbMOdqsruCe31VYi2k8OdF1EkuYlLRFjlilaZ67S+0XOlx7q+1vLvNVyijd7+znt7gKAFN3GJLaVgowomOmbSML4tBgDrcOoa2lyVFR5GhmSooshS09bR1EZvHPS1UKT08yH8pLFIGH+B99roJ4rqCG5gcNBIgZSOBVhUEfIg164eXFvNaXE9rcRlLiN2RlPEMpIYH5ggjrSy7b2bX9ddudsbCykZirtpdlb0xUiEEBqWTO1mSxFRGSBrhrcJkKaeNvoUlHvhbzxsVxyxzrzfy\u002FdRFJbTc7lM+a+KzIw\u002FoujKwPoeu8\u002FJe+2\u002FNHJXJnMdo+q3vdqtZB8mEKxyKfmkqOhHkVPSA9hfoSde9+691737r3Xvfuvdf\u002F\u002FWGL3wA6+hHp4wu3Ny7mmr6fa+2tx7pqcTi6rOZWm21g8nn6nF4Si0\u002FeZjIU+KpauakxtMXAaV1C3Nhc8e1+3bTu+8SXEWz7TdXcsMTSyCCJ5THEvxSOEVtKLUVY0HSG\u002F3TatpS2l3fdrWzimmWKNp5Y4Vklf4YkMjKGdvJQSaZNB0wwVEFShkp5o5kDFGaJ1cK6mzxvpJ0SIeGU2Kngi\u002FsuV1cVVgf9Xn6fYejGSOSJtMiFWpXIpj1HqD5HgevVFRBSwyVFTKkEEQvJLIbKoJAA\u002FJZmYgKoBZmIABJA9+Z1jUu5oo63HFJNIsUSFpDwA\u002F1ftPADJx1ti\u002Fy8Ng7u63+InU+3t70ddis7V02f3P\u002FAcnE9PX7exm7dy5fceIw1ZTSHyUlXT43JRPNCwV4JZGjZQyke+xv3bOW955V9meT9q36J4r9kmm8JwQ0MdxPJNHGVNCrBHUspAKMxUgEEdcZvvOcybJzV72c57ny\u002FPHNtyPDB40ZDJNJbQRwSyqwwytJGwVhUOoDAkEdUZfzPMtj8\u002F8z+yKemEc8OE2f11tbJEEMklfFgJs1VQtb8x0m4oUYHkEEe+f\u002FwB7G\u002Fttx97uZLeHItrO0hc+RfwfEI+0LKoPz66FfdJs7nbfYjlWWUlXuL69uE9QhmESn82hYjq5P+Wn8iKTur494XaOWr0k7F6WpcZsLdVLLIv3mQwdFSeHZG7Vj0gtS53BUghke7WyFHUqTwL5w\u002Fdd9yrfnv222\u002FaLq4B5k2RI7S4UnuaJV0209Kk6ZYl0kk1MsUlfKuC33rfbGbkL3Mv96srYjlffnkvLdgO1JWat3bV\u002FihmYsox+jLERxNCy\u002FwAzb4S7s35mV+R3TOAqdybiiw9JiO1NkYiJp87uDH4aMxYPeW26IHVlM3hqA\u002FZ1tFH+\u002FV0UcDQq0lOUliz71PsNuvMt0fcvknb2uN2SAJfWsa1lnWMUjuIlHdJLGg8N4wSXjWPQupWrLP3TPf7ZuXLE+1vPW5Ja7W07SbfdyGkULymstrO3+hxSv+rFKe2OVpBIQkmpKEEmjkaaNWIlp5XgqYJEeGppKiMlZKaspZljqaOpiYWeKVEkQ8EA++cx7XkjYUkQkMDgqQaEEHIIIIIIBB66PMjKEYjsYAqQQVZTwZWFVZT5MpIPkesnv3VOo8VXSzSNFDUQSyoXDpHKjspj0+RSFJs0esah9VBF\u002FqPbayxs2lXBbPA+nH9nn6dOvDMih3iYIfMgjjw\u002FbQ09aHqR7c6a6\u002F\u002FXGL3wA6+hHq5f+UJvbp\u002FZuT7got3b42zt3s\u002FeuR2tj9s4TcFRFh6vJ7P2\u002FQVlQWwmTr5IaHKTVGfzdQJaOFzUxeBHZCrqRnR9y3feS9lm50t933+2tuab2WBYYpmEZe3iRjWF3IRy8sj6olPiDwwxBUimCn32tg533215HuNl5fu7rlKwiuHnlhUyrHczOo\u002FVjQF4wsMSaZWGhtbKCCpHVsfYfw3+K\u002Fb2Ubcm+ekthZ3NVf70+fo6B8HlMgZLuJ63KbaqcTV5F31X1yySEi3Nre8w+ZfZT2l50uW3HmDkXb7i8ky0qoYnfzq8kDRtIc8WLH59YZ8se+fvDyTZjauXuf8AcrewTAhZxLGlMUWOdZFQD0UKPl007E+C\u002FwASOtdyY7d+z+jNn0G5MRL9xisrkTmNyTYyqX9FZQR7mymYp6SthPMc0aLLGwurAgH2j5d9gfZ3lTcoN32LkGyi3GI1R38SYow4MonkkVWHFWADKcgg9LOYvvDe9PNW1XWyb57hX0m1TjTJGnhwCRfNXMEcTMh\u002FEhJVhggjoOvlx8+urPjdgq\u002FCYDI4nsPuetopV27sPE18dbR4iqkBjgzXYGQoJXXbuBo3PkMBZa+usI6eP1NNEG\u002FeP7w\u002FKHtfYXFlZXcO486vGfBtI2DiNuAkumQ\u002FpRqSDoJEsuFQAFpEFHsr92\u002FnD3U3C2v9ytZts5EjkHj3kiFWkUZMVmjj9aZhjWAYYq65WwEfV33DuHP7v3Dn937sy0+e3VuvM1+4dyZupSOKbKZrKTtUVlV4IQsFLAGIjggjAjp4ESJAFQe+Se7bruO+7puW97xeNcbreTvNNI1KvJIxZjQABRU0VVAVVAVQAB1122zbNt2TbNt2TZrJbbZ7OBIYIlJIjijFFWpyx83c9zuWdssel70p3T2D8fOxsR2h1nkoaLcGNhkx2RxuQWaXb+7duVM0M+Q2ruakgeOWoxVZJTpJHLGRPRVKJPCQ6kMJfb\u002Fn7mL205mtOaeWbhUvUBSRHqYp4SQWhmUUJRiAwIIZHCuhDKD0G+fuQuWfcvle95R5stWk22Vg6OlBNbTqCEuIGIIWRQSrKeyVC0cgKmo2X\u002FjR8\u002FuifkZQUuP\u002FAI1S9bdliJP4l1tvXJ0VDkXnsivNtXMTGmxm8sY8rkJJSEVKgfvU8J499Ufa37xPt97m2sMC7gm2cz6RrsrmRFevCsEh0pcISDpKUkoKvEnXKH3X+7b7ie11zNc\u002FQPuvKlTovrWN3QDNBcRDVJayUGRJ2E\u002F2csgz0LnZnxQ+NfdGQ\u002FjvZHTew91ZmcB5NwSYlKDOVYKroeozuGkx+UrBoA0mSZ7L9OPYy5p9ovbHnidr7mbkrb7u9cZmMeiVh5apYikjfKrH5dAvlP3l91eQ7b938rc9bjZ2K4EIkLxL8lhlDxrnjRRnj0GFH\u002FLo+FFDUR1MXx92hNJE4dUyFbubLUxYfTXR5TPVlJKo\u002FoyEf4ewnbfdo9irSVZofbmzLg\u002FieeQfmrzMpHyII6Fs\u002FwB6H38uInif3LvVVhQlEgjb8mjhVh9oIPRdf5lXXnRG2viTlds4SLrPrncG0s9tzdPXW1sRT7e25k8zlaDI01JlsJgsTQR01bV1WZ25W1MbiONg7aGlIA1rGn3ouWvb7bPZu92myTads3GzuIZ7KBBDA0kiuqyRxRoFdmkhdwdKmp0l8ColD7qfM\u002FuJuvvTZ7tfvu26bZe209ve3EjTTxxRujNHLNI5ZFWKdYyNTCg1BBU6Trk++X\u002FXUbr\u002F0Bi98AOvoR6xTQQ1MZiqIYp4iVYxzRrImpTdW0uCNSn6H6j3V0SQaXUFfnnq8ckkTa4nKv6g0P8ALpbbX7I7M2MoTZPZvZGz4lN0p9tb73TiKOO97+OgpMolDGOfxGPYg2nmvm3YEWPYebN0soxwEF1PGor\u002FAEVcL\u002FLog3flXlPmEluYOUtqvnPFp7S3kY\u002Fa7Rlz+bdOO5e4O5N6UrUO8e4u190ULjS9Dm+xN2VlFKv+omozlUpp0P5DqwPtTufPHPO9wtbb1zvvF3bEUKS3lw6EfNTJpP5jpLtXI\u002FI2wzC42PkbZrO4HB4rK3Vx9jeGWH5EdBxDBBTqywQxwq7mRxGiprka2qR9IGuRrcsbk+wqiIgIRQB0KXkklIMjliBQVNaD0HoPl1l936p1737r3WGangqUEdRDFOgYOEljWRVdf0uoYEK6\u002Fgjke6OiSAB0BHzFerxySRMWicq1KYNMen2fLpfbX7R7V2NEINk9q9nbRp14Sk27v7dWMoox\u002FqYqCDKiihX\u002FAAWMD2JNp5w5x2CNYdi5w3WyhAwsN3PGo88Kr6f5dBzd+UOTuYXMm\u002F8AJ203sp4tPZ28jn7XMes\u002Fm3T1mu9e+NyU7Um4e8+5MzSuLPTVvZe7\u002FC4\u002FIdKfLQBwf6G49r7z3F9xdyQxbj7g75PEeKve3BU\u002FaBIAfzx0gsPbv262qUTbZ7ebFBMODLYW1R9haM9BS0SyVL1sxkqq6TUJK+smmra+TWbv5K2qeaqk1nk6nNz9fYOYa5WnkYvO3FmJZj9rMSTx9ehkHKxLAgCW44IoCoKcKIoCinlQdZPe+qdf\u002F9EYvfADr6Eeve\u002Fde697917r3v3Xuve\u002Fde697917r3v3Xuve\u002Fde697917r3v3Xuve\u002Fde697917r\u002F0hi98AOvoR697917r3v3Xuve\u002Fde697917r3v3Xuve\u002Fde697917r3v3Xuve\u002Fde697917r3v3Xuv\u002FZ",crossref:"data:image\u002Fpng;base64,iVBORw0KGgoAAAANSUhEUgAAAYQAAACCCAMAAABxTU9IAAAAGXRFWHRTb2Z0d2FyZQBBZG9iZSBJbWFnZVJlYWR5ccllPAAAA3FpVFh0WE1MOmNvbS5hZG9iZS54bXAAAAAAADw\u002FeHBhY2tldCBiZWdpbj0i77u\u002FIiBpZD0iVzVNME1wQ2VoaUh6cmVTek5UY3prYzlkIj8+IDx4OnhtcG1ldGEgeG1sbnM6eD0iYWRvYmU6bnM6bWV0YS8iIHg6eG1wdGs9IkFkb2JlIFhNUCBDb3JlIDYuMC1jMDAyIDc5LjE2NDQ4OCwgMjAyMC8wNy8xMC0yMjowNjo1MyAgICAgICAgIj4gPHJkZjpSREYgeG1sbnM6cmRmPSJodHRwOi8vd3d3LnczLm9yZy8xOTk5LzAyLzIyLXJkZi1zeW50YXgtbnMjIj4gPHJkZjpEZXNjcmlwdGlvbiByZGY6YWJvdXQ9IiIgeG1sbnM6eG1wTU09Imh0dHA6Ly9ucy5hZG9iZS5jb20veGFwLzEuMC9tbS8iIHhtbG5zOnN0UmVmPSJodHRwOi8vbnMuYWRvYmUuY29tL3hhcC8xLjAvc1R5cGUvUmVzb3VyY2VSZWYjIiB4bWxuczp4bXA9Imh0dHA6Ly9ucy5hZG9iZS5jb20veGFwLzEuMC8iIHhtcE1NOk9yaWdpbmFsRG9jdW1lbnRJRD0ieG1wLmRpZDo1ZGY5NjA0ZC1hZTk1LWYxNGMtYjk0Zi01NTMwNzcxZWZkNGMiIHhtcE1NOkRvY3VtZW50SUQ9InhtcC5kaWQ6MjkyRDk2MUQwQkRGMTFFRTk5OTlFOEQwM0UzNUM3MkQiIHhtcE1NOkluc3RhbmNlSUQ9InhtcC5paWQ6MjkyRDk2MUMwQkRGMTFFRTk5OTlFOEQwM0UzNUM3MkQiIHhtcDpDcmVhdG9yVG9vbD0iQWRvYmUgUGhvdG9zaG9wIENDIChXaW5kb3dzKSI+IDx4bXBNTTpEZXJpdmVkRnJvbSBzdFJlZjppbnN0YW5jZUlEPSJ4bXAuaWlkOjA1YjA4ODVjLWFiZDYtN2Q0Ny1iNDQyLTEyM2M0ZDMxMzI3YSIgc3RSZWY6ZG9jdW1lbnRJRD0ieG1wLmRpZDo1ZGY5NjA0ZC1hZTk1LWYxNGMtYjk0Zi01NTMwNzcxZWZkNGMiLz4gPC9yZGY6RGVzY3JpcHRpb24+IDwvcmRmOlJERj4gPC94OnhtcG1ldGE+IDw\u002FeHBhY2tldCBlbmQ9InIiPz7MCU3SAAABgFBMVEV0enpkbGx9hISIjo7\u002F1WpYc3iHydj4+Pj+\u002Fv6ipqbLzc33iEtZYWFrc3O8vr6ytrbwRkqKkJD\u002F8c709PT\u002F4JH+vzj28O389PTo6enC5ez\u002F6rXT1dVvd3fh4uLn6OjvM0CFi4uTmJiNk5NLVFR5gYH\u002FxiScoqKfpKQwrcXHyspnw9Sws7PuIjLq9vn\u002F6bD5+Pb96Ol8ytr\u002FxBTX18jAwsL\u002FzSjvKzpIUlLuJUFETk7\u002FzTpWV1Wk2eRHUVH29fLMzs7\u002F01T18\u002FD8\u002FPurqqK\u002FvLLWz8Dw8PBWX1\u002F\u002F\u002Ffj\u002F+vD\u002F0FJPrsLtZWvvNkzdxrxZqrtVmaZah499g4P\u002F7L6p2+W\u002FwsL\u002F9d74lU38s0nuWWBfg4nyXVM\u002FSkrlmpbrd3rgvrb\u002F9+XhsamMkZH7pkLzaEnjqaPojo2Ok5P1+\u002FzZ7\u002FNglJ7\u002F4pj1dk9ocXH9\u002Ff3\u002F\u002F\u002F7ohoXX2dmZnZ36\u002Ff2Bh4fioKD5nkutsbH8u27\u002F+fnvECfY0sT\u002Fxyw+schPWFj\u002F\u002F\u002F+Eg0rNAAAAgHRSTlP\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002FADgFS2cAAAxCSURBVHja7Nz9n9O2GQDwi0xbCJBgaD+9wlEk2STkfB07yCUc3THKvRQYtAUKLRulg7Wl3QYHXfd6R\u002FKv17EdWZL1ZjuJ9wnSb+Qcx9ZXlh49klkY2lJ5WZjSeQ8ev3n5N7Z6K0TYfXDzwoWzZy1CZQi7Dw6EAvv7+xahIgQiYBEqQth98C4RsAhVIHACFmHmCKHAWVbAIswWYffPAgGLMEMEmYBFmBVCKLAvEbAIM0FQC1iE6SPs3tAIWIQpI5gIWISpItyQjsQWYTYIN\u002F52wUzAIkwJIY+ARZgGQk4BizBxhBsn8wpYhMki3Dh5Nr+ARZggwvWT+4UELMKkEK7fKyxgESaCcP3ezRICFqE8QmkBi1ASIRIoS2ARSiBMSMAiFEa4fu\u002FAhAQsQjGE68cnKGARCiAcPH7z7CQFLEJehCkIWIR8CP\u002F9z9raxAUsQj6E367cPvz12ppFqBbhzKVLU3CwCPkQwnLp0rHD703UwSLkRogc\u002FnT4vf01i1AlQuzw1aQcLEJBhEk6WITiCCOH299NwMEilEKIHf5Vcpy+ahFKIpR1uHr56ru2dssjjByOffeXAg6hgH0KJoYQOfw9dLAClSIkDpfXDAXsK+TTQUgctOFSKHDS1uj0EOK0hiq9ZAVmgKB0sAIzQ5A4WIEZI2TSrVagEoTY4auRgxWoECFJ8x04buuvSoRe78ytf16x1VcdQiSwffHieVt9FSGEAg+\u002F+ezixb29vfNhsTU4cwRKIC7WYcYIvV7v4Q\u002FfUgITc6gHyIXAQe2mRdAQPHz57R4vkDoUhmgBB3uDuGAHwsYbUvPdoI1a7aBpjNDrHXv5RCJQ6oEIgIsGbMEQz78AAm7c8jxohtBbOfb4ybZSoKBDAPFAUBBszTkBTFueY4DQW7n9+PvPLhoQ5HdwhAQxwzwbME1Pi9BbOXMrh0A+hzpEg4FcYX7HaIe5bzVCb6V365u8AuYODThQFieYU4OOOzBE6PVWigokDjunWl2VgTPQFGdOw6T0xlEYC0JfhtBbefhDGYG9LXTE6\u002Fc7qr6I7ntc6PsY+4Adp+d0XCB3jpF8nhAGQ4IpWR6Bn4\u002F0+6PoS4EAKAIHUBErxYDAXPZG4ztEbdlkLQyGXpYS2N6srffjX1EggHRsQj7zlzY1XON5HJzbggedRuj1zjx+Uk7g+aMNUodyhEba3jFWAPlvGkJpgb3NpfUNj+rT5Qh+atBWPSa4O38ILXJzPMJoheD7vRIE25t3+4yACqGJ0yFZOXQNwBwjIBbhf6HAdpmHYPPuYN3LxJhSBF9TyW2kDZCaYTzFjRhd1wcA+KClDG3rjQCFByHNQW3s+7Bl8iCOrqSV\u002FSy8Er\u002FVMUd4f+\u002F982WCoV8+FQioEEikjDuaKC55UEa1O6rg5Gab\u002Fijh4dGzufgjkop1xEMRgI7rxdeKXQiQsAX4kGR1wzP5TH\u002FZHF9J0pc0QfSzmM8Kk1ykT2G3\u002FFFBaVIgOtXz4fiN\u002FmtXPrlSqBfaco+IBRQIafwDdAFsMmSM1eJ\u002F+fE9eumT0AZcFgo5WYbMQaPJEn+Qm0kpYkh1mQ2PuZLxGSkEyCYlBhiQywyEFQWH1P9tUcBh6ykJR\u002FMgAG0I2m21k4IzCI2xYYrgY1HyiRvyAR5oj+oK01kIdIQI6dE4TQtnT4CBOUJeh1Bgo6\u002FMO8gQnNxTYgohTXeMEQJJIhC7ksCXPYrqzwNZKgV2BQjdNH4YIyBxZt7Pg2DusIVq\u002Fb6nSf5IENLYCORFCOjIKUEIXJN8OGWAXdelKsttCFIpHncQFCBQiResMiBTf2MEE4etn597GzoBOUKasWjnR6BafYzQpVqvt7G+Tk9V0gCYLN+F\u002FU83mi0KBqa0mwRBJ4pxSP8+PhOFQD9aMUILs1fS55pDAEPZdCExSt+RBN4wj8PW5vOBiYABAq7nRmhSbS1GSNtj\u002F2jz1Yc7O6cO1dLKCbhwC6WdTyuTw3GzQxWpModH6NCtHrPNoV9rnt7ZWTx1aEnQ6SFxiDqUOHwimJItmQoYIDj5s79eknQdReFR8IjJPR29fzA5+p2nC1w3UseCDjDgPhzPThB9XQgxUgSBuRIAmPuq1cdXsriQvddWDoSsw\u002FbmQg4BOQLMn6pmBkwXiP5y9DT16aGFAVOhRCoQXQhkG4c7lB\u002FUYO7fhQKuQW2RqsLxvBQhTdpiaOIQTopzCUwLAYG66M6X6qoAzB\u002FwUW3aR7surCsQHCc+yM8gsFdCbmvpFf3pzhJ\u002Fs\u002FkRIodQ4JdBP6fAlBC4xDeUdG0nakwL9E1+EmYarWDO7cn2IpBUC5cNO+RxmcpCCMMf374jnxXPGoH9TnfcrW+c5qfHHt3hEwTUMlhnchsmCFCcD1taZD\u002FfWeeOL4Dw45fnzp17\u002FfrzO0c+9v4PELjMN7mhhRfcNz5k7r2VhiR+Xb\u002FYh4EeAQfiS6xx39gF3LOaFyERiMvndz7N6TAFBCg51VP+Gy9qTI7QobMIji7BPlpxbWkQoGSZ6hn\u002FFdIfdQsgfEQLJA5\u002FzOWgQygQojriU3nZDmSBiQR9xO4rAH5Due4dTaUArMsRuIGDTNuf\u002FZ4rz7ipqTmCQCAuX+Rw0M0T3NwIfOp7\u002FPnGq2zO1GOWSEF2x2umjjuZnVAY+kiMgBuyJZI+VwYcmyHCSOC1vHzxlqGDdsbcyIvAPzvj9re+mO1cWIShaNdr+ER0dUnU8LGBAgRH9RRJsoDmCD+pBaLyBzMHbe5I3h+1g6Sw6wlQhrCTOcN9DmHYFm4+xiwDEh2E0ukhQYD5EYAhwk+regFzBwlCutVCPjJDLymgNEIa6TQd0Q5kl+0Vwxkclu+MnTbCP8wFUgev1HqCq90glbRjR5L7VnRHDU+0baaJQaaOM3vMAggzL024gSECch1J8XUIuQUSh7++pXgctCtryNFMXsdXKkNwZHO1NDLM7l0K65h9IrBg63EbcBBQjeALqjbXlpfhB4UEtA4yhDTzqUVIwiEZwvi4\u002Fv3MKe4qayUA9FKwZG7Wpt8iipMZ+ifBL4jwUWEBtYN+t4XkXZAmP2roELzM8tDBmi4EQymDPFamHhmoRCCpI1ARwsjhjsDBYN+ReKkf8PsgZQikGu\u002Fucn9ZXNfPCElAiuRLfA12AJMikOEJVocQpTVqnIMUoYOV299xZhukDIE8MpmR+d+eQcuEkumvMEcYrQPKEaDpDHSqCK8zaT7FXlSk6JDaONNZyxCk\u002FdG1Gn3+JmpFBQWKzXBofFBd\u002FBMaBCC9oUZcmjNCiB08T4tALxBAvmZwNmyRIpBjjy6KkxZRw+xI+gqyQbpFfcVD4kvVdEfk6eb7v3oy54GzQ4jSfB\u002FHz4MCAdNbFZgVKnrK6g91COn7X0vv0DO1JXYNSJL2cOler4PFPwKNBuZ0oEPi75MAYTYIUZpv5KB6U4dJa7ow3ivYwcyr5U5Hi0BtgaEWmdNdDg7XVwBhI4fsyhpTwWQ7gA6BjE\u002FM4h\u002FZLwiHs0aIHRwFAp\u002FWxI4DubxN+kqXAoGa1x29++rEi4PXTtyvLfCzQbIUzVQQAYTcHBIKoqM4glIgpHeUvgtfJ7eUrtbNEuH16urbyihB9RYzXxUKhICS21h\u002F9OgRtRKLsvu6EMmJpm\u002FVJ3FyPZ2+kCUdiA1nzGx7QHC0JSecdAtyA7NDWF1e\u002Fp0uVlMrYGdohCDde8gwUlTRXiFAbZ4mTwc1UI2WdACduUh2bykR6tI3s+lucEYIJgKyFL+gL9IgyHeA0hWFsUEFyZsFcoZ6hGFHcgJmKJoFgqlAnEpDsvoLhsYIktO4bNIIufpGKm0W5LFUI0hOwO4bmDpCHgE+gUM3OySZVsimv9jJrl7yrzo1hes13E+1RJ5Uk9AhCJaO+BclpouQWyCbqoyrL5tESF5Sil4tkjDQN4\u002FF\u002F3NVE2Z+CutrEdMbL5r6KwkAlfXDbmbbhjs+hT9xhGICJHfvuNjzsBsOh62ib8122w4YbTUHAHXVPxVOX93RT0leHuygcESOLie6nk7+S2ni6EoggIb\u002FScokEEKB1WHJUu90O8OZlHq3262bHDScWSmNMAmBN72URLACVSMsLy9\u002FYGuwSoRVK1AxghWoGsEKVI1QZkJgyyQQrEDVCFagaoTV5VUrUCmCFagawQpUjWDHgaoRrEDVCFagagQrUDWCFagawQpUjWAXyapGsAJVI1iByhGWl5dtRVSNYJdoKi6\u002FCjAAtUPuhnb2u1cAAAAASUVORK5CYII=",dimensions:"data:image\u002Fjpeg;base64,\u002F9j\u002F4Qr6RXhpZgAATU0AKgAAAAgADAEAAAMAAAABEnUAAAEBAAMAAAABEnYAAAECAAMAAAADAAAAngEGAAMAAAABAAIAAAESAAMAAAABAAEAAAEVAAMAAAABAAMAAAEaAAUAAAABAAAApAEbAAUAAAABAAAArAEoAAMAAAABAAIAAAExAAIAAAAfAAAAtAEyAAIAAAAUAAAA04dpAAQAAAABAAAA6AAAASAACAAIAAgAD0JAAAAnEAAPQkAAACcQQWRvYmUgUGhvdG9zaG9wIDIyLjQgKFdpbmRvd3MpADIwMjQ6MDE6MjQgMTk6MjU6MjEAAAAEkAAABwAAAAQwMjMxoAEAAwAAAAH\u002F\u002FwAAoAIABAAAAAEAAABkoAMABAAAAAEAAABkAAAAAAAAAAYBAwADAAAAAQAGAAABGgAFAAAAAQAAAW4BGwAFAAAAAQAAAXYBKAADAAAAAQACAAACAQAEAAAAAQAAAX4CAgAEAAAAAQAACXQAAAAAAAAASAAAAAEAAABIAAAAAf\u002FY\u002F+0ADEFkb2JlX0NNAAL\u002F7gAOQWRvYmUAZIAAAAAB\u002F9sAhAAMCAgICQgMCQkMEQsKCxEVDwwMDxUYExMVExMYEQwMDAwMDBEMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMAQ0LCw0ODRAODhAUDg4OFBQODg4OFBEMDAwMDBERDAwMDAwMEQwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAz\u002FwAARCABIAEgDASIAAhEBAxEB\u002F90ABAAF\u002F8QBPwAAAQUBAQEBAQEAAAAAAAAAAwABAgQFBgcICQoLAQABBQEBAQEBAQAAAAAAAAABAAIDBAUGBwgJCgsQAAEEAQMCBAIFBwYIBQMMMwEAAhEDBCESMQVBUWETInGBMgYUkaGxQiMkFVLBYjM0coLRQwclklPw4fFjczUWorKDJkSTVGRFwqN0NhfSVeJl8rOEw9N14\u002FNGJ5SkhbSVxNTk9KW1xdXl9VZmdoaWprbG1ub2N0dXZ3eHl6e3x9fn9xEAAgIBAgQEAwQFBgcHBgU1AQACEQMhMRIEQVFhcSITBTKBkRShsUIjwVLR8DMkYuFygpJDUxVjczTxJQYWorKDByY1wtJEk1SjF2RFVTZ0ZeLys4TD03Xj80aUpIW0lcTU5PSltcXV5fVWZnaGlqa2xtbm9ic3R1dnd4eXp7fH\u002F9oADAMBAAIRAxEAPwD1VJJJJSLKysbDofk5VraKK9X2PIa0Sdo1P7zlmO+t\u002FwBWG\u002FS6njj4vCzOrVv+sfXGdKrcW4HT5sybW\u002F6UgsZtnczfW79G3\u002F0L\u002FwBGubxeq9FwsL7Fn9Gx+qdSqsdWy5oqFdrA7ZXY7JsDnv8Af7N\u002FpP8A0fp2K3DlQY2eIz0kYR4Rwxn8t8TWPM+uhQhZiJn9KUPm4XtD9dfqkOerYv8A24E3\u002FPj6o\u002F8Alti\u002F9uBcn\u002FzQ+svV5a\u002FpvSPq9jOJBNdDLsrafCxm9n\u002Fbd2K9aeD\u002FAIq+hUVO+2XX517mbQ57vTra7tbXRRs+j\u002Fw1t6jnDFH9InyqX4tiJJ30ewxsnHy6GZOLa26i0bq7ayHNcD3a5qKvN\u002F8AFl1qzEuf0DMJaLi6zHBn23Nn7XR\u002FJbdsfkM\u002F4T7R\u002FpF6QopR4TTJlxyxy4ZdrHkpJJJNWP8A\u002F9D1VUOtdQOBhOfXrkWeygfyiPpx\u002FwAG33q+uZy2v6p1t2NbYKGVE1MB+lAh9npB3t9e\u002FwCl\u002FwAT7\u002FegZxiY31NV3Y8xkIER+aXpDUxMLIy6HdJwH7KXu3dTzh+eT9LHqf8ASexrPZ\u002Fw\u002FwDOfo6LP1knQuhdNzKOq9OzaBdjV5ZbU130mhg9Jj2WM2vrs2N+nWupx8enGpbRQwMrZw0flP7znfvLL+r7QLupeeU\u002F8qsy5skSiTwylRjXzGYPFKVtaHK+3PF+l80Zfuxhw+mEf5et5yzo31v+qJN31etPWekM1f0q8\u002FpWNHuf9kc3\u002FoNo\u002Ff8A6FkvW99X\u002Frj0nrkUsLsTPGlmDf7bA4fzjWfm3bNrvofpP9LVUt1Y\u002FXPqr0jrY35Nfp5TY2ZdXttERtl3+E4\u002Fwv0P8HsUU8hlqQDLv8t+boYhivhyGUYnaUfVwf4P6UXieo9Hud1vrmLhyM\u002FDtZ1fp72D3a7bMqpmnv8AddV6LP8AT0f8I9d39XetVdb6TTnsgPcNt9Y\u002FMtb\u002FADjOXe38+v8A4J9a4hmR1Do31qosuyD1u2it2O443uufXD9tF7fd+tU2fpnbrLrPT\u002FnLVc+rOe7F+tluNj4t2JidUBsdh3t2Oqe1rrvWrZH8x\u002FOVs\u002Fm\u002Fps\u002F0Nare+DIA9+F1+Y5OU8F6S9vFHPDINOKMfTmhwy4Z\u002Foe9H0vfJJJKZxX\u002F0fVVg\u002FWTA+jn1jiG37dDof0Nwj8+t\u002Ft\u002F9VreUbGMsY6t43MeC1zT3B0IUXMYvdxShdEj0y\u002Fdl+iUEA7uV0vqr97cLOI9UgGi\u002FwDNtafoO7e93+v6X9GpdEbttz\u002FPJefxKqU4VfqWdHyifaTZh3fnAH3GP60e9v7\u002FAKqFj5OXRVaKSS19h9TKaC7Ucls\u002FvfT9yyfvk8UsMswMo4zkGnqyicY8MsWQf1P87\u002Fm2WOPi0G\u002Fi7mZn42Gwuudr2Y3Vx+SxsjG67132PsPTOnO5ayfWsb3l3td\u002F57\u002F4vIV7p46YCLBaLbyZL7dHT\u002FJD1pz3V7DKfMjjnkiMfTFhlx\u002F+HZo\u002F9CDKJjAfRC8n+cyD5f8AZ45f9KTxWNidN6P1\u002FJvpr24fRsUAQZc++2Gy5x\u002Fwt3qPp\u002FcWl9UenXu+0dfz9c3qR3N59lPLGs3fmP2s2f8AAV46zcTEd1jPfQ4Rj33HLyiOSwbm01T\u002FAG9jP+uWf4JdqAGgNaIA0AHACbyR928gFY4ylweMj\u002F3kPS2+ezmEfb4jLLkjCOSR1l7UPVw\u002F9VzceWa6SSSvuW\u002F\u002F0vVUklyGPbgspqvxsl37Zsz7WNrrtc99kZVrLqcjH3WN+zMxd\u002Fq76v1KpnrVej6NaSnoupYDstrH0uDL6j7HGRIP0m7m+5V8fE6vj1Cqp9DWN4GpOvj7VmN6j1N1Afg5FNNNTsOllbq3Wbvtb2VPtse+11v6JtzH41e\u002F\u002FwAMb\u002FU\u002FQhzvrB1TFJxftFQsxheTkvFTG2mlzNjMj7Tfi1VMZVaz7d9md6v6T1aPstf6NVcnI4p5TmBnjySHCZY5cFrxMgVQI8Q7F3Ts+9pFjcdznfnQQf8AOa1V29I6xU0tpvYxpBBZvdtg6H2Orc1Bf1XrDQ\u002FLORjNrGRdiCh7HCthYyw1W3ZO71f0WRV+sWbfR+yf4Cu39IqlXXss1NufdW6+ltrLL7WgsqDndO3Xu+w5DsTKxcarLflWWfo\u002F0Pp13WYf6zaov9FYOLj4snH+9x+r\u002FGZY83kAoCNdiNHoekdMb0\u002FGLCQ66w7rXDjTRrGz7tlbVeXN09R6lkZR6bj9QptaLS057aw58Cr1zQ1od9ldk0v9L1bdnpfZ766\u002FS+0fpEPB+sGdb1HDryrq2syA2s1UNa+bC2z1PVrN327G9Syr1sa3078b7L\u002FSPf8ApVcx4444RhAVGIoBhnOU5GUjcpal6hJJJPWv\u002F9P1VQbVW17ntY1r3\u002FScAAT\u002FAFivlhJJT9VKD6qrNvqMa\u002FY4PbuAMOH0Xtn85q+WEklP1Uo1111NDK2hjBMNaABqdztB\u002FKXyukkp+p66q6mCupja2N+ixoAA+DWp\u002FTr9QW7R6gG0PgbtpM7d37ui+V0klP1UkvlVJJT\u002FAP\u002FZ\u002F+0SyFBob3Rvc2hvcCAzLjAAOEJJTQQEAAAAAAAHHAIAAAIAAAA4QklNBCUAAAAAABDo8VzzL8EYoaJ7Z63FZNW6OEJJTQQ6AAAAAADXAAAAEAAAAAEAAAAAAAtwcmludE91dHB1dAAAAAUAAAAAUHN0U2Jvb2wBAAAAAEludGVlbnVtAAAAAEludGUAAAAASW1nIAAAAA9wcmludFNpeHRlZW5CaXRib29sAAAAAAtwcmludGVyTmFtZVRFWFQAAAABAAAAAAAPcHJpbnRQcm9vZlNldHVwT2JqYwAAAAVoIWg3i75\u002FbgAAAAAACnByb29mU2V0dXAAAAABAAAAAEJsdG5lbnVtAAAADGJ1aWx0aW5Qcm9vZgAAAAlwcm9vZkNNWUsAOEJJTQQ7AAAAAAItAAAAEAAAAAEAAAAAABJwcmludE91dHB1dE9wdGlvbnMAAAAXAAAAAENwdG5ib29sAAAAAABDbGJyYm9vbAAAAAAAUmdzTWJvb2wAAAAAAENybkNib29sAAAAAABDbnRDYm9vbAAAAAAATGJsc2Jvb2wAAAAAAE5ndHZib29sAAAAAABFbWxEYm9vbAAAAAAASW50cmJvb2wAAAAAAEJja2dPYmpjAAAAAQAAAAAAAFJHQkMAAAADAAAAAFJkICBkb3ViQG\u002FgAAAAAAAAAAAAR3JuIGRvdWJAb+AAAAAAAAAAAABCbCAgZG91YkBv4AAAAAAAAAAAAEJyZFRVbnRGI1JsdAAAAAAAAAAAAAAAAEJsZCBVbnRGI1JsdAAAAAAAAAAAAAAAAFJzbHRVbnRGI1B4bEBZAAAAAAAAAAAACnZlY3RvckRhdGFib29sAQAAAABQZ1BzZW51bQAAAABQZ1BzAAAAAFBnUEMAAAAATGVmdFVudEYjUmx0AAAAAAAAAAAAAAAAVG9wIFVudEYjUmx0AAAAAAAAAAAAAAAAU2NsIFVudEYjUHJjQFkAAAAAAAAAAAAQY3JvcFdoZW5QcmludGluZ2Jvb2wAAAAADmNyb3BSZWN0Qm90dG9tbG9uZwAAAAAAAAAMY3JvcFJlY3RMZWZ0bG9uZwAAAAAAAAANY3JvcFJlY3RSaWdodGxvbmcAAAAAAAAAC2Nyb3BSZWN0VG9wbG9uZwAAAAAAOEJJTQPtAAAAAAAQAGQAAAABAAIAZAAAAAEAAjhCSU0EJgAAAAAADgAAAAAAAAAAAAA\u002FgAAAOEJJTQQNAAAAAAAEAAAAHjhCSU0EGQAAAAAABAAAAB44QklNA\u002FMAAAAAAAkAAAAAAAAAAAEAOEJJTScQAAAAAAAKAAEAAAAAAAAAAjhCSU0D9QAAAAAASAAvZmYAAQBsZmYABgAAAAAAAQAvZmYAAQChmZoABgAAAAAAAQAyAAAAAQBaAAAABgAAAAAAAQA1AAAAAQAtAAAABgAAAAAAAThCSU0D+AAAAAAAcAAA\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002FwPoAAAAAP\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F8D6AAAAAD\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002FA+gAAAAA\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002F\u002FwPoAAA4QklNBAgAAAAAABAAAAABAAACQAAAAkAAAAAAOEJJTQQeAAAAAAAEAAAAADhCSU0EGgAAAAADlQAAAAYAAAAAAAAAAAAAAGQAAABkAAAAMABpAG0AZwBfAHYAMwBfADAAMgA3AGQAXwAxADIAMwA0ADQAZgAwAGMALQA5ADAAMgA1AC0ANAAyAGMANQAtADgAZQBmAGEALQA4ADMAOAAzAGIAYQBmADcAYwBiADYAZwAAAAEAAAAAAAAAAAAAAAAAAAAAAAAAAQAAAAAAAAAAAAAAZAAAAGQAAAAAAAAAAAAAAAAAAAAAAQAAAAAAAAAAAAAAAAAAAAAAAAAQAAAAAQAAAAAAAG51bGwAAAACAAAABmJvdW5kc09iamMAAAABAAAAAAAAUmN0MQAAAAQAAAAAVG9wIGxvbmcAAAAAAAAAAExlZnRsb25nAAAAAAAAAABCdG9tbG9uZwAAAGQAAAAAUmdodGxvbmcAAABkAAAABnNsaWNlc1ZsTHMAAAABT2JqYwAAAAEAAAAAAAVzbGljZQAAABIAAAAHc2xpY2VJRGxvbmcAAAAAAAAAB2dyb3VwSURsb25nAAAAAAAAAAZvcmlnaW5lbnVtAAAADEVTbGljZU9yaWdpbgAAAA1hdXRvR2VuZXJhdGVkAAAAAFR5cGVlbnVtAAAACkVTbGljZVR5cGUAAAAASW1nIAAAAAZib3VuZHNPYmpjAAAAAQAAAAAAAFJjdDEAAAAEAAAAAFRvcCBsb25nAAAAAAAAAABMZWZ0bG9uZwAAAAAAAAAAQnRvbWxvbmcAAABkAAAAAFJnaHRsb25nAAAAZAAAAAN1cmxURVhUAAAAAQAAAAAAAG51bGxURVhUAAAAAQAAAAAAAE1zZ2VURVhUAAAAAQAAAAAABmFsdFRhZ1RFWFQAAAABAAAAAAAOY2VsbFRleHRJc0hUTUxib29sAQAAAAhjZWxsVGV4dFRFWFQAAAABAAAAAAAJaG9yekFsaWduZW51bQAAAA9FU2xpY2VIb3J6QWxpZ24AAAAHZGVmYXVsdAAAAAl2ZXJ0QWxpZ25lbnVtAAAAD0VTbGljZVZlcnRBbGlnbgAAAAdkZWZhdWx0AAAAC2JnQ29sb3JUeXBlZW51bQAAABFFU2xpY2VCR0NvbG9yVHlwZQAAAABOb25lAAAACXRvcE91dHNldGxvbmcAAAAAAAAACmxlZnRPdXRzZXRsb25nAAAAAAAAAAxib3R0b21PdXRzZXRsb25nAAAAAAAAAAtyaWdodE91dHNldGxvbmcAAAAAADhCSU0EKAAAAAAADAAAAAI\u002F8AAAAAAAADhCSU0EEQAAAAAAAQEAOEJJTQQUAAAAAAAEAAAAAThCSU0EDAAAAAAJkAAAAAEAAABIAAAASAAAANgAADzAAAAJdAAYAAH\u002F2P\u002FtAAxBZG9iZV9DTQAC\u002F+4ADkFkb2JlAGSAAAAAAf\u002FbAIQADAgICAkIDAkJDBELCgsRFQ8MDA8VGBMTFRMTGBEMDAwMDAwRDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAENCwsNDg0QDg4QFA4ODhQUDg4ODhQRDAwMDAwREQwMDAwMDBEMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwM\u002F8AAEQgASABIAwEiAAIRAQMRAf\u002FdAAQABf\u002FEAT8AAAEFAQEBAQEBAAAAAAAAAAMAAQIEBQYHCAkKCwEAAQUBAQEBAQEAAAAAAAAAAQACAwQFBgcICQoLEAABBAEDAgQCBQcGCAUDDDMBAAIRAwQhEjEFQVFhEyJxgTIGFJGhsUIjJBVSwWIzNHKC0UMHJZJT8OHxY3M1FqKygyZEk1RkRcKjdDYX0lXiZfKzhMPTdePzRieUpIW0lcTU5PSltcXV5fVWZnaGlqa2xtbm9jdHV2d3h5ent8fX5\u002FcRAAICAQIEBAMEBQYHBwYFNQEAAhEDITESBEFRYXEiEwUygZEUobFCI8FS0fAzJGLhcoKSQ1MVY3M08SUGFqKygwcmNcLSRJNUoxdkRVU2dGXi8rOEw9N14\u002FNGlKSFtJXE1OT0pbXF1eX1VmZ2hpamtsbW5vYnN0dXZ3eHl6e3x\u002F\u002FaAAwDAQACEQMRAD8A9VSSSSUiysrGw6H5OVa2iivV9jyGtEnaNT+85Zjvrf8AVhv0up44+Lwszq1b\u002FrH1xnSq3FuB0+bMm1v+lILGbZ3M31u\u002FRt\u002F9C\u002F8ARrm8XqvRcLC+xZ\u002FRsfqnUqrHVsuaKhXawO2V2OybA57\u002FAH+zf6T\u002FANH6ditw5UGNniM9JGEeEcMZ\u002FLfE1jzProUIWYiZ\u002FSlD5uF7Q\u002FXX6pDnq2L\u002FANuBN\u002Fz4+qP\u002FAJbYv\u002FbgXJ\u002F80PrL1eWv6b0j6vYziQTXQy7K2nwsZvZ\u002F23divWng\u002FwCKvoVFTvtl1+de5m0Oe7062u7W10UbPo\u002F8Nbeo5wxR\u002FSJ8ql+LYiSd9HsMbJx8uhmTi2tuotG6u2shzXA92uairzf\u002FABZdasxLn9AzCWi4usxwZ9tzZ+10fyW3bH5DP+E+0f6RekKKUeE0yZccscuGXax5KSSSTVj\u002FAP\u002FQ9VVDrXUDgYTn165FnsoH8oj6cf8ABt96vrmctr+qdbdjW2ChlRNTAfpQIfZ6Qd7fXv8Apf8AE+\u002F3oGcYmN9TVd2PMZCBEfml6Q1MTCyMuh3ScB+yl7t3U84fnk\u002FSx6n\u002FAEnsaz2f8P8Azn6Oiz9ZJ0LoXTcyjqvTs2gXY1eWW1Nd9JoYPSY9ljNr67Njfp1rqcfHpxqW0UMDK2cNH5T+8537yy\u002Fq+0C7qXnlP\u002FKrMubJEok8MpUY18xmDxSlbWhyvtzxfpfNGX7sYcPphH+Xrecs6N9b\u002FqiTd9XrT1npDNX9KvP6VjR7n\u002FZHN\u002F6DaP3\u002FAOhZL1vfV\u002F649J65FLC7EzxpZg3+2wOH841n5t2za76H6T\u002FS1VLdWP1z6q9I62N+TX6eU2NmXV7bREbZd\u002FhOP8L9D\u002FB7FFPIZakAy7\u002FLfm6GIYr4chlGJ2lH1cH+D+lF4nqPR7ndb65i4cjPw7WdX6e9g92u2zKqZp7\u002FAHXVeiz\u002FAE9H\u002FCPXd\u002FV3rVXW+k057ID3DbfWPzLW\u002FwA4zl3t\u002FPr\u002FAOCfWuIZkdQ6N9aqLLsg9btordjuON7rn1w\u002FbRe33frVNn6Z26y6z0\u002F5y1XPqznuxfrZbjY+LdiYnVAbHYd7djqnta671q2R\u002FMfzlbP5v6bP9DWq3vgyAPfhdfmOTlPBekvbxRzwyDTijH05ocMuGf6HvR9L3ySSSmcV\u002F9H1VYP1kwPo59Y4ht+3Q6H9DcI\u002FPrf7f\u002FVa3lGxjLGOreNzHgtc09wdCFFzGL3cUoXRI9Mv3ZfolBAO7ldL6q\u002Fe3CziPVIBov8AzbWn6Du3vd\u002Fr+l\u002FRqXRG7bc\u002FzyXn8SqlOFX6lnR8on2k2Yd35wB9xj+tHvb+\u002FwCqhY+Tl0VWikktfYfUymgu1HJbP730\u002Fcsn75PFLDLMDKOM5Bp6sonGPDLFkH9T\u002FO\u002F5tljj4tBv4u5mZ+NhsLrna9mN1cfksbIxuu9d9j7D0zpzuWsn1rG95d7Xf+e\u002F+LyFe6eOmAiwWi28mS+3R0\u002FyQ9ac91ewynzI455IjH0xYZcf\u002Fh2aP\u002FQgyiYwH0QvJ\u002FnMg+X\u002FAGeOX\u002FSk8VjYnTej9fyb6a9uH0bFAEGXPvthsucf8Ld6j6f3FpfVHp17vtHX8\u002FXN6kdzefZTyxrN35j9rNn\u002FAAFeOs3ExHdYz30OEY99xy8ojksG5tNU\u002FwBvYz\u002Frln+CXagBoDWiANABwAm8kfdvIBWOMpcHjI\u002F95D0tvns5hH2+Iyy5IwjkkdZe1D1cP\u002FVc3Hlmukkkr7lv\u002F9L1VJJchj24LKar8bJd+2bM+1ja67XPfZGVay6nIx91jfszMXf6u+r9SqZ61Xo+jWkp6LqWA7Lax9Lgy+o+xxkSD9Ju5vuVfHxOr49QqqfQ1jeBqTr4+1Zjeo9TdQH4ORTTTU7DpZW6t1m77W9lT7bHvtdb+ibcx+NXv\u002F8ADG\u002F1P0Ic76wdUxScX7RULMYXk5LxUxtppczYzI+034tVTGVWs+3fZner+k9Wj7LX+jVXJyOKeU5gZ48khwmWOXBa8TIFUCPEOxd07PvaRY3Hc5350EH\u002FADmtVdvSOsVNLab2MaQQWb3bYOh9jq3NQX9V6w0PyzkYzaxkXYgoexwrYWMsNVt2Tu9X9FkVfrFm30fsn+Art\u002FSKpV17LNTbn3Vuvpbayy+1oLKg53Tt17vsOQ7EysXGqy35Vln6P9D6dd1mH+s2qL\u002FRWDi4+LJx\u002Fvcfq\u002FxmWPN5AKAjXYjR6HpHTG9PxiwkOusO61w400axs+7ZW1XlzdPUepZGUem4\u002FUKbWi0tOe2sOfAq9c0NaHfZXZNL\u002FS9W3Z6X2e+uv0vtH6RDwfrBnW9Rw68q6trMgNrNVDWvmwts9T1azd9uxvUsq9bGt9O\u002FG+y\u002F0j3\u002FAKVXMeOOOEYQFRiKAYZzlORlI3KWpeoSSST1r\u002F\u002FT9VUG1Vte57WNa9\u002F0nAAE\u002FwBYr5YSSU\u002FVSg+qqzb6jGv2OD27gDDh9F7Z\u002FOavlhJJT9VKNdddTQytoYwTDWgAanc7Qfyl8rpJKfqeuqupgrqY2tjfosaAAPg1qf06\u002FUFu0eoBtD4G7aTO3d+7ovldJJT9VJL5VSSU\u002FwD\u002F2ThCSU0EIQAAAAAAVwAAAAEBAAAADwBBAGQAbwBiAGUAIABQAGgAbwB0AG8AcwBoAG8AcAAAABQAQQBkAG8AYgBlACAAUABoAG8AdABvAHMAaABvAHAAIAAyADAAMgAxAAAAAQA4QklNBAYAAAAAAAcACAEBAAEBAP\u002FhDNdodHRwOi8vbnMuYWRvYmUuY29tL3hhcC8xLjAvADw\u002FeHBhY2tldCBiZWdpbj0i77u\u002FIiBpZD0iVzVNME1wQ2VoaUh6cmVTek5UY3prYzlkIj8+IDx4OnhtcG1ldGEgeG1sbnM6eD0iYWRvYmU6bnM6bWV0YS8iIHg6eG1wdGs9IkFkb2JlIFhNUCBDb3JlIDcuMC1jMDAwIDc5LjIxN2JjYTYsIDIwMjEvMDYvMTQtMTg6Mjg6MTEgICAgICAgICI+IDxyZGY6UkRGIHhtbG5zOnJkZj0iaHR0cDovL3d3dy53My5vcmcvMTk5OS8wMi8yMi1yZGYtc3ludGF4LW5zIyI+IDxyZGY6RGVzY3JpcHRpb24gcmRmOmFib3V0PSIiIHhtbG5zOnhtcE1NPSJodHRwOi8vbnMuYWRvYmUuY29tL3hhcC8xLjAvbW0vIiB4bWxuczpzdEV2dD0iaHR0cDovL25zLmFkb2JlLmNvbS94YXAvMS4wL3NUeXBlL1Jlc291cmNlRXZlbnQjIiB4bWxuczpkYz0iaHR0cDovL3B1cmwub3JnL2RjL2VsZW1lbnRzLzEuMS8iIHhtbG5zOnBob3Rvc2hvcD0iaHR0cDovL25zLmFkb2JlLmNvbS9waG90b3Nob3AvMS4wLyIgeG1sbnM6eG1wPSJodHRwOi8vbnMuYWRvYmUuY29tL3hhcC8xLjAvIiB4bXBNTTpEb2N1bWVudElEPSJCRjgyRkJFNzZFMjA0MURDMUQ3NDgwNzdDRjg3Qzc4NyIgeG1wTU06SW5zdGFuY2VJRD0ieG1wLmlpZDo5ODQ2N2YwYS0yZjBhLTZiNDMtODU4YS05NGRiNTJlODIyNDYiIHhtcE1NOk9yaWdpbmFsRG9jdW1lbnRJRD0iQkY4MkZCRTc2RTIwNDFEQzFENzQ4MDc3Q0Y4N0M3ODciIGRjOmZvcm1hdD0iaW1hZ2UvanBlZyIgcGhvdG9zaG9wOkNvbG9yTW9kZT0iMyIgcGhvdG9zaG9wOklDQ1Byb2ZpbGU9IiIgeG1wOkNyZWF0ZURhdGU9IjIwMjQtMDEtMjRUMTk6MjQ6NTIrMDg6MDAiIHhtcDpNb2RpZnlEYXRlPSIyMDI0LTAxLTI0VDE5OjI1OjIxKzA4OjAwIiB4bXA6TWV0YWRhdGFEYXRlPSIyMDI0LTAxLTI0VDE5OjI1OjIxKzA4OjAwIj4gPHhtcE1NOkhpc3Rvcnk+IDxyZGY6U2VxPiA8cmRmOmxpIHN0RXZ0OmFjdGlvbj0ic2F2ZWQiIHN0RXZ0Omluc3RhbmNlSUQ9InhtcC5paWQ6OTg0NjdmMGEtMmYwYS02YjQzLTg1OGEtOTRkYjUyZTgyMjQ2IiBzdEV2dDp3aGVuPSIyMDI0LTAxLTI0VDE5OjI1OjIxKzA4OjAwIiBzdEV2dDpzb2Z0d2FyZUFnZW50PSJBZG9iZSBQaG90b3Nob3AgMjIuNCAoV2luZG93cykiIHN0RXZ0OmNoYW5nZWQ9Ii8iLz4gPC9yZGY6U2VxPiA8L3htcE1NOkhpc3Rvcnk+IDwvcmRmOkRlc2NyaXB0aW9uPiA8L3JkZjpSREY+IDwveDp4bXBtZXRhPiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIDw\u002FeHBhY2tldCBlbmQ9InciPz7\u002F7gAhQWRvYmUAZEAAAAABAwAQAwIDBgAAAAAAAAAAAAAAAP\u002FbAIQAAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQICAgICAgICAgICAwMDAwMDAwMDAwEBAQEBAQEBAQEBAgIBAgIDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMD\u002F8IAEQgAZABkAwERAAIRAQMRAf\u002FEAOAAAAICAgMBAQAAAAAAAAAAAAAJBwgGCgIEBQMBAQEAAgMBAQEBAAAAAAAAAAAABwgFBgkEAQMCEAABBAICAwABBAMAAAAAAAAGBAUHCAIDAQkAECBAUDETCjARFhEAAQUBAAEDAwIFAQQLAAAABAECAwUGBxEAEgghExQVCRAgMSIWFzBAMiVBUXGBkbFCIyTVJhIAAgECBAMDCAQHCg8AAAAAAQIDEQQhEgUGABMHMUEiIFFhcTIjFAgQgUJSkWKSM9PUFUBQwdFyU4MkhBbwobHh8YKiwtJDYzR0lKT\u002F2gAMAwEBAhEDEQAAAN\u002FgAAAAAAAAAAT3YWuaI7DQtYCKrLo32DY4K1vbmh6HOu+zBML\u002FAKAABiuTwmv7crn6s+QfLSD7N22LSa0ja47yujHM17Hhx7F75NZrGAACepvpPVWboDxH37azGsF+tZiR8nsiQfL9CPHZFXmFs9uuOUGW\u002FwBYoAh\u002FH4FC3ynr\u002Fc7bxZUp1VcDEFzuIhXJWyTdEfQK+n7wt6WO\u002FfZ3lTlaAUzi\u002FUqaQBhXLXPkdUVKtNa9dXcgwXCbHra1L6u0y+T36P7+XdAuTwOAAQVxCxdjZQz+UfM2zzrPrdI+bFkqZeOdfXm3F0MjOx747u83rTb5DgAEFQD6KI859lhynciTdJm0MC6\u002Fw\u002FK1j9GRnyk6OdmQ\u002FM8PoxzsAAAFn8epHjOtW6yVuefjeAJnye2+uUlh2wD\u002FAPvdy8y7Y8GABCp8iVcB+8K1e2rs\u002FjkOXzJR17vbMUs61LkwaIAAFEipxhhxItLRmBnhHrmcjiwAiA+xkR3SFyWiPDqEpEbk+AAAAAAAAAB\u002F\u002F9oACAECAAEFAPwIDjNCYOchqIjjvRxJ0ILckrSNO\u002FD6iAGbx+HWAhYfpmaHB\u002FdpbcEcWx0xDerXEpby4x2yEUqHJDllllnkAOuvFgKkunS5\u002FMPq0IurZiUWFFTwVOJ1XQFlowAsnhyjmVeH4Wc2Da0umTb4vef40+GeOzD26OSZpRjxqrW5rV6tx3NElakcW+ft4penhQmfyxWy8sTqkJBwFKv5l\u002FuXWxyWh8VWKHUO\u002FjnjngnmXY1W89EBIwijYX2M\u002FwCr0RsDoTcPhZDkTmfxa+PMolmKEbKmFcjE4W6FPZqqVJkKaZuxwJZneOK1TjMrpZjkWh+DDQvWDUXxqEpwAP8Aiy0KIZ4ietqYTs1F29Wb07tC3msv9hRdAlU4aro3edhMgI0hzUQXeZPLvq7IW\u002F14mqB5XgI+m+TxPq3NvEcrm1fMYv7VSLXwavSq5VphEVZggY+ikWHDZg5oRUnnzKgFQ8vMOv8AqFr510Rqhq8jOAogh5T+l\u002F\u002FaAAgBAwABBQD8Cf5VVBSGKM5ROlSIWPEeKfYr1cNeOW3ahANZQNft9EpE0CQ\u002FX3Fym+T3ExxWy7HbnrMdyJnb0OHlVU6Z1GrNxXkDlPzaZ4Wme1a0GslaBkdaYzte9hTA+bmBpc028kio4GGujbjozkoLF0UxAz6yOo09eyZ81DrK0Ry8HjwKibEGNBMF4qrb+oZtaSx7nD0QwBKJzY3EvgG4t\u002F4SQOjL7MmXl5ahBfy2aPNw1ioLvQoIE5y914pwPRK82HvdIENyxfCdFIhXb41sujLambduadMg\u002FwBsOGGezOOa9Ln3BdZ2P4XZKlvZlLlkojZWsrn2epcc5tk34S7uNG9Zp3NW1A3pSZgSJWWMkJdIpUac+UhZ96EXtPIDdH0ffQnnoe0DozkCRmGls0NeSEaYjjh9qgzruAjhLAEKkxG7FxB9JlKhHv1n5fq51SmfafNc1yhq8wn2XdfhbJp4dJv0v\u002F\u002FaAAgBAQABBQD8DtwuqZV3jyp4b2ZXfdznrE7vRtstPYy4db9cBzd3r26eKNdkV\u002F8Arj7COOeMuPk4NB2OQ6mUcPFxbfzzKRJj2GRiHxV2CzPW3o46xq6bEDegakP9kqv2LNfnoTvOrsnWj57FpWcJAJXRnObQs4fXQUrj2yXv6o6k39ZiLnuu\u002Fr5unX93AVB7CGj+zFEK0pqZLJKQUYs6AnQnKAR7niXWmD4vh6DpKtNIMSxECwmGnzPpUdrPmeGG3C8HQVWOxCu0NgexquVd+vAUii8HTj0L2rJgF+93niV3lGFKT25YQBJxzxzwbcc8dnfqR5Oj2IBK8naab2YFKOdN8IWnrX020+bZQvL8WpgBkiCboBn40rcSFerQq7Fd27Un1TZchsE8W7r8mK1Rd2UiEWVnoZZsrIo3pbTOsjBUmv3xYuG0M4RhXtuGZ\u002FjNtSlFa5yayCRbWkEbwuAxdo87aCNG9SB13wm9TVNv1aEMdIal2IzGLiqUigbpaQ+ORqVRZqGOwkkZ8pbwcLkWnAgccjQL+JlsZBtek41ZWvxlGj4xMRayYVVr\u002Fr8wrBBGHm6scFb+DKv1Pw3yOYHiKJV3zOvOvi89vzceB7HzqssLsELLn06UR5lgwuJBIHNYpPNeykNlOSI1WaTaT2+nBPrn4dhVnK5Mje1nuXIAhOe0otAsIg4EsjSO3BndwUJIHIDp1VeMWU0jKOJITx9VmuUVM6SpdZkEt7o3j5QMNNWK4MMsf5\u002F\u002F2gAIAQICBj8A\u002FcGobk3LbB9qaaPErnLHNOVLBHII91CgM02KjKEVjlc8aRdav0rtbixvQ3LmtrW2MYZQGyMXZcWRlkQiodDUE0NEtrHpI73LHBVs7RmPoAWQk\u002FUOEkTpL8DbsKhrmC3iY180QZpT+SPXw37R0zS4G+6Y0z\u002FUi1f\u002FAGeP2ptuyjjmCs0fLTJzMrEMjKO80OWuIIXEAmvladomlQGXUbqZYo1HezkAVpWgHax7FUEnAcaD0h2o5l1nUIcspQVd4nak0mUVYtezAxRilfh4yo7Rxo23utS24s7Tlls8jJyI1cLapPKlCksefkvkagiZY5GqGpz+mnTayubRoi7vAyqyLTwuEiBuLqMjxGRJio+0KY8Spd6u1vAxxjgHJX1Er429Od29PDO7EsTUk4kn08XVvM35iYkfyXAOP1huJbm2p8PMc1BSgY4mlOwE4js7wBQeVc7uvULXSIyQKBVwCKSMg\u002FnHB5MeGGZz2Y8at1a6hyC633dE\u002FDwIQwtlChY7a1rVRIiZUluiMkK1WIM7FpN4bp1VEW5nupFVEFEiiS6thHGveQgJGZiXY1LEnhLfTr3n6Lmq1rNVose0x4hoX\u002FHiZTXFgww4N0sB03dT9vsh3Y+kBY7oekiOegxqMeCLhRJak+GROw+sHFTiK1FK4Ani7GekciUPrBwr+E8RiQloPsmtcB3d+OX2QB3EVx4V0aqEVB9HkSXl04CCgHpY4AD1\u002FwCSp7uMliapHTMR7K+YV+8fMMaYmg4595OzyUoK9gA7AB3D0D19vGvdPXlGecllXvBM0MlR9UZ+io4MDXzMQKAuS3cRQmtaY9ta4AVoKcTWur2jRxygiOT7BaoplcDKe0DI2STzBhjxdwSXaRzpIyAlgCrDxxtj6x66EcXm19RIW\u002FjLGMVB9k+8jBqa5T41IwKN+L5F1f6QGe+sD8Ryx2yoisJEA73yFmTzsoUe1xabZ3Tdw2+mTuBb3ZIVEdzglyewK5PgnOCkhZSFo4BBw42l0oUuUvYIyQPZFbO6mNcfxF7u8en6ZtZ3Hq0FlpiYGSVgoJNaKoOLu1PCiBnY4KpOHGr6T0\u002F0R59IhhZ7i5liz5YVqGk5ZBWGMHLSSarHMAI0kpxe7w17W5rOyhllUFRGVEMCgySMzqx9rN2GgCYivE2r6c0v92dMdmR3pnkLqyQo2UBMxQtJJkwUAL9oHybG31LUriy6RbjleWKeGLmtZGQ5bqIJ\u002FwA1baV1uBAAC9tKIlbOKrb9CfmIuebtBljOmaurmeFLaX\u002Ft5o7mn9a0mdaGCX87Z+KGRVWNobXpdLbXCS2stlC6MhDI6NpV0VZWFQysKEEEgg1B4uL29uI4bOGNnkkdgiIiAszuzEKqqoLMxIAAJJAHDbB6BaUd376klEKzxrI+npMxKqsIipNqMhYAKtuY4HDq0VzKQYjadQfmY3bdWkLUaLT0ZPiVQ0bJkQfDWEbUjYxojSsc4mggm95x\u002FdfaelxWNtqd\u002Fb2qpEDncLWZ3kkYtJIxWBY2kldmowBbs46VdENto0m6tVs7ee6jjxcfFvzo7aoK+8lmkq4JVuUi1BSXjStuxsr3arzLiQCgluHoZX89K0RK4iNUBxHk6\u002Fsx1RdeRfidPlYD3V5ErGMFj7McwLW8pxyxys4BZFpuH5V+skLW3UTaZmOl3TBfjLWEPypIlBIZ\u002Fg5wIri2cqktu8KKEe2SaLad91Wgm1ibbULm1WO5zLcWT211BZ8iSQSGCDNJ+baPNCodeVUBTc7c3l1r2\u002Fs7p5FcoBpZukheatWT4awaZJ9UnQopeS4mWKF2LQmEsICIth7dEm4nQrNqd1lmv5q+0BLlVYI2oM0NskMT5VaRXkBc8dLdnXNZbC0t3vbmJWoXW4mEagkYqwW1ky94Ele8cbo+Y3esVWnuZI7BCAVDU5bOnhB5drBltYO1RV1weAeXsX5uem9ows5L+NdTiQlUa4ClXEmUEJDqVoJIZmphKsjlubOnHVHrv82Gv2We9Z0sdMubO61GGk4ZBmWK0nj5VhaJHbW3MozFhLQSQK5ubza3VrUNr6s9SDaWesT2xY97211YStlH83bz26juAGHHO6BfOXFuPa1uQFspbbUrchagELpmrWk1kgIwZre4aQAYEEKeLe06q9PLW8jwDXOnSNbyADvNvMZ45XPflmtkr2ADABNjxXC6LqMltDAZowGtrSC3jE8syo8igIUnmYLJRiciEsy10PaO37flaNp9skMSk1bKgoWdvtSO1Xkc4u7MxxJ8vU9q7t0a31Db17Hkmt5lDxutQwqD2Mjqro6kPG6q6MrqrDHo\u002FCf7fqv69x4ujkH\u002FAL+q\u002Fr3FU6OQA\u002F8An6r+vce76Rwj+3ap+vcX97052Pb6bfXKZJJeZPPKUqGMay3Ms0kcbMqs0cbKjsiMykqpH71\u002F\u002F9oACAEDAgY\u002FAP3Bpe19uXJXdupmqlRmeG3DBC6Ch95NIRBDgTmLsozICNUitOo9ylzZMolgubi45lCSCygK2CuDG4NCjihAqOA+obuVox2kzTU\u002FCyjgK+t85x9xmYflEAf4+I+fM3JqKkk0p\u002FD9WPEt\u002FoUGa+SNnjCrQyGMnOhH3iActftBQCATXytZ3Pr12sGjWFtJPM5I8KRqWNKkVY0yovazEKMSON1dd95R8jbumzloVkOWOOWNKwQ5jRcmnW5EszVp8TKHbsPGtbh6ISXDzXZkC5I1b4iQqWungieueKTJzUzjM0gaSNaFKht17quFv1fJklBNWB8SlmPLhYHDIYwT3GuHCiGLMR3scx\u002Fi\u002FAB9GuNcMA2m3eYk9ixSR58zeZQY5CTSlAa9nFvujSrem1tbrKhUeCK5oGmiFMAsgYTw9mZHYKMsflaZ0i0O6jSB5Y5b13bLEGHjgjmY0AhhAN3PUmoSJaVNDt\u002F5feiKNZ9KtNC\u002FGXsqshvXLl5b6\u002FpR+TLLnktrAHm3LZXnKRIFh2X050eeSS2gsIXaSU1knmksrtpZm+ypdkrkQKiCgVQBib2a0EWqU\u002FPRgK5p2B8KSL6HBoPZI4tNKEvNjd1RTjTE0GBqV9VSvEWvX+hzPoDgf1hFLRqTTCWmMRxAq4CsTRGY8a\u002FtS7ANrqWlu2U4hmtXEhUjvrA09R5q93HVj5a90XAXem17yaC1mlOZ+RHM40y8JoCRDmWzuAuJtpImZqtXjVdva5ZPbazZXEkE8TijRyxMUdT6mBFRge0Eg+Re6o65pUWkaffkbBF9RPb5lBPdxdGSQ+9lMl1cuKqpc5moD7ch7I4+wAKWooxh0bQLMRWqnMx7XlkPtSSv2vI3eTgBRVCqFUbT3eqESQwRqSOwgWl2lD+X9Nvou8tP\u002Fb2zivLZHYLdxRnwkRTMGWVApNYLlZEagRXiXHja3Wz5eNzRWOvaXcc7UNJCFM1vLG8NxHJp+fmWTMkhVLi2a409mUjKpOYaJ1V2roF5eaZqWn2l1cxxRSOlzCyfBahbNlBXM4gzjNgkrRS0JA42v80WwImm21qsFtHqDZCje+QfAX0qModHmjHwl1nAKXEUQIzS1P0qI6maB+YB96gIYevKSR6RTv4TSrmNRaVqjgUoWNSH89T2Oce5sKH6Itw8sZokUV\u002F1HX\u002Fe+m025s\u002FQrnUdcnPghgQuxGFWamCItavI5VEGLMBjxtHqF1p39yN4y3kcem6dYXht1a8auSB7yNhNeSnHmWtgOSFDCa6aIsp0TpX072Rpus6ne2lrK4la5SQ3l9Iwgt4UtpoUFY+SaMpYtKAGCinC9Pt0x2MnVDeEEMNxFb52trdLeWK4v54eYzSmNbiNLS2aRyzsJpakR+TIUi8Jxp5vP\u002FHTzV83HuPFl7u8U7vq\u002FwAO4m+uyniXD8DAfw8JHGpaRiAABUknAAAYkk9g4XW9+aj+yNtxrndSUW4ZBiSzSe6tUp2vNVhT80ag8XOzuge1rWS8YUlvHVuQzDDO5alxqDjHKZmjtlNDHHImHFlvrfOvXWpalpNhcXCyTPVY5HUWsEcSCkcKB7jMkUSJGpUkKDx1o+abecwg2not\u002FdW+nSS1yL8DFyJr1a4FLS1jAhoSDdTLTxx8a\u002Fvm+Vo7ByILKAmvw9lDVbeHv8WUmSU1OaaSRvteTHIR4a4+r\u002FNxb6rbLmtZKZh3HCox9I7D569oNDcRacwi55AJK+ywZS1QO00GFDQ+fhNStNu3V\u002FqpUkShCQtO3PKAVgU17EUuwwJ+1wsWragV0xDVLaOqQL5jkr42\u002FHkLviaEDD6Ooe57aTlajezR2cEhFQhiiaUuB3hZZoWI7ylK8bW+XXZMvLgFrFJflT4uUG5scchHbLeXBe+uamprEDUMfLvNvXjDPkJQ99PR5yjUIHmoOwHjTdF2pbOXWheVXWNvDjQEspq7Es1MMKdhpwiy7fivoB3SPCj08wkSVfwsH4WHe\u002FSd7O9cYzI8Df8A0W0iSj0cyNh68eHm2lumW2lxIiu0EqegCaIK6j1xOfTwJNyvE82mrPNMI2qs9zPK3JhjYhc3MHKjrQEKGY4KeNX3NrlyZdWvp2llbuzMa0UEmiKKKi1oqAKMB5cVzazNHcIaqwNCD\u002FowI7CMDhxWPWWB\u002FkRf8HA5W4nFP+nD+j493uyQf0Vv+i48G85R\u002FQ236Hi0st17lnu7KBy6RkIkYcimfJEqKz0JAZgWAJAIBIP71\u002F\u002FaAAgBAQEGPwD\u002FAHDMcb4Zdk1HfO0SowG5pworPQYnCxnxVRdzQBTwFxP12vvZ4qWk8wEPSeWeaKNZR2KnSs1yj9xjofP9jywgFulxHU+yddrdPNXWExgH67WC0FZbq+vrb6uJrTmSpFMGfD9uRrffH7rDSXX7uVPjM5WxKRYXmj+R\u002FfqKmrYEXws51kdkoQQ4W+f+J70T1cCs\u002FflP7nrKkuUSfKfG3t\u002FyD31WLJC5qELedRLoaTl9DHEir4\u002F5kS9VaqexPKL6gr\u002Fi53b9y7swchUME+pyPS9+BzkFs7vEUx\u002FUtTYZ\u002Fm4w6u\u002FqqWjnoiKvtXwvqP40fuS9z6H0TA2t5lOedzp+rdB\u002F1YH4\u002FadBpKW75x1nEdAJGbZJnq+W+DhvIYJJK6UAsmZIVnCjl9I5qo5rkRWuRUVFRU8oqKn0VFT+bT7zW2EdXmsjSWOgujpXMT7IFYNITMkTXvYkxUyR\u002FbhiRfdLK5rG+XORPXQ\u002Fnj3OCOrwnKbv9Wy4l3M0Wlq9LR1z5cXTKbP+OHFV8ZyczbM2VXpH+vGpO\u002F6seqdG7P8AtfybVmx3w+rErv0HO1F27oOgsaaey65bYTHX7SBL\u002FMXyUDLoJpo7p5bUeYwOBzZIGvmyH7lP7jXXuVW8GjDz+cpOpVZ1vS6W1dO9LDNz6neFR8d4PpwTfA8FZZZqAghX+YZEkRInU+jz3x2z3XtfXtQis6H346TtV0xs8bHRE1NfqUJwtGiN\u002FvidV1ISN8+WeERvgSsqwhK2tAHiEBrwBoQwQhYGJHAMIIOyOAYeGNqNYxjWta1PCJ49cx6ZWVEp0XyQ4VTZwwdkKvddbbmepKyTKsVjERSTbPO7WoHbH5971a1qf1T1P8e+m3ExXefirFX4qyfbSql1sOUxPnqsDqymkPaWXdZ79Mmz14qsV0VhXMkmd9wtE\u002FmzXxZw9oAK2Q6rvekWh5zAaQMt3g\u002FO1OgNlWKAekz4cTr608vVPtQjM8K93tXO\u002FCj4qflZ\u002FwCPOMhHdvN5ZgEVzOhEEGSGW3Q+kxxuiNSivrf7xdRmVkQu4k9kpn2h4mxjfHbm2HWwJpM1l6wwy3tpnT2Wh0lxyzo6aHRGu9344pFoTCxyDDNjFHY1rImNan1siOl5OXFdYlrJK+p7jzmOupt8O1o6wiBamMkIyg6Pm4XIxH1t6KbEkSK2B473fcRDs7Az5t\u002Ft7VZzVIDWLTXmHxWaQp6OcQPEum6n8Tj2QkefvDf5NztCJEV8cL3Ijaetwmmn5l2kuuYbY8F6cTWVW2crIEkLMw9mIYTmeo51j2SOjMpCiJWwMSQocRXfbTi\u002FaqJZodFxHuIQgxw3ujJra\u002FqNGZnobCIlio8ZQtrVUUrXJ9UkY1U+qJ6+G37tvEKZ0nD\u002FAJh4HLbXqeQzQ6AVDtnpsrTkfJTlMYySkwjTbR4U2wo1ma2OPSgGMijRsaNXI9Hwd2HpMVus5T6zK39fJ9wO3oL4CCyqz4HKiOa0gQhrla5EexfLXIjkVE\u002FjpOg2bGFEV47Qs9UuejH3uosVUeip4\u002F743e0oxUdM5F8xjskk\u002FoxfVy2IuTwfcT3nUulWAyk1tQXbz\u002FnGRjwSKxltojGu8A13u9kMLY3TqyBie8HD4GpSuqhXOJNMnehNxf20zWIZe6CyVjJbO3OcxFfI5EaxqNjjbHExkbedWb40dKLn6NY5FRFVq\u002F4BvYvov9U+kq\u002FwfHIxskcjXMkje1Hsex6K1zHtcitc1zV8Ki\u002FRU9WHTPjr+H8V+5od\u002FkMBGPrHxck0ukgf+VBZ3WEp56ubGaKUyJipe5ietMher55YTJF8L1b4HfuMYG06nzvpuaZmuPfIY60jtT6rYUF1V6XHXdT2JtV+ldWrRLGmjlJqtBBT7SKGZFdPK1qRv1fxB65ssrlL7m+63+NwNxd29ZXWOF0IdsvTuO7MBD3tIUapK1CDvSJPcSBCUL5Rr3eujftpd1ngrNzzC72dzyEZ1hDYCRMpbSVercvpbAYkgE2sztvK6\u002Fo\u002FsK9s1RYEKjvYMjW\u002FxIMzrZyr7nVom6DqIv7v1sECuPCvK+KJXNR9glOdNML\u002FAFV08SMRP7\u002FVXx\u002Fof6VUY08t0mM3kA8IA1cfbzJN+l7eVjWMeLYkzJ+NbyeFic5sJSoz2TNRUVFRURUVF8oqL9UVFT6Kip6wrkYrkXP0f9\u002Fj6N\u002F\u002FAA+2RfK\u002F9f1T\u002FwAf42e76fsaDDZCoZ7jr3R2ENeEx6se+IQf7rvvHWBP21SAaBshE7\u002F7Y2Od4T103iHxR4r+tcsAy1nada6r03BwaucXCB\u002FbSw0IHP7ceegwNKxytUW40r1OdI+NRa1hCRyJrfkd3Lq245bQZLTbSor56UHIG1sOE53WDT6HUXhusz14c90Vo05vuikZG2ENVVjnuV3p\u002Fbedv1tf8efi9b3l7lLrUsAF1ulsNPSXmZ5pmrtKgcSpgtScrakXlxCLBHEPC4UZUas6\u002FwArIj3l0\u002FHemGGXNbaVoLTf8SebKrdRWsBlRkB42dsDG2DQ\u002FPmasndExyPjTwLwX5AkfmY2KER2F6BEXJa1tfQGKjaU8C6cjn6DmljErfxCneSadf8A45DUjZ4HwtkPJFPDNnqSYciF7ZYZoZMZrFjlilYrmSRyMeitc1VRUXynqUgiWOCCCN8000z2xRQxRtV8kssj1ayOONjVVzlVERE8r6XOchpv9RtodJ+DXExRGFZyI57vtxxhwVjXWusK96eGwheyF3lF\u002FITwqegem\u002FMjoehDqoHumqMJXGDN0I4cyI\u002F8IGEVs2Z5mBO1WpI0CIi2lRFSaeGX+71bce5BjM9g6HpGxyuXLqqINsRdtWgFv2GlsrexlWa0vbGeuzH2ZyzJpyZGzI1z1T18U\u002F26eVCyW3Ser4\u002FG6XrFTTIxDzXdFun6Sh54T9hXTwn7rZ2znno9rXx01e9XeYp09Yvj9Q8c65Ejm0PQNFBGrF1fRL9IidTeu8tY78dxLGChtVGrEALBH4\u002Fs\u002Fluck9sEV8L4u8fYytav6fpq+OVwKPe5W+0OyY94hP18fYncvhVanq1+N\u002FVoJ67b8wcfNgbyVkaaTPV8ZDq86tGdM1HE\u002FwCMWbfwzBJHLEUA+BqoixtkZnpegDzaEvACvZUxA2bXj2udnq7iszrKkkx0z6qqbKW5UHlb7xUa9jWO8N90tbp+h5TBZOAqFEyLLNgriWv\u002FAL4kr6GaeA7YGR+xPfOXKwaF6+WRNVfYrXZ2oZNcvgSAvS2iRm3xbPH90SGLGxoIrvp5gGbDCvhFVqqnn+HE8AfAtjSZWtsd3oKyCVI5Dm3duHTxAPk+v2JC6ijOia5f+BpCu8L66H84etDocbJoLms5tCTE1REu5Bm01pb1sL\u002FCRVODzMMGcqEaz7bEYS5vh8aL\u002FPkfkfhx1igOug4dSLC50Izr1IvxpWFLCxPs1+1o2PEId9V\u002FKja\u002F6vkT1u+r99uKv7tvKTFRZW1qbPQAvbYsWBrnxQVp47gM3TQxBCJKiOV7nSoiORF9Pmz21Pxha\u002F3Rtp63TmVLZfPn3uqbelLjY3z\u002FAOmCSBE\u002F6PHpCOW\u002FIAbSVYvj7dMWDejucxfDU8Z3VVp9W5Hp\u002FX8YqN6In08eE9MG6JgAb0dv22Ptsca6qsEai+HzS0tw8kEiRU+vhhkDfP8ARE9fYwsZ4g25IoKGknshVhKz2Tz9SP8Art9ZCMlIaOlYrjCPYj3NfLJGxFVZE9ZnA5EFtdm8nTh0tSKntV7RhI0Ys5D2MYk5hcvumnlVEWWaRz1+rl\u002FlzRPaunZbnTdlYH1WTg0Br4ztGfVV8ltaj0taNETYWK1dZGpBLoonMHi8OkVqKnnSdjyvZec33LsWNYlbPb1mrqCaHGsqAY7O2i1xTCfOYMq66Vk5A5zYJ4Intc9jUciqbRX9aDeUNwMkJtedCwkIwd6slZ7mORUVWva18b08OY9rXNVHIions5yA3x\u002FT\u002Fm+kX\u002Fzul9f2c+Cb\u002FwBlto\u002F\u002FALj17ZefhPTx48LbaJPp\u002FwB1wnrNruc1k83\u002FAJlrKbB5VbzV6Cu\u002FyHZ6JSEosvUff0Ef513bqJL+OPH5kl+272ovhfVjac8wtRm7S2HYGfZwvOOspg2SNmQJp9qWeXAE6ZjXuhjeyN72tc5qq1FT+X4EI5WJIue+XX20VU96omC557\u002FYi\u002FVfCKnnx67Xv\u002FjxfhUHYcL8A\u002FkbZ\u002FKfUY6KMyGgIranLmfEJ\u002FS0eKfkH9SrtTBfz5mG0iltWZtLXyz8B8SO\u002FcZ+SuS+V\u002FWMhrvjR2K2zvx4wg8mfn4piKzNcb4vdWtX0HnsNZX\u002FAOr1brNPrrAwht0a8quimjjrJwnMWV\u002FWaPCfITpnTw9N8Z6fo8l33S5zO00HNdxD8juPcZ3PXMlbX0GdyOXzwmF68XdEU9ixmOqD6QadIBa\u002F86KXv+ezPUO2cfbb\u002FHoPSZm3+R3cfj38h+yYjqqdu4\u002FzrM9V5pn83c6W5IwumqdnaxaMC1D\u002FAMUBtAwIwGwflkwSWnL+N\u002FLfdyn\u002FACC+OfabbO6P5P66k1lZzvtfMdZxwSm1+f19lSjMwwnQ6XqB1RKDEEXQ1VslYYFUqkRARmN5URr\u002FAJU4nqAXyj+KMfUeZfIroXK+3TB8664D1CuBE5r2CiFs36fEdWvebHDyMPOHtayxrZfYLWQStSbnv7iAXy56Vqu3bG253oX8mhua5fjxp9RuejUeFuvhxRcOjqjZQbqisTS8aGaKSm1g1YP5hhsytLBk+SfzMf8AKvsl1vOOfKHv\u002FwDprzb9QpK7honIeVfJTU8\u002FD4pq+cVtPCNtYLzJVZAhF+aQ7RCkSwShlj\u002FisY8Tad76r3EnAdO+Tug5ZxToPKesc12fxUvhdMRqKTm3x56Nwv8ARU3HN9tm5aNaiwvgRLCQnVgvnLuwoCvwIf40AfZuXYvpUGUPMtMwutow7UjPWFiDJWWJVKZNH+XWS2NdK6Aj7L2JPEvtf7moierjlmP5JzjM830TD2aPD0mNoK7MaRbUSIC1m0VOMBGFfE2gUDIipi2TSkMaiSOciJ62OfPwuRNouh2JNvvaYrPVU9ZtLU0GtrC7LUgyiuGvjiq6nEgklJbK98I0TFVWxtRCLi+yOZu7UvK3OFLsLejrLIwrE6KcMq\u002FyJM5g00hGZuya+B5YL1UYl0LFkY5Wp41Ocwnx+5RnaPbPq11daLjacge9GobCO1z1SchwxTn5\u002FNWUTZ6utRUArZU9wsMSqvmQToWBxu6Emob7LTCbDNU2kFmzWqfUyabPzDXAZkEtLoJKEFxgzmrCS4OFZGuWJntjoufcU5zmKyLYVnQWxBZitmIXc0kLxqLWSWBsJVhNf0AkiwV5L5XSgj+IoFjjRGpP3cPhfMxuukWRN7Ju4srWNuU0hsUg52th\u002FwDZ\u002FDH2Z4sr4SLmOJtpPA90chDo1Vq3OJnxGTmx2iOubS\u002Fy0lBVvz11ZaO5J0V+faU7hVAPMur8yU0qSWNzyCpHSvVz3K5Su60vEeaVXXjSrWxJ3wGTqRb99zetnjvtE2eEdkUOpvoCpYjrVjG2JkMj45pnxuc1f9v\u002FAP\u002FZ",nav:[{name:"Home",url:el,thispartype:w,sort:h},{name:ee,url:a,thispartype:h,sort:m,children:[{name:"About the Journal",url:"\u002Fir\u002Fabout_the_journal",sort:h},{name:"Aims and Scope",url:"\u002Fir\u002Faims_and_scope",sort:m},{name:ef,url:"\u002Fir\u002Feditorial_policies",sort:s},{name:"Editorial Board",url:"\u002Fir\u002Feditor",sort:u},{name:"Junior Editorial Board",url:"\u002Fir\u002Fjunior_editorial_board",sort:v},{name:"Journal Awards",url:"\u002Fir\u002Fawards",sort:x},{name:"News",url:"\u002Fir\u002Fnews",sort:aM},{name:"Partners",url:"\u002Fir\u002Fpartners",sort:aN},{name:"Advertise",url:"\u002Fir\u002Fadvertise",sort:aO},{name:cr,url:lV,sort:bV}]},{name:"Publish with us",url:a,thispartype:h,sort:s,children:[{name:"For Authors",url:a,sort:h,children:[{name:"Author Instructions",url:"\u002Fir\u002Fauthor_instructions",sort:h},{name:"Article Processing Charges",url:"\u002Fir\u002Farticle_processing_charges",sort:m},{name:"Editorial Process",url:"\u002Fir\u002Feditorial_process",sort:s},{name:"Manuscript Templates",url:"\u002Fir\u002Fmanuscript_templates",sort:u},{name:"Submit a Manuscript",url:"https:\u002F\u002Foaemesas.com\u002Flogin?JournalId=ir",sort:v},{name:lW,url:lX,sort:x}]},{name:"For Reviewers",url:a,sort:m,children:[{name:"Peer Review Guidelines",url:"\u002Fir\u002Fpeer_review_guidelines",sort:h}]}]},{name:"Articles",url:a,thispartype:h,sort:v,children:[{name:"All Articles",url:"\u002Fir\u002Farticles",sort:h},{name:"Articles With Video Abstracts",url:"\u002Fir\u002Farticles_videos",sort:m},{name:lW,url:lX,sort:a}]},{name:"Special Issues",url:a,thispartype:h,sort:x,children:[{name:"All Special Issues",url:"\u002Fir\u002Fspecial_issues",sort:h},{name:"Ongoing Special Issues",url:"\u002Fir\u002Fongoing_special_issues",sort:m},{name:"Completed Special Issues",url:"\u002Fir\u002Fcompleted_special_issues",sort:s},{name:"Closed Special Issue",url:"\u002Fir\u002Fclosed_special_issues",sort:u},{name:"Special Issue Ebooks",url:"\u002Fir\u002Fspecial_issues_ebooks",sort:v},{name:"Special Issue Guidelines",url:"\u002Fir\u002Fspecial_issue_guidelines",sort:x}]},{name:"Volumes",url:"\u002Fir\u002Fvolumes",thispartype:w,sort:aM},{name:"Pre-onlines",url:"\u002Fir\u002Fpre_onlines",thispartype:w,sort:aN},{name:"Features",url:a,thispartype:h,sort:aO,children:[{name:"Webinars",url:"\u002Fir\u002Fwebinars",sort:h},{name:eh,url:"\u002Fir\u002Facademic_talks",sort:m},{name:"Videos",url:"\u002Fir\u002Fvideos",sort:s},{name:"Interviews",url:"\u002Fir\u002Finterviews",sort:u}]}],qksearch:{},footer:{journal_name:n,issn:lY,email:"editorial@intellrobot.com",navigation:[{title:cr,url:lV},{title:"Sitemap",url:"\u002Fir\u002Fsitemap"}],cope:{title:"Committee on Publication Ethics",url:"https:\u002F\u002Fpublicationethics.org\u002Fmembers\u002Fintelligence-robotics",img:"https:\u002F\u002Fi.oaes.cc\u002Fuploads\u002F20230811\u002F49f92f416c9845b58a01de02ecea785f.jpg"},open:{title:"Portico",desc:"All published articles are preserved here permanently:",url:"https:\u002F\u002Fwww.portico.org\u002Fpublishers\u002Foae\u002F",img:"https:\u002F\u002Fi.oaes.cc\u002Fuploads\u002F20230911\u002F67d78ebf8c55485db6ae5b5b4bcda421.jpg"},follow:[{title:"LinkedIn",url:"https:\u002F\u002Fwww.linkedin.com\u002Fin\u002Fintelligence-robotics\u002F",icon:"icon-linkedin"},{title:"Twitter",url:lZ,icon:"icon-tuite1"}],wechat_img:a,twitter:{url:lZ,img:"https:\u002F\u002Fi.oaes.cc\u002Fuploads\u002F20230824\u002F5249ddabb6d642558c9843fba9283219.png"}},top:{path:q,pid:l,journal_img:ct,journal_name:n,mpt:"40 days",issn:lY,indexing:{ESCI:"https:\u002F\u002Fwww.oaepublish.com\u002Fnews\u002Fir.852",Scopus:bf,"Google Scholar":"https:\u002F\u002Fscholar.google.com.hk\u002Fcitations?view_op=list_works&hl=zh-CN&hl=zh-CN&user=-Hx5OVYAAAAJ",Dimensions:"https:\u002F\u002Fapp.dimensions.ai\u002Fdiscover\u002Fpublication?and_facet_source_title=jour.1423782",Lens:"https:\u002F\u002Fwww.lens.org\u002Flens\u002Fsearch\u002Fscholar\u002Flist?p=0&n=10&s=date_published&d=%2B&f=false&e=false&l=en&authorField=author&dateFilterField=publishedYear&orderBy=%2Bdate_published&presentation=false&preview=true&stemmed=true&useAuthorId=false&publicationType.must=journal%20article&sourceTitle.must=Intelligence%20%26%20Robotics&publisher.must=OAE%20Publishing%20Inc."},editor:cu,journal_rank:a,journal_flyer:"https:\u002F\u002Ff.oaes.cc\u002Findex_ad\u002Fflyer\u002FIR-flyer.pdf",qksearch:["Intelligence","Robotics",cD,"Machine Learning","Unmanned Vehicles","UAV"],sitetag:"Intell Robot",ad:[],colour_tag:cs,score:a,mobile_top_img:a,impact_factor:[{factor:cx,url:bf},{factor:j,url:a}],rgba:cv,log_image:cw},webinfo:{},searchKey:a,loading:X,appid:a,videoPlay:{show:cy,href:a}},editer:{editList:{list:{}}},userdata:{showLogin:cy,logined:cy}},serverRendered:X,routePath:"\u002Farticles\u002Fir.2021.02",config:{_app:{basePath:l_,assetsPath:l_,cdnURL:"https:\u002F\u002Fg.oaes.cc\u002Foae\u002Fnuxt\u002F"}}}}("",4325,null,1,4,"2021",0,"1","[]","https:\u002F\u002Fi.oaes.cc\u002Fuploads\u002F20240506\u002Fea3d9071c35b4bf3982ffe25f1083620.png",3,40,"2","Intelligence & Robotics","[\"1\"]",2,"ir",927,"3",5,"4","5","0","6","[\"2\"]","IR","Li","Research Article","https:\u002F\u002Fmjl.clarivate.com\u002Fsearch-results","2022","2017","33",20,18,"Yang","Wang","Lei",935,14,10,15,"Lens",34,6,7,35,"2020",32,9,true,"39",13,21,"Extracellular Vesicles and Circulating Nucleic Acids","Microbiome Research Reports","One Health & Implementation Research","Chemical Synthesis","Energy Materials","Journal of Materials Informatics","Microstructures","Minerals and Mineral Materials","Soft Science","Complex Engineering Systems","Disaster Prevention and Resilience","Green Manufacturing Open","Journal of Smart Environments and Green Computing","Journal of Surveillance, Security and Safety","Ageing and Neurodegenerative Diseases","Artificial Intelligence Surgery","Cancer Drug Resistance","Connected Health And Telemedicine","Hepatoma Research","Journal of Cancer Metastasis and Treatment","Journal of Translational Genetics and Genomics","Metabolism and Target Organ Damage","Mini-invasive Surgery","Plastic and Aesthetic Research","Rare Disease and Orphan Drugs Journal","The Journal of Cardiovascular Aging","Vessel Plus","Carbon Footprints","Journal of Environmental Exposure Assessment","Water Emerging Contaminants & Nanoplastics","Federated reinforcement learning: techniques, applications, and open challenges",7352,"Improved DDPG algorithm-based path planning for unmanned surface vehicles","ir.2024.22","Chen","Zheng","Xu","Zhang","40","7","8","9","11","Menglong Hua, ... Zihao Chen",2023,71,23,65,22,25,30,8,24,"ESCI, CAS, Dimensions, Lens, CNKI",17,11,55,12,"2015","ESCI, CAS, Scopus, Wanfang Data, CNKI, Dimensions, Embase","https:\u002F\u002Fwww.scopus.com\u002Fsourceid\u002F21101199351",90,39,16,"Journal of Unexplored Medical Data",26,"ESCI, Scopus, CAS, Dimensions, Lens, CNKI","https:\u002F\u002Fi.oaes.cc\u002Fuploads\u002F20240809\u002Fccc8ac23b789404781787655e4495af4.png",33,"Neuroimmunology and Neuroinflammation","2014",29,31,"ISSN XXXX-XXXX (Coming soon)","Soil Health","Space Mission Planning & Operations","Stomatological Disease and Science","Review",6104,"A distributed multi-vehicle pursuit scheme: generative multi-adversarial reinforcement learning","ir.2023.25",6065,"Reinforcement learning methods for network-based transfer parameter selection","ir.2023.23",5735,"Reinforcement learning with parameterized action space and sparse reward for UAV navigation","ir.2023.10",5495,"UAV maneuver decision-making via deep reinforcement learning for short-range air combat","ir.2023.04",5115,"Deep reinforcement learning for real-world quadrupedal locomotion: a comprehensive review","ir.2022.20",4885,"AVDDPG – Federated reinforcement learning applied to autonomous platoon control","ir.2022.11",4634,"Evolution of adaptive learning for nonlinear dynamic systems: a systematic survey","ir.2021.19",58,101,"ISSN 2770-3541 (Online)","10","14","15","16","20","25","28","29","30","36","37","42","44","48","53","58","67","68","89","92",1731427200,2024,2022,109,"Contact Us","#0047bb","https:\u002F\u002Fi.oaes.cc\u002Fuploads\u002F20231121\u002F59802903b17e4eebae240e004311d193.jpg","Simon X. Yang","rgb(0,71,187)","https:\u002F\u002Fi.oaes.cc\u002Fupload\u002Fjournal_logo\u002Fir.png","https:\u002F\u002Fi.oaes.cc\u002Fuploads\u002F20240813\u002F49390c7e86ab40a58ee862e8c1af65ba.png",false,"\u003Cp\u003EThis paper presents a comprehensive survey of federated reinforcement learning (FRL), an emerging and promising field in reinforcement learning (RL). Starting with a tutorial of federated learning (FL) and RL, we then focus on the introduction of FRL as a new method with great potential by leveraging the basic idea of FL to improve the performance of RL while preserving data-privacy. According to the distribution characteristics of the agents in the framework, FRL algorithms can be divided into two categories, \u003Ci\u003E\u003Ci\u003Ei.e.\u003C\u002Fi\u003E\u003C\u002Fi\u003E, Horizontal Federated Reinforcement Learning and vertical federated reinforcement learning (VFRL). We provide the detailed definitions of each category by formulas, investigate the evolution of FRL from a technical perspective, and highlight its advantages over previous RL algorithms. In addition, the existing works on FRL are summarized by application fields, including edge computing, communication, control optimization, and attack detection. Finally, we describe and discuss several key research directions that are crucial to solving the open problems within FRL.\u003C\u002Fp\u003E","https:\u002F\u002Fv1.oaepublish.com\u002Ffiles\u002Ftalkvideo\u002F4325.mp4",2021,74,"Reinforcement Learning","This paper presents a comprehensive survey of federated reinforcement learning (FRL), an emerging and promising field in reinforcement learning (RL). Starting with a tutorial of federated learning (FL) and RL, we then focus on the introduction of FRL as a new method with great potential by leveraging the basic idea of FL to improve the performance of RL while preserving data-privacy. According to the distribution characteristics of the agents in the framework, FRL algorithms can be divided into two categories, i.e., Horizontal Federated Reinforcement Learning and vertical federated reinforcement learning (VFRL). We provide the detailed definitions of each category by formulas, investigate the evolution of FRL from a technical perspective, and highlight its advantages over previous RL algorithms. In addition, the existing works on FRL are summarized by application fields, including edge computing, communication, control optimization, and attack detection. Finally, we describe and discuss several key research directions that are crucial to solving the open problems within FRL.","https:\u002F\u002Fi.oaes.cc\u002Fuploads\u002F20240205\u002F1103dab1ab644cf2818a1cab0dd2b8ff.jpg","Qi J, Zhou Q, Lei L, Zheng K. Federated reinforcement learning: techniques, applications, and open challenges. \u003Ci\u003EIntell Robot\u003C\u002Fi\u003E 2021;1(1):18-57. http:\u002F\u002Fdx.doi.org\u002F10.20517\u002Fir.2021.02","Qi J, Zhou Q, Lei L, Zheng K. Federated reinforcement learning: techniques, applications, and open challenges. \u003Ci\u003EIntell Robot\u003C\u002Fi\u003E. 2021;1:18-57. http:\u002F\u002Fdx.doi.org\u002F10.20517\u002Fir.2021.02","Qi, Jiaju, Lei Lei, and Kan Zheng. 2021. \"Federated reinforcement learning: techniques, applications, and open challenges\" \u003Ci\u003EIntell Robot\u003C\u002Fi\u003E. 1, no.1: 18-57. http:\u002F\u002Fdx.doi.org\u002F10.20517\u002Fir.2021.02","Qi, J.; Zhou, Q.; Lei, L.; Zheng, K. Federated reinforcement learning: techniques, applications, and open challenges. \u003Ci\u003EIntell. Robot.\u003C\u002Fi\u003E \u003Cb\u003E2021\u003C\u002Fb\u003E, \u003Ci\u003E1\u003C\u002Fi\u003E, 18-57. http:\u002F\u002Fdx.doi.org\u002F10.20517\u002Fir.2021.02","2024-11-13 00:00:00","Menglong","Hua","Weixiang","Zhou","zhouwx@shmtu.edu.cn","Hongying","Cheng","Zihao","2023-09-13 00:00:00","Xinhang","Yiying","Qinwen","Yuan","Lin","[\"1\",\"2\"]","zhanglin@bupt.edu.cn","2023-08-31 00:00:00","Yue","Guo","yueguo@cs.cmu.edu","Yu","I-Hsuan","Katia","Sycara","2023-06-27 00:00:00","Shiying","Feng","Xiaofeng","Lu","Ren","penny_lu@ahu.edu.cn","Shuiqing","[\"3\"]","2023-03-09 00:00:00","Zhiqiang","Haibin","Duan","hbduan@buaa.edu.cn","2022-09-01 00:00:00","Hongyin","He","Donglin","wangdonglin@westlake.edu.cn","2022-05-30 00:00:00","Christian","Boin","Simon X.","2022-03-16 00:00:00","Mouhcine","Harib","Hicham","Chaoui","Suruz","Miah","12","13","10.20517\u002Fir.2024.22","\u003Cp\u003EAs a promising mode of water transportation, unmanned surface vehicles (USVs) are used in various fields owing to their small size, high flexibility, favorable price, and other advantages. Traditional navigation algorithms are affected by various path planning issues. To address the limitations of the traditional deep deterministic policy gradient (DDPG) algorithm, namely slow convergence speed and sparse reward and punishment functions, we proposed an improved DDPG algorithm for USV path planning. First, the principle and workflow of the DDPG deep reinforcement learning (DRL) algorithm are described. Second, the improved method (based on the USVs kinematic model) is proposed, and a continuous state and action space is designed. The reward and punishment function are improved, and the principle of collision avoidance at sea is introduced. Dynamic region restriction is added, distant obstacles in the state space are ignored, and the nearby obstacles are observed to reduce the number of algorithm iterations and save computational resources. The introduction of a multi-intelligence approach combined with a prioritized experience replay mechanism accelerates algorithm convergence, thereby increasing the efficiency and robustness of training. Finally, through a combination of theory and simulation, the DDPG DRL is explored for USV obstacle avoidance and optimal path planning.\u003C\u002Fp\u003E","https:\u002F\u002Ff.oaes.cc\u002Fxmlpdf\u002F69bc157a-c03e-48d7-b08c-14cc87ec711f\u002Fir4022.pdf",363,"363-84",384,"Xinhang Li, ... Lin Zhang","Yue Guo, ... Katia Sycara","Shiying Feng, ... Shuiqing Xu\u003Ca href='https:\u002F\u002Forcid.org\u002F0000-0003-3081-3726' target='_blank'\u003E\u003Cimg src='https:\u002F\u002Fi.oaes.cc\u002Fimages\u002Forcid.png' class='author_id' alt='Shuiqing Xu'\u003E\u003C\u002Fa\u003E",1005," Collaborative Optimization and Control of Intelligent Unmanned Autonomous Systems",76,"Zhiqiang Zheng, Haibin Duan\u003Ca href='http:\u002F\u002Forcid.org\u002F0000-0002-4926-3202' target='_blank'\u003E\u003Cimg src='https:\u002F\u002Fi.oaes.cc\u002Fimages\u002Forcid.png' class='author_id' alt='Haibin Duan'\u003E\u003C\u002Fa\u003E","Hongyin Zhang, ... Donglin Wang\u003Ca href='http:\u002F\u002Forcid.org\u002F0000-0002-8188-3735' target='_blank'\u003E\u003Cimg src='https:\u002F\u002Fi.oaes.cc\u002Fimages\u002Forcid.png' class='author_id' alt='Donglin Wang'\u003E\u003C\u002Fa\u003E","Christian Boin, ... Simon X. Yang\u003Ca href='https:\u002F\u002Forcid.org\u002F0000-0002-6888-7993' target='_blank'\u003E\u003Cimg src='https:\u002F\u002Fi.oaes.cc\u002Fimages\u002Forcid.png' class='author_id' alt='Simon X. Yang'\u003E\u003C\u002Fa\u003E",1002," Evolutionary Computation for Deep Learning and Machine Learning","Mouhcine Harib, ... Suruz Miah","Xiao-Wen Zhao\u003Ca href='https:\u002F\u002Forcid.org\u002F0000-0001-7873-8708' target='_blank'\u003E\u003Cimg src='https:\u002F\u002Fi.oaes.cc\u002Fimages\u002Forcid.png' class='author_id' alt='Xiao-Wen Zhao'\u003E\u003C\u002Fa\u003E, ... Zhi-Wei Liu","About","Editorial Policies","Journals","Academic Talks","Biology & Life Science","Chemistry & Materials Science","Computer Science & Engineering","\u002Fir","Medicine & Public Health","Environmental Science","#837fbc","ISSN 2769-5301 (Online)","https:\u002F\u002Fi.oaes.cc\u002Fuploads\u002F20231218\u002F6135b005a8674b878a7d1a5c91ac9869.jpg","https:\u002F\u002Fwww.oaepublish.com\u002Fand","Wei-Dong Le","CAS, Dimensions, Lens, CNKI","rgb(99,94,171)","https:\u002F\u002Fi.oaes.cc\u002Fupload\u002Fjournal_logo\u002Fand.jpg","and","#030072","ISSN 2771-0408 (Online)","https:\u002F\u002Fi.oaes.cc\u002Fuploads\u002F20231110\u002F14bc775310574f91b457ff09d6c6d950.jpg","https:\u002F\u002Fwww.oaepublish.com\u002Fais","Andrew A. Gumbs","ESCI, Scopus, Dimensions, Lens","rgb(3,0,114)","https:\u002F\u002Fi.oaes.cc\u002Fupload\u002Fjournal_logo\u002Fais.png","https:\u002F\u002Fi.oaes.cc\u002Fuploads\u002F20240813\u002Fb7358d0632aa413fa84630ef0a2fb7a0.png","https:\u002F\u002Fwww.scopus.com\u002Fsourceid\u002F21101189039","ais",83,"#e8b475","ISSN 2578-532X (Online)","https:\u002F\u002Fi.oaes.cc\u002Fuploads\u002F20231113\u002F9db4b73f2e954900807499f00094fcf0.jpg","https:\u002F\u002Fwww.oaepublish.com\u002Fcdr","Godefridus J. (Frits) Peters","2018","337","ESCI, PMC, Scopus, CAS, CNKI, Dimensions, Lens, Embase","rgb(224,155,71)","https:\u002F\u002Fi.oaes.cc\u002Fupload\u002Fjournal_logo\u002Fcdr.jpg","https:\u002F\u002Fi.oaes.cc\u002Fuploads\u002F20240104\u002F502e35965fc448e0bc9f2d6c1b0bb3d7.png","https:\u002F\u002Fwww.ncbi.nlm.nih.gov\u002Fpmc\u002Fjournals\u002F4180\u002F","https:\u002F\u002Fi.oaes.cc\u002Fuploads\u002F20240611\u002F5def3ab5829743ce8d55e664cabe0dcb.png","https:\u002F\u002Fwww.scopus.com\u002Fsourceid\u002F21101047803","https:\u002F\u002Fi.oaes.cc\u002Fuploads\u002F20240620\u002Fed79fcbfce8e4cd38a36cc4d5f7d473d.png","https:\u002F\u002Fjcr.clarivate.com\u002Fjcr-jp\u002Fjournal-profile?journal=CANCER%20DRUG%20RESIST&year=2022","https:\u002F\u002Fi.oaes.cc\u002Fuploads\u002F20240620\u002F86311aafb7a945a98a7ee95c3627173b.png","cdr",418,38,"#1d57a5","ISSN 2770-6249 (Online)","https:\u002F\u002Fi.oaes.cc\u002Fimages_2018\u002Foae\u002Fcover_comengsys.jpg","https:\u002F\u002Fwww.oaepublish.com\u002Fcomengsys","Hamid Reza Karimi","Scopus, Dimensions, Lens","rgb(29,87,165)","https:\u002F\u002Fi.oaes.cc\u002Fupload\u002Fjournal_logo\u002Fces.png","https:\u002F\u002Fi.oaes.cc\u002Fuploads\u002F20240722\u002F4748814aab87441d809e5b9ddcd9d56e.png","https:\u002F\u002Fwww.scopus.com\u002Fsourceid\u002F21101189038","comengsys","#75ac9d","ISSN 2993-2920 (Online)","https:\u002F\u002Fi.oaes.cc\u002Fuploads\u002F20231218\u002F0b7a80c0f6e545818e143ea48fe9be90.jpg","https:\u002F\u002Fwww.oaepublish.com\u002Fchatmed","Yuan-Ting Zhang","rgb(117, 172, 157)","https:\u002F\u002Fi.oaes.cc\u002Fupload\u002Fjournal_logo\u002Fchatmed.jpg","chatmed","#00a5b3","ISSN 2769-5247 (Online)","https:\u002F\u002Fi.oaes.cc\u002Fuploads\u002F20231121\u002F1598f33bf2284642830511e489eb31bf.jpg","https:\u002F\u002Fwww.oaepublish.com\u002Fcs","Bao-Lian Su","Scopus, CAS, Dimensions, Lens, CNKI","rgb(0,165,179)","https:\u002F\u002Fi.oaes.cc\u002Fupload\u002Fjournal_logo\u002Fcs.jpg","https:\u002F\u002Fi.oaes.cc\u002Fuploads\u002F20240813\u002F6d91bea2ff474cb9b7ec10017b26ec2a.png","https:\u002F\u002Fwww.scopus.com\u002Fsourceid\u002F21101189036","cs",157,51,"#6cc24a","ISSN 2831-932X (Online)","https:\u002F\u002Fi.oaes.cc\u002Fuploads\u002F20231110\u002Fb1a3c337f6a54bcb945e0d9ab6afe0ec.jpg","https:\u002F\u002Fwww.oaepublish.com\u002Fcf","Yong Geng","rgb(86,164,55)","https:\u002F\u002Fi.oaes.cc\u002Fupload\u002Fjournal_logo\u002Fcf.jpg","cf",45,"#01588b","ISSN 2832-4056 (Online)","https:\u002F\u002Fi.oaes.cc\u002Fuploads\u002F20231110\u002Fe5808de306274ba287f6f24bf45eb910.jpg","https:\u002F\u002Fwww.oaepublish.com\u002Fdpr","Jie Li","rgb(1,88,139)","https:\u002F\u002Fi.oaes.cc\u002Fupload\u002Fjournal_logo\u002Fdpr.jpg","dpr","#008c15","ISSN 2770-5900 (Online)","https:\u002F\u002Fi.oaes.cc\u002Fuploads\u002F20231110\u002Fe1df1779a10c4a7486e89c5f5c4994f3.jpg","https:\u002F\u002Fwww.oaepublish.com\u002Fenergymater","Yuping Wu","rgb(0,140,21)","https:\u002F\u002Fi.oaes.cc\u002Fupload\u002Fjournal_logo\u002Fem.jpg","https:\u002F\u002Fi.oaes.cc\u002Fuploads\u002F20240621\u002Fab41932db942489c841586b032349a31.png","https:\u002F\u002Fi.oaes.cc\u002Fuploads\u002F20240625\u002F3f1358bb24954c8eae11907a77ab8b18.png","energymater",196,"#b15ee9","ISSN 2767-6641 (Online)","https:\u002F\u002Fi.oaes.cc\u002Fuploads\u002F20231110\u002Fb7fc9d6a51bf4adda4ff6dec5002690b.jpg","https:\u002F\u002Fwww.oaepublish.com\u002Fevcna","Yoke Peng Loh","ESCI, Scopus, CAS, CNKI, Dimensions, Lens","rgb(177,94,233)","https:\u002F\u002Fi.oaes.cc\u002Fupload\u002Fjournal_logo\u002Fevcna.jpg","https:\u002F\u002Fi.oaes.cc\u002Fuploads\u002F20240228\u002Ff15fa97315a24fcc8bc2910eed05ec8a.png","https:\u002F\u002Fwww.scopus.com\u002Fsourceid\u002F21101160588","https:\u002F\u002Fmjl.clarivate.com:\u002Fsearch-results?issn=2767-6641&hide_exact_match_fl=true&utm_source=mjl&utm_medium=share-by-link&utm_campaign=search-results-share-this-journal","evcna",131,56,"#1d6960","ISSN 2835-7590 (Online)","https:\u002F\u002Fi.oaes.cc\u002Fuploads\u002F20231110\u002Fb95b79391630482d83323d705e486a35.jpg","https:\u002F\u002Fwww.oaepublish.com\u002Fgmo","Hongchao Zhang","rgb(29,105,96)","https:\u002F\u002Fi.oaes.cc\u002Fupload\u002Fjournal_logo\u002Fgmo.png","gmo","#f3906d","ISSN 2454-2520 (Online)\u003Cbr\u003EISSN 2394-5079 (Print)","https:\u002F\u002Fi.oaes.cc\u002Fuploads\u002F20231110\u002F4ef51b2e673a40ca9ebf0a414f21adb5.jpg","https:\u002F\u002Fwww.oaepublish.com\u002Fhr","Guang-Wen Cao","465","rgb(239,108,62)","https:\u002F\u002Fi.oaes.cc\u002Fupload\u002Fjournal_logo\u002Fhr.jpg","https:\u002F\u002Fi.oaes.cc\u002Fuploads\u002F20240313\u002Fee358186a06449d5b2c8ea15f712cbed.png","https:\u002F\u002Fwww.scopus.com\u002Fsourceid\u002F21101058282","https:\u002F\u002Fi.oaes.cc\u002Fuploads\u002F20240625\u002Faf5ee9028a554c6aa6f0030bbcbfe55d.png","https:\u002F\u002Fjcr.clarivate.com\u002Fjcr-jp\u002Fjournal-profile?journal=HEPATOMA%20RES&year=2023","hr",543,"https:\u002F\u002Fwww.oaepublish.com\u002Fir","ESCI, Scopus, Google Scholar, Dimensions, Lens",60,"#44b762","ISSN 2454-2857 (Online)\u003Cbr\u003EISSN 2394-4722 (Print)","https:\u002F\u002Fi.oaes.cc\u002Fuploads\u002F20231121\u002Fe46043c3fcaf49f38eef7654adff79e1.jpg","https:\u002F\u002Fwww.oaepublish.com\u002Fjcmt","476","rgb(68,183,98)","https:\u002F\u002Fi.oaes.cc\u002Fupload\u002Fjournal_logo\u002Fjcmt.jpg","https:\u002F\u002Fi.oaes.cc\u002Fuploads\u002F20231201\u002F8b13deaaf6e24f9189ca91e62e3b84ab.png","https:\u002F\u002Fwww.scopus.com\u002Fsourceid\u002F21101058912","https:\u002F\u002Fi.oaes.cc\u002Fuploads\u002F20240621\u002F38a515f5996349019b5018be997f4446.png","https:\u002F\u002Fmjl.clarivate.com\u002F","jcmt",527,"#00629b","ISSN 2770-372X (Online)","https:\u002F\u002Fi.oaes.cc\u002Fuploads\u002F20231110\u002F4fb1d2ad901a4d94945ad58e1a208c41.jpg","https:\u002F\u002Fwww.oaepublish.com\u002Fjmi","Tong-Yi Zhang","rgb(0,98,155)","https:\u002F\u002Fi.oaes.cc\u002Fupload\u002Fjournal_logo\u002Fjmi.jpg","jmi",81,"#5aa8d9","ISSN 2578-5281 (Online)","https:\u002F\u002Fi.oaes.cc\u002Fuploads\u002F20231110\u002Fe326d72dd71546d193c9f1e77d5aea9c.jpg","https:\u002F\u002Fwww.oaepublish.com\u002Fjtgg","Sanjay Gupta, Andrea L. Gropman","135","ESCI, Scopus, CAS, Dimensions","rgb(38,117,166)","https:\u002F\u002Fi.oaes.cc\u002Fupload\u002Fjournal_logo\u002Fjtgg.jpg","https:\u002F\u002Fi.oaes.cc\u002Fuploads\u002F20240611\u002Fd422f7a41d93496284c73d6e6355460c.png","https:\u002F\u002Fwww.scopus.com\u002Fsourceid\u002F21101105293","https:\u002F\u002Fmjl.clarivate.com:\u002Fsearch-results?issn=2578-5281&hide_exact_match_fl=true&utm_source=mjl&utm_medium=share-by-link&utm_campaign=search-results-share-this-journal","jtgg",171,"#a67ebd","ISSN 2572-8180 (Online)","https:\u002F\u002Fi.oaes.cc\u002Fimages_2018\u002Foae\u002Fcover_jumd.jpg","https:\u002F\u002Fwww.oaepublish.com\u002Fjumd","Tarek Shalaby","2016","rgb(143,93,172)","https:\u002F\u002Fi.oaes.cc\u002Fupload\u002Fjournal_logo\u002Fjumd.jpg",70,"jumd",42,"#00d1d9","ISSN 2767-6595 (Online)","https:\u002F\u002Fi.oaes.cc\u002Fuploads\u002F20231121\u002Fde2367aef0bc452484a3d180488f3430.jpg","https:\u002F\u002Fwww.oaepublish.com\u002Fjsegc","Witold Pedrycz","Lens, Dimensions","rgb(0,162,168)","https:\u002F\u002Fi.oaes.cc\u002Fupload\u002Fjournal_logo\u002Fjsegc.jpg",80,"jsegc",43,19,"#3e9aff","ISSN 2694-1015 (Online)","https:\u002F\u002Fi.oaes.cc\u002Fuploads\u002F20231110\u002Ffac7bed365f1465799f61e5b2212492b.jpg","https:\u002F\u002Fwww.oaepublish.com\u002Fjsss","CNKI, Dimensions, Lens","rgb(62,154,255)","https:\u002F\u002Fi.oaes.cc\u002Fupload\u002Fjournal_logo\u002Fjsss.jpg",85,"jsss",49,47,"#00b2a9","ISSN 2771-5949 (Online)","https:\u002F\u002Fi.oaes.cc\u002Fuploads\u002F20231115\u002F1dd8d978556f47cda9d63c2e03bb7e29.jpg","https:\u002F\u002Fwww.oaepublish.com\u002Fjeea","Stuart Harrad","Scopus, CAS, Dimensions, Lens","rgb(0,128,121)","https:\u002F\u002Fi.oaes.cc\u002Fupload\u002Fjournal_logo\u002Fjeea.jpg","https:\u002F\u002Fi.oaes.cc\u002Fuploads\u002F20240617\u002F3a87e8e1c3f145e78ac49d8d775b0e7d.png","https:\u002F\u002Fwww.scopus.com\u002Fsourceid\u002F21101196042","jeea",68,52,"#2d68c4","ISSN 2831-2597 (Online)","https:\u002F\u002Fi.oaes.cc\u002Fuploads\u002F20231110\u002F78caf22c90e94711a856e3528819610e.jpg","https:\u002F\u002Fwww.oaepublish.com\u002Fwecn","Daoji Li, Joana C Prata","Accepted by Scopus, CAS, Lens","rgb(45,104,196)","https:\u002F\u002Fi.oaes.cc\u002Fupload\u002Fjournal_logo\u002Fwecn.jpg",95,"https:\u002F\u002Fi.oaes.cc\u002Fuploads\u002F20240506\u002F1af7834d877a40a992f1eb45e4904d07.png","https:\u002F\u002Fwww.scopus.com\u002Fsourceid\u002F21101230595?origin=resultslist","wecn",62,"#AD9E00","ISSN 2769-6375 (Online)","https:\u002F\u002Fi.oaes.cc\u002Fuploads\u002F20231113\u002Fa711ad20ef204fc392177dcc0829f9f5.jpg","https:\u002F\u002Fwww.oaepublish.com\u002Fmtod","Amedeo Lonardo","rgb(173,158,0)","https:\u002F\u002Fi.oaes.cc\u002Fupload\u002Fjournal_logo\u002Fmtod.png","https:\u002F\u002Fi.oaes.cc\u002Fuploads\u002F20240611\u002F450a21a9a0474ec89913f38bcda8785a.png","https:\u002F\u002Fwww.scopus.com\u002Fsourceid\u002F21101174245","https:\u002F\u002Fwww.oaepublish.com\u002Fnews\u002Fmtod.844","mtod",96,46,"#7ba4db","ISSN 2771-5965 (Online)","https:\u002F\u002Fi.oaes.cc\u002Fuploads\u002F20231110\u002F0b98ef6cf95b4a948209ac292b1d3d6b.jpg","https:\u002F\u002Fwww.oaepublish.com\u002Fmrr","Marco Ventura","ESCI, Scopus, PMC, CAS, Lens","rgb(123,164,219)","https:\u002F\u002Fi.oaes.cc\u002Fupload\u002Fjournal_logo\u002Fmrr.jpg","https:\u002F\u002Fi.oaes.cc\u002Fuploads\u002F20240104\u002F206342e31750460bb7449112e6aafba6.png","https:\u002F\u002Fwww.ncbi.nlm.nih.gov\u002Fpmc\u002Fjournals\u002F4522\u002F","https:\u002F\u002Fi.oaes.cc\u002Fuploads\u002F20240621\u002F1dfbaaeea68e40c98d7494b8c702b695.png","https:\u002F\u002Fwww.scopus.com\u002Fsourceid\u002F21101195678?origin=resultslist","https:\u002F\u002Fwww.oaepublish.com\u002Fnews\u002Fmrr.836","mrr",117,"#6f7bd4","ISSN 2770-2995 (Online)","https:\u002F\u002Fi.oaes.cc\u002Fuploads\u002F20231113\u002F1738c68b2d0a41edae268adf4ea5c7e2.jpg","https:\u002F\u002Fwww.oaepublish.com\u002Fmicrostructures","Shujun Zhang","ESCI, Scopus, CAS, Dimensions, Lens","rgb(111,123,212)","https:\u002F\u002Fi.oaes.cc\u002Fupload\u002Fjournal_logo\u002Fmic.jpg",102,"https:\u002F\u002Fi.oaes.cc\u002Fuploads\u002F20240815\u002F5e862cb5242a48fa8c18754471220198.png","https:\u002F\u002Fwww.scopus.com\u002Fsourceid\u002F21101193861","https:\u002F\u002Fwww.oaepublish.com\u002Fnews\u002Fmicrostructures.800","microstructures",135,53,"#bb5e00","ISSN 2832-269X (Online)","https:\u002F\u002Fi.oaes.cc\u002Fuploads\u002F20231121\u002F24ab06d9b34b4daf8a13194a93e8da10.jpg","https:\u002F\u002Fwww.oaepublish.com\u002Fminerals","Shaoxian Song","rgb(187,94,0)","https:\u002F\u002Fi.oaes.cc\u002Fupload\u002Fjournal_logo\u002Fmmm.jpg",103,"minerals","#35cccd","ISSN 2574-1225 (Online)","https:\u002F\u002Fi.oaes.cc\u002Fuploads\u002F20231110\u002Fa2ede06450eb4b0f9df7bef7efc9d60c.jpg","https:\u002F\u002Fwww.oaepublish.com\u002Fmis","Giulio Belli","343","ESCI, Scopus, CNKI, Dimensions","rgb(41,167,168)","https:\u002F\u002Fi.oaes.cc\u002Fupload\u002Fjournal_logo\u002Fmis.jpg","https:\u002F\u002Fi.oaes.cc\u002Fuploads\u002F20240208\u002F6020c4a092dd44578a776056b5057abc.png","https:\u002F\u002Fwww.scopus.com\u002Fsourceid\u002F21101115668","mis",395,27,"#33a7d9","ISSN 2349-6142 (Online)\u003Cbr\u003EISSN 2347-8659 (Print)","https:\u002F\u002Fi.oaes.cc\u002Fuploads\u002F20231110\u002F026484b9f2b042fda9b7afffc0d9f62e.jpg","https:\u002F\u002Fwww.oaepublish.com\u002Fneurosciences","296","rgb(51,167,217)","https:\u002F\u002Fi.oaes.cc\u002Fupload\u002Fjournal_logo\u002Fnn.jpg",104,"neurosciences",296,28,"#43b02a","ISSN 2769-6413 (Online)","https:\u002F\u002Fi.oaes.cc\u002Fuploads\u002F20231110\u002Ff65f06a7eae24d1484bad38ff0a2bef2.jpg","https:\u002F\u002Fwww.oaepublish.com\u002Fohir","Dimensions","Jose M. Martin-Moreno","rgb(67,176,42)","https:\u002F\u002Fi.oaes.cc\u002Fupload\u002Fjournal_logo\u002Fohir.jpg",105,"ohir","#c45284","ISSN 2349-6150 (Online)\u003Cbr\u003EISSN 2347-9264 (Print)","https:\u002F\u002Fi.oaes.cc\u002Fuploads\u002F20231110\u002F8d72c3f669d5490ea3b9abc424d66cb6.jpg","https:\u002F\u002Fwww.oaepublish.com\u002Fpar","Wen-Guo Cui","520","rgb(196,82,132)","ESCI, Scopus, Dimensions, CNKI, Lens, Wanfang","https:\u002F\u002Fi.oaes.cc\u002Fupload\u002Fjournal_logo\u002Fpar.jpg","https:\u002F\u002Fi.oaes.cc\u002Fuploads\u002F20240607\u002F9bbbcad710cf457e85d63b1811faefb1.png","https:\u002F\u002Fwww.scopus.com\u002Fsourceid\u002F21101111784","https:\u002F\u002Fclarivate.com\u002Fproducts\u002Fscientific-and-academic-research\u002Fresearch-discovery-and-workflow-solutions\u002Fwebofscience-platform\u002Fweb-of-science-core-collection\u002Femerging-sources-citation-index\u002F","par",619,"#d62598","ISSN 2771-2893 (Online)","https:\u002F\u002Fi.oaes.cc\u002Fuploads\u002F20231121\u002F6ee75e8ab583446b90c97865c2c68464.jpg","https:\u002F\u002Fwww.oaepublish.com\u002Frdodj","Daniel Scherman","Dimensions, Lens","rgb(214,37,152)","https:\u002F\u002Fi.oaes.cc\u002Fupload\u002Fjournal_logo\u002Frdodj.jpg",106,"rdodj",77,"#1f4e79","ISSN 2769-5441 (Online)","https:\u002F\u002Fi.oaes.cc\u002Fuploads\u002F20231110\u002F98e2ed5023ee466c8890d749bfd97597.jpg","https:\u002F\u002Fwww.oaepublish.com\u002Fss","YongAn Huang","ESCI, Scopus, CAS, Lens, Dimensions, CNKI","rgb(31,78,121)","https:\u002F\u002Fi.oaes.cc\u002Fupload\u002Fjournal_logo\u002Fss.jpg",107,"https:\u002F\u002Fi.oaes.cc\u002Fuploads\u002F20240606\u002Fcfa200290630438c8e3a3cae1536af37.png","https:\u002F\u002Fwww.scopus.com\u002Fsourceid\u002F21101178114","ss",116,54,"#a96318","https:\u002F\u002Fi.oaes.cc\u002Fimages_2018\u002Foae\u002Fcover_sh.jpg","https:\u002F\u002Fwww.oaepublish.com\u002Fsh","Manoj Shukla","rgb(169,99,24)","https:\u002F\u002Fi.oaes.cc\u002Fupload\u002Fjournal_logo\u002Fsh.jpg",108,"sh","#4475e1","https:\u002F\u002Fi.oaes.cc\u002Fuploads\u002F20231218\u002F79ccda0bc909441587a0b70e0a8ef1cc.jpg","https:\u002F\u002Fwww.oaepublish.com\u002Fsmpo","Madjid Tavana","rgb(68,117,225)","https:\u002F\u002Fi.oaes.cc\u002Fupload\u002Fjournal_logo\u002Fsmpo.jpg","smpo","#53be9b","ISSN 2573-0002 (Online)","https:\u002F\u002Fi.oaes.cc\u002Fuploads\u002F20231218\u002F54ba11ec5c57448a84169c318bb94fd4.jpg","https:\u002F\u002Fwww.oaepublish.com\u002Fsds","Nikolaos G. Nikitakis","rgb(62,163,130)","https:\u002F\u002Fi.oaes.cc\u002Fupload\u002Fjournal_logo\u002Fsds.jpg",110,"sds","#0038a0","ISSN 2768-5993 (Online)","https:\u002F\u002Fi.oaes.cc\u002Fuploads\u002F20231113\u002F695c0f0de82c43f189957ff572f0798c.jpg","https:\u002F\u002Fwww.oaepublish.com\u002Fjca","rgb(0,56,160)","https:\u002F\u002Fi.oaes.cc\u002Fupload\u002Fjournal_logo\u002Fjca.png","https:\u002F\u002Fi.oaes.cc\u002Fuploads\u002F20240611\u002Fd7d617bceee246428da2ed07a84efd5e.png","https:\u002F\u002Fwww.scopus.com\u002Fsourceid\u002F21101176723","https:\u002F\u002Fwww.oaepublish.com\u002Fnews\u002Fjca.838","jca",129,36,"#db6868","ISSN 2574-1209 (Online)","https:\u002F\u002Fi.oaes.cc\u002Fuploads\u002F20231110\u002Fe64051237c8a454085bb02abfdc6dd12.jpg","https:\u002F\u002Fwww.oaepublish.com\u002Fvp","290","CAS, Scopus, CNKI, Dimensions, Lens","rgb(219,104,104)","https:\u002F\u002Fi.oaes.cc\u002Fupload\u002Fjournal_logo\u002Fvp.jpg","https:\u002F\u002Fi.oaes.cc\u002Fuploads\u002F20240606\u002Fa5cb40102c2d4ec4b1a362fe1f27d324.jpg","https:\u002F\u002Fwww.scopus.com\u002Fsourceid\u002F21101064905","vp",351,"5555","\u002Fir\u002Fcontact_us","Video Abstract Guidelines","\u002Fir\u002Fvideo_abstract_guidelines","2770-3541 (Online)","https:\u002F\u002Ftwitter.com\u002FOAE_IR","\u002F"));</script><script src="https://g.oaes.cc/oae/nuxt/b06ddfb.js" defer></script><script src="https://g.oaes.cc/oae/nuxt/d1923ac.js" defer></script><script src="https://g.oaes.cc/oae/nuxt/0a3b980.js" defer></script><script src="https://g.oaes.cc/oae/nuxt/3e8004d.js" defer></script><script src="https://g.oaes.cc/oae/nuxt/b19d7ea.js" defer></script> </body> </html> <div id="noIe" style="display: none;"> <style> #noIe { background: rgba(99, 125, 255, 1); width: 100%; height: 100vh; position: fixed; top: 0; left: 0; z-index: 999999; } #noIe .container { width: 802px; height: 594px; background: #ffffff; border-radius: 10px; position: absolute; left: 50%; margin-left: -401px; margin-top: -297px; top: 50%; text-align: center; } #noIe .container ul { display: inline-block; height: 164px; margin-left: -30px; margin-top: 100px; } #noIe .container li { float: left; list-style: none; text-align: center; margin-left: 30px; } #noIe li img { width: 115px; height: 115px; } #noIe li p { margin-top: 12px; font-size: 14px; line-height: 150%; font-weight: 500; color: #526efa; } #noIe a { text-decoration: none; } #noIe li p a:visited { color: #526efa; } #noIe li p.tip { font-size: 10px; line-height: 160%; font-weight: normal; text-align: center; margin-top: 0; color: rgba(37, 38, 43, 0.36); } #noIe .title { margin-top: 72px; font-size: 36px; line-height: 140%; font-weight: bold; color: #25262b; } #noIe .title2 { margin-top: 7px; font-size: 14px; line-height: 170%; color: #25262b; } #noIe .logo-container { width: 100%; text-align: center; padding-top: 100px; } #noIe .logo { height: 24px; } @media screen and (max-width: 820px) { #noIe .container { width: 432px; left: 50%; margin-left: -216px; margin-top: 48px; position: relative; top: 0; } #noIe .container ul { width: 290px; height: 352px; margin-left: -30px; margin-top: 40px; } #noIe .container li { margin-left: 30px; margin-top: 24px; } #noIe .logo-container { padding-top: 121px; padding-bottom: 20px; } } </style> <div class="container"> <p class="title">The current browser is not compatible</p> <p class="title2">The following browsers are recommended for the best use experience</p> <ul> <li> <a href="https://www.google.cn/chrome/" target="_blank"> <img src="https://gw.alicdn.com/imgextra/i2/O1CN01Nn0IoE1cmXZ6gFiM3_!!6000000003643-2-tps-230-230.png" /> <p>Chrome</p> </a> </li> <li> <a href="http://www.firefox.com.cn/" target="_blank"> <img src="https://gw.alicdn.com/imgextra/i3/O1CN01P8aqdX1HdHczGialK_!!6000000000780-2-tps-230-230.png" /> <p>Firefox</p> </a> </li> <li> <a href="https://www.apple.com/safari/" target="_blank"> <img src="https://gw.alicdn.com/imgextra/i4/O1CN01vVxDF11chVD0nsbiZ_!!6000000003632-2-tps-230-230.png" /> <p>Safari</p> <p class="tip">Only supports Mac</p> </a> </li> <li> <a href="https://www.microsoft.com/zh-cn/edge" target="_blank"> <img src="https://gw.alicdn.com/imgextra/i4/O1CN01UW7hs31Xa6jfm2a2O_!!6000000002939-2-tps-230-230.png" /> <p>Edge</p> </a> </li> </ul> <div class="logo-container"> <img src="" class="logo" /> </div> </div> </div> </html> <script> window.onload = function () { if (!!window.ActiveXObject || 'ActiveXObject' in window) { document.getElementById('noIe').style.display = ''; } } </script>