Multiagent Reinforcement Learning Based on Fusion-Multiactor-Attention-Critic for Multiple-Unmanned-Aerial-Vehicle Navigation Control

Sangwoo Jeon; Hoeun Lee; Vishnu Kumar Kaliappan; Tuan Anh Nguyen; Hyungeun Jo; Hyeonseo Cho; Dugki Min

LAPSE:2023.9509

Published Article

LAPSE:2023.9509

Multiagent Reinforcement Learning Based on Fusion-Multiactor-Attention-Critic for Multiple-Unmanned-Aerial-Vehicle Navigation Control

Sangwoo Jeon, Hoeun Lee, Vishnu Kumar Kaliappan, Tuan Anh Nguyen, Hyungeun Jo, Hyeonseo Cho, Dugki Min

February 27, 2023

Abstract
The proliferation of unmanned aerial vehicles (UAVs) has spawned a variety of intelligent services, where efficient coordination plays a significant role in increasing the effectiveness of cooperative execution. However, due to the limited operational time and range of UAVs, achieving highly efficient coordinated actions is difficult, particularly in unknown dynamic environments. This paper proposes a multiagent deep reinforcement learning (MADRL)-based fusion-multiactor-attention-critic (F-MAAC) model for multiple UAVs’ energy-efficient cooperative navigation control. The proposed model is built on the multiactor-attention-critic (MAAC) model, which offers two significant advances. The first is the sensor fusion layer, which enables the actor network to utilize all required sensor information effectively. Next, a layer that computes the dissimilarity weights of different agents is added to compensate for the information lost through the attention layer of the MAAC model. We utilize the UAV LDS (logistic delivery service) environment created by the Unity engine to train the proposed model and verify its energy efficiency. The feature that measures the total distance traveled by the UAVs is incorporated with the UAV LDS environment to validate the energy efficiency. To demonstrate the performance of the proposed model, the F-MAAC model is compared with several conventional reinforcement learning models with two use cases. First, we compare the F-MAAC model to the DDPG, MADDPG, and MAAC models based on the mean episode rewards for 20k episodes of training. The two top-performing models (F-MAAC and MAAC) are then chosen and retrained for 150k episodes. Our study determines the total amount of deliveries done within the same period and the total amount done within the same distance to represent energy efficiency. According to our simulation results, the F-MAAC model outperforms the MAAC model, making 38% more deliveries in 3000 time steps and 30% more deliveries per 1000 m of distance traveled.

Record ID

LAPSE:2023.9509

Keywords

actor-attention-critic, air logistics, multiagent reinforcement learning, multiple UAV, sensor fusion

Subject

Process Control

Suggested Citation

Jeon S, Lee H, Kaliappan VK, Nguyen TA, Jo H, Cho H, Min D. Multiagent Reinforcement Learning Based on Fusion-Multiactor-Attention-Critic for Multiple-Unmanned-Aerial-Vehicle Navigation Control. (2023). LAPSE:2023.9509

Author Affiliations

Jeon S: Department of Computer Science and Engineering, Konkuk University, Seoul 05029, Korea [ORCID]
Lee H: Department of Computer Science and Engineering, Konkuk University, Seoul 05029, Korea [ORCID]
Kaliappan VK: Konkuk Aerospace Design-Airworthiness Research Institute, Konkuk University, Seoul 05029, Korea [ORCID]
Nguyen TA: Konkuk Aerospace Design-Airworthiness Research Institute, Konkuk University, Seoul 05029, Korea [ORCID]
Jo H: Department of Computer Science and Engineering, Konkuk University, Seoul 05029, Korea
Cho H: Department of Computer Science and Engineering, Konkuk University, Seoul 05029, Korea
Min D: Department of Computer Science and Engineering, Konkuk University, Seoul 05029, Korea

Journal Name

Energies

Volume

15

Issue

19

First Page

7426

Year

2022

Publication Date

2022-10-10