RS-Agent: Automating Remote Sensing Tasks through Intelligent Agent

Wenjia Xu^*, Zijian Yu^*, Boyang Mu, Zhiwei Wei, Yuanben Zhang, Guangzuo Li and Mugen Peng
^* Equal Contribution

State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications
School of Geographic Sciences, Hunan Normal University
Aerospace Information Research Institute, Chinese Academy of Sciences

RS-Agent.mp4

Introduction

Recent advancements in Large Language Models (LLMs) and Multi-modal Large Language Models (MLLMs) have led to impressive performance in remote sensing tasks. However, these models are limited to basic vision and language tasks and lack specialized expertise for complex remote sensing applications. To address these, we propose RS-Agent, an intelligent agent for remote sensing. RS-Agent is powered by an LLM as its "Central Controller," enabling it to understand and respond to various problems. It integrates high-performance remote sensing image processing tools, allowing multi-tool, multi-turn conversations for complex tasks. Additionally, RS-Agent utilizes a knowledge graph-enhanced Retrieval-Augmented Generation (RAG) framework to access domain-specific knowledge, ensuring accurate responses to expert-level queries. Experimental results show RS-Agent achieves over 95% task planning accuracy and demonstrates strong domain-specific knowledge retrieval, excelling across various tasks.

Core Components

Central Controller: Serves as the decision-making core of the agent. It interprets user queries, plans task execution, manages dialogue history, and synthesizes final responses.
Toolkit: A collection of state-of-the-art remote sensing tools for various applications. These tools are invoked based on the Central Controller’s planning.
Solution Space: Stores predefined expert-level task solutions. It guides the Controller in selecting appropriate tools and execution strategies by retrieving relevant task-specific instructions.
Knowledge Space: Provides domain-specific information via a curated knowledge database. It supports expert-level reasoning by retrieving relevant content.

Supported Function

Tool	Function	Example Input
cloud_removal	Cloud removal from satellite images	Remove the clouds in this image.
image_dehazing	Haze removal from images	Dehaze this foggy image.
super_resolution	Image super-resolution (2×)	Enhance the resolution of this image.
denoising	Image denoising	Remove noise from this image.
caption	Geo-specific VQA and captioning	What is in this remote sensing image?
optical_detection	Optical image target detection	Detect objects in this optical image.
optical_plane_type	Aircraft type recognition in optical images	What type of aircraft is in this image?
scene	Scene classification	What is the scene category of this image?
sar_detection	Target detection in SAR images	Find the objects in this SAR image.
sar_plane_type	Aircraft type recognition in SAR images	Identify the aircraft in this SAR image.
knowledge_search	Aircraft info retrieval via Knowledge Database	Who manufactures Boeing 747?
building_damage_detection	Building damage assessment	Which buildings are damaged?
building_extraction	Building extraction from images	Extract all buildings from the image.
road_extraction	Road extraction from images	Extract roads from the scene.
horizontal_object_detection	Horizontal bounding box detection	Detect objects using horizontal boxes.
rotated_object_detection	Rotated object detection	Detect objects using rotated boxes.
semantic_segmentation	Pixel-wise semantic segmentation	Segment the different regions in this image.
land_use_classification	Land use categorization	What are the land use types in this image?

Results

Quantitative Results

To evaluate RS-Agent’s adaptability, we evaluate its task planning accuracy when paired with different closed-source (GPT series) and open-source LLMs. This analysis helps reveal whether RS-Agent can maintain or improve performance as the underlying model changes.

Task	ChatGPT (3.5-turbo-1106)	ChatGPT (3.5-turbo)	ChatGPT (4o-mini)	LLaMa 3.1 (8B)	LLaMa 3.1 (70B)	Qwen2.5 (14B)	Qwen2.5 (32B)	Qwen2.5 (72B)	DeepSeek-r1 (70B)
	(87.71t/s)	(65.03t/s)	(58.87t/s)	(100.78t/s)	(17.71t/s)	(69.61t/s)	(36.77t/s)	(16.24t/s)	(18.25t/s)
Cloud Removal	95.00%	95.00%	100%	100%	100%	100%	95.00%	100%	100%
Image Dehazing	30.00%	95.00%	100%	100%	100%	100%	100%	100%	75.00%
Super Resolution	100%	100%	100%	0.00%	100%	100%	100%	100%	95.00%
Denoising	90.00%	100%	100%	100%	100%	100%	100%	100%	90.00%
Image Captioning	55.00%	45.00%	90.00%	15.00%	60.00%	70.00%	80.00%	80.00%	10.00%
Object Detection	75.00%	60.00%	95.00%	30.00%	90.00%	90.00%	85.00%	100%	85.00%
Optical Plane Classification	100%	100%	100%	100%	100%	100%	100%	100%	95.00%
Scene Classification	20.00%	90.00%	100%	80.00%	90.00%	90.00%	100%	100%	50.00%
SAR Detection	30.00%	100%	100%	75.00%	95.00%	100%	100%	100%	100%
SAR Plane Classification	100%	100%	100%	100%	100%	100%	100%	100%	90.00%
Knowledge Search	100%	100%	100%	100%	80.00%	100%	100%	100%	10.00%
Building Damage Detection	100%	100%	100%	100%	100%	95.00%	100%	100%	100%
Building Extraction	10.00%	70.00%	100%	55.00%	100%	100%	100%	100%	100%
Road Extraction	15.00%	55.00%	100%	65.00%	100%	100%	100%	100%	100%
Horizontal Detection	20.00%	55.00%	100%	95.00%	100%	100%	100%	100%	100%
Rotated Detection	15.00%	35.00%	100%	85.00%	90.00%	100%	100%	100%	100%
Semantic Segmentation	60.00%	100%	100%	80.00%	100%	100%	100%	100%	80.00%
Land Use Classification	15.00%	100%	100%	75.00%	100%	100%	100%	95.00%	95.00%
Average Accuracy	57.22%	82.50%	99.17%	75.28%	94.72%	96.94%	97.78%	98.61%	81.94%

Qualitative Results

This figure shows several key tools that highlight the core capabilities of RS-Agent. The input image represents the image input by the user. The blue box shows the user's request, and the red box shows the RS-Agent's reply.

🏆 Contributions

We present RS-Agent, a novel architecture designed to interpret user queries and orchestrate diverse tools for accurate and efficient remote sensing task execution. Its four core components—Central Controller, Toolkits, Solution Space, and Knowledge Space—work in concert, seamlessly interacting and complementing one another to enable robust, adaptive performance across a wide range of applications.
To enhance the agent’s task planning accuracy, we propose an innovative Task-Aware Retrieval method. By retrieving and understanding expert-level task solutions, RS-Agent is able to emulate the decision-making and tool selection processes of professional remote sensing analysts.
To strengthen RS-Agent’s domain-specific knowledge, we propose DualRAG, a retrieval augmented generation method that assigns weights to extracted keywords and performs dual path retrieval, thereby enhancing the accuracy and relevance of knowledge retrieval.
Extensive experiments demonstrate that RS-Agent consistently surpasses previous SOTA Multimodal Large Language Models across a range of remote sensing applications, and significantly boosts the task planning accuracy. These results establish RS-Agent as a major step forward in adapting AI agents to the remote sensing field, and, for the first time, present a comprehensive and modular architecture tailored for remote sensing applications.

Acknowledgments

We are thankful to the amazing open-sourced LLMs and the tools used in our RS-Agent for releasing their models and code as open-source contributions.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
images		images
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RS-Agent: Automating Remote Sensing Tasks through Intelligent Agent

Introduction

Core Components

Supported Function

Results

Quantitative Results

Qualitative Results

🏆 Contributions

Acknowledgments

About

Uh oh!

Releases

Packages

License

IntelliSensing/RS-Agent

Folders and files

Latest commit

History

Repository files navigation

RS-Agent: Automating Remote Sensing Tasks through Intelligent Agent

Introduction

Core Components

Supported Function

Results

Quantitative Results

Qualitative Results

🏆 Contributions

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages