Human-agent-mobility

Target distribution search task

This section study provides python scripts to simulate search tasks where the targets follow a parametric distribution and the agent's objective is to maximize targets collected and minimize distance traveled.

getting started

software

Python 3.10

Depending on the size of the dataset and parameters simulations, the user might require access to high-performance computing (HPC)

Collective search with social learning

Social learning in collective search was first proposed by Bhattacharya and Vicsek¹ in 2014 and later investigated by Garg et al.²³. Based on their work, we modified the model to improve its clarity.

Target distribution

To generate a heterogeneous target distribution, we manipulated the initial spatial clustering of targets using a power-law distribution growth model. Additionally, we assume that targets do not regenerate after being collected by agents.

Social learning range

After collecting a target, the agent emits a social signal from the location where the target was collected to attract nearby agents. Other agents within a radius of $\rho$ can detect the signal, while those outside this region cannot. The parameter $\rho$ determines the strength of social learning.

Varied $\mu$ search

Before detecting a target, agents perform a Lévy walk with $\mu = 3$ (exploration). After collecting a target, an agent changes $\mu$ from 1.1 (exploration) to 3 (exploitation) and maintains this value until it moves out of the region of radius $R$ around the detected target without detecting new targets.

Negative targets

Agents not only share information about the location of resources but also communicate potential risks in the area. Based on this, we introduce a negative target, assuming a repulsive step length between agents receiving the social signal and the detected negative target.

RL models for deterministic environment:

We simulate search tasks where target locations follow a hierarchical distribution and their locations are fixed between search tasks (episodes) but unknown to the agent.

The following methods were used previously to simulate target search. Although they can be used to learn optimal policies, they have some limitations which are presented here.

Tabular SARSA & Q-learning methods:

Under tabular methods, each state is represented as an entry of the value function table. In our problem setting, we define states as the location of the agent in the search space; however, we have to discretize the search space such that we can assign an entry for each location of the agent.

Limitations:

Large number of states: If the radius of detection is very small compared to the search space, the number of states is too large that updating all of them will take a long computational time.

Planning & learning:

This method is similar to the previous one, the difference is that updating the values will made without extra information about the environment.

Limitations:

Large number of states: If the radius of detection is tiny compared to the search space, the number of states is too large that updating all of them will take a long computational time.

Actor-critic methods:

To reduce the number of states, we use approximation methods. The value function and policies are both parametrized and states are defined by a specific transformation of the search space.

Limitations:

Rewards give information on the following quantities: distance, direction, and whether the agent is founded or not. However, we need independent modifications to each quantity during the learning process. Additionally, information on the targets collected can be used more efficiently.

Actor-critic with posterior Approximation

This is the most recent model used. Here, we plan to use all the previous experiences to improve the learning process. Namely, we plan to modify rewards from the environment to split them into two separate rewards and learn two independent value functions.

Multi-Agent Deep Deterministic Policy Gradient

Agents are assigned their own actor network for decentralized execution. There is one critic network that takes in all of the states of each agent. It functions as a centralized function to encourage coordination amongst agents.

Continuous Action Space

Agents are using a continuous action space rather than discrete. This allows for the agents to utilize the wind flow and maneuver smmoothly

Reference

Bhattacharya, K., & Vicsek, T. (2014). Collective foraging in heterogeneous landscapes. Journal of the Royal Society Interface, 11(100), 20140674. ↩
Garg, K., Kello, C. T., & Smaldino, P. E. (2022). Individual exploration and selective social learning: balancing exploration–exploitation trade-offs in collective foraging. Journal of the Royal Society Interface, 19(189), 20210915. ↩
Garg, K., Smaldino, P. E., & Kello, C. T. (2024). Evolution of explorative and exploitative search strategies in collective foraging. Collective Intelligence, 3(1), 26339137241228858. ↩

Name		Name	Last commit message	Last commit date
Latest commit History 207 Commits
.idea		.idea
CollectiveSearch		CollectiveSearch
CollectiveSearch_NegativeTargets		CollectiveSearch_NegativeTargets
Collective_Levy_search_Self_Assignment		Collective_Levy_search_Self_Assignment
Hierarchical_Levy_Flight		Hierarchical_Levy_Flight
RL-hierarchical-target-dsitribution		RL-hierarchical-target-dsitribution
SmartSearch		SmartSearch
SubFunc		SubFunc
Varied_mu_search		Varied_mu_search
WindModel		WindModel
.gitignore		.gitignore
Enviroment.py		Enviroment.py
Main_eval.m		Main_eval.m
Main_eval_ver2.asv		Main_eval_ver2.asv
Main_eval_ver2.m		Main_eval_ver2.m
Main_eval_ver2_par.m		Main_eval_ver2_par.m
Main_with_env_complx.m		Main_with_env_complx.m
Py_Matlab_inter.py		Py_Matlab_inter.py
README.md		README.md
Random_walk.py		Random_walk.py
Real_Targets.mat		Real_Targets.mat
Through_search.py		Through_search.py
notebook.ipynb		notebook.ipynb
roi.mat		roi.mat
target_search.py		target_search.py
velocity_RL_heterogenous_terget_distribution.py		velocity_RL_heterogenous_terget_distribution.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Human-agent-mobility

Target distribution search task

getting started

software

Collective search with social learning

Target distribution

Social learning range

Varied $\mu$ search

Negative targets

RL models for deterministic environment:

Tabular SARSA & Q-learning methods:

Limitations:

Planning & learning:

Limitations:

Actor-critic methods:

Limitations:

Actor-critic with posterior Approximation

Multi-Agent Deep Deterministic Policy Gradient

Continuous Action Space

Reference

About

Uh oh!

Releases

Packages

Contributors 4

Uh oh!

Languages

aminrahimian/human-agent-mobility

Folders and files

Latest commit

History

Repository files navigation

Human-agent-mobility

Target distribution search task

getting started

software

Collective search with social learning

Target distribution

Social learning range

Varied $\mu$ search

Negative targets

RL models for deterministic environment:

Tabular SARSA & Q-learning methods:

Limitations:

Planning & learning:

Limitations:

Actor-critic methods:

Limitations:

Actor-critic with posterior Approximation

Multi-Agent Deep Deterministic Policy Gradient

Continuous Action Space

Reference

Footnotes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Uh oh!

Languages

Packages