This section study provides python scripts to simulate search tasks where the targets follow a parametric distribution and the agent's objective is to maximize targets collected and minimize distance traveled.
- Python 3.10
Depending on the size of the dataset and parameters simulations, the user might require access to high-performance computing (HPC)
Social learning in collective search was first proposed by Bhattacharya and Vicsek1 in 2014 and later investigated by Garg et al.23. Based on their work, we modified the model to improve its clarity.
To generate a heterogeneous target distribution, we manipulated the initial spatial clustering of targets using a power-law distribution growth model. Additionally, we assume that targets do not regenerate after being collected by agents.
After collecting a target, the agent emits a social signal from the location where the target was collected to attract nearby agents. Other agents within a radius of
Before detecting a target, agents perform a Lévy walk with
Agents not only share information about the location of resources but also communicate potential risks in the area. Based on this, we introduce a negative target, assuming a repulsive step length between agents receiving the social signal and the detected negative target.
We simulate search tasks where target locations follow a hierarchical distribution and their locations are fixed between search tasks (episodes) but unknown to the agent.
The following methods were used previously to simulate target search. Although they can be used to learn optimal policies, they have some limitations which are presented here.
Under tabular methods, each state is represented as an entry of the value function table. In our problem setting, we define states as the location of the agent in the search space; however, we have to discretize the search space such that we can assign an entry for each location of the agent.
- Large number of states: If the radius of detection is very small compared to the search space, the number of states is too large that updating all of them will take a long computational time.
This method is similar to the previous one, the difference is that updating the values will made without extra information about the environment.
- Large number of states: If the radius of detection is tiny compared to the search space, the number of states is too large that updating all of them will take a long computational time.
To reduce the number of states, we use approximation methods. The value function and policies are both parametrized and states are defined by a specific transformation of the search space.
- Rewards give information on the following quantities: distance, direction, and whether the agent is founded or not. However, we need independent modifications to each quantity during the learning process. Additionally, information on the targets collected can be used more efficiently.
This is the most recent model used. Here, we plan to use all the previous experiences to improve the learning process. Namely, we plan to modify rewards from the environment to split them into two separate rewards and learn two independent value functions.
Agents are assigned their own actor network for decentralized execution. There is one critic network that takes in all of the states of each agent. It functions as a centralized function to encourage coordination amongst agents.
Agents are using a continuous action space rather than discrete. This allows for the agents to utilize the wind flow and maneuver smmoothly
Footnotes
-
Bhattacharya, K., & Vicsek, T. (2014). Collective foraging in heterogeneous landscapes. Journal of the Royal Society Interface, 11(100), 20140674. ↩
-
Garg, K., Kello, C. T., & Smaldino, P. E. (2022). Individual exploration and selective social learning: balancing exploration–exploitation trade-offs in collective foraging. Journal of the Royal Society Interface, 19(189), 20210915. ↩
-
Garg, K., Smaldino, P. E., & Kello, C. T. (2024). Evolution of explorative and exploitative search strategies in collective foraging. Collective Intelligence, 3(1), 26339137241228858. ↩