Social-ecological system in multi-agent environment through deep reinforcemen...

 2017/08/16   Share

Introduction

There are series of urgent global challenges nowadays, which are caused by the overexploitation of common resources, such as overexploitation of groundwater, petroleum, and mine. Due to those issues above, the resources on the earth may be depleted in a high speed.

To address these, people try to find the optimal policy to avoid overexploitation. Classic game theory holds a wildly acceptive assumption that individual is rational and self-interested, so they will maximise their own profits thus regardless of the whole population. Learning in social-ecological systems — allows people gaining a better understanding of resource management. However, in a cooperative dilemma (prisoner dilemma) people found that multiple self-interested agents can hardly find the global social-ecological solution, e.g. common resource pool. (Perolat et al., 2017).

As Rand and Nowak say (2013), natural selection always prefers defection. Therefore, the situation that most agents choose do not cooperate may eventually cause the famous pessimistic social dilemma — tragedy of the commons. However, altruistic behaviors are occasionally occurring in our life, and sometimes exploitation of common resource has been controlled at a sustainable level. The simple classic game model, therefore, cannot give more explanations of the phenomenon that we observed in reality. Since we need much more complex models in simulation. As Battersby (2017) says, “We just need to change the game.” (p. 8)

Many researchers have invested a large amount of efforts in game theory, in order to find the optimal solution for the individuals and population. They have developed a variety of strategies to apply in social-ecological systems. Afterwards, various frameworks and mechanisms have come out in order to extend the expression ability of their system. In this case, Rezaei and Kirley (2012) introduce a social network based model in the N-player Prisoner’s Dilemma game; Rand and Nowak (2013) discuss five mechanisms of the evolution of cooperation; Hauser et al. (2014) devise a new kind of paradigm – ‘Intergenerational Goods Game’; Osten, Kirley and Miller (2017) present a model of dynamic common resource pool which combines profitability with sustainability goals; Leibo et al. (2017) propose a sequential social dilemma to address the dynamic policies problem.

Key problem

From a biological point of view, a different approach - “evolutionary game theory” has been proposed. Traulsen and Hauert (2009) describe two aspects of evolutionary game theory: the strategies encoded by the genome and cultural evolution. The concepts of evolutionary game theory can help us have a better understanding to build social-ecological system based on multi-participants.

An interesting observation is that when exploitation decisions are made by individuals respectively, the common resource tends to be easily exhausted. By contrast, if the decisions are made by all participants democratically, the over-exploiting problem is much easier to solve (Hauser, Rand, Peysakhovich, & Nowak, 2014). Moreover, Taylor et al. (2014) point out it is important for us to develop cooperative strategies in multi-agents based game. Only most of the agents working cooperatively, the appropriation behaviour taken by a single agent can be effectively prevented.

Now, the key problem is how a self-interested agent can learn from the environment to cooperate rather than defect? What is the best strategy to maximise overall returns in the long run? Since we cannot use the traditional classic game theory to predict agent behaviour without cooperation, there is a variety of methods to approach the cooperative social dilemma.

Key methods

There are two main learning techniques widely applied to find a solution of social dilemma: social learning and reinforcement learning.

Social learning (or imitation learning)

The key idea of social learning is that an individual can imitate or directly copy some successful behaviours from other individuals. Chatterjee, Zufferey and Nowak (2012) propose a framework which is based on social learning. They imply single agent can learn strategies from other agents who are successful. Moreover, they explore the situation that the individual has different learning abilities. No surprised, the outcomes of self-learning are variant from individual to individual. Then they add the noisy with a small possibility to present the learning mistakes. All of these efforts are made the learning progress much and much more convincible and reliable. From their work, they test the first mechanism – direct reciprocity, and the outcomes show that all of agent’s policies will still nearly converge to an optimal policy, while oscillations occur.

Reinforcement learning

Another approach is using reinforcement learning approaches. A reinforcement learning based agent can learn from their own or someone else’s experience, to improve their policy. Using this mechanism, agents can learn autonomously to solve some problem, especially in Markov decision progress. Since reinforcement learning is value-based approach, researchers try to combine reinforcement learning with deep learning in order to approximate any complex value functions. The outcomes show that deep reinforcement learning approaches perform much better than any other learning methods. For instance, Taylor et al. (2014) propose a framework based on agents advising agents in complex video games: StarCraft and Pac-Man. They highlight the great performance of the RL agents in learning control policies.

Perolat et al. (2017) apply deep reinforcement learning in common resource pool game, and achieve a satisfying outcome: agents finally have learned different strategies to cooperate with each other after training. Additionally, they observed that only tracking the log of rewards during the experiment is far from enough. To detect the emergent events during the training process, they bring forward the “social outcome metrics” which are consisted of the evaluation of efficiency, equality, sustainability, and peace.

Another study taken by Leibo et al. (2017) concentrate on the policies which are applied with cooperativeness. They point out that besides learning the game strategies, agents also need to learn the policies dynamically. The outcomes are quite different in different models. In the game “Gathering”, agents tend to cooperate rather than defect; however, agents are much easier to cooperate in another game “Wolfpack”, just because of the change in the environment in games.

Furthermore, Osten, Kirley and Miller (2017) implement reinforcement learning method with Q- Learning to guide the decision strategies in stochastic common resource pool system; Tampuu et al. (2017) introduce deep reinforcement learning to play video game “Pong”, illustrating how cooperation or competition strategies can be learned.

Evaluation

Above all, many methods have been put forward to address real-world social dilemma. These have made progression on the specific model. In the absence of one general strategy which can apply to all these situations, more study is needed to fill the vacancy. Obviously, the current studies are still at the very beginning stage and finding a general learning method truly a tough task – even the mankind can hardly find the optimum solution in these social real-world dilemmas.

Now we have discussed two kinds of popular learning approaches: social learning (or imitation learning) and reinforcement learning. These two address different issues, but they are a little bit similar. Researchers used to apply social learning widely to explore a series of social real-world dilemma and achieve in many models. However, Osten, Kirley and Miller (2017) argue that evolutionary game theory and the social learning method may not be effective in some stochastic models because of its slow convergence, which has already been discussed above. While evolutionary game theory and social learning apply well in many real-world issues, reinforcement learning may be more suitable to solve Markov decision problems, which include the most of social dilemma models.

Although reinforcement learning performs well in many AI learning tasks, there is still much room to improve. For instance, these two companies, DeepMind and Blizzard, recently declared that they have developed the reinforcement learning environment SC2LE, which is based on the famous real-time strategy game StarCraft II. The former training way of reinforcement learning or deep reinforcement learning method becomes invalid in the new experiment environment (Vinyals et al., 2017). However, they also admit that this mechanism only works in mini-games. When it applies to main game of StarCraft II, the agents can hardly make any progress. Not only the partial information but also the complexity of the space of state cause the difficulties in such complicated strategy game. One of possible solutions they gave is dividing the main game into multiple mini-games, and solve them respectively by reinforcement learning method. However, there is another possibility that a new reinforcement learning method comes out which is powerful to find the optimum solution.

In conclusion, large quantities of studies show that deep reinforcement learning performs much better than any other traditional learning methods. Especially in multi-agent environment, most agents can learn some cooperative strategies. Nevertheless, when we place our agents in much more complicated environment they encounter the bottleneck. The future work should concentrate on learning from complex and high interactive environment. Reinforcement learning method seems like one possible key of entering the era of artificial general intelligence.

Reference

Battersby, S. (2017). News Feature: Can humankind escape the tragedy of the commons? Proceedings of the National Academy of Sciences, 114(1), 7–10. https://doi.org/10.1073/pnas.1619877114
Chatterjee, K., Zufferey, D., & Nowak, M. A. (2012). Evolutionary game dynamics in populations with different learners. Journal of Theoretical Biology, 301, 161–173.
Hauser, O. P., Rand, D. G., Peysakhovich, A., & Nowak, M. A. (2014). Cooperating with the future. Nature, 511(7508), 220–223. https://doi.org/10.1038/nature13530
Leibo, J. Z., Zambaldi, V., Lanctot, M., Marecki, J., & Graepel, T. (2017). Multi-agent Reinforcement Learning in Sequential Social Dilemmas. ArXiv:1702.03037 [Cs]. Retrieved from http://arxiv.org/abs/1702.03037
Osten, F. B. von der, Kirley, M., & Miller, T. (2017). Sustainability is possible despite greed - Exploring the nexus between profitability and sustainability in common pool resource systems. Scientific Reports, 7. https://doi.org/10.1038/s41598-017-02151-y
Perolat, J., Leibo, J. Z., Zambaldi, V., Beattie, C., Tuyls, K., & Graepel, T. (2017). A multi-agent reinforcement learning model of common-pool resource appropriation. ArXiv:1707.06600 [Cs, q-Bio]. Retrieved from http://arxiv.org/abs/1707.06600
Rand, D. G., & Nowak, M. A. (2013). Human cooperation. Trends in Cognitive Sciences, 17(8), 413–425. https://doi.org/10.1016/j.tics.2013.06.003
Rezaei, G., & Kirley, M. (2012). Dynamic social networks facilitate cooperation in the N-player Prisoner’s Dilemma. Physica A: Statistical Mechanics and Its Applications, 391(23), 6199–6211. https://doi.org/10.1016/j.physa.2012.06.071
Taylor, M. E., Carboni, N., Fachantidis, A., Vlahavas, I., & Torrey, L. (2014). Reinforcement learning agents providing advice in complex video games. Connection Science, 26(1), 45–63. https://doi.org/10.1080/09540091.2014.885279
Traulsen, A., & Hauert, C. (2009). Stochastic evolutionary game dynamics. Reviews of Nonlinear Dynamics and Complexity, 2, 25–61.
Vinyals, O., Ewalds, T., Bartunov, S., Georgiev, P., Vezhnevets, A. S., Yeo, M., … others. (n.d.). StarCraft II: A New Challenge for Reinforcement Learning. Retrieved from https://deepmind.com/documents/110/sc2le.pdf

CATALOG

1. Social-ecological system in multi-agent environment through deep reinforcement learning



缺失模块。
1、请确保node版本大于6.2
2、在博客根目录（注意不是archer根目录）执行以下命令：
npm i hexo-generator-json-content --save
3、在根目录_config.yml里添加配置：

jsonContent:
  meta: false
  pages: false
  posts:
    title: true
    date: true
    path: true
    text: false
    raw: false
    content: false
    slug: false
    updated: false
    comments: false
    link: false
    permalink: false
    excerpt: false
    categories: false
    tags: true