Multi-agent reinforcement learning has recently shown great promise as an approach to networked system control. Arguably, one of the most difficult and important tasks for which large scale networked system control is applicable is common-pool resource management. Crucial common-pool resources include arable land, fresh water, wetlands, wildlife, fish stock, forests and the atmosphere, of which proper management is related to some of society’s greatest challenges such as food security, inequality and climate change. Here we take inspiration from a recent research program investigating the game-theoretic incentives of humans in social dilemma situations such as the well-known tragedy of the commons. However, instead of focusing on biologically evolved human-like agents, our concern is rather to better understand the learning and operating behaviour of engineered networked systems comprising general-purpose reinforcement learning agents, subject only to nonbiological constraints such as memory, computation and communication bandwidth. Harnessing tools from empirical game-theoretic analysis, we analyse the differences in resulting solution concepts that stem from employing different information structures in the design of networked multi-agent systems. These information structures pertain to the type of information shared between agents as well as the employed communication protocol and network topology. Our analysis contributes new insights into the consequences associated with certain design choices and provides an additional dimension of comparison between systems beyond efficiency, robustness, scalability and mean control performance.

In our latest paper accepted at #NeurIPS2020 we evaluate networked multi-agent RL (NMARL) systems for common-pool resource (e.g. water) management using empirical game theoretic analysis. Work in collaboration with Stellenbosch University, Wits University, and UTC.

NMARL systems display distinct equilibrium profiles that are dependent on their employed information structure. Systems with differentiable communication protocols (e.g. DIAL, CommNet and NeurComm) tend to lead to improved agent cooperation.

However, most system profiles still exhibit inefficiency at equilibrium. One exception is the NeurComm algorithm (Chu et al., 2020) which is able to reach a stable equilibrium where it is optimal for the system as a whole, as well as for each individual agent, to cooperate.

This work is a first step in improving our understanding of NMARL systems for common-pool resource management. In follow-up work, we will scale to larger systems and more environments (beyond water systems) with further investigation into why certain solution concepts arise.