-
Notifications
You must be signed in to change notification settings - Fork 176
Description
If anyone could lend a hand porting this to python3/gtk3, I've made considerable progress here: https://github.com/smearle/gym-micropolis/tree/master/micropolis-4bots-gtk3, at the service of an OpenAI gym environment enabling the training of city-building bots. There are just some glitches in the gui that need to be worked out.
These bots store their own internal representation of the game map (via the engine.getTile(x, y) function). The gtk3 port above, if initialized with a bot, (via pyMicropolis.gtkFrontend.main.train(bot)), will notify the bot of any human-player actions executed via the gui, so that it can update its own map-representation and respond accordingly. See below for a simple example of interactive inference with from early on in training:

The networks I'm using are trained using actor-critic (A2C), and are made up entirely of convolutional layers (making them too cumbersome for ACKTR, since Kronecker factorization seems to need to jump through extra hoops to deal with convolutions (?)). Passing activations repeatedly through a single convolutional layer seems to result in performance that is at least on par with - and sometimes apparently better than - that of the same network, sans repeated passes. And finally, each layer of activation is the same width and height as the original input feature-map (where each 'pixel' of the image is a tile of the game map).
By adopting a convolution-only architecture, we pave the way for scalability to larger map sizes (since the action space is the map itsef, with build-actions for channels, a linear layer between it and the input would grow exponentially with map size). By using recursive (repeated) convolution, we allow distant activations on the feature-map to affect one another, again at minimal computational cost. And by making our hidden activations occur "on the map," we can think of our network as executing a kind of non-discrete Conway's game-of-life to determine its next build, or of letting activations flow spatially through the map itself, which has a certain appeal (though I can't help but feel that the notion of a compressed map, for more abstract, high-level spatial planning, might also be invaluable in this problem space...).
Another such network, a bit further along, without my bullying it:

If anyone has a spare gpu, I'm currently using a pytorch port of the OpenAI baselines repo for training: https://github.com/smearle/pytorch-baselines-micropolis. I'm very interested in exploring the possible space of neural architectures that lend themselves well to this environment, since it seems to foreshadow a more general game-playing paradigm which takes for input the gui-as-image, and outputs something like an equal-size gray-scale image, with pixel intensity corresponding to mouseclick-likelihood, for example.
In the current context, however, I want to make these agents as interactive as possible - the player should be able to make whatever builds they desire, and have the bot complement their design to optimize for some combination of factors (population, traffic, happiness, pollution, etc.). The player should be able to control these bots from the gui (stop/starting them, setting their optimization-goals), and confine their building area to subsections of the map. To this end, any help completing the port to gtk3 would be appreciated. Thanks :)