Monday, December 12, 2016

Self awareness game - byzantine agent cell war

Target


Conjecture


  • Conscious is formed by the processes of abstraction when applied to self modeling and self awareness

Objective


  • to attempt an attack the problem of self awareness and self modeling
  • to force an ML agent to be aware of it own presence in the neighboring cells.

The Game


Game Board


The core items


  • 2 learning agents: red and blue
  • a 2d grid of cells


The grid of cells


  • each cell has 3 states: red, blue, ownerless
    • the agent red or blue can read and act and learn from its owned cells
  • each cell has 4 comms channels (in the directions north, south east and west)
  • each cell has one memory channel that can be written/read
  • the grid is circular
    • the comms channel of the east most cell talks to the west most cell (and visa versa)
    • the comms channel of the north most cell talks the south most cell (and visa versa)


The agent


  • each agent is one global entity
  • each cell owned by an agent runs its own instance of the agent individually and independent of all other cells
  • each agent is installed in a cell that it owns and can only act on information for that *specific* cell its working with
    • there is *NO* back door communication inside the agent between cells
    • the agent acts like a CNN processing unit and is isolated from the neighboring cells even if it owns them


The play sequence


Game initialization


  • cells are randomly assigned one of the 3 states of ownership
  • memory is randomly filled


The round of play


  1. randomize phase
    • NOTE this is to break the potential unique identification of a cell by memory contents that creates a data leak problem allowing the network to hard bake in a unique action per cell and never generalize it
    • randomly select some percent of cells and randomize the ownership and memory values
    • percent of randomization can vary over time and might ramp up/down if game is in stalemate or single agent comes to dominate the game
  2. "question" phase
    • Agent reads "memory" and outputs 4 "questions" one for each direction
    • the cells send the "question" to each of the four directions
  3. "answer" phase
    • agent reads in the 4 incoming "questions" and "memory" and decides "answers"
    • the cells then send an "answer" to each of the four directions
  4. action phase
    • given "memory", "questions" and "answers" agent decides "action"
    • the agent then chooses 1 action to attack or defend and its direction of action
    • the agent chooses what to write to memory
  5. update phase
    • cell updates memory
    • env computes action outcome and updates cell ownership status
  6. learning phase
    • agents are informed of the question, answer, attack, defend, memory and ownership of each cell they *owned* at the round start
    • agents are told the score: how many cells they *now own* after the round
    • agents can then perform learning with unlimited time before signaling ready for next round

updating ownership of cell


  • battle is decided on the simple formula of defend - attack
  • current cell owner counts as +1 defend for that team
  • if the result is net positive ownership is unchanged
  • if the attack result is less than 0 the enemy takes cell ownership.
  • ownership of cell is updated to ownerless if net attack count was 0 .


Game Notes


The comms


  • these are a vector of numbers (ie as in word2vec vector)
  • size of 4 seems ok??

The memory


  • acts like a 5th comms channel
  • same size as a single comms channel

Ownerless cells


  • generate random responses and actions

Experimental notes and Expectations


Issues with the agents


  • the agent has no global communication method
  • the agent has no direct way to know what its position in the grid is
  • HOWEVER there are some massive *buts* and tweaking is expected
    • ISSUE: the contents of the memory might form a unique identification signal and combined with the comms channels this can result in a data leak that exposes location of the cell and its relationship to its neighbors in an indirectly computable anyway. The learning methodology could then bake such information into its neural net at each round and this will cause a hard coded response based on the magic id provided by the memory
    • COUNTER ACTION: Add an "randomize phase", at the start of each round that damages the memory and ownership of a small percent of cells. this forces the agent to consider all cells as potential byzantine failures, rather than explicitly and solo relying on the baked in knowledge of what the cell is and who owns it... hopefully this forces the agent to generalize better in the long term


Tools and methods in play


  • A byzantine environment.
  • Question/answer phases utilize adversarial techniques (ie Generator and Discriminator)
  • Utilize a min max game dynamics
  • Requires self inter communication, recovery from random events and handle long distance coordination


Exceptions of experiment


  • I expect the agents to learn to communicate with friendly neighbors
  • I expect to see indirect message relay using comms -> memory -> comms
  • I expect agents to be aware of which neighbor is an enemy and which is friend
    • ie attacks enemies only, supports friends only
  • deceptive comms, an enemy cell will attempt to mask its self as if its a friend
  • coordinated attack, neighboring cells may focus attack a target enemy to over power it
  • self supportive defense, agents work together to defend its own front line cells so that if attacked the front line cells defense will be boosted
  • testudo formation
    • in theory a cell with 2 friendly neighbors and 2 enemies can fully defend its self with support of its neighbors
    • ie the cell can defend to the east always, creating a +2 internal defense
    • also with supported from behind it an become a +3 defense(unkillable cell)
    • this will lead to a battle stalemate when 2 testudo walls push up to each other, at this point the agents need to break the formation using comms based infiltration... hopefully the randomization phase will allow sufficient *spying* for this to be possible


No comments:

Post a Comment