Adversary, Attractor, Astonishment
As of last month, [AlphaStar], an AI player, beat 2 top Starcraft players with a score of 10–1. Did AI conquer Starcraft the same way it conquered Go? Is AlphaStar actually SkyNet? Or are we just falling into some avoidable traps?
In this (my second :D) blog, we will explain which games have unbeatable AIs, and which games do humans still have a shot of winning. We will explore what it takes for a game to be “solved” and the gaps still lacking in AI approaches. No click baits, no mindless hypes, no needless jargons. Enjoy.
Note: All external links have brackets around them [like so], they are not critical to the flow of the blog but have very high entertainment values.
Adversary : WIN at NIM
Let’s start with a simple game called NIM, some of you might have heard of it. It goes something like this:
There is a pile of 13 coins, and we take turns taking a numbers of coins from the pile. On each turn we can take either 1, 2, or 3 coins, and the person who takes the last coin wins the game. You get to go first, how would you win?
I have pythoned a simple text-based version of this game, if you have not played NIM before, you can play it [here]. Just click “run” and it will start.
. . . I’ll wait here until you have finished beating it . . .
If you have not beaten it it’s fine, because without knowing the [trick] behind the game, it is difficult to win. Even if you know the trick, you can still lose with a single misplay. This has the feel of a [solved game], which is to say, there is “book” which records, for every position of the game, which optimal move one should take. The game of 13 coin NIM is a simple solved game, and the game of [connect four] is a complicated solved game. Human vs AI in a solved game is nearly impossible: The AI, armed with the [book], could trap you down a path which irrevocably leads to defeat off of a single mistake.
Attractor : No Escape
The case where your opponent can trap you into a sequence that leads to defeat has a mathematical analogy called [attractor]. An attractor is a set of configurations in a dynamical system which, for a range of starting configurations, will eventually evolve towards and getting stuck there forever.
One of the coolest attractor is the [hailstone sequence], starting with any positive integer number, you evolve the number with the following rule:
Let’s do a quick example: Starting with 13, since it is odd, we multiply it by 3 and add 1, which makes 40, and since 40 is even, we divide it by 2, making 20. The whole sequence is [13, 40, 20, 10, 5, 16, 8, 4, 2, 1, 4, 2, 1 . . . ]. It is conjectured that for any positive integer, with the above update rule, one would always get stuck in the 4, 2, 1 cycle. You can mess with it [here].
The set of configurations that eventually evolves into an attractor is called the [basin of attraction]. The simplest basin is literally a physical [bowl]. More than 90% of Wikipedia articles are in the basin of attraction of [philosophy]. In the hailstone sequence, the [entire set of positive integers] is postulated to be the basin that eventually ends in the 4, 2, 1 cycle.
Games : Adversarial Attractions
Now, the crux of this blog: viewing games as adversarial attractions. Consider this simple game, let’s call it “Attractor”, based on [sloppy] 2D physics. Attractor is like a 2 player soccer: Two disembodied players, “Left” and “Right”, attempt to maneuver the blue particle into their respective goals by directly exerting force vectors. The blue particle is initially placed at a random location with a random starting velocity.
What would be a good way to win this game? It seems like keeping nudging the particle directly toward your own goal (like a magnet) is a good strategy. Below is a simulation of 100 different games (i.e. each particle is a different game) played between the Left player, which uses the magnet strategy, and the Right player, who just pushes the particle at random.
Can we define the basins of attraction for these two players? That is to say, given any particle at any given time, can you predict if the winner is Left or Right? It is difficult to tell when the particle is fast. However, as the particle slows down, the Left magnet player will always successfully attract the particle that lies close to its goal.
By understanding the basin of attraction for the Left magnet player, Right can formulate a counter strategy: When the particle is close to Left, subverts the particle’s path by [sling-shotting] it away, then attract it afterwards. The Left player is probably upset: How could games that were once confidently within its basin of attraction all of a sudden be unwinnable? Let’s see it in action:
How can we beat the sling player? I have no idea but I’m sure it is possible. We can even train one with a neural network. As these strategies progress, their respective basins of attractions will become harder to analyze.
Attractor is a simple game, but it is a good metaphor for other kinds of games: Instead of a particle with a position and a velocity, it could be a chess board position or a particular game state in Starcraft; Instead of nudging the particles, it could be moving the pawn or constructing additional pylons. However, the message here is the same: By understanding your opponent’s basins of attraction, one can formulate the appropriate counters.
Traps : Avoid to Win
As we can see, even for a simple game like Attractor, the basins of attraction can become difficult to analyze and describe. Therefore, we tend to focus instead on traps: states that most definitely lie within a basin of attraction.
An amateur has a limited grasp on traps, to him, the trap is only few steps away from actually winning or losing the game. Thus, the amateur may be overconfident, realizing something was wrong way way way [too late].
By increasing our understanding of different plays and traps, we can focus on critical errors that lead one to be trapped, rather than the subsequent pointless struggles. Now, let’s examine some traps in modern AI agents.
Writing the Book of Traps
The unbeatable AI is [the book], describing for each scenario, which of the basins of attraction it belongs to (i.e. everything is a trap). For simple games such as the 13-coin NIM and [connect four], such book can be written down. Once such a book is written, the humans would have no hope of winning.
However, for more complex games such as Chess, Go, or Starcraft, the number of scenarios is simply too numerous to be written down into a book. Thus, one have to write a simpler book that nonetheless gives estimates on the attractor of each scenario: better estimates create stronger AI agents.
The Chess AI, [DeepBlue], employed [tree search] and hand-crafted [heuristics] as the simpler book. In Chess, it is easy to manually specify the “goodness” of a given board as a [heuristic]. As the human players largely rely on the same heuristics to play Chess as DeepBlue, it is game over for humans: DeepBlue is doing the same computation as a human player, except better at it thanks to deep searches into the future.
For Go, it is difficult to manually specify a good heuristic on a given board, a major road-block on creating a strong Go AI. Rather than relying on hand-crafted heuristics to evaluate the goodness of a board, [AlphaGo] learns the goodness of each board with a [value function]. These learned heuristics, combined with similar deep searches as DeepBlue, produced a super human Go playing book. This particular book does have some [missing pages], but they appear to be patched by massive amount of [self plays].
Astonishment : When Machines Fail
The most reliable way of defeating an AI opponent is via [astonishment]. To pull this off against a modern AI agent built with a neural network, such as AlphaStar, we need to first understand what is a neural-network and how is it susceptible to failure. It will be bit technical so hold on to your smart hats.
In the hype literature, neural-network is often compared to the actual brain with a tagline of “biologically inspired”. These definitions are not wrong but misleading. Here, let’s understand it as a kind of [function approximator].
We’ll explain what is a function approximator with this simple example: Consider the following 1000 points, how might we succinctly describe them? Specifically, given an input x, how do we describe its corresponding output y?
I came up with the following red curve. As you can see, it is making a lot of mistakes, especially on inputs around -5 and 5, and a few edge points on the far left and right. However, the red-curve is defined with only 5 numbers, which is far more succinct than the original 1000 points.
Let’s improve our approximation. It turns out that if you were to tweak the 5 numbers, you can end up with a much better approximation as follows:
The errors around 5 and -5 were reduced, while the edge points in the far left and right remain inaccurate. Nonetheless, it is a worthwhile sacrifice to accurately represent the majority of the points toward the centre. The result is that the inaccuracies along the edges remain unaddressed.
The punchline : This red curve of mere 5 parameters is a simple function approximator, mapping input x to its corresponding output y. AlphaStar is also a function approximator, a complex neural network with many parameters, mapping a Starcraft scenario to how likely is it going to win. Both will be deadly accurate in scenarios that occur frequently during training, and both may have inaccuracies in scenarios that are rare or absent.
AlphaStar : Traps and Missing Pages
In Starcraft, as in any game, the goal is to write the book, perfectly identifying the basins of attractions (aka [value]) and the appropriate maneuvers (aka [policy]) in every scenario. AlphaStar approximates this perfect book with a neural-network, and improves it by exposing this book to different scenarios.
First, AlphaStar is trained to imitate human plays from a large body of replays; Second, AlphaStar refines what it imitated from the humans in a league system that actively preserves diversity of strategies. From reading between the lines in the blog post, I conjecture the following to be true:
- con. It is difficult to spontaneously develop diverse high-level strategies by playing against itself. Thus the need of imitating humans first.
- con. It is tricky to maintain diversity of strategies within the league. Strategies that are novel but low win-rate are explicitly kept around.
- pro. Surprisingly, the total number of steps for a game is about 3000, and as thus, directly reward an agent with +1 for winning and -1 is sufficient.
- pro. With ~200 years of grinding (per agent), the micro engagement and battle is superhuman level, effectively traps for humans.
- pro. Rather than a single agent, AlphaStar is a stable mixture of different agents, so it packs the surprise in terms of opening strategies.
- con. It seems the agent does not plan (in a model-based [MCTS] sense), which makes adapting and pivoting strategies on the fly difficult.
The following would be my personal opinions on how to fight. Simply, avoid familiar scenarios that were exposed to AlphaStar during its training (traps) and try to come up with weird game scenarios that it has no exposure to.
Traps to Avoid, Instant Losses:
- Micro battles. Curb your ego, it’s 200 years better than you.
- Early game gambles that relies on execution such as proxy (see 1).
- Overestimate your army strength: its 5 units are stronger than your 10.
Traps to Set:
- Long games. All games start identical, but diverge as time goes on. Thus long games contain more opportunities for the AI to [act astonished].
- Mind games. AlphaStar is unlike AlphaGo because it is [reactionary] rather than [planning]. And its different openings a [gambit] of multiple reactionary agents. It cannot adapt on the fly as well as a human can.
- Sneaky games. Although repeatedly emphasised as an important aspect of Starcraft, the AlphaStar is bad at scouting (as a consequence of its reactionary nature). Hide things from it to give it a nasty surprise later.
The perfect win for AlphaStar is a predictable, overwhelming victory of executions on a refined strategy. The perfect win for human is coaxing the game to uncharted territories until AlphaStar becomes astonished.
As we do not chop off tall people’s legs when they play basketball, perfect execution could be deemed as a natural talent of AlphaStar, and focusing on it is bit dogmatic. There’s more strategic depth to Starcraft than brute-forcing a bunch of blink stalkers around and winning every game. Unlike Chess and Go, where the AI’s book is dense and packed with traps, AlphaStar is still missing many pages its book: adaptability and creativity. It would be such a shame to declare Starcraft “solved” when these defects are still glaringly present.
The danger of AI is not that it will be more capable than us, but us projecting human sentiments onto the machines as we do with stuffed animals.
Well I’d really like to thank you for reading this far, high five !