Final board position Game 1. AlphaGo (White) vs Fan Hui (Black). AlphaGo wins by 2.5 points.
My fascination with Go began after I read an article about the four-stone handicap game between Crazy Stone, a computer program, and Yoda Norimoto, a 9 dan professional. Although Crazy Stone won, the four-stone handicap belittles this victory. Rémi Coulom, programmer of Crazy Stone, predicted hesitantly that computers might defeat humans without handicap after ten years. He said this in March 2014.
Two Different Algorithms
Final board position Game 2. AlphaGo (Black) vs Fan Hui (White). AlphaGo wins by resignation.
Last January 20, John Tromp announced that the number of legal positions in a 19×19 Go board is:
That is 171 digits for anyone who did not bother counting.
This number of possible combinations in Go prevents computer programs from solving the game by brute force, i.e. playing out subsequent moves until the end to find the best move.
Final board position Game 3. AlphaGo (White) vs Fan Hui (Black). AlphaGo wins by resignation.
Coulom improved the go playing strength of computers with an algorithm he called Monte-Carlo Tree Search (MCTS). In playing go, MCTS selects random moves from the current position and iterates subsequent moves until it reaches a value (final score). MCTS then selects the best move based on the values obtained from each simulated move. Read more about Crazy Stone and MCTS here. The number of random simulations MCTS can make limits its strength.
AlphaGo, on the other hand, uses an advance tree search and two neural networks: the policy network and the value network. Google’s DeepMind, the team behind AlphaGo, fed the neural networks with 29.4 million board positions from games between 6 to 9 dan human players in KGS. After learning from this data, AlphaGo’s policy network can choose a limited number of best moves. The value network then simulates up to a certain number of moves before evaluating the resulting position on the board or iterates it until the end of the game. This method minimizes the number of moves AlphaGo needs to test and does not require the program to simulate up to the end of the game always.
However, AlphaGo takes another step, in what DeepMind calls reinforcement learning, by playing against itself (10,000 sets of 128 parallel games per set per day). AlphaGo improves its neural network by learning the game on its own and developing its own strategies. This system however needs the Google Cloud Platform because of the imense computing power it uses. Learn more about how AlphaGo works here.
AlphaGo vs Fan Hui
Final board position Game 4. AlphaGo (Black) vs Fan Hui (White). AlphaGo wins by resignation.
AlphaGo and Fan Hui, three-time European Go champion, ranked above 600 in the world, played ten games, five official (1 hour main time plus 3x30s byoyomi) and five unofficial (3x30s byoyomi), last October 2015. In these ten games, Fan Hui only won two unofficial games. For the fifth official game, Younggil An’s game review showed the strong and weak moves by both players. Get the complete game record in sgf here.
Artificial and Human Intelligence
Final board position Game 5. AlphaGo (White) vs Fan Hui (Black). AlphaGo wins by resignation.
Most reactions on AlphaGo’s 5-0 win against a Fan Hui leaned on the supremacy of machines and artificial intelligence over the human mind. However, I think AlphaGo is proof of how amazing the human mind is.
AlphaGo trained using 160,000 games between strong Go players. I think professionals also become strong by studying their games and other games between top professionals. Maybe a thousand or so game reviews and the familiarity with good shapes, tesuji plays, and proper direction of play would be achievable within a lifetime. Humans will need more than a lifetime to study 160000 games. AlphaGo also played 1.28 million parallel games (if my understanding is correct) in a day against itself. A game of competitive Go on average lasts for more than an hour. Professionals play about 50 games a year. For a human to play a million games in a lifetime would be impossible. How many games did Fan Hui study and played to reach his current rating? I think much much less than what AlphaGo did.
Fan Hui, despite losing five of the formal matches, still won two unofficial games. Not to underestimate Fan Hui’s skills, but AlphaGo, with all the data and experience it accumulated, seems far from perfect nor near the level of the top Go professionals. AlphaGo will undergo an acid test when it plays against Lee Sedol, currently ranked 5th among professionals, in March 2016. The live stream coverage of the games will be available here.
DeepMind claims that AlphaGo plays with an “approach that is perhaps closer to how humans play.” Close, but maybe still lightyears away.
The human mind can learn Go in a more efficient way than how AlphaGo does. As humans, we have other things to think about (what to eat, what to wear, what to say to that beautiful girl, how much money to place in that investment) besides playing Go. AlphaGo dedicates its whole life and existence in thinking about Go, but Fan Hui has proven we can still win against such a machine.
AlphaGo is just a small step in a light year of journey for the AI’s attempt to mimic the human mind. The difference between AlphaGo and the human mind is like the difference of humans to God. As AI becomes more complex and better, the more amazed I am at the workings of our brain. Let us not forget too that AlphaGo is a product of the human mind. Even if AlphaGo beats Lee Sedol in March, I still think that AlphaGo’s victory will be the human mind’s victory.