[toc]Nearly two years ago an AI program named Claudico designed by Carnegie Mellon University researchers played 80,000 heads-up hands of no-limit hold ’em against four poker pros.
At the end of the tournament, the “robot” was down more than $700,000.
Not to be deterred, CMU’s researchers built another hold ’em AI named Liberatus.
The key to not dumping hundreds of thousands of virtual dollars on the competition this time around? Achieving the perfect balance of risk and reward during the run of play, researchers told The Verge’s Ben Popper this week.
They feel confident they can find success because poker, unlike chess and Go, is a zero-sum game, which refers to the fact that, in poker, the total money on the table (assuming there is no rake) never changes because the sum of all the wins at the table is equal to the sum of all the losses.
“In these two-player zero sum games, if the other player doesn’t play a Nash equilibrium strategy, that means they are playing worse, and we are making more money,” CMU Professor Tuomas Sandholm told Verge. “In such games, playing Nash equilibrium is safe. It has the flavor where it plays rationally and is not exploitable anywhere.”
With claims of a tie the first time, anticipation builds for Liberatus
The first run for CMU’s AI hold ’em program didn’t go so well, according to how the chips were distributed at the end of the marathon heads-up session at Pittsburgh’s Rivers Casino.
While the chips spoke of one narrative, the statistics, researchers said, spoke an entirely different story. Sandholm said that they could not definitively prove that humans were better than the computer, and that, if they played again, the dollar results might be different.
Their conclusion? The first go-round was a statistical tie because researchers couldn’t say with full certainty that humans were better.
Liberatus is no Deep Blue, and there’s a good reason why
The concept of AI programs vanquishing human opponents became a reality in 1997 when IBM’s Deep Blue defeated chess legend Gary Kasparov in a much-ballyhooed showdown.
This past year, the robots dealt another blow to human hubris when Google’s AlphaGo beat guru Lee Sedol 4-1 in Go, the ancient board game seen in A Beautiful Mind.
While these two games are considered the pinnacle of what are known as complete-information games (past moves are known and future moves are numerous but limited), poker is an incomplete-information game, and thus the possibilities for moves and outcomes is fantastically complex.
Since a specific set of moves can’t be planned or calculated in advance of a play, researchers had to adopt a different strategy: risk-reward equilibrium.
As far as what we can expect from Liberatus, the poker pros who battled Claudico say the unexpected will surely appear: random donk bets, limping in, and other moves normally viewed as rookie mistakes.
If all this sounds like an elaborate publicity stunt, don’t be fooled. Go experts say it was AlphaGo’s use of unconventional strategies that served as the catalyst for its victory over Sedol.