In 1992, IBM announced another major step in developing artificial intelligence through games: A program written by Tesauro had taught itself to play backgammon well enough to compete with professional players. That year, TD-Gammon, as it was known, went 19–19 in 38 games at a World Cup of Backgammon event — a far better performance than any backgammon program up to that point.
In some ways, TD-Gammon was an electronic brain as much as it was a computer program. It represented an early example of a neural network, a computer application comprising nodes and connections, modeled after the human brain’s neurons and synapses. The TD-Gammon network “learned” through a temporal difference algorithm, which used a delayed reinforcement approach to reward the system for a successful game.
The TD algorithm, which was designed to mimic the way humans learn, was conceived by computer scientist Richard Sutton of GTE Laboratories. But Tesauro was the first to apply it on such a large scale. He chose backgammon with the mindset that the game’s clean-cut rules and criteria for success would help the network learn through trial and error. Running on an IBM RS/6000 workstation, TD-Gammon played approximately 300,000 games against itself over the course of a month — about three times as many as most backgammon masters play in a lifetime.
After each turn’s roll of the dice, TD-Gammon considered every legal move and estimated the probability that it would lead to a win. These estimates were based on the connection-strength values stored in the synapses of the neural network; if a move or series of moves had previously featured in a winning game, it got more weight in the probability estimate. These weights were adjusted after each game, enabling TD-Gammon to “learn” from wins and losses.
With 30,000 artificial synapses, TD-Gammon had the brain power equivalent of a sea slug. It was powerful enough to compete at backgammon, but its abilities were dwarfed by the game-learning system that came after it: Deep Blue.