AI Learns to Play Tag (and breaks the game)

42
609


Title: How Artificial Intelligence Learns: The Battle between Albert and Kai

Subtitle: Witness the evolution of two AI entities as they learn to excel at Tic-Tac-Toe

Introduction:

Witness an epic battle unfold between two AI entities, Albert and Kai, as they embark on a journey of learning. In this simulated game of Tic-Tac-Toe, Albert is the player while Kai acts as the game controller, ensuring the learning process is proceeding correctly. Initially, they both lack any knowledge, not even understanding the game’s rules. However, being artificial intelligence, the more they play, the better they become. Soon enough, they develop clever strategies to outsmart each other, ultimately leading to a thrilling final showdown.

Paragraph 1:

Albert, initially a novice, starts off as the player trying to win against Kai, the game controller. Punishment awaits them for their losses. However, as they play more and more, their skills rapidly improve. Astonishingly, after only 51 attempts, Kai learns to mark Albert! Well done, Kai! With the reward of winning, it doesn’t take long for Kai to dominate every game, subjecting Albert to constant punishment. But Albert, quick to adapt to avoid punishment, learns how to dodge Kai’s moves, although still not as skilled as his opponent.

Paragraph 2:

Once Albert becomes a fully-fledged competitor, they engage in the ultimate battle. “Albert, don’t underestimate me,” warns Kai. Surprisingly, Albert’s decision to jump on Kai’s offers helps him escape punishment. However, Kai quickly adapts to become faster, marking Albert before the latter can react. At this rate, Albert is doomed to lose the final battle. Let’s speed things up.

Paragraph 3:

After just one week of training, Kai effortlessly marks Albert on the ground. But what about Albert’s evasive strategy? Kai is learning much faster, throwing Albert off balance. Another week of training goes by, and the question arises: Has Albert found a way to escape? It seems the walls were not high enough, but Albert has learned how to exploit a loophole, allowing him to win. Let’s make things more exciting!

Paragraph 4:

Now, our contenders have new additions to their game—the blocks and wall. The introduction of these elements adds complexity to their strategies. With the room looking different, both Albert and Kai are perplexed. They are not performing well. They encounter unfamiliar data when interacting with new objects, making it difficult to make the right moves. Yes, you can grab the blocks! Let’s see what Albert can do with them. No, Albert, use them to avoid Kai! Ah, he’s using them correctly! Kai is no longer confused by the wall. Great job, Albert! But Kai won’t let you win for long.

Paragraph 5:

Albert has now mastered running circles to escape Kai’s attempts to mark him. Kai, what happened to your skills in room number 1? Go, Kai! Kai returns to his old tactics, struggling with the wall once again. Meanwhile, Albert’s performance with the cube is improving. You make it too easy for Albert, Kai! With Brilliant’s interactive lessons, you can build real knowledge in just a few minutes a day. Now, after Albert’s victory, let’s see what Kai can do…

Paragraph 6:

Kai has become more aggressive, grabbing whatever he can and attacking Albert. Surprisingly, these tactics work! It’s incredible how you learn through practice, just like Albert and Kai. Albert stumbles near the edge! Kai is unsure how to react. Now, Kai feels frustrated. Brilliant offers thousands of interactive lessons on mathematics, data analysis, programming, and artificial intelligence. You guys, don’t miss out!

Paragraph 7:

With Albert on the edge, it becomes challenging for Kai, but he eventually finds a way to corner Albert. It appears Albert’s strategy has caused maximum chaos. Let’s make this even more interesting. Are you tossing cubes at Kai? Did you manage to kick Kai out of the room? It seems Albert’s strategy disrupts Kai once again. While Albert has become remarkably adept at handling the cube, where did the cube disappear to, Albert? Calm down, Kai. Not again, Albert. Escape with the cube should be impossible. Let’s see if Albert can keep doing it continuously. And there he goes. Again and again, Albert breaks the game to escape Kai’s clutches. Congratulations, Albert!

Conclusion:

After over two months of training, both Albert and Kai have become excellent players, pushing each other to their limits. In the end, they both prove to be formidable opponents, culminating in a battle that leaves spectators in awe. It’s now down to 1 versus 5! Congratulations, Albert! Get ready to witness their final fight.

5 Questions and Answers:

1. How do Albert and Kai improve in the game of Tic-Tac-Toe?
– Albert and Kai improve by playing the game repeatedly, learning from each move and developing more intelligent strategies.

2. Can Albert escape Kai’s punishment and win against him?
– Yes, Albert learns to dodge Kai’s moves and eventually develops skills to beat him.

3. What does Brilliant offer to enhance learning in various fields?
– Brilliant provides interactive lessons in mathematics, data analysis, programming, and artificial intelligence to help users build real knowledge in minutes per day.

4. How do the new additions of blocks and walls affect Albert and Kai’s strategies?
– The introduction of blocks and walls complicates their strategies, as they encounter unfamiliar data and struggle to adapt to the changes.

5. Who becomes the ultimate winner in the battle between Albert and Kai?
– After a fierce and thrilling battle, Albert emerges as the ultimate winner by outsmarting Kai with his innovative tactics.

42 COMMENTS

  1. More information about how Albert and Kai were trained:

    Time it took to train :

    Room 1: 12h 30m (though I stopped the recording after Albert broke the game)

    Room 2: 13h 40m

    Room 3: 1d 20h 2m

    Final Battle: 6h 48m (this wasn’t shown but was needed since the agents weren’t used to seeing other teammates)

    We continue training on top of the previous brains, meaning by the end of the video Albert and Kai both have trained for 3 days and 5 hours

    Thank you so much for watching! These videos take a lot of time and money to make, which is why I recently enabled channel memberships! By becoming a member, your name can be in future videos, you can see more behind-the-scenes things that don’t fit in the regular videos, you can also use stickers of Albert, Kai and some other characters our team made in comments (more coming) 😀

    NOTES

    When I mention it took x days to train, that’s in game time, and much larger than the displays indicate since there are 200 copies training simultaneously.

    This is a very long comment going over more of the details of how Albert and Kai works, issues they’ve had, unexpected results etc.

    THE BASICS:

    Albert and Kai were trained using reinforcement learning, meaning they were rewarded for doing things correctly and punished for doing them incorrectly (the reward is just increasing their score, and the punishment is decreasing it). After they finish each attempt, the actions they took are analyzed and the weights in their neural networks (brains) are adjusted using an algorithm called MA-POCA to try to prioritize the actions that led to the most reward. The agents start off making essentially random decisions until Kai accidentally tags Albert in the first room and is rewarded, then, as mentioned above, the weights in his neural network brain are adjusted in order to try to replicate that reward (it wasn’t this simple for this video since we use self-play to train both agents at the same time, more on that later). This leads to Kai learning that tagging Albert is good, and since Albert is punished when he’s tagged, it also leads to Albert learning that getting tagged by Kai isn’t good. This process continues through 10s of millions of steps until one of the agents consistently loses, or the agents are able to counter each other well enough to where it’s a draw.

    REWARD FUNCTION:

    Albert and Kai are given two types of rewards, group rewards and individual rewards. When Albert gets tagged he’s punished by getting a -1 group reward and Kai is rewarded by getting a +1 group reward and vice versa, encouraging Kai to tag Albert, and Albert to avoid being tagged by Kai. Additionally, Albert is given an individual reward of 0.001 for each frame he’s alive (0.6 total in a room lasting 10s), and Kai -0.001, to encourage Kai to try to tag Albert as quickly as possible. When we introduce the grabbable cubes we also give Albert an individual reward of +1 the first time he picks up the cube to make sure Albert actually starts using the cube (since without this, the rewards were too infrequent for Albert to learn to use it effectively).

    BRAIN:

    Albert and Kai’s brains are neural networks with 4 layers each (one input layer, 2 hidden layers and one output layer).

    The agents collect information about the scene through direct values and raycasts. Every 5 frames they’re fed data about their position in the room, the opponent’s position, velocity, direction etc., and they also collect information through raycasts (a simplified version of eyes). The agent's eyes (raycasts) can differentiate between walls, ground, moveableObjects and Kai/Albert.

    The agents' brains (neural networks) are given the data the agents collect from direct values and raycasts and use them to predict 4 numbers for the respective agent which control how that agent moves. An example of an output of one of the neural networks is: [1, 2, 0, 1], this would be interpreted as [1=move forward, 2=turn right, 0=don’t jump, 1=try to grab], so the agent being controlled by this neural network would try to move forward while turning right and grabbing.

    The fact that we have two agents training simultaneously complicates things a bit, normally we’re able just update the agents brains every x steps, but if we did that for both brains at the same time then they would struggle developing multiple strategies, since reinforcement learning tends to be best at finding a single solution, that would lead to the winner dominating and the loser stuck doing the same strategy over and over. The way we tackle this issue is by using something called self-play. Since we use self-play, we technically only train one agent at a time, and swap which is being trained every 100k steps. When we’re training Albert, we use a recent model of Kai’s brain as his opponent, and to avoid there only being one strategy, we store 10 recent brains to use as opponents, swapping them out every couple thousand steps so that Albert learns to beat all of them and not just one. This results in a much more general AI that’s hard to exploit.

    UNEXPECTED BEHAVIORS:

    In room 1 Albert manages to break out of the room by exploiting a small hole in the hitbox near the top of the room, which was there because I didn’t make the hitboxes on the walls tall enough. Though Albert used it to escape, I’m not convinced he actually would learn to do it consistently. The challenge with this video is that it can be difficult to interpret the agent’s behaviors; Albert could be making certain unexpected moves as a way to exploit Kai’s poorly trained brain to get him to make bad moves, or Albert could just be making these unexpected moves because he hasn't trained enough. Albert was able to find the hole a few times, however he wasn’t able to do it consistently, this could be from either him not training long enough, his observations not making it easy to detect when he can jump out, or Kai quickly learning to counter him getting to the display in time.

    In room 2 Albert also manages to glitch out of the room, and he was able to do this consistently. We made sure the cube grabbing functionality was coded as rigorously as possible, even with it automatically detaching the grab if the force exerted is too high, I couldn’t find a single way of exploiting it in testing, but Albert certainly didn’t have issues finding it.

    Albert also had a couple moments of throwing the cubes at Kai and spinning with the cube to throw Kai out of the room, we didn’t even consider this being a possibility before training, AI’s able to come up with some really clever solutions to problems.

    OTHER

    Thank you so much to our amazing team that helped make this video! Jonas helped with setting up the character controls, Tyler helped create the clean grabbing functionality, Catt helped edit and Andrew and Steve helped solve any issues we ran into while making the video. If you want to meet our team and talk to all of us, join our discord server!:) https://discord.gg/qDRtuFe5gp