There are so many ways you can have a text chat with a large language model, from ChatGPT to Google Bard or MLC LLM, a local chatbot that can run on your phone. The next frontier for AI is bringing the power of LLMs to NPCs (non-player characters) in games where, instead of having a canned set of interactions, you can have a wide-open conversation.
During its Computex 2023 keynote, Nvidia CEO Jensen Huang unveiled ACE for Games, an AI model foundry service, designed to bring game characters to life using natural language conversation, audio-to-facial-expression and text-to-speech / speech-to-text capabilities. Huang showed a game demo where an NPC named Jin, who runs a ramen noodle shop, interacted with a human player who was asking questions with voice and getting back real-sounding answers that match the NPC’s backstory.
In the demo, the gamer (named Kai), walks into Jin’s Ramen shop, asks him how he’s doing (in voice) and has a conversation about the fact that the area has a high crime rate. Kai asks if he can help and Jin responds saying that “if you want to do something about this, I have heard rumors that the powerful crime lord Kumon Aoki is causing all sorts of chaos in the city. He may be the root of this violence.” Kai asks where to find Aoki and Jin tells him, setting the user off on his quest.
Nvidia ACE for Games will offer high-speed access to three different components that already exist. The first, Nvidia NeMo, is an AI framework for training and deploying LLMs and it includes NeMo Guardrails, which is designed to prevent inappropriate / “unsafe” AI conversations. Presumably, this would stop NPCs from answering inappropriate, off-topic prompts from users. Guardrails also has security which should prevent users or would-be prompt injectors from “jailbreaking” the bots and getting them to do bad things.
Nvidia Riva is the company’s speech-to-text / text-to-speech solution. In the ACE for games workflow, a gamer will ask a question via their microphone and Riva will convert it to text which is fed to the LLM. The LLM will then generate a text response which Riva turns back into speech that the user will hear. Of course, we’d expect games to also show the responses in text. You can try Nvidia Riva’s speech-to-text and text-to-speech capabilities yourself on the company’s site.
Nvidia Omniverse Audio2Face provides the last step in the ACE for games workflow as it allows the characters to have facial expressions that match what they’re saying. The company currently offers this product in beta and you can try it here.
The demo, which is called Kairos, was designed by Convai, an AI-in-gaming startup that’s part of Nvidia’s Inception program that connects up-and-coming companies with venture capital. On the company’s site, it offers a toolset that allows game developers to build lifelike NPCs with complex backstories.
The company has a great explainer video about how its tools work and what they can do. In the video, you can see players talking to NPCs and asking them to do things that involve actual objects and other characters in the game.
For example, in the video, a player asks an NPC to hand him a gun that’s sitting on a table and the NPC complies. In another part of the video, the player asks a soldier NPC to shoot at a target that’s located in a particular place. We also see how Convai’s tools make this all possible.
Having that added context so that the NPC is aware of what’s going on in-game is so important. Recently, we tested a Minecraft AI plugin that allows you to talk to NPCs in that game, but the NPCs have no situational awareness at all. We were able to continue a conversation with a sheep after we had killed it (and it didn’t know it was dead), for example.