KubeCon NA 2023: Ishan Sharma on Real-Time Generative AI for Gaming Apps Running on Kubernetes

0
404

Kubernetes provides an excellent platform for gaming applications using generative artificial intelligence (GenAI) for both game development and gameplay. Ishan Sharma from Google spoke at the recent KubeCon CloudNativeCon NA 2023 Conference about real-time GenAI inference integrated with distributed game servers running on Kubernetes.

With the launch of ChatGPT and Bard, the term “GenAI” has become mainstream, not just in the technical community. Over the last decade, AI and ML technologies have been steadily improving. AI has been beating humans in perception tests in domains such as handwriting recognition, speech recognition, image recognition, reading comprehension and language understanding. The generative capabilities of AI have also improved a lot in the last nine years, from generating pixelated black-and-white images in 2014 to very realistic images in just three years (2017). By 2021, text-to-image generation using prompts was possible.

GenAI offers a lot of support for online gaming-based applications. With the help of a chart showing global GenAI prediction in the gaming market from 2022-2032, Sharma said GenAI is being used in game development use cases first and will be eclipsed by new game experiences such as smart non-player characters (NPC), level generation, image enhancement, scenarios and stories.

In game development, the applications of GenAI are boundless: create art assets, auto-generate game code, life-like conversations with bots, and generate levels from player input. Generative AI is evolving the games industry and will transform live service games into living games, from boxed software games in the past to live service games today to the near future “living games”. In living games, three aspects – Developer, Game, and Player – will interact with each other to enrich the user experience. Here, the game developers must develop AI responsibly and safely by protecting intellectual property while respecting the player’s privacy and safety.

Classification of GenAI use cases in games includes two categories: improving productivity during game development and improving player experience during gameplay.

In the game development phase, we can use GenAI to accelerate time-to-market by creating content and simplifying development. This includes the development of game assets such as characters, props, audio and video. Turnkey APIs like VertexAI, Amazon’s Sagemaker, and ChatGPT can help in this category.

In the second category, the run-time gameplay phase, we can use AI/ML & GenAI to adapt the gameplay and empower players to generate game content in real-time. These capabilities include smart NPCs (bots), dynamic in-game content, and customized player experiences. GenAI during gameplay brings demanding requirements like low latency, high performance, fast scalability, and low cost. Runtime gameplay environments can use platforms like Google Kubernetes Engine (GKE) to host gaming apps.
 
Based on user research that his team conducted across SME’s in the gaming industry, Sharma discussed user pain points for GenAI in games in three different categories: platform, AI maturity, and Gameplay.

In the platform category, we need at-scale cost efficiency to ensure financial feasibility for popular (AAA) games. Also, for a seamless player experience, low latency and lag are essential to ensure smooth gameplay. Lag can hurt the success of games where even sub-second latency is not acceptable. The platforms with performance and access to run state-of-art models without vendor lock-in will drive the platform decisions.

LLM Unpredictability is a big concern for the pain points in the AI maturity category. We need a coherent, relevant, and contextually appropriate inference over and over again that’s repeatable. The models should not promote AI biases and stereotypes. Content filtering and moderation are needed to ensure a safe and inclusive gameplay environment for the players.

In the third gameplay category, we must balance user-generated content with game lore & structure (creativity). Some games need content for gameplay, which LLMs filter out, so we must consider the GenAI constraints. Also, procedural generation with GenAI still requires human supervision in the near future as we continue to evolve with GenAI and LLM’s.

Sharma mentioned Kubernetes is a good computing solution for games as it solves the majority of IT operations problems like scheduling, health-checking, deployment methods, autoscaling and rollbacks, centralized logging & monitoring, declarative paradigm and primitives for isolation. But the challenge is that Kubernetes, on its own, does not understand how game servers work. For game servers, we need additional capabilities like maintaining in-memory state, starting and shutting down game servers on demand and protecting the running servers from shutting down (even for upgrades!), which will result in a poor player experience. 

Agones open source framework can help with these game server scaling and orchestration requirements. It was developed in 2017 with a partnership between Google and Ubisoft. Agones makes it possible to get all the benefits of Kubernetes operations, but now for game servers as well, including better understanding of game matches and sessions, seamless scaling with player loads, multiple UDP/TCP ports per node and hot-spares with tunable warm-up parameters.
 
Sharma discussed the high-level architecture of a live service game with a use case of a multi-player-based game session. Core components of the solution, like Game Frontend, Matchmaker Service to direct the player to connect to a dedicated server where they can connect with other players in a shared environment and shared experience, and Player Profile Service, can all be hosted on a Kubernetes cluster. Game servers also run on K8s and are orchestrated by Agones.

When it comes to integrating GenAI inference with game servers, development teams have a few different options. Similar to game development options, turnkey solutions like VertexAI, Sagemaker, and Stable Diffusion API can be used for gameplay environments.

The second approach is a DIY solution with k8s where dedicated GenAI Inference servers would run on Kubernetes Nodes. These servers can leverage infrastructure hardware options like GPU’s or high-performance CPU’s. 

Another approach is to run GenAI inference servers as sidecar components within the same pod where a dedicated inference server is needed for each Game Server. The underlying hardware is optimal for both Agones Game Server and the GenAI Inference Server. When choosing any of these options, the teams should find the right balance between raw performance and cost.
 
He discussed the advantages of different options in integrating GenAI inference with game servers. Advantages of using a turnkey solution include out-of-the-box game development use cases and improving time-to-value. Some specific models are only available through Turnkey APIs, and not openly available where you can containerize them.

For DIY solutions with Kubernetes for GenAI in games that include openly available models that can be run in containers, k8s can be more cost-effective than pay-per-use APIs for high-usage scenarios (game launches where you see an influx of a lot of concurrent users in a short amount of time). Also, dedicated inference k8s nodes are easily set up with k8s features such as horizontal pod autoscaling (HPA) and scheduling with taints/tolerations.

They ran tests using Stable Diffusion (for image generation) and Bloom (text generation). A slightly better performance was observed when using the sidecars. In general, inference latency overpowers any difference between different Kubernetes deployment methods. Dedicated inference k8s nodes provide the most versatility, ease of use, and flexibility.
 
In his conclusion of the talk, Sharma highlighted the advantages of using Kubernetes for GenAI in games in the areas of portability, flexibility, scalability & performance, and cost & efficiency. There is also a decent ecosystem of frameworks from which to choose, which includes frameworks like Spark, Beam, Dask, Ray, Rapids, and XGBoost.
 
Sharma ended the presentation with a demo of integrated GenAI into a multiplayer game with real-time image generation. The demo app is hosted on GKE’s GenAI inference cluster on Google Cloud and uses a dedicated nodes option. There is a GenAI API component that routes traffic to different models. The logic layer consists of NPC for test pre/post-processing for dialogue, image generation logic that handles image pre/post-processing, and VertexAI services for LLM pre/post-processing and talks to VertexAI LLM endpoints. In terms of models, LLAMA 2 model was used for text generation and Stable Diffusion was used for image generation.
 
For more information on KubeCon NA 2023, check out the conference website and the complete program schedule, as well as Data and AI/ML specific session catalog.
 



Source link