The latest release of MLPerf Inference introduces new LLM and recommendation benchmarks, marking a leap forward in the realm of AI testing.

The v3.1 iteration of the benchmark suite has seen record participation, boasting over 13,500 performance results and delivering up to a 40 percent improvement in performance. 

What sets this achievement apart is the diverse pool of 26 different submitters and over 2,000 power results, demonstrating the broad spectrum of industry players investing in AI innovation.

Among the list of submitters are tech giants like Google, Intel, and NVIDIA, as well as newcomers Connect Tech, Nutanix, Oracle, and TTA, who are participating in the MLPerf Inference benchmark for the first time.

David Kanter, Executive Director of MLCommons, highlighted the significance of this achievement:

“Submitting to MLPerf is not trivial. It’s a significant accomplishment, as this is not a simple point-and-click benchmark. It requires real engineering work and is a testament to our submitters’ commitment to AI, to their customers, and to ML.”

MLPerf Inference is a critical benchmark suite that measures the speed at which AI systems can execute models in various deployment scenarios. These scenarios span from the latest generative AI chatbots to the safety-enhancing features in vehicles, such as automatic lane-keeping and speech-to-text interfaces.

The spotlight of MLPerf Inference v3.1 shines on the introduction of two new benchmarks:

  • An LLM utilising the GPT-J reference model to summarise CNN news articles garnered submissions from 15 different participants, showcasing the rapid adoption of generative AI.
  • An updated recommender benchmark – refined to align more closely with industry practices – employs the DLRM-DCNv2 reference model and larger datasets, attracting nine submissions. These new benchmarks are designed to push the boundaries of AI and ensure that industry-standard benchmarks remain aligned with the latest trends in AI adoption, serving as a valuable guide for customers, vendors, and researchers alike.

Mitchelle Rasquinha, co-chair of the MLPerf Inference Working Group, commented: “The submissions for MLPerf Inference v3.1 are indicative of a wide range of accelerators being developed to serve ML workloads.

“The current benchmark suite has broad coverage among ML domains, and the most recent addition of GPT-J is a welcome contribution to the generative AI space. The results should be very helpful to users when selecting the best accelerators for their respective domains.”

MLPerf Inference benchmarks primarily focus on datacenter and edge systems. The v3.1 submissions showcase various processors and accelerators across use cases in computer vision, recommender systems, and language processing.

The benchmark suite encompasses both open and closed submissions in the performance, power, and networking categories. Closed submissions employ the same reference model to ensure a level playing field across systems, while participants in the open division are permitted to submit a variety of models.

As AI continues to permeate various aspects of our lives, MLPerf’s benchmarks serve as vital tools for evaluating and shaping the future of AI technology.

Find the detailed results of MLPerf Inference v3.1 here.

(Photo by Mauro Sbicego on Unsplash)

See also: GitLab: Developers view AI as ‘essential’ despite concerns

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with Digital Transformation Week.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

  • Ryan Daws

    Ryan is a senior editor at TechForge Media with over a decade of experience covering the latest technology and interviewing leading industry figures. He can often be sighted at tech conferences with a strong coffee in one hand and a laptop in the other. If it’s geeky, he’s probably into it. Find him on Twitter (@Gadget_Ry) or Mastodon (

    View all posts

Tags: ai, artificial intelligence, benchmark, gpt-j, inference, large language model, llm, machine learning, mlcommons, mlperf, mlperf inference, testing


Please enter your comment!
Please enter your name here