Using a large-scale dataset holding a million real-world conversations to study how people interact with LLMs

A team of computer scientists at the University of California Berkeley, working with one colleague from the University of California San Diego and another from Carnegie Mellon University, has created a large-scale dataset of 1 million real-world conversations to study how people interact with large language models (LLMs). They have published a paper describing their work and findings on the arXiv preprint server.

Over the past few years, LLMs such as ChatGPT have burst into the public realm, giving users across the world an opportunity to interact with chatbots backed up by artificial intelligence. Such access has led to millions of “intelligent” conversations between humans and chatbots, resulting in not only discussions, but assistance with activities like programing, text writing and test taking.

In this new study, the research team wanted to know what sorts of interactions are occurring with AI chatbots by category percentages, for example, what percentage of such conversations are about programing or a related topic. To find out, they obtained the texts of more than 1 million real-world conversations between people and their AI chatbots (25 of them) and then parsed them by subject type.

The conversations were global in nature, involving people and their chatbots speaking 150 languages. To learn more about the nature of such conversations, the researchers used a program to randomly choose 100,000 of them for study.

The research team found that roughly half of all the AI chatbot conversations were centered on what they describe as “safe” topics, such as computer programming, requests for help in writing text, or even gardening—the most popular topic involved resolution of software errors and solutions.

They also found that approximately 10% of such conversations involved what they team describe as “unsafe” topics—those with sexual or violent content. They found, for example, many examples of people asking their chatbot to provide them with erotic stories or to engage with them in sexual role playing.

The researchers suggest studying real-world LLM/human conversations can help makers of such systems define the way they want their products to be used and also to find out how effective controls designed to prevent “unsafe” use of such products are working.

More information:
Lianmin Zheng et al, LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset, arXiv (2023). DOI: 10.48550/arxiv.2309.11998

Journal information:
arXiv

Citation:
Using a large-scale dataset holding a million real-world conversations to study how people interact with LLMs (2023, October 16)
retrieved 18 October 2023
from https://techxplore.com/news/2023-10-large-scale-dataset-million-real-world-conversations.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.

Unveiling AI Secrets: How the CAMIA Privacy Attack Exposes Model Memory!

Driving Innovation: Sanjay Kumar Singh Takes the Helm as Managing Director…

AI Revolution: Transforming the Financial Sector Today

New AI Apps Shield Rental Drivers from Fake Damage Fees

Using a large-scale dataset holding a million real-world conversations to study how people interact with LLMs

Post date:

Author:

Category:

INSTAGRAM

Popular Categories

Related Posts

Create Stunning AI Shorts for Free: 2025 Guide!

Unveiling AI Secrets: How the CAMIA Privacy Attack Exposes Model Memory!

Driving Innovation: Sanjay Kumar Singh Takes the Helm as Managing Director at Nemetschek India

EDITOR PICKS

POPULAR POSTS

Warning from OpenAI leaders helped trigger Sam Altman’s ouster

How to Sign In to ChatGPT: A Complete Guide

Google is increasing the features and availability of its AI-powered search.

POPULAR CATEGORY