People struggle to distinguish between humans and ChatGPT in five-minute text conversations: Study

Produced by: Tarun Mishra Designed by: Manoj Kumar

Turing Test Experiment

Researchers from UC San Diego conducted a study using a Turing test to assess whether people can distinguish responses from ChatGPT and humans in conversations.

Research Origin

The idea stemmed from a class led by Cameron Jones, co-author, where they explored if large language models (LLMs) like GPT-4 could exhibit human-like intelligence.

First Study Findings

Initial results indicated that GPT-4 could pass as human in about 50% of interactions, prompting further controlled experiments.

Refining the Experiment

To improve accuracy, they designed a two-player online game where participants interacted with either humans or AI witnesses (GPT-4, GPT-3.5, and ELIZA).

Game Dynamics

Participants engaged in five-minute conversations with witnesses and then guessed whether each witness was human or AI based on their interactions.

Model Performance

Users could often distinguish ELIZA and GPT-3.5 as machines, but their ability to differentiate GPT-4 from humans was no better than random chance.

Human vs. AI Persuasion

Real humans successfully convinced interrogators they were human two-thirds of the time, highlighting challenges in distinguishing AI from humans.

Real-world Implications

The study suggests that in practical scenarios, people may struggle to reliably discern whether they are conversing with a human or an AI system.

Future Research

Continued exploration into AI-human interaction dynamics aims to understand the implications of advanced LLMs like GPT-4 on social and technological fronts.