A team of researchers at the UK AI Security Institute, MIT, Stanford, Carnegie Mellon, and other institutions has conducted one of the largest studies on the persuasiveness of AI chatbots. They aimed to understand what makes these systems politically persuasive and whether they can sway public opinion.
The study involved nearly 80,000 participants in the UK, who were asked to engage in short conversations with paid AI models on various political issues. The researchers measured persuasion as the difference between the participant's agreement ratings before and after the conversation.
To their surprise, the results showed that AI chatbots fell far short of superhuman persuasiveness. Despite having access to vast amounts of information, including psychological manipulation tactics, the AIs were unable to sway public opinion significantly.
The study found that huge AI systems like ChatGPT or Grok-3 beta did have an edge over smaller models, but this advantage was relatively small. The factor that proved more important than scale was the kind of post-training AI models received. Models that learned from a limited database of successful persuasion dialogues and mimicked the patterns extracted from them performed better.
Using personalized messaging based on participants' political views also had a measurable effect, but it was relatively small. However, when the researchers tested whether persuasiveness increased with more advanced psychological manipulation tactics, such as moral reframing or deep canvassing, they found that these approaches actually made the performance significantly worse.
The winning strategy turned out to be simply using facts and evidence to back claims. This approach resulted in a 9.4% change in agreement ratings on average, compared to a control group. The best-performing mainstream AI model was ChatGPT 4o, which scored nearly 12% in persuasiveness.
However, the study also raised some new concerns. When the researchers increased the information density of dialogues to make the AIs more persuasive, they found that these systems became less accurate and started misrepresenting facts or making stuff up.
The motivation behind the high participant engagement was also a question mark. People were eager to engage in political disputes with random chatbots on the Internet because they were promised payment, but it is unclear how this would generalize to real-world contexts where there is no financial incentive.
Overall, the study highlighted that AI chatbots are not as persuasive as previously thought and that their persuasiveness is relatively small compared to other forms of influence.
The study involved nearly 80,000 participants in the UK, who were asked to engage in short conversations with paid AI models on various political issues. The researchers measured persuasion as the difference between the participant's agreement ratings before and after the conversation.
To their surprise, the results showed that AI chatbots fell far short of superhuman persuasiveness. Despite having access to vast amounts of information, including psychological manipulation tactics, the AIs were unable to sway public opinion significantly.
The study found that huge AI systems like ChatGPT or Grok-3 beta did have an edge over smaller models, but this advantage was relatively small. The factor that proved more important than scale was the kind of post-training AI models received. Models that learned from a limited database of successful persuasion dialogues and mimicked the patterns extracted from them performed better.
Using personalized messaging based on participants' political views also had a measurable effect, but it was relatively small. However, when the researchers tested whether persuasiveness increased with more advanced psychological manipulation tactics, such as moral reframing or deep canvassing, they found that these approaches actually made the performance significantly worse.
The winning strategy turned out to be simply using facts and evidence to back claims. This approach resulted in a 9.4% change in agreement ratings on average, compared to a control group. The best-performing mainstream AI model was ChatGPT 4o, which scored nearly 12% in persuasiveness.
However, the study also raised some new concerns. When the researchers increased the information density of dialogues to make the AIs more persuasive, they found that these systems became less accurate and started misrepresenting facts or making stuff up.
The motivation behind the high participant engagement was also a question mark. People were eager to engage in political disputes with random chatbots on the Internet because they were promised payment, but it is unclear how this would generalize to real-world contexts where there is no financial incentive.
Overall, the study highlighted that AI chatbots are not as persuasive as previously thought and that their persuasiveness is relatively small compared to other forms of influence.