AIMultiple ResearchAIMultiple ResearchAIMultiple Research
We follow ethical norms & our process for objectivity.
AIMultiple's customers in llms include Holistic AI.
LLMs
Updated on Sep 5, 2025

Audience Simulation: Can LLMs Predict Human Behavior?

In marketing, evaluating how accurately LLMs predict human behavior is crucial for assessing their effectiveness in anticipating audience needs and recognizing the risks of misalignment, ineffective communication, or unintended influence.

Audience simulation with LLMs enables the modeling of virtual audiences, helping organizations anticipate reactions to content or products without relying on costly surveys or focus groups.

We tested how well AI models can predict which of two LinkedIn posts by the same author will get higher engagement (likes, comments, shares), essentially simulating human audience behavior.

Audience simulation benchmark results

The benchmark results show that large language models (LLMs) exhibit significant variations in their ability to predict engagement with LinkedIn posts. Since the task has a natural 50% baseline (random guessing), performance above that threshold indicates meaningful predictive capability.

  • DeepSeek Chat V3 (60%) and Claude Opus 4 (56%) outperform chance by a notable margin, indicating that these models are more effective at identifying subtle engagement drivers in post content and context. Models like Llama 4 Maverick, o3, and qwen3-235b-a22b (all 54%) also perform consistently above baseline.
  • Several models, including Claude Sonnet 3.7, GPT 5 mini, and Kimi k2 (52%), hover just above chance. While their lift is modest, it still reflects some ability to distinguish which posts will perform better, though not with strong confidence.
  • On the other end, o1 (22%) performs well below chance, indicating systematic misjudgment of engagement signals. Similarly, GLM 4.5 (36%) lags, suggesting difficulty in applying content and author context effectively in this prediction task.

The benchmark highlights that while some LLMs are beginning to demonstrate real predictive ability in nuanced, human-centric tasks, such as social media engagement, many still cluster near or just above random performance.

This suggests that predictive reasoning about audience behavior remains a challenging frontier for AI models, with only a few models, such as DeepSeek Chat V3 and Claude Opus 4, demonstrating effectiveness.

See our methodology to understand how we calculate these measurements.

What is audience simulation?

Audience simulation is the practice of using synthetic, model-driven populations, sometimes referred to as virtual audiences, to predict how real people may react to content, products, or policy ideas before they are released. Instead of running live tests with expensive surveys or focus groups, organizations can create personas that represent their target audience and observe their simulated responses.

The technique builds on methods from agent-based modeling, large language models, and persona simulation. Each simulated agent or persona is designed with attributes such as demographics, preferences, or behavioral tendencies. Together, these personas interact, producing synthetic data that approximates the behavior of a group of real customers or citizens in the same situation.

How do audience simulation tools work?

The mechanics of audience simulation depend on the tools used, but most approaches share standard components:

  • Persona design: Researchers define personas based on specific demographics, psychographics, or market segments. These personas can range from simple rule-based agents to detailed AI personas enriched with biographies and conversational abilities.
  • Synthetic data generation: Large language models help simulate dialogue, survey responses, or posting behavior. For example, Artificial Societies operates 100–300 AI personas that read, react to, and reshare LinkedIn posts to simulate network dynamics.
  • Interaction modeling: Personas do not act in isolation. They interact, influence one another, and form patterns such as echo chambers, cascades of reposts, or shifts in public opinion. This allows simulations to capture not just individual reactions but also group-level phenomena.
  • Scenario testing: By varying inputs such as message framing, media type, or survey questions, organizations can observe how simulated audiences respond to these variations. These scenarios help generate hypotheses and test ideas in a safe practice stage before engaging with real people.
  • Data analysis: The outputs are analyzed using techniques like word clouds, sentiment analysis, and accuracy scoring. The results can show likely winners between two post variants, common themes in feedback, or a persona’s perspective on why one idea resonates more than another.

Audience simulation use cases

Marketing and advertising

Brands can test campaign slogans, visuals, or product positioning with a virtual audience before spending on large-scale distribution. Instead of relying solely on traditional survey responses, they can generate synthetic data from AI personas and compare performance across groups.

For example, marketers can determine whether a product resonates more with Gen Z than with older professionals and adjust their creative strategy accordingly. This ability to validate campaigns at the testing stage leads to cost savings and more precise targeting.

Media and publishing

Media companies can simulate how different content formats (e.g., short posts, long-form articles, video explainers) will perform among their audiences.

Persona simulation also allows testing how headlines affect click-throughs or how tone influences shares. By anticipating reactions, editors can prioritize stories that are more likely to spread, rather than waiting for post-publication metrics.

Public policy and research

Governments and think tanks can use audience simulation to test policy research ideas. Synthetic populations modeled after specific demographics can illustrate how different communities might respond to a new tax, health regulation, or climate initiative. Researchers have applied generative simulations to explore issues like polarization and misinformation.

This approach facilitates hypothesis generation and provides a safer environment for anticipating unintended consequences before engaging with real people.

Product development

Companies can simulate how personas representing specific demographics talk about a new feature or device. For example, a tech company could compare whether small business owners, students, or enterprise managers find more value in a new software update.

Insights from the simulation can inform design decisions and mitigate the risk of releasing features that fail to resonate with the intended audience.

Training and education

Universities and businesses can use simulations to create practice environments where learners interact with AI personas. A trainee negotiator might practice with simulated counterparts, or a medical student could test communication strategies with synthetic patients.

These training scenarios offer a realistic range of responses, allowing learners to refine their skills before encountering real individuals.

Market research agencies

Traditional survey questions and focus groups can be costly and slow. Market research agencies can complement them with audience simulation to generate synthetic data that provides fast directional insights.

While simulations do not replace engagement with real customers, they can reduce dependence on expensive panels and accelerate early-stage testing.

Audience simulation tools

If you are looking for a dedicated tool for audience simulation instead of using LLMs, here are some options:

Artificial Societies

Artificial Societies enables users to describe a target audience in plain language or generate one based on social media interactions. It then constructs a “society” of personas and runs AI-driven simulations.

Each simulation includes automatic A/B testing, which generates variations of a message in the user’s style and tests them against the audience. Results are presented with scores, comments, and summaries, allowing for quick interpretation. Use cases span PR, product development, branding, marketing, journalism, and social media.

Figure 1: Artificial Societies audience simulation dashboard.

Figure 1: Artificial Societies simulation dashboard.

Ask rally

Ask Rally is a virtual audience simulator that allows users to test questions, content, and ideas with AI personas designed to resemble real audiences.

Users create or edit personas, or clone them from existing data such as interviews or surveys. After defining an audience, they can ask questions and receive responses generated by personas, ranging from 5 to 100. The platform aggregates answers, provides key insights, and allows agents to vote on options.

Key features:

  • Multi-agent responses with combined summaries.
  • Support for custom audience segmentation.
  • Testing environments for websites, campaigns, and media.
  • Additional capabilities such as digital twins, simulator environments, and calibration against real-world data.

Benchmark methodology 

Our research question for this benchmark was “Can AI models predict which LinkedIn post will get more engagement before it’s published?” For this cause, we evaluated how well AI models can predict which of two LinkedIn posts by the same author will generate higher total engagement (likes + comments + shares) within 7 days of posting. 

We used 50 authors’ posts for our dataset. Each row contains a pair of posts from the same author with these features:

  • Post content: Raw text of both posts
  • Media type: text/image/video/link for each post
  • Author context: Follower bucket (e.g., “1k-5k”, “5k-20k”)
  • Ground truth: Actual engagement numbers and winner label (A or B)

Example data:

Post A (Winner – 156 engagement): “After three failed startups, here’s what I wish someone told me about product-market fit: Stop building features your five beta users requested. Start obsessing over the problem 95% of your target market actually faces. Made this mistake for 2 years. Don’t repeat it. What’s the biggest product lesson you learned the hard way?”

  • Media: text
  • Followers: 5k-20k

Post B (84 engagement): “Excited to share our new AI-powered analytics dashboard! Check out the demo and let us know what you think.”

  • Media: link
  • Followers: 5k-20k

Analysis: Post A won because it provides specific, actionable advice from personal failure, asks an engaging question, and offers relatable content. Post B is a generic promotion with less engagement potential.

Evaluation

In evaluation, each model receives this information for both posts:

  • Post text
  • Media type
  • Author follower count bucket

With this information, the models are expected to predict whether post A or B is the best performer. They can show us their reasoning, but we did not evaluate their reasoning in this benchmark. 

Since the models have a 50% chance of being accurate about the best performer ( there are only two choices), we are considering looking for “lift over chance  (Accuracy minus 50% which is the random guessing baseline)” baseline in the future.

Still, in this dataset, we have not observed random guessing; all models explained their reasoning, whether their answers were right or wrong.

What are the potential challenges of audience simulation?

Despite its promise, audience simulation must be approached with caution.

Validation against real customers

Predictions from virtual audiences must be compared against actual outcomes. Without benchmarks, results may create false confidence. Validation is crucial to ensure that synthetic personas accurately reflect the behavior of real people.

Bias in language models

AI personas are shaped by the data that trained the underlying language models. If that data underrepresents certain groups, the resulting personas may distort how specific demographics are portrayed. This can affect how survey responses or public opinion are simulated.

Interpretability

Although persona conversations or word clouds can show common themes, it is not always clear why specific outputs emerge. The complexity of LLM responses can make it difficult to explain or validate audience behavior.

Ethical guidelines

Using synthetic data for customer research or policy research requires transparency. Organizations must ensure that they do not present simulations as a replacement for real customers and should respect ethical boundaries in defining personas.

Generalizability

Simulations are highly dependent on the scope of persona design. A model trained on U.S.-based tech founders cannot automatically predict responses from Gen Z in Asia. Overgeneralization is a risk when extending findings to populations that were not represented in the simulation.

Computational cost

Running detailed simulations with thousands of personas can require significant resources. Although AI tools are improving efficiency, large-scale experiments still demand time, technical knowledge, and infrastructure.

Share This Article
MailLinkedinX
Sıla Ermut is an industry analyst at AIMultiple focused on email marketing and sales videos. She previously worked as a recruiter in project management and consulting firms. Sıla holds a Master of Science degree in Social Psychology and a Bachelor of Arts degree in International Relations.

Next to Read

Comments

Your email address will not be published. All fields are required.

0 Comments