synthetic data Archives

Showing posts tagged with: synthetic data

04
May

Will Synthetic Data Take Over Market Research?

jerry9789
0 comments
artificial intelligence, Burning Questions

Is Synthetic Data Replacing Consumer Research?

Lately interest in synthetic data has been gaining steam, judging from the conversations, posts and discussions around it. Easier access to advanced modeling tools, improved efficiency and effectiveness, as well as the opportunity for better privacy governance are seen as the driving forces for the surge in its popularity. Some are even marketing synthetic research not just as a solution but as a replacement for traditional, slower and often expensive research methodologies, presenting it as the faster, cost-effective and modern approach to consumer research. But is synthetic data indeed the future of market research?

Image: Darlene Alderson

What Is Synthetic Data?

Put simply, synthetic data is information that wasn’t directly collected from real world consumers or respondents. Instead, it’s artificially generated data produced by mathematical models or algorithms designed to mimic natural or real-world data.

It can be “fully synthetic,” meaning it was primarily created by algorithms with little direct connection to real respondents, or “partially synthetic,” where gaps in real data are filled in by AI. “Augmented” data, perhaps the more popular form of synthetic data, is simulated information built or extrapolated from a foundation of real-world information.

Aside from the benefits of speed, efficiency, and cost-effectiveness, synthetic data is helpful with various aspects of experimentation, such as preliminary testing, checking hypotheses, stress testing, iteration, and data fusion, even before any data is collected. It could help improve cases where sample data is small or limited because of difficulties acquiring real data or niche populations. And with rising expectations when it comes to governing privacy, synthetic data is being perceived as a solution to easily share and analyze sensitive information with lesser risks of identifying respondents.

Image: Sherin Sam

What’s Keeping Researchers From Embracing Synthetic Data?

While researchers acknowledge the benefits offered by synthetic data and are interested enough to explore the new realms it unlocks, there’s no general feeling of rushing to embrace the new hot tech. Rather, the push for adopting synthetic data seems to come more from research agencies and their marketing arms, rather than the researchers themselves or even their corporate clients.

So why aren’t more and more researchers jumping on the prospect of using synthetic data for their studies? Proponents of synthetic data extol its 80% match rate with real data; however, researchers recognize that that 20% divergence might make or break the research, as it could be where you’ll find the more nuanced opinions, emotion-driven responses, and meaningful differences.

There’s also the stigma associating the term “synthetic” with “fake.” There is distinction, however, between synthetic data and fake data, as the former is generated rather than invented like the latter. Synthetic data draws from real data so it reflects outputs that can be validated, tested, and compared; fake data isn’t afforded the same respect and measure of accountability.

Understandably, there are concerns about the quality of the data the AI models are fed on. Poor quality data can lead to oversimplification, overexaggeration, and bias reinforcement. Perhaps most importantly, researchers are concerned with losing the human element in synthetic data, that disconnect from genuine behavior which is revealed when observing how people naturally- and often spontaneously- express themselves. Human truths that are deeply tied to cultural, economic, and psychological factors, grounding insights in real-world behavior while elevating them from mere statistical guesswork.

In addition to AI hallucinations, synthetic data left to iterate by itself eventually produces nonsensical results. AI models have also been observed to be too eager to please, potentially discounting the opportunity for contrarian responses, unexpected perspectives and uncovering pain points which real participants often provide, potentially leading to groundbreaking insights and discoveries.

Image: Michelangelo Buonarroti

The Future Of Market Research with Synthetic Data

Synthetic data might be far from the game-changer vendors are hyping it up to be but researchers appreciate having it as another tool at their disposal. Synthetic research could work if you need to confirm or validate ideas quickly and while working on a budget, but it shouldn’t be expected to produce breakthroughs or unravel deeper levels of understanding the same way natural data does. It can help improve studies by filling in gaps but these would require validation as well as being transparent to stakeholders regarding the nature of the data behind the results.

Rather than being a direct replacement, synthetic research could serve study objectives and goals better by complementing, supporting and/or augmenting consumer research. Synthetic data alone would give everybody the same information, but adding human input and oversight could mean the difference in uncovering resonant insights with a level of confidence that truly drives or influences decisions and actions.

Additionally, synthetic data is also not a one-and-done solution. Human behavior and attitudes aren’t fixed and they change over time, so why should synthetic data remain the same and stagnate? To foster credibility and uphold confidence, synthetic data would require consistent updating and stringent control, as well as be verifiable and reflective of the real world.

Yes, synthetic data can be powerful, but by itself would falter without that all-important additional layer of humanity. Market research was, after all, founded on listening to real people, so synthetic data must be anchored in human truths to produce meaningful and relevant insights. AI-driven market research might be lauded by some as the way of the future, but it won’t spark anywhere near the same level of confidence that synthetic data empowered by human truth inspires.

Additional Reading:

Can Synthetic Respondents Take Over Surveys?

Trade Talk: Synthetic data: Intriguing, but is anyone actually sold?

Why We Don’t Talk About ‘Synthetic Data’—And Why You Shouldn’t Either

Synthetic data can benefit medical research — but risks must be recognized

When and How “Synthetic Research”– Qualitative Research Among AI-Generated Profiles– Might Be Useful, and its Limitations

Synthetic Data in Market Research: An Expert View on Why Natural Data First Still Wins

Featured and Top Images: cottonbro studio

15
Apr

Survey Fraud Is On The Rise- But Is AI To Blame?

jerry9789
0 comments
artificial intelligence, Brand Surveys and Testing, Burning Questions

Is Rising Survey Fraud Due To AI?

Online survey fraud is on the rise with 40% of surveys done in 2025 believed to be problematic. That translates to 2 billion surveys out of 5 billion market research surveys completed each year. It’s easy to think that the increase is due to the growth of AI usage, but survey fraud has actually been a problem long before ChatGPT caught mainstream fire.

Market research’s struggle with survey fraud for over two decades is fraught with poor quality data. More than simple “errors,” fraud could potentially and significantly skew or distort findings with noise and bias. This could lead to either outright “flat” or negligible results, even unactionable insights when unraveled. Additionally, survey fraud wastes time, effort, and resources, including those expended for detecting and cleaning up fraudulent data. More importantly, it undermines confidence in the market research industry.

AI may be poised to exacerbate the issue with survey fraud, especially now that we’ve begun exploring the realms of synthetic data. Experts agree that AI fraud is apparently still in its early stages at this time but even so, organizations have already prepared measures to combat AI fraud, such as observing typing and mouse movement patterns, identifying “copy/paste” behavior, and flagging nonsensical or incoherent responses. These measures also extend to anticipating or simulating how AI agents would be designed to convincingly mimic human respondents taking surveys and avoid detection, then devising ways to preemptively counter those tactics.

Image: Towfiqu barbhuiya

What Is Human Survey Fraud?

Data quality at present is mostly under increasing threat from human fraud powered by “click farms” more than the AI kind. For all the operational efficiency and productivity it brings, building AI agents sophisticated enough to convince surveys that a “real human” is participating is actually difficult and expensive to scale at this time, while those models that can be employed in large numbers and for cheap are comparatively easy to detect. It would therefore be more cost effective for fraudsters to forego sophisticated AI agents for now and simply stick with human-powered click farms.

Of course that doesn’t stop those engaging in human survey fraud from utilizing AI along with bots, VPNs, and other contemporary technology, as their efforts have resulted in the aforementioned 40% survey fraud rate. While the picture of an overseas operation located in a room with several employees and computers comes to mind, the pandemic had pushed click farms in low-wage countries to expand to home-based setups utilizing multiple smartphones to simultaneously take part in surveys. They’ve even promoted their activities through social media by sharing experiences, information and advice on groups, forums and video-sharing sites on how to enter surveys, pass through screenings, and the like, leading the way for more fraudsters to participate and aggravate the problem.

Another considerable contributing factor to the growing fraud rate is hyperactive respondents, or professional survey takers who attempt to participate in many surveys as possible within a given period of time. They exploit systems and farm incentives by pretending to be legitimate participants and repeatedly entering the same survey with the help of VPNs, device spoofing, cookie clearing, browser emulators, and AI-generated text. Different studies on survey fraud have found hyperactive respondents in every source, panel, and exchange.

Image: Darlene Alderson

The Importance of Ownership of Data Quality

Measures and solutions against human survey fraud like verification checks, logic-based trap questions, and post-survey cleanup exist, but their effectiveness is now in question with the high fraud rate. The prevalence of hyperactive respondents indicate that the present system of vetting and filtering participants is not only falling short but lack teeth in flagging these repeat offenders.

Perhaps rethinking human survey fraud might be key in fighting or even reversing the increasing fraud rate. Online survey fraud has been around for more than twenty years already so we as an industry need to think past of it as just a mere disruption but as a systemic and consistent challenge moving forward. We anticipate the inevitable rise of AI fraud with the exponential growth of synthetic data in the coming years by arming ourselves with innovative detection methods and safeguards to face this emerging issue; why not apply the same rigor, dedication, and layered approach in combating the present threat of human survey fraud?

And instead of limiting our renewed approach to battling human survey fraud by reacting, reviewing and restructuring, why not empower ourselves with a greater focus on ownership of data quality? Rather than accept at face value that fraud has been filtered beforehand or rely that it would be handled post-survey, we assume responsibility for data quality every step of the way, evaluating participant behavior at every stage, erring on the side of caution by flagging suspected hyperactive respondents, and/or leveraging human expertise when distinguishing fraudulent responses. We can take advantage of AI and modern technologies to help us measure, track, and flag possible instances of fraudulent behavior, automating the process wherever relevant while being guided by human oversight.

Ownership of data quality can also go hand in hand with improving participant engagement and polling representivity, potentially unlocking opportunities to discover new insights that would’ve otherwise been hidden by poor quality data.

Let’s be real- fraud could never be fully removed from the survey process. But by caring about the data quality that your market research firm provides, you’re able to mitigate the dangers fraud poses while gaining value at the same time from the insights and breakthroughs you uncover with every challenge you master in this protracted campaign for survey data.

Image: Tumisu

Additional Reading:

The Fraud Problem Reshaping Survey Research

The Rising Issue of Bad Data in Online Surveys Causes and Contributing Factors

The Pervasive Threat of Tech-Enabled Fraud in Survey Research

Featured Image: geralt

Top Image: Towfiqu barbhuiya

28
Jan

What’s Happening Nowadays With Survey Samples? (Part 1)

jerry9789
0 comments
artificial intelligence, Brand Surveys and Testing, Brandview World, Burning Questions

What is The Op4G / Slice MR Scandal?

Op4G (Opinions4Good) and its offshoot Slice were US-based market research companies whose senior leaders were indicted in April 2025 for selling fake market research over the course of a 10-year period, generating $10M in fraudulent revenue. While they marketed their business model of maintaining “a quality, engaged membership panel” of individuals eligible to participate in surveys, they began recruiting in 2014 certain individuals called “ants” to complete surveys to increase revenue despite producing fabricated market data. Companies that purchased survey data from Op4G or Slice between 2014 and 2024 are encouraged to contact the U.S. Attorney’s office.

The scheme opens up questions on how much these fraudulent market data has permeated the industry, especially when Op4G and Slice presented their survey findings as high quality backed by ISO certification. It brings to light the importance of upholding transparency and accountability in the market research industry despite the availability of certain shortcuts to cut cost and time.

Image: jesben

What is Enshittification?

The Op4G / Slice MR scandal is perhaps emblematic of the enshittification of platforms. Popularized by Canadian writer Cory Doctorow in a 2022 blog post, Wikipedia defines enshittification as “a process in which two-sided online products and services decline in quality over time.” JD Deitch, who cited in a Greenbook podcast Doctorow’s article as inspiration for writing his ebook, described enshittification as “what happens in platforms when they start to seek yield and profitability and growth.”

Together with Lenny Murphy on that Greenbook podcast, JD touched on how enshittification compounds the long-standing issues in the sample market when it comes to producing high quality and reliable market data: those of participant engagement and polling representivity. The participant experience has been neglected and treated as an afterthought by the industry for so long that attracting a wide and diverse pool of engaged and relevant respondents has remained a constant challenge. When participants aren’t incentivized enough to engage with the survey experience, the quality of the data and insights produced risk falling short of their true potential. And when you simply aren’t attracting enough respondents or even give a reason to change the minds of those who aren’t really inclined to participate in surveys, you’re missing out on the opportunity of tapping into subsets of the population that could’ve given new and interesting perspectives.

The emergence of AI exacerbates issues and attitudes towards the participant experience. When client companies have not just years but decades worth of survey data and studies, they could simply shift spending away from participant-driven research to developing AI that could produce synthetic data from their stock. And when research market companies don’t own or have access to such kind of survey information, desperate firms might resort to taking shortcuts like programmatic sampling or like in the case of Op4G and Slice, fraudulent means to generate survey data and revenue.

The quality of the synthetic data being produced from all that past data and studies comes to mind, too. Yes, it would depend on the quality of the training data Large Language Machines (LLMs) is fed. Excellent synthetic data would enable scaling and efficiency. However, excellent synthetic data would be tethered to the subject matter it excels on; deviation from the subject matter might produce less than desired outputs and far from potential breakthroughs or new discoveries. And despite AI’s best attempts to optimize based on what it was trained on, there’s also always the risk of it hallucinating. When one cares enough to understand, working or investing with flawed data is simply intolerable.

Image: Tumisu

Featured Image: andibreit

Top Image: Tima Miroshnichenko

02
Dec

Can AI Replace Human Respondents In Qualitative Research?

jerry9789
0 comments
artificial intelligence, Brand Surveys and Testing, Brandview World, Burning Questions

Like most industries these days, market research is no stranger to AI with its broad applications including the employment of synthetic respondents, which are individual profiles constructed by Large Language Models (LLMs) from real or simulated data. They offer fast, cheap, and scalable synthetic data that closely mimics how human participants would respond, a boon for quantitative research. But can synthetic respondents be just as effective in qualitative research? Can AI-powered profiles fully take over the role of human respondents in market research?

Image: Diana

Synthetic Respondents and Qualitative Research

L&E Research recently hosted a webinar sharing their findings and observations testing synthetic respondents across a variety of qualitative research tasks. They shared that AI characteristically produces quick, structured, and consistent surface-level insights. It does well with detecting macro trends in usage or preferences, concept screening if you need to compare multiple ideas at scale, and spot issues with survey testing. It is also capable of gap-filling or simulating missing segments from known data, as well as bulk analysis for summarizing large open-ends quickly.

The key takeaway L&E found is that AI can describe what people do, but it falls short of telling why people do it. AI fundamentally excels in following patterns, but it would struggle with finding out the emotional driver, the motivation behind certain responses. AI can match logic but it won’t be able to fill in tone, nuance nor context like human insight and experience can.

Most AI models are also built on public data and may not have access to knowing how real people would respond to certain questions. When the engineers tried to influence AI agents in the direction of how real participants would respond, it rejected this notion and firmly stood by the perspective formed from the vastness of public data.

Additionally, AI can be absolutely and confidently wrong. Synthetic data can look convincingly human but since AI relies on patterns instead of experience, the air of confidence it puts up doesn’t guarantee accuracy.

Of course, the hosts added a disclaimer that this is where synthetic respondents are at right now, as no one could tell how things could possibly be so much different in the years to come. But the continued utilization of AI in market research- or any other industry, for that matter- is inevitable thanks to the operational and executionary efficiency it grants, and that is enough reason to continue studying and developing synthetic respondents.

Image: Ron Lach

Why The Human Factor Matters

In market research, emotions matter and context counts. AI can prove to be a powerful partner but it is no replacement for lived insight or validation. Human researchers are simply going to remain essential.

AI’s inherent structure and consistency is representative of its pursuit of perfection; however, humans aren’t perfect, nor simple. Humans are emotional and oftentimes, irrational. AI participants would respond based on their perfect approximation of how a human being would, but the synthetic logic behind that would be narrower and more consistent, as it discounts the fact that humans are imperfect.

Humans also bring incredible complexity and a broader range of perception to the table. We can contradict ourselves, and this would be natural. One human participant’s perception and experiences could inform the difference in how they respond from the next, while synthetic data would be uniformly shaped by congruence and invariability, no matter how much effort or work is put into making AI come close to mimicking humanlike responses.

The complexity, variability, and randomness of human nature is desirable in qualitative research. The engineers recognized this and cautioned about overly guiding or influencing randomness in AI that it “will hard-code your picture of randomness to the point where it is no longer random.”

AI can quickly give you bulk analysis but you might not want to rush in bringing it to your stakeholders, as they would question and challenge the quality and reliability of synthetic data. Human insight continues to be vital and irreplaceable when it comes to trust, nuance, and real-world complexity in market research.

Image: Kathrine Birch

The Hybrid Approach

At the end of it all, the hosts made a point that the webinar wasn’t meant to scare people away from synthetic data but rather bring a valid conversation on when it makes sense to take advantage or steer clear of AI-generated personas. In fact, they recommended utilizing a hybrid approach of employing virtual respondents and recruiting human participants, striking a delicate balance between synthesis and empathy.

Synthetic data would be great during the early exploratory stages of market research when you want to get an initial pulse check, something quick and good enough before getting people involved. But once you’re at the point when you need to uncover the emotional driver behind responses and decisions, understand or predict behaviors, or even gain a bit more confidence and trust in your findings, that’s when you bring in your human respondents.

This all aligns not only with a recent growing trend of companies coming around from the AI hype of the last few years but also with our stance on the appropriate use of AI, where we advocate for the responsible and ethical use of artificial intelligence. Instead of handing AI complete reins over all aspects of a business- or in this case, all stages of research work- we at Cascade Strategies encourage the thoughtful and practical application of artificial intelligence in combination with or enhanced by human experience, values and discretion.

To find out how our brand of inspired and enlightened human thinking can help you with your market research needs, please contact us here.

Additional Reading:

Can Synthetic Respondents Take Over Surveys?

Featured Image: Darlene Anderson

Top Image: Michelangelo Buonarroti

A highly innovative, award-winning market research and consulting firm with over 31 years’ experience in the field. Cascade provides consistent excellence in not only the traditional methodologies such as mobile surveys and focus groups, but also in cutting-edge disciplines like Predictive Analytics, Deep Learning, Neuroscience, Biometrics, Eye Tracking, Virtual Reality, and Gamification.

Is Synthetic Data Replacing Consumer Research?

Image: Darlene Alderson

What Is Synthetic Data?

Image: Sherin Sam

What’s Keeping Researchers From Embracing Synthetic Data?

Image: Michelangelo Buonarroti

The Future Of Market Research with Synthetic Data

Featured and Top Images: cottonbro studio

Is Rising Survey Fraud Due To AI?

Image: Towfiqu barbhuiya

What Is Human Survey Fraud?

Image: Darlene Alderson

The Importance of Ownership of Data Quality

Image: Tumisu

Featured Image: geralt

Top Image: Towfiqu barbhuiya

What is The Op4G / Slice MR Scandal?

Image: jesben

What is Enshittification?

Image: Tumisu

Featured Image: andibreit

Top Image: Tima Miroshnichenko

Image: Diana

Synthetic Respondents and Qualitative Research

Image: Ron Lach

Why The Human Factor Matters

Image: Kathrine Birch

The Hybrid Approach

Featured Image: Darlene Anderson

Top Image: Michelangelo Buonarroti

Welcome to Cascade Strategies

About Us

Our Services

Methodology

Welcome
to Cascade Strategies