Jemma White, COO of Prolific, on why humans ensure AI safety

Jan-Erik Asplund
View PDF
None

Background

We've tracked the rapid growth of a new cohort of data labeling companies like Handshake ($50M annualized AI revenue in May 2025), Mercor ($100M annualized revenue in February 2025), and Prolific fueled by frontier labs’ transition to reasoning models.

To learn more, we chatted with Jemma White, the COO of Prolific ($32M Series A, Partech & Oxford Science Enterprises), which has built a network of 200,000+ participants serving thousands of organizations.

Key points via Sacra AI:

Questions

  1. Like a lot of companies in this space, when Prolific started it was working on a slightly different use case. How did Prolific evolve to meet AI use cases and how does Prolific's unique origin story inform its approach?
  2. Outsiders of this market look at companies like Prolific and think, well, these are more like professional services or at best labor marketplaces, matching demand with professionals to meet a specific need. What does that view get wrong about the market? You mentioned Prolific's technology platform. So if we could drill into that.
  3. Can we talk about specific tools like the API that can integrate into customer workflows, or the audience discovery tool? Is it possible to talk a little bit about those products and how they gel with the overall offering?
  4. The perception is that this market is really all about those contracts with Frontier Labs and that it’s a fairly small number of customers even if the spend is large. Is that accurate?
  5. So it's fair to say that as you're serving these other customer use cases, in addition to the AI use case, that helps feed the supply and availability of experts in the platform overall?
  6. AI model training and development is all about scale. How do you think about navigating that tension between quality of participants on the one side and breadth and depth on the other?
  7. So is there a sense in which you're seeing demand in AI across customers shift towards these more niche, for lack of a better term, needs that perhaps can be met by smaller groups? Is that a demand shift?
  8. Does that include cultural fluency, localization, and other language groups?
  9. And in terms of those personality or human attributes that you just mentioned, can you help us understand what those might be?
  10. How do you stay ahead of Frontier Lab and AI B2B demand? Do you read research and figure out what the latest post training trends are and where labs are in their development arc with these foundation models and try to get ahead of it? How do you follow that ball as it moves?
  11. In terms of the AI use cases, what makes frontier labs or businesses stay with or expand their business with Prolific?
  12. How does a customer decide between build versus buy in this market? When would a customer decide to replicate this capability in-house versus rely on Prolific?
  13. In terms of Meta's investment into Scale AI, can we talk about what it reveals about the broader vendor landscape for human data and AI? How has it reshuffled the market from where you all sit?
  14. Is the rise of synthetic data a headwind for this market? Or is Prolific somehow positioned to benefit?
  15. Is there a sense in which synthetic data is good for achieving scale? But in terms of some of those nuances and the last mile, so to speak, of model development, you need human data?
  16. Many firms in this market report what are clearly gross revenue figures as topline results. How should customers and other stakeholders assess what the landscape is really like, and what the reality is behind those headline figures that we see?
  17. What's another big misconception outsiders to this market have about the market you're serving, specifically human data for AI?
  18. To paraphrase, the frontier models are getting smart. They've learned physics. They've learned advanced math. But what we're seeing is there's still a long tail need for ground truth in human data to ensure trust and safety, for evaluation purposes, for alignment to specific cultural, linguistic, and even personality vectors. Is that fair?
  19. In terms of benchmarks, can we talk about that? You guys have been putting out some messaging on benchmarks, how they should be treated, why they might be counterproductive in some contexts. If you could speak to that.
  20. If we could fast forward into the future, five years from now, what does this space look like for human data in AI? What's Prolific's role in it? If you could paint a picture for us about what that looks like.
  21. To dig into that layer, the app developer layer and the B2B AI layer as we called it, if we could just zoom in because I think that gets lost in discussions about Prolific’s market. Can you speak to how demand has grown and what you're seeing in terms of spend and what these use cases require?
  22. One follow-up. You mentioned 2,000,000 experts on the wait list. Have those folks been vetted?

Interview

Like a lot of companies in this space, when Prolific started it was working on a slightly different use case. How did Prolific evolve to meet AI use cases and how does Prolific's unique origin story inform its approach?

This dates back around twelve years to the original inception of Prolific. Two PhD students at Oxford University needed to conduct research and set up their own technology platform to do it. Although they could run studies themselves, they were finding it difficult to get high quality respondents on the other side of those studies.

They built a technology platform because there wasn't anything available to them. From that, it grew substantially over the next eight years. At its core was this proprietary technology that they had matching the researchers' needs to the right participants at the other end, and it was pretty much instant responses, which was another big selling point for them.

As generative AI advanced, the demand for human-in-the-loop research grew and Prolific’s platform was perfectly positioned to meet that need for leading AI companies.

Prolific’s foundation of high quality, rapid data delivery has remained constant since its inception. While technology drives the platform, it is our commitment to fair participant treatment and precision matching, delivering the right participants to the right tasks at speed that really resonates with AI customers.

Prolific’s origins and PMF are very different from companies that started as recruitment firms or labor marketplaces and later pivoted into AI training. What sets Prolific apart is its technology, global reach, and scientific grounding. The academic rigor and unique selling points that defined us then are still our USPs today.

Outsiders of this market look at companies like Prolific and think, well, these are more like professional services or at best labor marketplaces, matching demand with professionals to meet a specific need. What does that view get wrong about the market? You mentioned Prolific's technology platform. So if we could drill into that.

Prolific has always been a technology platform, not a labor marketplace. From the very beginning, we built on two principles: ethical treatment of participants and smart technology to deliver high-quality data at speed. Twelve years ago we were already focused on fair pay and participant experience, and that remains core today.

What sets us apart are the technology layers under the hood: smart matching of participants to tasks, dynamic pricing, automated screening and verification, enrichment, and fraud detection. Together, these power a quality flywheel that ensures authentic human input and consistently high-quality responses. The result is speed, our customers can launch a study and get data back in minutes, whereas managed-services models often take weeks or months.

We operate a very transparent business model. Researchers can see how much our participants are being paid, set the payment themselves, and select their own participant groups based on thousands of attributes, including behaviors, credentials, experience, performance, and a growing number of personality traits too.

There’s a huge amount of profiling behind the participants on the Prolific platform, and that’s what sets us apart from marketplaces made up of job seekers with spare time. They simply don’t have the depth of profiling we have across 200,000 active participants—many of whom have been with us for years.

We rarely rely on external recruitment and we never look to cheaper labour markets to fill studies. With over 1 million people on our waitlist, we can meet almost all customer needs quickly from the participants we already have signed up while continually refreshing and enriching our data. That’s what enables both speed and quality. Prolific has the breadth of humanity as well as extensive expertise on the platform from doctors, scientists, teachers, smart generalists and so much more. That makes Prolific uniquely valuable.

Can we talk about specific tools like the API that can integrate into customer workflows, or the audience discovery tool? Is it possible to talk a little bit about those products and how they gel with the overall offering?

The audience discovery tool is an interesting one because it's just gone live on our external website in the last couple of weeks. Anyone can use it, you don't need to have signed up to Prolific. You can go onto the website now. You can type in the types of participants that you need or you think you need, and our algorithms in the background will tell you how many of those we have in the pool and allow you a lot of different filters to search down into what you need.

If you're within the actual platform itself, it goes a lot deeper than that. It can actually suggest participants that you would need for the study back to you. That's quite a unique feature. As I say, it's now available externally. People can play around with it. It shows you multiple domain experts, the global reach, cultural preferences, languages and our qualified AI taskers. We developed our own proprietary AI training for a lot of our participants, and the number of qualified AI taskers active on the platform and available to all researchers is now over 5000 and growing weekly.

API integrations are currently used mainly by our Frontier Labs, though they’re not limited to them. We have evolved this capability over the last few years as our customer base has grown, and we now work with a significant number of leading labs and are deeply integrated into their AI workflows. Customers can connect to our platform via API or operate on a self-serve basis directly. For those who want more support, we also provide managed services offering curation of bespoke participant pools, closer oversight, QA and more hands-on guidance for complex or larger studies or simply a POC before self serving in the future.

The perception is that this market is really all about those contracts with Frontier Labs and that it’s a fairly small number of customers even if the spend is large. Is that accurate?

Prolific works with four main customer segments, and Frontier Labs are a key driver of our growth in AI research. But I’ll start with academics, because that’s where Prolific began. Our academic base is our heritage, and it gives us strong credibility in the market thanks to their drive for quality and the scientific rigor of their research. Academics remain a significant part of our business, running everything from AI studies to behavioral research.

Our second customer category is Frontier Labs who are at the cutting edge of foundation model development. They rely on Prolific for a wide range of studies, from red teaming and safety evaluations to cultural nuance and other post-training and fine-tuning work. We have been a trusted partner in this space for over four years.

Our third customer category is AI focused B2B companies developing their own products and smaller models. They’re using Prolific for product testing, safety evaluations and a growing range of other use cases.

Finally, we serve the broader enterprise market, businesses seeking corporate insights, customer insights, and general market research. This segment is largely self-serve: customers value the diversity of our participant pool and can run their own studies with minimal interaction from our team. It is a growing market for us, with strong product-market fit and low-touch scalability

So it's fair to say that as you're serving these other customer use cases, in addition to the AI use case, that helps feed the supply and availability of experts in the platform overall?

Absolutely. Because we’re still a relatively small company, we’re agile, and the variety of use cases we support keeps Prolific fresh while maintaining a diverse participant pool.

Frontier Labs’ demands change week to week, and we have to stay close to those needs and respond quickly. We’re fortunate that we can structure our roadmaps and supply networks to keep pace and ensure we continue to deliver.

We already have a very large participant base, around 200,000 active at any given time and a further 2 million on our waitlist that we can tap into as demand grows. This means we rarely need external recruitment. Only in recent months have we sourced small numbers of very specific skills to backfill what we couldn’t access from the 2.2 million already in our network.

This scale gives us much greater control over quality and really fuels the speed of the research that can take place on the platform with time to fill of less than an hour in most cases. We have years of history with our participants we know their performance metrics, behaviors, and preferences as well as having verified their credentials and skills which allows us to confidently stand behind the quality of the work they deliver.

AI model training and development is all about scale. How do you think about navigating that tension between quality of participants on the one side and breadth and depth on the other?

That’s an easy one; we never compromise on quality. At Prolific, we would rather run something smaller at the highest standard and stand behind the results than chase massive pools of people where the quality bar can’t be guaranteed or in lower paid countries where fair pay and participant experience may be compromised.

There is a tension, of course. Earlier in the AI training cycle, labs often needed large-scale, broad participant pools for simpler data annotation, and we could deliver that. Today, demand is shifting, we are now in the experience era of AI training and evaluation and model builders are increasingly focused on smaller, more specialized groups for red teaming, safety work, cultural fluency where depth and expertise matter more than scale and again we are well placed to deliver these whilst maintaining a high quality bar for authentic human data.

So whether it’s breadth of humanity at scale or highly targeted expertise, Prolific can deliver and not only deliver but quicker than most other data providers. The line we’ll never cross is sacrificing quality for quantity, that’s not a path to success for us or our customers.

So is there a sense in which you're seeing demand in AI across customers shift towards these more niche, for lack of a better term, needs that perhaps can be met by smaller groups? Is that a demand shift?

Yes, the demand has definitely shifted. Over the past 12 to 18 months, AI companies have been highly focused on specialized expertise linguists, PhDs, STEM graduates, scientists. But that phase won’t last forever. Models are already reaching a point where they surpass the knowledge of the experts who trained them.

In my opinion the next frontier is humanity. Companies will need models that interact more like people capturing very human attributes, personality, and behaviors. Human input will remain essential, both to provide those responses and to evaluate them for safety and trustworthiness.

That’s where Prolific is uniquely positioned to win. Our pool is deeply profiled not just on credentials, but on human attributes and behavioral data, giving us a truly human intelligence layer. That need isn’t going away if anything, it will only increase over the next few years, across Frontier Labs but more widely any company developing AI products and tools, as ongoing monitoring of model outputs and user experience becomes critical.

Does that include cultural fluency, localization, and other language groups?

Definitely. Prolific has built a diverse base over many years, we have participants from over 40 countries with fluency in more than 80 languages allowing us to offer globally representative samples to customers a lot of whom are qualified and experienced in completing AI tasks.

This diversity is especially valuable for AI app developers, who need cultural nuance and language fluency to ensure their outputs are appropriate for the markets they serve. While Frontier Labs have long focused on language specialisation, we’re now seeing growing demand from the AI B2B segment for exactly this capability.

And in terms of those personality or human attributes that you just mentioned, can you help us understand what those might be?

Resilience is a good example, especially in trust and safety work, can someone handle the outputs or images they’re exposed to? Perceptiveness and reasonableness matter too. Even personality traits like temperament, whether someone has a high temper or remains calm under pressure can be highly relevant.

We are seeing increasing demand for these kinds of human attributes and traits, and we are able to track those signals directly through our Audience Finder tool and requests directly from customers. Over the next five years, I think this is going to become a huge part of human data both for model training and evaluation and for businesses building their own AI technology.

How do you stay ahead of Frontier Lab and AI B2B demand? Do you read research and figure out what the latest post training trends are and where labs are in their development arc with these foundation models and try to get ahead of it? How do you follow that ball as it moves?

We do this in a couple of ways. First, we have a fantastic in-house AI research team, led by our VP of Data and AI, made up of data scientists and AI researchers who stay close to the latest research and developments. We also build many of our own AI tools internally, which helps us stay on the cutting edge.

Equally important, we are listening to demand signals from our customers every day. Working directly with Frontier Labs gives us early signals of where the market is heading, and tools like Audience Finder help us spot demand trends across our participant base. That combination of our in-house expertise and constant customer feedback definitely keeps us ahead of the curve—if that is at all possible in this field.

In terms of the AI use cases, what makes frontier labs or businesses stay with or expand their business with Prolific?

There are a few things. First is our API integration, deeply embedded into customer workflows. That gives them always-on access to participants and the ability to scale up or down instantly. If they need ten participants one day and a thousand the next, Prolific can flex with their needs and they have so much control over their own research and data.

Second is quality. Once customers experience the consistency and standards of our participant base and platform, it’s very hard to walk away from that.

Transparency is another differentiator. Managed service providers often operate as a black box, while Prolific is far more transparent, giving customers control over workflows alongside speed to data.

And speed really is a major advantage. We can fill complex AI studies in hours, sometimes minutes, versus the weeks or months it can take with managed service providers. A customer can wake up with an urgent need, launch a study on Prolific, and have high-quality data the same day with no lengthy scoping, SOWs or negotiations.

That combination of self-serve flexibility with the option of managed services when needed gives Prolific a unique edge our competitors simply don’t offer.

How does a customer decide between build versus buy in this market? When would a customer decide to replicate this capability in-house versus rely on Prolific?

Frontier Labs often maintain their own annotator pools, but even the biggest labs still rely on external partners like Prolific. Internal pools simply don’t offer the same reach, depth, or diversity and there’s always a need for external validation which will grow with regulation as it slowly creeps in.

Even when labs use in-house annotators, demand for Prolific remains strong. Sometimes they don’t get the right answer internally and want to validate or get a second opinion. Other times they need access to a participant group that doesn’t exist in their internal pool.

So it’s rarely a choice of one or the other, it’s almost always both.

In terms of Meta's investment into Scale AI, can we talk about what it reveals about the broader vendor landscape for human data and AI? How has it reshuffled the market from where you all sit?

It was certainly an interesting transaction, and it has reshaped the market. Many customers have stepped back from Scale over independence concerns, which has pushed a wave of work back into the market. Some of that has come to Prolific, some to other managed service providers.

It also shows the lengths model creators will go to gain an edge, even spending billions to bring services in-house. But the results suggest it hasn’t been as successful as expected, and the reason is quality. At the end of the day, the quality of human data going into these models determines the quality of the outputs.

The good news is model builders have become much smarter about spotting quality issues quickly, what used to take months to detect is now understood far earlier in the process. That shift reinforces Prolific’s position: quality and speed is what matters most.

Competitors like Surge and Handshake have also benefited from the reshuffling, but there is more than enough demand in this space. For us, it’s reaffirmed that staying focused on high quality human data and representing the breadth and depth of humanity will ensure our long-term success.

Is the rise of synthetic data a headwind for this market? Or is Prolific somehow positioned to benefit?

Probably two parts to my answer to this one. I don't think it's a headwind. Synthetic data will continue to be around to train models for quite some time, but it will be needed alongside human data. Synthetic data alone is not going to be enough, and you will always need that human evaluation and human in the loop in order to ensure that the synthetic data being put in and the outputs that are being generated are still safe and trustworthy and of the quality that the model creators want them to be.

The other part to that is that synthetic data can coexist with human data. It may be in the future that our platform may be able to tell you whether your research needs synthetic data or human data, or a mixture of both. That's certainly an orchestration angle that Prolific will be investing in going forward.

So rather than reducing demand for human data, synthetic data may actually increase it. Some model creators, like Cohere, lean heavily on synthetic data, especially in enterprise contexts but they still need human input to validate and guide it. Ultimately, quality human data remains essential and that is not likely to change in the next 10 years at least.

Is there a sense in which synthetic data is good for achieving scale? But in terms of some of those nuances and the last mile, so to speak, of model development, you need human data?

I think that's pretty accurate. It's very good for generating volume. It'll only get you so far. At the very tail end of that, as you already said, then you're into getting the humans to validate the outputs based on the synthetic training data that's gone in.

Synthetic data has been used by all the major model creators for years, but there comes a point where domain experts outperform the data. A doctor evaluating a healthcare response, for example, will always provide more value than a textbook dataset. That human intelligence layer is essential for the nuance and safety required in model development and that is the space Prolific plays well in.

Many firms in this market report what are clearly gross revenue figures as topline results. How should customers and other stakeholders assess what the landscape is really like, and what the reality is behind those headline figures that we see?

I would love to know the reality behind some of those headline figures myself, I am not sure they always reflect the true picture. But they make for great storytelling, but gross revenue or GMV includes everything that’s paid out to contractors and participants and service delivery costs for those using BPO models which most of them are. What really matters is net revenue and margins for durability.

It’s encouraging to see growth across the market, but the key question is whether it’s sustainable. As a CFO before becoming COO, I always pay close attention to the numbers and enjoy the storytelling and bravado in podcasts of late. Certainly reassures me that we also have strong growth numbers and defensibility as our margins are excellent. Ultimately, it comes down to net returns and margin, not just topline growth.

Share of wallet is great and important, and it’s clear many companies are doing a lot for their customers who have deep pockets but the real test is: what are participants being paid? What margins are companies actually generating? And is the model sustainable over the long term? Those are the questions stakeholders should be asking.

What's another big misconception outsiders to this market have about the market you're serving, specifically human data for AI?

The biggest misconception is that human data won’t be needed for long, that it is just a temporary bubble before models outgrow human input. In reality, the opposite is true. As models become smarter, the need to continually evaluate their outputs for safety, trustworthiness, and humanity only increases.

We see clear demand trends emerging. Trust and safety is becoming a major area of focus. Cultural nuance, multilingual data, and diversity of perspective are also going to be critical especially as AI B2B customers move beyond the Frontier Labs and build products for global markets.

Prolific is in a fortunate position because of our academic research heritage. We expect to see a surge in academic work exploring how AI is developed, how it’s used, and its impact on society and human behavior. That’s a growing opportunity and unique element of our platform.

And beyond AI, consumer insights and product testing isn’t going away either. With our four strong customer segments, we’re well placed to serve demand wherever it shifts and we are experiencing positive growth in all areas of the business as we have great product-market fit and proven technology. Human based research and validation is going to be needed for a long time to come.

To paraphrase, the frontier models are getting smart. They've learned physics. They've learned advanced math. But what we're seeing is there's still a long tail need for ground truth in human data to ensure trust and safety, for evaluation purposes, for alignment to specific cultural, linguistic, and even personality vectors. Is that fair?

That’s exactly right. The need for expertise isn’t going away anytime soon. Knowledge workers, especially in areas like healthcare, are still critical. We haven’t yet seen the full potential of AI in healthcare, and it is a space I am personally excited to see the developments in. AI has the ability to improve services globally, expand access where it doesn’t exist today, and make healthcare systems more efficient for all of humanity.

So yes, deep expertise will continue to matter. And beyond that, the long tail you mention, the uniquely human attributes, will remain essential for the longer term. Trust, safety, cultural nuance, personality and human context are things models can’t yet self-generate. Human input will be needed to keep AI safe, aligned and grounded.

In terms of benchmarks, can we talk about that? You guys have been putting out some messaging on benchmarks, how they should be treated, why they might be counterproductive in some contexts. If you could speak to that.

Yes our in-house AI research team has developed *HUMAINE*, our leaderboard now in its second version, and they’ve done a fantastic job building it out.

My skepticism about traditional benchmarks/leaderboards is that they can be easily gamed. Large companies, in particular, have the resources to optimise for benchmark performance, which doesn’t always reflect real-world quality. There’s still a lot of work to be done before benchmarks and leaderboards become truly trustworthy not just for foundation models, but also for apps and smaller models.

We’re not there yet as an ecosystem, but it’s an area we’re watching closely. For Prolific, this is an exciting space, and enhancing our leaderboards and benchmarks will be a major focus over the next 12 months.

If we could fast forward into the future, five years from now, what does this space look like for human data in AI? What's Prolific's role in it? If you could paint a picture for us about what that looks like.

It's the million dollar question, isn't it? Planning a business is not as simple as it used to be five years ago. Certainly, the AI space keeps us all on our toes, which is why I like it and which is why I like scaling businesses because no two days are the same.

In 5 years regulation will be a defining force in AI safety and compliance work requiring human validation. Compliance and oversight will be far more developed and embedded. The EU is already ahead, and momentum is building in the US. That will require more humans in the loop to validate outputs, ensure models are safe and trustworthy, and meet regulatory standards.

As for the Frontier Labs, even they don’t fully know what their needs will be in five years. What I do know is that human data will remain critical, and Prolific will be there as a trusted partner. At the same time, our academic research base isn’t going anywhere, and I expect it will continue to grow as academics explore AI’s societal and behavioral impact.

Finally, the AI app developer space is going to be a major growth driver. That’s where we will see some of the most exciting innovation, and Prolific is well positioned to power it. So five years from now, I see a more regulated, more human-in-the-loop ecosystem and Prolific still right at the center of it.

To dig into that layer, the app developer layer and the B2B AI layer as we called it, if we could just zoom in because I think that gets lost in discussions about Prolific’s market. Can you speak to how demand has grown and what you're seeing in terms of spend and what these use cases require?

Spend is growing rapidly in this segment. It’s not just about safety and evals, there is a lot of product testing and grassroots research happening with us. We’re also seeing a shift toward multimodal work, not only text and reasoning but also audiovisual research. Many of these companies are building specialised models in a particular modality, so the research variety is huge.

This will be a really important growth area over the next three to five years. Customers in this segment like the self-serve nature of our platform, but at this stage they often need a bit more support. As they mature, those relationships deepen, and we already have some long-standing customers here. It’s an exciting and fast-growing space for Prolific

One follow-up. You mentioned 2,000,000 experts on the wait list. Have those folks been vetted?

Yes, they’ve gone through initial screenings, so we already hold a lot of data points on them. But to join the platform, they still need to pass our full verification checks and be in demand. We manage supply and demand carefully to protect participant experience.

That balance has always been important to Prolific. We want participants to log in and find a wide variety of studies, so they stay engaged and with us for years. Just as customers are sticky, our participants need to be sticky too and that comes from looking after them. We treat participants as our customers as much as researchers. Ensuring they have a great experience and can meaningfully benefit from the platform is a big part of what makes Prolific unique.

Disclaimers

This transcript is for information purposes only and does not constitute advice of any type or trade recommendation and should not form the basis of any investment decision. Sacra accepts no liability for the transcript or for any errors, omissions or inaccuracies in respect of it. The views of the experts expressed in the transcript are those of the experts and they are not endorsed by, nor do they represent the opinion of Sacra. Sacra reserves all copyright, intellectual property rights in the transcript. Any modification, copying, displaying, distributing, transmitting, publishing, licensing, creating derivative works from, or selling any transcript is strictly prohibited.

Read more from

Joe Kim, CEO of Office Hours, on the end of crowdwork

lightningbolt_icon Unlocked Report
Continue Reading
None

Read more from

Handshake revenue, growth, and valuation

lightningbolt_icon Unlocked Report
Continue Reading

Invisible revenue, growth, and valuation

lightningbolt_icon Unlocked Report
Continue Reading