The Design Psychologist | Psychology for UX, Product, Service, Instructional, Interior, and Game Designers

Welcome to The Design Psychologist, a podcast where we explore the intersection of psychology and design. The show is hosted by Thomas Watkins, a design psychologist who has spent years applying behavioral science principles to the creation of digital products.

We sit down with a variety of experts who apply psychology in different ways to the design of the world around us. Thomas uses his expertise to guide conversations that provide practical advice while illuminating the theory behind why designs succeed.

Tune in if you are a design practitioner who seeks to understand your work on a deeper level and craft experiences that are intuitive, effective, and delightful.

All Episodes

The Design Psychologist | Psychology for UX, Product, Service, Instructional, Interior, and Game Designers

The Why Behind Sample Size: How Many People Do You Really Need to Test With?

June 30, 2025 • Thomas Watkins • Season 1 • Episode 9

How many participants do you need to test in order to make valid research claims? In this episode, we dive deep into the science and psychology behind sample sizes in user testing. Whether you're working with five users or five hundred, the number you choose can shape the story your research tells—and how credible your findings appear to stakeholders.

Why sample size is one of the most misunderstood elements in product research
The psychological impact of “too few” vs. “just enough” users in high-stakes design reviews
Whether the popular idea that "you only need to test five users" is a myth or a useful research guideline
How to determine the right number of participants based on your research goals

By the end of this episode, you’ll have a clearer, more confident approach to choosing sample sizes. This will help you create better, more intuitive, and scientifically sound designs.

Never miss an episode.
If you’d like a note when new episodes of The Design Psychologist drop, join the newsletter. I’ll send you fresh insights on psychology and design straight to your inbox.
[Sign up for the newsletter here—it only takes a moment. → https://3leafdesign.substack.com]

[0:00] How many people do you really need to test before you can trust your research? Picture this. You're in a high-stakes meeting presenting your UX research findings. The design team is nodding along, the stakeholders seem engaged, until someone raises their hand and challenges your entire study. They ask, how can you possibly make these claims when you've only tested a dozen people? Suddenly, doubt creeps into the room as you're explaining yourself. Did you test enough people? Did you make the right research choices? In today's episode, we'll be peeling back the layers of sample size, a concept that can make or break the credibility of your research. We'll uncover how the size of your sample affects the reliability of your findings, and we'll explore why some studies require thousands of participants while other studies can draw powerful insights from just a handful of people. We'll also tackle one of the biggest UX research myths. Do you really need only five users to uncover usability issues? By the end of this episode, you'll have a clearer and more confident approach to choosing sample sizes. This will help you create better and more intuitive,

[1:17] scientifically sound designs. And if you've ever struggled with determining the right number of people for your research studies, you won't want to miss this episode. So let's dive in.

[1:27] A few years ago, I was presenting some research to a client of mine, and we had done a whole bunch of different research methodologies, and we're presenting the results, and we're saying, okay, this is what we found with this research, and this is what we found there, and the client is listening. But then there was a skeptical person in the audience who raised his hand, and he said, how could you possibly draw these kinds of conclusions when you only researched a couple of dozen people. Well, we tried to explain in this business setting where you can't get too much into academic details that given the type of research, we were standing on pretty firm ground, we think, with the number of people who we researched, who were participants in the study. We also weren't making extraordinary claims. So So depending on the type of research and the types of claims you're making, that can change the number of people who are required in order to draw certain conclusions. This skeptical individual seemed to be under the impression that in order to make a scientifically valid statement generalizing to a population of people, and in this case it was a small population of users, that you needed hundreds or even thousands of people in order to make a statement of truth. So let's take a step back and think about sample sizes.

[2:52] Imagine that you are a politician running for office. And you are looking at polling data and you see that you are polling at 45% and your primary opponent is polling at 42%. Are you doing well? Well, it might depend, right? So if you hear that the margin of error is two points, then maybe you feel somewhat good that you're actually ahead, That the true amount that you're ahead is actually a real value. Versus if you hear that the margin of error is 10 points, then you're probably less confident because it's a huge overlap between the variance in your numbers or the variance in both of the numbers that it's varying all over the place and you can't be entirely confident.

[3:50] Sample size is about reducing that variance. Now, the exact details are a little bit tricky here because there's a difference between variance and confidence interval and margin of error, but the point is basically the same.

[4:07] The variability or how much something is varying by shrinks when you bring more and more people into the equation to study. So here's the interesting thing whatever we're studying it depends on the normal level of variation in the thing that we're studying it also depends on a few other things like effect size so effect size is something like let's say you have a exercise program and you say okay in this program, after weeks of exercise, people lost an average of two pounds in a month-long period. Well, that would probably be a low effect size versus if you said, okay, in this exercise program, people lose, you know, 20 pounds on average, then that's a much larger effect size. You need to study a bigger population of people in order to see that tiny effect size. So that is a little bit about what plays into the need for sample size. So that slightly addresses one misconception in research is that you always need a huge number of people in order to make a statement.

[5:24] Here's another misconception on the other side, and it comes from an article written by Jacob Nielsen around the late 90s, early 2000s, where he states that all you need to study is five people in usability testing in order to find the truth. At least that's the takeaway that a lot of people got from the article. This misconception resulted in a lot of researchers.

[5:55] Especially junior researchers, thinking that they could dramatically cut back on all research resources, regardless to the scenario, and just test five people. But what Jacob Nielsen was actually saying is that there is a slope of diminishing returns and that when you test five people, you tend to find most of the major problems within the first five people. And then after five, there's much fewer big problems to test for.

[6:29] What Jacob Nielsen also advocated in his writings is that you have to do lots of iterative testing. So you don't do just one experiment. You do lots of experiments. You do lots of small experiments where you find some answers and you run a test and you say, okay, when we ran this test, we found three major problems. So let's stop doing the research and let's fix those problems. And then let's do the research again. And then you would test five more people and then you would find some more problems. And then you would keep doing this iteratively instead of doing a huge gigantic study. And you would tend to find the problems in a very economic fashion. It would be very efficient that you would be accelerating and moving the product toward being a better product without doing a huge, giant, expensive study. That's the point that Jacob Nielsen was actually making, but so many people misinterpreted this and thought that research only requires five people, and that's not the case. There's another caveat as well, is that those five people or the number, the size of a research sample is per group. So if you're testing multiple different groups and subgroups of people, each group needs that sample size that's being used for the study.

[7:53] So when we're thinking about picking the right number of participants for research, it really depends on the type of research that we're doing. It turns out that in each area of research, there are norms that get established because in every area of research, the researchers are doing something called a power analysis. The power analysis is sort of a long, complicated exercise that's difficult to get right, but the good news is that not everybody in the field needs to do it. It needs to be done maybe a few times, and the experiments are similar enough to where if it applies in one case, it likely applies in another case. And different areas of research begin to understand what is the sample size that we probably need in order to make the conclusions or to find the effect that we're trying to find.

[8:47] So going back to our concept of effect size, depending on what you're researching,

[8:52] there's different effect sizes. Okay, so let's take a moment to consider the different research methods that are relevant to UX researchers? And what are some of the standards around how many participants or what sample size is typically necessary in each of those different types of methods? For focus groups, it's often recommended that you have between half a dozen to a dozen groups where each group has three to five people before you reach some type of saturation of certain ideas. Because there are small groups and you're really just trying to capture certain thematic ideas that start emerging in these group discussions. In a lot of field research it's often recommended to have between 20 and 50 people who.

[9:45] So that you can capture consistent observations in the real world variability that you're dealing with.

[9:54] So if you're researching something like a mental rotation task. So imagine that you sit down and you're shown a picture of an object. Okay, and the object has maybe some kind of a complicated shape. And you're given another set of objects below labeled A, B, C, and D. And you're asked, okay, of A, B, C, and D, which one of those is a rotated version of the object on top? Your task is to pick the object that is a rotated version of the one on top. So that is called a mental rotation test. And in that area, the effect sizes tend to be bigger. So you can often test with fewer people. Maybe a few dozen participants will give you the effect that you're looking for. In other experiments like priming, this is where you're shown something in an experiment and it somehow subtly affects your behavior. So you might be shown a picture of a robber and then this makes you act differently in the experiment, maybe a little bit more cautious because you've been primed to be that way because of something you've been exposed to. In that area, the effect size might be a small to medium size. So you might need more participants to involve in the experiment in order to see the effect.

[11:19] In other cases, an area like prospective memory, there's a much smaller effect size very often. So you have to research a whole bunch of people. You have to research many more people in order to see the tiny effect size. So when we think of something like survey data, there's often hundreds or even thousands of people we need to involved because generalizability is a big issue. You might be studying a big population of people and there's lots of factors affecting their opinions. There might be subgroups of people that you need to make sure are represented in that sample. On the other hand, something like usability testing, as discussed earlier, you might need maybe five to ten people per study, as long as you're doing heavy iteration and as long as you are understanding what kinds of claims you can make, you can test far fewer people.

[12:21] In lots of areas of experimental research, it might be 20 to 100 people that you can research in per group in your experiment and still be able to find the effects. In that case, you want to be aware of what is the norm in those areas of research so that you're in line with what is necessary to find your effect. In A-B testing, you might need hundreds or thousands of people because you're looking at tiny behavioral differences of people using a digital product. And it's a small effect size. So you need lots of people to be able to shine a light on that tiny effect size. In other types of experiments, you might be doing in-depth interviews. And that might just be a few dozen people who are required because that is a heftily detailed per person output of data. And you often don't need hundreds of people in order to gain valuable insights. You can just have a dozen or a few dozen people in order to pick up on things.

[13:28] Longitudinal analyses might require hundreds or even thousands of people because you're looking at things over a prolonged period of time and there's lots of factors affecting the data. So, Those are some things to think about when considering how many people to include in your research study. And those are some of the factors that affect why is it the case that it can be valid to have a few people in one study versus lots and lots of people in another study.

[13:59] So to take a moment to think about extraordinary claims and what might make a claim appropriate versus extraordinary.

[14:08] Think about the following example let's say i have a design and i want to see whether it works and i test five people and let's say in one situation four of the five people got through the design flow and they think it's a terrific user experience and they performed well in it And then you say, yes, this design is definitely good and we can move forward with it. Versus if you test those five people and four of them had a problem, a catastrophic problem while trying to finish the flow, we have enough evidence to say that we're not confident in this design. That's not an extraordinary claim, but in the first case, if you said, yes, based on only these five people, it worked for four people, and we can extrapolate and generalize to a larger population that this is the design to go forward with, that would not be a safe conclusion to make. So depending on the type of conclusion and claim that you're making, this is going to affect how many people in your sample or how big of a sample size is necessary for you to be able to make those claims.

[15:29] So in practical application, when we're thinking about how many people we need for our research, a lot of this in the real world might boil down to looking at the data and trying to get a feel for how big the effects are and how many more people we need to research. So for example, sometimes in my consulting practice, we might administer a survey. And this survey could be administered to a whole bunch of employees who work at a company, and we might be asking them how they would organize certain data, or how they would react to a certain organizational scheme, or to a certain screen, or to certain terminology. And then we put in a survey, and we email it out to a whole bunch of employees. Now we might get just a few dozen responses.

[16:29] If there are some metrics that come very close between multiple different groups, if 30 people say they prefer option A and 40 people say they prefer option B, that might not be a big effect. But if we have several hundred people give a response, then we can feel more and more confident as that sample size gets bigger that the amounts we're seeing are real. And of course, we can run statistics. There's ways of doing descriptive statistics to try to see what the sample size is.

[17:03] The variance is, in practice, we often are so rushed on a low budget that we don't do that kind of due diligence, truthfully. So here are a few examples from actual practical application. If I'm doing consulting and we are doing cognitive walkthroughs to make sure that a certain design is not problematic, we might go through rounds of testing of maybe half a dozen people at a time and we try to see if big problems emerge with small data sets and if there are constraints on budget constraints on time then that might be more pressure for us to quickly make a decision but we're going to keep in mind what kind of claims can we make if we're saying okay we're trying to identify the big problems and three out of five people got confused at a certain step, that is enough information for us to go back and take a second look as we're iterating on the designs.

[18:11] And if we want to be able to confidently say that we can roll this out to a company, we will probably test dozens of people, maybe 40 people, and find that there is a low rate of people having problems or finding obstacles within the flow, this is something that we'll feel much more confident about advocating as a solution across an organization. If we're administering a survey, as we sometimes do, in my consulting practice, we might look for a few hundred people when we're trying to get attitudes or.

[18:52] Impressions that people are having when they're exposed to certain design options or organizational schemes. If we're able to get, for example, a few hundred people to say this organizational scheme makes sense and the vast majority of a few hundred people are able to get it, something like that would give us more confidence because we're having a high success rate after testing a lot of people. Okay, to wrap this up, as researchers and design psychologists, we need to carefully consider sample size when creating our studies. The number of participants we involve affects the reliability of our findings, and there's no single right answer for every study. Some studies require thousands of participants, and that's for certain statistical reasons. For example, they aim to detect small behavioral patterns, or maybe they are trying to generalize to large or diverse populations.

[19:53] Other studies can uncover powerful insights with just a handful of people, especially when we're studying effects that are strong and therefore show up more easily in the data collection process. Other times include if we're trying to identify critical usability problems early on in the design process. This is why the idea that you only need to test five users for usability testing is true, but only in limited circumstances. While you can certainly reveal major usability issues with just five participants, you want to add certain features to your testing process, like testing in cycles or specifically trying to find show-stopping disasters.

[20:37] You also have to limit the types of conclusions you're able to draw from such a study.

[20:43] In the end, different research questions and methods call for different sample sizes. The right number depends on exactly what you're trying to uncover, whether it's broad trends or small behavioral changes or major usability issues. Choosing the right sample size comes down to your research goals, the type of data you need, and the claims you plan to make with your data. Ultimately, sample size is part of your research strategy. Think of it as matching your method with your goals. This helps us observe effects that are meaningful, credible, and guide our projects to success.