Use a hypergeometric distribution when sampling without replacement from a small, finite population and you need the exact probability of obtaining a specific number of successes.
Use the hypergeometric distribution when sampling without replacement from a small, finite group where the exact count of successes matters.
You’ll want it when pulling cards from a deck or selecting defective items from a small batch—each draw changes the odds for the next one. That’s because you’re sampling without replacement from a finite population and need the exact probability of getting a specific number of successes.
Quick Fix Summary: Use the hypergeometric distribution when sampling without replacement from a small finite population where you care about the exact count of successes. Formula is:
P(X=k) = [C(K,k) × C(N−K,n−k)] / C(N,n)
- N = population size
- K = total successes in population
- n = sample size
- k = desired successes in sample
A discrete probability model that gives exact probabilities for the number of successes in a sample drawn without replacement from a finite population.
This distribution spits out exact probabilities for the number of successes in a sample drawn without replacement from a finite population. It only produces whole-number outcomes, and its trials are dependent—each selection changes the remaining pool.
It is named for its derivation from the hypergeometric series in mathematics, reflecting its mathematical complexity compared to simpler distributions.
The name comes from the hypergeometric series in mathematics, which appears in its derivation. The “hyper-” prefix hints at the extra layer of complexity compared to simpler distributions like the binomial.
The hypergeometric distribution models dependent trials where each draw changes the probability for subsequent draws because items are not replaced.
Think of it this way: once you draw an item, it’s gone. That changes the probability for every subsequent draw. Most statistical software in 2026 includes built-in functions like HYPGEOM.DIST to handle this scenario automatically.
It differs from the binomial distribution because it assumes sampling without replacement, making trials dependent, while the binomial assumes replacement or an effectively infinite population.
It assumes sampling without replacement, making trials dependent. The binomial distribution, on the other hand, assumes replacement (or an effectively infinite population), so probabilities stay constant across draws.
Avoid using it when your population is large or replacement does not matter, and use the binomial distribution as an approximation if N > 10,000.
Skip it when your population is large or replacement doesn’t matter. If N > 10,000, the binomial distribution often works as a close enough approximation. Also skip it if you’re sampling with replacement—use the binomial instead.
Common real-world examples include quality control, card games, and marketing sampling where exact probabilities are needed for small finite populations.
Quality control: checking a batch of 100 light bulbs for 5 defectives. Card games: calculating the odds of drawing 3 aces from a 52-card deck. Marketing: estimating how many premium subscribers you’ll get in a 1,000-person sample from a 50,000-email list.
Calculate it manually using the formula P(X=k) = [C(K,k) × C(N−K,n−k)] / C(N,n), but use software for accuracy.
Start with the formula P(X=k) = [C(K,k) × C(N−K,n−k)] / C(N,n). Figure out your population size (N), total successes (K), sample size (n), and desired successes (k). Then compute the combinations and plug them in. Honestly, this is the kind of calculation you’ll want a computer to handle.
What’s the step-by-step solution
Here’s how to compute a hypergeometric probability in three common tools.
1. Microsoft Excel 365 (2026 build)
- Open your workbook and pick the cell for the result.
- Type this formula:
=HYPGEOM.DIST(k, n, K, N, FALSE)
Swap in your actual numbers for the placeholders.
- Hit Enter—the cell shows P(X = k).
- For cumulative probability P(X ≤ k), just change the last argument to TRUE.
2. Google Sheets
- Select your target cell and enter:
=HYPGEOM.DIST(k, n, K, N, FALSE)
- Press Enter—Google Sheets gives the same result as Excel.
3. Python (SciPy 1.14+, 2026)
- Update SciPy if needed:
pip install --upgrade scipy
- Run this code:
from scipy.stats import hypergeom
N = 100 # population
K = 10 # successes in population
n = 15 # sample size
k = 3 # successes you want
prob = hypergeom.pmf(k, N, K, n)
print(prob)
If this didn’t work
- Wrong parameters: Triple-check N, K, n, and k. Mixing up K and N or using percentages instead of counts is a classic mistake.
- Large population approximation: When N > 10,000 you can switch to the binomial distribution as a shortcut, but expect errors above 5 %. Only do this if exact precision isn’t critical NIST Handbook.
- Cumulative vs. point: FALSE gives an exact match; TRUE gives “up to k.” Flip the wrong switch and your answer will be way off.
Prevention tips
- Document your parameters: Jot down N, K, n, k, plus the date and data source. This keeps you from recalculating with the wrong numbers later.
- Use built-in functions: Don’t try to code the math yourself—Excel’s HYPGEOM.DIST or SciPy’s hypergeom function are faster and less error-prone.
- Validate with simulation: Run a quick Monte Carlo check. Generate 10,000 samples and count how often you hit k. The empirical rate should land within 1 % of the theoretical probability.
In 2026, the hypergeometric distribution remains unchanged mathematically, but tools like Excel, Google Sheets, and Python make it far easier to compute.
The hypergeometric distribution itself hasn’t changed since the 1930s. What’s new in 2026 is how fast and easy the tools are—Excel, Google Sheets, and Python all handle the heavy lifting now.
For a deeper dive into the mathematical derivation, see the Wolfram MathWorld entry.
Use a hypergeometric distribution when you want to determine the probability of obtaining a certain number of successes without replacement from a specific sample size.
This discrete probability distribution helps you figure out the odds of getting exactly k successes when sampling without replacement from a finite population of size N.
It is called hypergeometric because these go “over” or “beyond” the geometric progression, and the prefix comes from the ancient Greek word ˊυ′περ (“hyper”).
Because these probabilities extend beyond the geometric progression (where the rational function stays constant), they were termed “hypergeometric” from the ancient Greek prefix ˊυ′περ (“hyper”).
The hypergeometric distribution tells you the probability of selecting a specific number of successes from two groups without replacing members of the groups.
In statistics, this distribution function is often employed in random sampling for statistical quality control. It models scenarios where you’re selecting items from two distinct groups without putting any back.
The hypergeometric distribution is discrete, not continuous.
Unlike the normal distribution, which is continuous, the hypergeometric distribution only produces whole-number outcomes. It’s similar to the binomial distribution in that regard.
You know it’s a hypergeometric distribution when it’s defined by three parameters: population size, event count in the population, and sample size.
For example, imagine receiving a special order shipment of 500 labels where 2% are defective. That gives you a population size of 500, an event count (defectives) of 10, and you’d use the sample size to calculate your probabilities.
The hypergeometric distribution predicts the effect of surface deterioration on electrode behavior in the presence of two competitive processes.
In probability theory, this distribution is employed to model situations where surface deterioration affects electrode behavior when two competing processes are at play.
The hypergeometric distribution is associated with sampling without replacement.
That’s its key feature. Later, in Lesson 9, you’ll see that when samples are drawn with replacement, the discrete random variable follows the binomial distribution instead.
One example involves calculating the probability of selecting 7 females from 101 and 3 males from 95 out of a total of 196 voters.
Here, 101C7 represents the number of ways to choose 7 females from 101, while 95C3 is the number of ways to choose 3 male voters from 95. The total combinations come from 196C10, representing all possible ways to choose 10 voters from 196.
The assumptions include: it’s a discrete distribution, the population N is finite and known, and there are only two possible outcomes (success or failure).
To use the hypergeometric distribution properly, you need to meet these conditions: it must be a discrete distribution, your population size N must be finite and known, and you should only have two possible outcomes (which we typically call success or failure).
Some forms of the hypergeometric distribution can be symmetric.
Researchers have derived symmetric formulas for certain cases of the hypergeometric distribution, which can simplify calculations in specific scenarios.
Yes, the trials in a hypergeometric distribution are dependent on each other.
Like the binomial distribution, it’s used for multiple trials where you count successes and failures. The main difference? The trials depend on each other because each selection changes the pool for the next draw.
The normal distribution is continuous, not discrete.
While the hypergeometric and binomial distributions are discrete, the normal distribution is continuous. It’s used to describe variation in continuous variables rather than whole-number outcomes.
The multivariate hypergeometric distribution extends the standard version to cases with more than two different states of individuals in a group.
This is essentially an expansion of the hypergeometric distribution where you’re dealing with more than two distinct categories or states within your population.
The lognormal distribution is continuous, unlike the Pascal, binomial, and hypergeometric distributions which are all discrete.
When choosing between distributions, remember that the Pascal, binomial, and hypergeometric are all discrete. The lognormal distribution, however, is continuous and used for continuous variables.
Edited and fact-checked by the TechFactsHub editorial team.