Skip to main content

What Does A High E Value Mean?

by
Last updated on 3 min read

A high E value in BLAST indicates a weak or likely random match, typically ≥ 0.05, suggesting the similarity may not be biologically meaningful

What does a high E value indicate in BLAST?

A high E value (≥ 0.05) signals a weak match that could occur by random chance

Think of the E value as a "how lucky was I to find this?" score. It tells you how many times you’d expect to see that match in a database search purely by accident. Short queries and huge databases? Those naturally push E values higher, so don’t panic if you see a 2.0 for a 20-nucleotide match NCBI BLAST Help. The real story comes from looking at alignment length, percent identity, and bit score together—not just the E value alone.

How do I adjust BLAST settings to handle high E values?

Raise specificity by lowering the Expect threshold—use 0.001 for nucleotides or 1e-5 for proteins

Head to “Algorithm parameters” and tighten that Expect threshold below the default 10. For really short reads under 50 nucleotides, you might temporarily bump it up to 10 to avoid missing anything important, but flag those hits for extra scrutiny later NCBI BLAST Handbook (2026). After the search finishes, filter everything with E < 0.01 and actually look at the top hits before you decide what they mean.

Why do short sequences often return high E values?

Short sequences have fewer informative characters, making matches more likely to occur by chance

A 15-mer can randomly pop up in a billion-base database with an E value around 1.0—that’s basically a coin flip. Longer queries give you way more statistical muscle, which slashes those false positives PMC study on BLAST statistics. If short reads are unavoidable, double-check with something solid like PCR or Sanger sequencing before you trust the results.

Can I trust a high E value if the percent identity is also high?

No—always pair E value with alignment length and percent identity for reliability

Here’s the trap: 95% identity over just 10 base pairs might still be random noise and still give you a sky-high E value. A safer rule? Accept nucleotide matches only when they’re >75% identical across >50 bp, and protein matches >30% across >60% of the query NCBI C-Search Guidelines. Use bit score and query coverage as your backup witnesses to confirm you’re not chasing ghosts.

What additional filters can reduce false positives from high E values?

Apply the Low-complexity filter and switch to curated databases like RefSeq

Those repetitive or low-complexity sequences? They love inflating E values. Turn on the Low-complexity filter to clear out the clutter NCBI Advanced Options. Working with metagenomic data? Ditch the default nr database and use RefSeq or another manually curated source to cut down on nonsense hits. Before you even run BLAST, clean up your raw reads with fastq-dump (v3.0.5) to strip adapters and low-quality bases—that small step saves you headaches later.

Edited and fact-checked by the TechFactsHub editorial team.
David Okonkwo
Written by

David Okonkwo holds a PhD in Computer Science and has been reviewing tech products and research tools for over 8 years. He's the person his entire department calls when their software breaks, and he's surprisingly okay with that.

What Does An IRB Do?What Describes The Proper Form Of A Balance Sheet?