COVID-19 Viral Genome Analysis Pipeline COVID-19 Viral Genome Analysis Pipeline home COVID-19 Viral Genome Analysis Pipeline home
COVID-19 Viral Genome Analysis Pipeline
Enabled by data from   gisaid-logo

Relative Frequency Change By Geographical Region

Last data update: Aug 11, 2022

[ + ]
Correlated variant help
Site AA [ + ]
Filter by
Email results


This tool employs contemporary GISAID data to identify adequately sampled geographical regions and to determine which of those regions are experiencing a significant shift (either up or down) in relative frequency of specific variants.


Founder effects and non-representative regional sampling can impact the frequency of a variant in any given region. But a consistent pattern of change, with one variant increasing in relative frequency across a range of geographic regions, provides evidence suggesting that a variant may have a selective advantage, or be linked to a site with a selective advantage.

For a change in the relative frequency of a variant to be observed in a geographic region, three requirements must be met:

  1. Both variants must at some point be co-circulating in the region
  2. Sampling must be over a long enough time to discern a change
  3. There must be enough samples available for an observed difference to be statistically significant.

Specifically, for the geographic regions in the chart:

  1. We define an "onset" day as the first day for which cumulative number of sequences is at least 15 and each form is represented at least 3 times.
  2. We define a "delay" day as two weeks after the onset day, and we require that there be at least 15 sequences available from the post-delay period (ie, from the delay day to the last day of sampling).

A two-sided Fisher's exact test compares the counts in the pre-onset period to the counts in the post-delay period, and provides a p-value against the null hypothesis that the fraction of D614 vs G614 sequences did not change. We require p<0.05 in order to include the region in the bar chart.

Lineage definitions

This tool lists CoV-2 lineages as defined by Pangolin ( The WHO Greek letter designations are in parentheses.

Correlated variants

The "Correlated variant" feature can be used to enable tracking mutations that are part of a variant lineage.

As an example, one can use this tool to explore how often the E484K mutation is increasing or decreasing in the world at any geographic level based on all Spike backbones using just the top part of the tool, and with the default “Correlated variant” setting of “Do not consider”.

But one of the contexts in which the E484K mutation can be found in is in the B.1.1.7 variant Spike backbone; B.1.1.7 tends in increase in frequency once it has entered a population, and one can explore how this compares to E484K+B.1.1.7. This tool will identify all geographic locations in GISAID that have more than 10 examples of the E484K+B.1.1.7, and will determine if the fraction of E484+B.1.1.7 is increasing or decreasing relative to other forms of B.1.1.7 over time in those populations.

last modified: Tue Jul 26 10:02 2022

GISAID data provided on this website is subject to GISAID's Terms and Conditions
Questions or comments? Contact us at

Operated by Triad National Security, LLC for the U.S. Department of Energy's National Nuclear Security Administration
© Copyright Triad National Security, LLC. All Rights Reserved | Disclaimer/Privacy

Dept of Health & Human Services Los Alamos National Institutes of Health