COVID-19 Viral Genome Analysis Pipeline COVID-19 Viral Genome Analysis Pipeline home COVID-19 Viral Genome Analysis Pipeline home
COVID-19 Viral Genome Analysis Pipeline
Enabled by data from   gisaid-logo

Relative Frequency Change By Geographical Region

Last data update: Apr 15, 2021

See analysis on Spike sites: 222, 477, 614

Position Site of Interest
Region   Site
Correlated variant help Include only sequences with   Exclude all sequences with
        Site   AA
Do not consider. Include all sequences
Filter by
Email results

This tool employs contemporary GISAID data to identify adequately sampled geographical regions and to determine which of those regions are experiencing a significant shift (either up or down) in relative frequency of G614 vs D614 variants of the sequences sampled in that region. D614 was the original form in the Wuhan virus, but G614 is currently the predominant form in most regions; all other changes in Spike are currently very rare.

We are working on a release that will allow tracking of any site using this same strategy in the near future.

Founder effects and non-representative regional sampling can impact the frequency of a variant in any given region. But a consistent pattern of change, with one variant increasing in relative frequency across a range of geographic regions, provides evidence suggesting that a variant may have a selective advantage, or be linked to a site with a selective advantage.

For a change in the relative frequency of a variant to be observed in a geographic region, three requirements must be met:

1. Both variants must at some point be co-circulating in the region
2. Sampling must be over a long enough time to discern a change
3. There must be enough samples available for an observed difference to be statistically significant.

Specifically, for the geographic regions in the chart:

1. We define an "onset" day as the first day for which cumulative number of sequences is at least 15 and each form is represented at least 3 times.
2. We define a "delay" day as two weeks after the onset day, and we require that there be at least 15 sequences available from the post-delay period (ie, from the delay day to the last day of sampling).

A two-sided Fisher's exact test compares the counts in the pre-onset period to the counts in the post-delay period, and provides a p-value against the null hypothesis that the fraction of D614 vs G614 sequences did not change. We require p<0.05 in order to include the region in the bar chart.

Correlated variant

The "correlated variant" feature can be used to enable tracking mutations that are part of a subclade.

For example, the GR and GH clades are sub-lineages of the G clade (G clade carries 4 mutations and includes the D614G mutation), To track changes in GR or GH frequencies, using the subset of sequences that carries D614G will enable and exploration of how the GR and GH clades are changing within the context of the G614 clade.

The G clade is the dominant form of the SARS COV-2 pandemic as of summer of 2020. It carries with it 4 nucleotide changes reactive to the Wuhan form: C241T, C3037T, C14408T, A23403G
Note: GISAID formally refers to an ancestral state of the G clade with just 3 base changes, as their definition of the G clade: C241T, C3037T, A23403G.
The change mutation at C14408 was part of the set of 4 mutations that were expanded together and now the now dominant G clade. A23403G encodes the D614G mutation.

The GR clade carries the G clade four base changes, plus a 3 contiguous base changes G28881A, G28882A and G28883C. The GR clade includes the S D614G mutation and the N G204R mutation.

The GH clade carries the G clade four base changes, plus the G25563T mutation. The GH clade includes the S D614G mutation and the NS3 (ORF3a) Q57H mutation.

last modified: Fri Dec 11 10:50 2020

GISAID data provided on this website is subject to GISAID's Terms and Conditions

Questions or comments? Contact us at

Operated by Triad National Security, LLC for the U.S. Department of Energy's National Nuclear Security Administration
© Copyright Triad National Security, LLC. All Rights Reserved | Disclaimer/Privacy

Dept of Health & Human Services Los Alamos National Institutes of Health