See analysis on Spike sites: 222, 477, 614
This tool employs contemporary GISAID data to identify adequately
sampled geographical regions and to determine which of those regions
are experiencing a significant shift (either up or down) in relative
frequency of G614 vs D614 variants of the sequences sampled in that
region. D614 was the original form in the Wuhan virus, but G614 is
currently the predominant form in most regions; all other changes in
Spike are currently very rare.
We are working on a release that will allow tracking of any site using
this same strategy in the near future.
Founder effects and non-representative regional sampling can impact
the frequency of a variant in any given region. But a consistent
pattern of change, with one variant increasing in relative frequency
across a range of geographic regions, provides evidence suggesting that
a variant may have a selective advantage, or be linked to a site with a selective advantage.
For a change in the relative frequency of a variant to be observed in
a geographic region, three requirements must be met:
1. Both variants must at some point be co-circulating in the region
2. Sampling must be over a long enough time to discern a change
3. There must be enough samples available for an observed difference
to be statistically significant.
Specifically, for the geographic regions in the chart:
1. We define an "onset" day as the first day for which cumulative
number of sequences is at least 15 and each form is represented at
least 3 times.
2. We define a "delay" day as two weeks after the onset day, and we require
that there be at least 15 sequences available from the post-delay
period (ie, from the delay day to the last day of sampling).
A two-sided Fisher's exact test compares the counts in the pre-onset period to the counts in the post-delay period, and provides a p-value against the null hypothesis that the fraction of D614 vs G614 sequences did not change. We require p<0.05 in order to include the region in the bar chart.
The "correlated variant" feature can be used to enable tracking mutations that are part of a subclade.
The G clade is the dominant form of the SARS COV-2 pandemic as of summer of 2020. It carries with it 4 nucleotide changes reactive to the Wuhan form: C241T, C3037T, C14408T, A23403G
Note: GISAID formally refers to an ancestral state of the G clade with just 3 base changes, as their definition of the G clade: C241T, C3037T, A23403G.
The change mutation at C14408 was part of the set of 4 mutations that were expanded together and now the now dominant G clade. A23403G encodes the D614G mutation.
The GR clade carries the G clade four base changes, plus a 3 contiguous base changes G28881A, G28882A and G28883C. The GR clade includes the S D614G mutation and the N G204R mutation.
The GH clade carries the G clade four base changes, plus the G25563T mutation. The GH clade includes the S D614G mutation and the NS3 (ORF3a) Q57H mutation.
GISAID data provided on this website is subject to GISAID's Terms and Conditions