COVID-19 Viral Genome Analysis Pipeline COVID-19 Viral Genome Analysis Pipeline home COVID-19 Viral Genome Analysis Pipeline home
COVID-19 Viral Genome Analysis Pipeline
Enabled by data from   gisaid-logo


Isotonic Regression


Isotonic regression for tracking Pango lineages by name

Mutations
Position(s)
test AA [ + ]
and/or
Insertion inserted AA(s) after postion
Options
Assumption  Test amino acid (mutant) form is increasing decreasing over time
Correlated variant help
Site AA [ + ]
Filter by
-
  
Geographic level
Y axis range (0 - 1.0)



Modeling the daily fraction of a SARS-CoV-2 variant as a function of time in local regions using isotonic regression

Here we extract all regional data from GISAID that have a minimum of 10 sequences representing a variant in the virus, with at least 14 days of sampling. The sampling days do not have to be contiguous. The tables show all political/geographical regions that meet these criteria, whether they are significant or not. The daily fraction of a variant as a function of time is modeled using isotonic regression; the null hypothesis that the fraction does not change over time. We then test the null against the hypotheses that the fraction of the new variant is either increasing or decreasing. We randomize the data in each geographic region 400 times, and refit the isotonic logistic regression to the randomized data, to evaluate changes in frequency of a new mutation could be occurring by chance alone, or is significantly increasing (as shown in the first 3 tables and sets of plots) or decreasing (as shown in the last 3 tables and sets of plots). Because we perform 400 randomizations the lowest p-value we can obtain is 0.0025. If over one time period a mutation is increasing, and another period of time it is decreasing, both can be significant. The "# days" column is the number of days with sample available, and the time window is the number of days spanned by the sampling.

The accompanying plots show the increase in the new variant over time. The dot size is proportional to the number of sequences sampled that day, and the staircase line is the maximum likelihood estimate under the constraint that the logarithm of the odds ratio is non-decreasing. The dotted line is the fraction of the variant over the considered time window. It provides a baseline for "no change" in the fraction of the variant.

This code is by Nick Hengartner and further descriptions of these analyses and plots can be found associated with Fig. 3 in:

Tracking changes in SARS-CoV-2 Spike: evidence that D614G increases infectivity of the COVID-19 virus.
Korber B, Fischer WM, Gnanakaran S, Yoon H, Theiler J, Abfalterer W, Hengartner N, Giorgi EE, Bhattacharya T, Foley B, Hastie KM, Parker MD, Partridge DG, Evans CM, Freeman TM, de Silva TI*, McDanal C, Perez LG, Tang H, Moon-Walker A, Whelan SP, LaBranche CC, Saphire EO, and Montefiori DC.
*on behalf of the Sheffield COVID-19 Genomics Group
Cell, June 2020
DOI:10.1016/j.cell.2020.06.043


Lineage definitions

This tool lists CoV-2 lineages as defined by Pangolin (cov-lineages.org). The WHO Greek letter designations are in parentheses.

Correlated variants

The "Correlated variant" feature can be used to enable tracking mutations that are part of a variant lineage.

As an example, one can use this tool to explore how often the E484K mutation is increasing or decreasing in the world at any geographic level based on all Spike backbones using just the top part of the tool, and with the default “Correlated variant” setting of “Do not consider”.

But one of the contexts in which the E484K mutation can be found in is in the B.1.1.7 variant Spike backbone; B.1.1.7 tends in increase in frequency once it has entered a population, and one can explore how this compares to E484K+B.1.1.7. This tool will identify all geographic locations in GISAID that have more than 10 examples of the E484K+B.1.1.7, and will determine if the fraction of E484+B.1.1.7 is increasing or decreasing relative to other forms of B.1.1.7 over time in those populations.


last modified: Fri Oct 11 17:48 2024

GISAID data provided on this website is subject to GISAID's Terms and Conditions
Questions or comments? Contact us at seq-info@lanl.gov.

 
Operated by Triad National Security, LLC for the U.S. Department of Energy's National Nuclear Security Administration
© Copyright Triad National Security, LLC. All Rights Reserved | Disclaimer/Privacy

Dept of Health & Human Services Los Alamos National Institutes of Health