Modeling the daily fraction of a SARS-CoV-2 variant as a function of time in local regions using isotonic regression
Here we extract all regional data from GISAID that have a minimum of 10 sequences representing a variant in the virus, with at least 14 days of sampling. The sampling days do not have to be contiguous. The tables show all political/geographical regions that meet these criteria, whether they are significant or not. The daily fraction of a variant as a function of time is modeled using isotonic regression; the null hypothesis that the fraction does not change over time. We then test the null against the hypotheses that the fraction of the new variant is either increasing or decreasing. We randomize the data in each geographic region 400 times, and refit the isotonic logistic regression to the randomized data, to evaluate changes in frequency of a new mutation could be occurring by chance alone, or is significantly increasing (as shown in the first 3 tables and sets of plots) or decreasing (as shown in the last 3 tables and sets of plots). Because we perform 400 randomizations the lowest p-value we can obtain is 0.0025. If over one time period a mutation is increasing, and another period of time it is decreasing, both can be significant. The "# days" column is the number of days with sample available, and the time window is the number of days spanned by the sampling.
The accompanying plots show the increase in the new variant over time. The dot size is proportional to the number of sequences sampled that day, and the staircase line is the maximum likelihood estimate under the constraint that the logarithm of the odds ratio is non-decreasing.
This code is by Nick Hengartner and further descriptions of these analyses and plots can be found associated with Fig. 3 in:
Tracking changes in SARS-CoV-2 Spike: evidence that D614G increases infectivity of the COVID-19 virus.
Korber B, Fischer WM, Gnanakaran S, Yoon H, Theiler J, Abfalterer W, Hengartner N, Giorgi EE, Bhattacharya T, Foley B, Hastie KM, Parker MD, Partridge DG, Evans CM, Freeman TM, de Silva TI*, McDanal C, Perez LG, Tang H, Moon-Walker A, Whelan SP, LaBranche CC, Saphire EO, and Montefiori DC.
*on behalf of the Sheffield COVID-19 Genomics Group
In press in Cell, June 2020
GISAID data provided on this website is subject to GISAID's Terms and Conditions