COVID-19 Viral Genome Analysis Pipeline
Enabled by data from   gisaid-logo

External Tools

Link to Tools, Software, and Information


Coronavirus Genetic Analysis and Genotyping Tools

The GISAID initiative promotes the rapid sharing of data from all influenza viruses and the SARS-CoV-2 coronavirus causing COVID-19. This includes genetic sequence and related clinical and epidemiological data associated with human viruses, and geographical as well as species-specific data associated with avian and other animal viruses, to help researchers understand how viruses evolve and spread during epidemics and pandemics.

NextStrain Genomic epidemiology of novel coronavirus - Global subsampling, illustrates the phylogeny of a downsampled subset of genomes in the GISAID database.

Edge Bioinformatics provides automated genome assembly and helps users create a good file for Genbank submission.

Edge Bioinformatics Assay Validation provides In Silico Evaluation of Diagnostic Assays.

PANGOLIN SARS-CoV-2 genotyping tool performs genotyping of SARS-Cov-2 sequences.

The A dynamic nomenclature proposal for SARS-CoV-2 to assist genomic epidemiology Nomenclature for SARS-CoV-2 genotypes. Andrew Rambaut, Edward C. Holmes, Verity Hill, Áine O’Toole, JT McCrone, Chris Ruis, Louis du Plessis, Oliver G. Pybus. “A dynamic nomenclature proposal for SARS-CoV-2 to assist genomic epidemiology”. bioRxiv 2020.04.17.046086; doi:

The Coronavirus Genotyping Tool uses phylogenetic methods to identify cornoavirus genotype. A paper describing the tool has been published.

The Year-letter Genetic Clade Naming for SARS-CoV-2 on describes the clades and names proposed by

Daily analyses of SARS-CoV-2 genomic data DataMonkey applies genetic selection pressure analyses to virus genome data.

COMET Context-based Modeling for Expeditious Typing (COMET) identifies subtypes and CRFs for HIV-1 and HIV-2

Phylogenetic Tools

PHYML "fast and accurate heuristic for estimating maximum likelihood phylogenies. Large DNA and protein sequences data sets can be analysed under a broad range of substitution models"

RAxML is a program for sequential and parallel Maximum Likelihood based inference of large phylogenetic trees

MEGAMEGA a versatile, user-friendly program that does phylogenetic analysis and bootstrapping (Windows)

Tree-Puzzle "a computer program to reconstruct phylogenetic trees from molecular sequence data by maximum likelihood" Windows, UNIX, Mac, VMS.

HyPh another program for maximum-likelihood trees; very flexible and comes with its own programmable interface (Windows, UNIX, Mac)

DataMonkey detects positive and negative selection on individual sites by nonsynonymous/synonymous analysis and Single Likelihood Ancestor Counting (SLAC) analysis

TreeView displays treefiles; more flexible and easy to use than Drawtree/Drawgram (Windows, UNIX, Mac)

MODELTEST "helps a user to choose the model of DNA substitution that best fits his/her data, among 56 possible models" Windows, UNIX, Mac.

Phylogenetic Analysis a comprehensive list of programs for creating phylogenetic trees, maintained by Joe Felsenstein at the University of Washington

BEAST Bayesian Evolutionary Analysis Sampling Trees: a cross-platform program for Bayesian MCMC analysis of sequences; can reconstruct phylogenies and test evolutionary hypotheses without conditioning on a single tree topology

MrBayes Bayesian Evolutionary Analysis Sampling Trees: a cross-platform program for Bayesian MCMC analysis of sequences; can reconstruct phylogenies and test evolutionary hypotheses without conditioning on a single tree topology

Edinburgh University Molecular evolution, phylogenetics and epidemiology software contains many high-quality programs, among them Se-Al (multiple sequence alignment program) and many Monte Carlo programs to simulate tree and sequence evolution

ClusterPicker at Edinburgh University, a tool for automatic identification of phylogenetic clusters.

Multiple Sequence Alignment & Manipulation

SeaView is a full-function alignment editor that can do automatic alignments and trees and allows plug-ins. Runs on most operating systems

AliView is another nice editor that runs on modern Macs (10.9+), as well as on Linux and Windows

BioEdit is a PC-compatible full-featured alignment editor; comes with built-in alignment and tree-building programs. Old, but still runs on Windows 7; it can also be installed on Macs via Wine

Jalview is a free program for multiple sequence alignment editing, visualization and analysis

CLUSTALX CLUSTALX is distributed as executables for DOS/Windows, Mac, and UNIX operating systems

Multiple Sequence Alignment a collection of alignment programs such as CLUSTALW, for aligning nucleic acid or amino acid sequences. Maintained by the Department of Genetics at the University of Washington, Seattle

SAM: Sequence Alignment and Modeling Programs Similar to HMMER, this is a package of tools for building and using Hidden Markov Models of sequence alignments; includes tool to convert between SAM and HMMER formats

Sequence Manipulation Suite contains an array of basic tools for manipulating DNA and protein sequences

READSEQ converts sequences to and from 15 different formats

Coronavirus Reference Strain Sequences

Severe acute respiratory syndrome coronavirus 2 isolate Wuhan-Hu-1, complete genome.

SARS coronavirus Tor2 genome, reference genome for SARS coronavirus.

Middle East respiratory syndrome coronavirus, complete genome. Reference genome for MERS coronavirus.

Other SARS-Cov-2 and COVID-19 Tools and Research Resources

World Health Organization COVID-19 Information

World Health Organization COVID-19 Situation Reports

USA Centers for Disease Control COVID-19 Information.

CoronaWhy is a globally distributed, volunteer-powered research organisation, trying to assist the medical community's ability to answer key questions related to COVID-19

COVID-19 Data Portal The European Commission and EMBL’s European Bioinformatics Institute (EMBL-EBI), together with EU Member States and research partners such as ELIXIR, operate a dedicated European COVID-19 Data Platform that enables the rapid collection and comprehensive data sharing of available research data from different sources for the European and global research communities. It includes host genes/proteins of interest to COVID-19 research.

COVID-19 Epidemiology

An interactive visualization of the exponential spread of COVID-19, a web based tool for visualizing COVID-19 case data over time by geographic region.

NCBI SARS-2 COVID-19 data entry page for SARS-Cov-2 genomic data and PubMed searching on papers related to the COVID-19 pandemic.

COVID-19 vaccine tracker This tracker lists COVID-19 vaccine candidates currently in Phase 1-3 trials, as well as major candidates in pre-clinical stages of development and research. Information will be updated weekly.

COVID-19 Clinical Trials An online, open access database set up by the Anticancer Fund (ACF, Brussels, Belgium) gives details of over 200 clinical trials for COVID-19 therapeutic techniques. The database currently contains 263 clinical trials taking place worldwide and gives insight into the type of drug candidates being tested.

COVID-19 Confirmed and Forecasted Case Data Current cases and forecast

CDC Case count by USA county.

Johns Hopkins -compiled USA county-level case counts, and global case counts information.

New York Times-compiled USA county-level case counts (numerical data).

US data including testing rates and results, hospitalization data, and data quality scores (The Atlantic).

Interactive worldwide infection data.

Financial Times (London) worldwide fixed and interactive plots, including historical death rate comparisons.

The US Covid Atlas (UChicago).

Arlequin cross-platform Java program for population genetic analysis; takes sequence, RFLP, microsatellite, and allele frequency data; calculates measures of diversity, provides tests for linkage and Hardy-Weinberg disequilibrium, etc.

SimPlot tool for recombination analysis by Stuart Ray at Johns Hopkins University; does both similarity plots and bootscan analysis; the most widely used for recombination analysis in HIV research

BankIt and Sequin are tools for sequence submission to GenBank (NCBI); BankIt is for a few sequences; Sequin is for larger or more complicated sets

Genetic Data Environment an integrated Linux environment for bioinformatics and evolutionary analysis

Mullins Computational Biology Tools a suite of programs developed at the University of Washington for analysis of HIV (and other) sequences, including diversity assessment tools, a method for rooting a tree in a central position, and other useful scripts and software

Genome Detective SARS-CoV-2 genome assembly form short read data You can submit sequences/contigs or short reads NGS data. Up to 2000 FASTA sequences may be submitted at once. The free service only allows one NGS analysis at a time.

ExPASy a large compilation of proteomics tools maintained by the Swiss Institute of Bioinformatics

Viral Genomics Analysis Software a suite of tools at the Broad Institute for next-gen sequence assembly, primer design, etc.

An interactive visualization of the exponential spread of COVID-19 at the National Institutes of Health (NIH), maintained by the National Institute of Allergy and Infectious Diseases (NIAID)

Virology Information

All the Virology on the WWW links and descriptions for many science resources and websites in virology and microbiology


