Since the onset of the COVID-19 pandemic, several severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) variants of concern (VOC) have emerged, leading to repeated increases in cases, deaths, and hospitalizations worldwide. Classification of these variants according to the PANGO nomenclature (Phylogenetic Assignment of Named Global Outbreak Lineages) shows that although they are descended from a common ancestor, they are not direct descendants of each other.
The PANGO genera that have been corresponding to the VOCs include Alpha variant (B.1.1.7 and Q genera), Beta variant (B.1.351 and progeny genera), Gamma variant (P.1 , which is a descendant of B.1.1). .28 and descendants), Delta variant (B.1.617.2 and AY genera) and Omicron variant (B.1.1.529 and BA genera).
All variants were reported to have evolved from the B.1 genus, while Alpha, Gamma and Omicron also have B.1.1 as an additional parent genus. However, these classifications do not describe the degree of distinctiveness between the variants or provide insight into the genetic characteristics of the variants.
The evolution of SARS-CoV-2, like all other viruses, occurs through the mutation of its genome; these mutations alter the amino acid sequences of the viral proteins. The mutations can be selected either positively or negatively based on their impact on viral fitness. Mutations in several regions, such as the N-terminal domain (NTD) of the Spike glycoprotein and receptor binding domain (RBD), improved viral fitness. Although much attention has been paid to individual mutations at the amino acid level, limited attention has been given to the nucleotide sequence level.
A new study published in the pre-print server medRxiv* hypothesized that the emergence of more immuninvasive or transmissible variants of SARS-CoV-2 was associated with increased genetic distinctiveness from the original or earlier strains.
Study: Genomic diversification of long polynucleotide fragments is a signature of new SARS-CoV-2 variants of concern. Image credit: NIAID
To test the hypothesis, the study introduced a new method that quantifies the number of different nucleotide n-mers (of different sizes) in VOCs to estimate the degree of viral development.
About the study
The study involved calculation and quantification of the number of characteristic n-mer for SARS-CoV-2 sequences from the original reference strain (PANGO lineage A) and five VOCs, Alpha, Beta, Gamma, Delta and Omicron obtained from GISAID database. In addition, the number of amino acid mutations for the sequences obtained from GISAID was determined and compared to the original Wuhan-Hu-1 strain of SARS-CoV-2.
Multiple sequence adjustment (MSA) was performed for the sub-sampled SARS-CoV-2 genomes to calculate the phylogenetic distance. Finally, the distinctiveness of n-mers for a specific SARSCoV-2 lineage was calculated using an alternative metric, A * (1-B).
Distribution of polynucleotide characteristics of SARS-CoV-2 variants of concern (VOCs). (A) Schematic illustration of polynucleotide sequence analysis. SARS-CoV-2 sequences are analyzed to generate a set of distinct n-mer polynucleotide sequences (max. N-mer size = 240). (B) Venn diagram showing the mean of the distributions of shared and unique nucleotide 9-mers between all combinations of variants across 100,000 replicate comparisons. The beta variant was excluded from this visualization to reduce clutter. (C) Density plot showing 9-mer sequence characteristics of VOCs, measured by the number of distinct 9-mer polynucleotide sequences. (DE) Heatmaps showing Cohen’s and Jensen-Shannon divergence values from pairwise comparisons of the distributions shown in (C). (F) Cohen’s D for the characteristic n-mer distributions of alpha, beta, gamma, delta and omicron variants against the original strain for different n-mer lengths (n = 3, 6, 9, 12 , 15, 18, 21 24, 30, 45, 60, 75, 120 and 240). (G) Density plot showing a further example of genomic characterization of VOCs, measured by the number of distinct 15-mer polynucleotide sequences. Data shown in panels BG were generated using 287,739 unique SARS-CoV-2 sequences in total, divided across the variants as shown in the explanation of C. Abbreviations: μ – medium; IQR – interquartile range; VOC – variant of concern.
The results reported that from each genome a characteristic nucleotide 9-mer (DN9s) was derived which was present in a given lineage but absent from all others. The number of DN9s corresponded to the time of emergence and proved to be highest for Omicron, followed by Delta, Alpha, Gamma and finally Beta variant. The omicron sequence was also found to have more DN9s than all other VOCs.
Map of SARS-CoV-2 VOC prevalence by geographic area. Geographical distribution of alpha (B.1.1.7), beta (B.1.351) and gamma (P.1) variants based on sequences deposited in GISAID up to and including 14 December 2021. Each pie chart shows the proportion of alpha, beta or gamma sequences deposited in the country. Note that the denominator is the number of sequences labeled as one of these three variants, rather than the total number of sequences deposited in that country. Each pie chart thus answers the following question: “Of all the genomes deposited in a given country that were assigned as alpha, beta or gamma, what proportion of genomes was assigned to each of these three genera?” The presence of Delta and Omicron does not prove to better highlight the geographical distribution of alpha, beta and gamma; however, Delta and Omicron are currently or have been widely used in the regions shown in the past. Only countries where at least 1000 sequences are deposited are displayed. The variants depicted, which circulated at about the same time, became generally prominent in geographically separated regions.
Omicron was stated to be the most mutated VOC, while the phylogenetic distance between Gamma from Alpha and Beta was the most notable. The results also suggest that the newly emerged SARS-CoV-2 variants were genetically distinct from the original strain and that they included unique nucleotide sequences that resulted in distinctiveness. The distinctiveness was also found to increase within a genus of evolutionary time.
Thus, the current study provides a new method that will help researchers identify and assess the distinctiveness of any new SARS-CoV-2 variants compared to the previous ones. However, further research is needed to determine if this method will be able to classify lineages as VOCs earlier than the time it currently takes, how vaccination will affect SARS-CoV-2 genomic diversity, and also to determine whether SAR-CoV-2 infection would progress toward a seasonal or endemic nature.
The study had certain limitations. First, since the number of available Omicron sequences in the GISAID database is currently low, this can lead to oversampling. Second, protein-encoding nucleotide n-mers or amino acid n-mers, other than nucleotide 9-mers, should also be considered in the determination of genomic diversity. Third, the study may be sensitive to the pedigree composition of the complement group. Finally, further research is needed regarding the relationship between genomic trait measurements with phylogenetic depth and evolutionary time.
medRxiv publishes preliminary scientific reports that are not peer-reviewed and therefore should not be considered as crucial, guide clinical practice / health-related behavior or be treated as established information.