A new and unusual variant of SARS-CoV-2 has been found in Africa. First detected in Uganda as early as October 2020, the A.23.1 variant is now found in at least 26 other countries. A.23.1, at present, accounts for under 2,000 sequences of the 3.5 million in the GISAID SARS-CoV-2 database. A.23.1 is not yet classified as a variant of concern or interest, but is definitely worth careful observation. A.23.1 contains several mutations found in variants of concern as well as six unique substitutions, four of which are in the Spike protein (Figure 1). Here we describe the potential effects of each mutation, within and external to the Spike protein, on replication, immune evasion, and pathogenesis.
The first observation is that A.23.1 does not share a common origin with all of the variants of interest or concern, including Alpha, Beta, Gamma, Delta, or Mu. All of these variants carry a triad of mutations: C241U mutation in the 5’ untranslated region, D614G in the Spike protein, and P323L in NSP12. This reflects their common origin from the first major variant to sweep the globe. A.23.1 shares four amino acid changes with a previously described variant, A.30, originally detected in Angola, but thought to originate in Tanzania. Neither of these two East African variants contains the triad of other variants of concern.
The Spike Protein (S)
Most discussions of variants focus on the receptor-binding domain, specified by the S gene, that interacts directly with the ACE2 receptor (Figure 3). The mutation V367F lies in the receptor-binding core, the base of the receptor-binding domain that stabilizes the receptor-binding motif which binds directly to ACE2.
Two nonsynonymous mutations are present in the N-terminal domain, R102I and F157L. Both of these mutations are unique to the A.23.1 variant. The substitution of arginine (R) for isoleucine (I) is a change from positive charge to neutral charge. Phenylalanine (F) to leucine (L) does not create a major polarity or charge shift, however the change in amino acid structure might nonetheless impact N-terminal domain immunogenicity. We note that the dominant variant in the world today, Delta, is mutated in amino acid positions 157 and 158. Typically, mutation in the N-terminal domain reduces recognition by some therapeutic monoclonal antibodies and convalescent sera. We also note the synonymous mutation C22000U.
The two remaining mutations lie outside major domains and instead are located near furin cleavage sites (Figure 4). Furin cleaves the Spike protein into two distinct subunits. The common mutation associated with furin cleavage is D614G, is notably absent in this variant. However, Q613H is likely to increase sensitivity to furin cleavage as well. At present, no specific function is assigned to amino acid 681. However, this position is mutated in a number of variants of interest and concern, including Alpha, Delta, Mu, A.30, and B.1.620. We additionally note one more synonymous mutation: U24097C.
Non-Structural Protein 6 (NSP6)
The NSP6 gene of Orf1a in A.23.1 is heavily mutated, as it is in several other variants of concern. NSP6 is a transmembrane protein that is required for the formation of the double-membrane replication/transcription vesicle. As such, it plays a central role in the ability of SARS-CoV-2 to evade the innate immune response by sequestering the replication-transcription complex from RIG-1 and other single and double-stranded RNA sensors. NSP6 also directly inhibits the induction of type-I interferon. The NSP6 protein binds to TANK binding protein kinase-1, inhibiting phosphorylation of interferon regulatory factor 3 required for initiation of type-I interferon synthesis.
Two of the amino acid changes, M86I and M183I, are located in transmembrane regions: M86I in region S3 and M183I in region S7 (Figure 6). Both methionine (M) and isoleucine (I) are hydrophobic uncharged amino acids. M86I is notably also found in the A.30. The L98F mutation lies in what is likely to be the cytoplasm face of the protein between the S3 and S4 transmembrane domains. We note that NSP6 is mutant in many variants of interest and concern including Alpha, Beta, Gamma, Lambda, Kappa, Iota, Eta, and Mu.
There are five mutations throughout Orf1ab that are synonymous or do not result in an amino acid change. While these do not affect protein structure, they may play a role in RNA transcription and replication. These include C4753U in NSP3, C8782U in NSP4, C10747U in NSP6, C16575U in NSP13, and C17745U in NSP12.
The Nucleocapsid Protein (N)
There are two mutations in the N protein of A.23.1. One changes the amino acid at position 202 from serine to asparagine. The second is silent. N, like many other viral proteins, is multifunctional. In addition to forming the helical nucleocapsid, N is reported to be required for packaging the viral into infectious particles. Other functions of N include facilitating replication of the viral genome and synthesis of viral messenger RNAs. The N protein is also reported to suppress innate immune responses to viral infection. Amino acid 202 is located at a critical junction, connecting several active domains which modulate the RNA binding, oligomerization, and physical-chemical properties (phase separation) of the N protein (Figure 8). The Tanzanian variant A.30 also carries the S202N mutation.
The Open Reading Frame Protein-8 (Orf8)
The Orf8 protein of A.23.1 is mutant at amino acid positions 84 and 92. Both mutations are predicted to alter the structure and possibly the function of the Orf8 protein. The L84S mutation is a substitution of leucine for serine, a nonpolar-to-polar shift. The mutation E92K results in a change of glutamic acid to lysine, a shift from a negative to the positively charged residue. Orf8 interferes with immune recognition and destruction of SARS-CoV-2 infected cells.
The discovery of two distinct but distantly related variants in East Africa is concerning in and of itself. The observations that these variants arose independently from all others in the world, lacking the distinctive triad of mutations that link all other current variants demonstrates the versatility of SARS-CoV-2 adaptations to local conditions. These findings heightened the urgency for vigorous global surveillance and early detection of new variants focused on the sequence of the entire viral genome, not select regions, such as the S gene, only.