Random variation is an essential component of all living things. It drives diversity, and it is why there are so many different species. Viruses are no exception. Most viruses are experts at changing genomes to adapt to their environment. We now have evidence that the virus that causes Covid, SARS-CoV-2, not only changes, but changes in ways that are significant. This is the first in a series of articles on how the virus changes and what that means for humanity.

In September 2020, a sample of the new coronavirus variant B.1.1.7, also known as the UK variant, was collected and identified for the first time. Within three months, B.1.1.7 would be, at least in Britain, the dominant strain of SARS-CoV-2. Today, its prevalence is estimated to be above 90 percent; its contagiousness, up to 75 percent greater than previous strains; and its viral load, higher too.

B.1.1.7 wasn’t the first variant to successfully overthrow its predecessor, and now it can be found in more than 35 countries. Nor will it be the last, as the 501.v.2 variant, now the dominant strain circulating in South Africa, is already proving. But how do variants form in the first place? The answer goes back to the building blocks of life: the genome. More specifically, single-letter differentiations in viral RNA that may begin as mere errors, but when sufficiently multiplied increase a virus’s chances of survival. These are otherwise known as mutations.

To replicate its genome and thus reproduce itself, SARS-CoV-2 relies on an enzyme called the RNA-dependent RNA polymerase. As the polymerase does the mostly monotonous work of stringing together nucleotides—the organic, protein-coding molecules classified as either adenine (A), thymine (T), cytosine (C), or uracil (U)—a number of mistakes make their way into the mainframe. Many of these point mutations will ultimately be inconsequential. The ones that matter change proteins that impact how the virus functions, such as the spike protein.

We can make sense of mutations using six distinct categories. The first and most common type is substitution. Where an A should go, for example, a G somehow ends up instead. Unlike most RNA viruses, human coronaviruses come equipped with corrective machinery that can normally fix substitutions. This so-called proofreading mechanism is why SARS-CoV-2 has thus far yielded a rate of error much lower than RNA viruses like HIV and influenza. But what this virus lacks in its capacity for variation, it makes up for in the large number of people it infects—meaning that when a variant does catch on, it can spread like wildfire.

The second type is deletion. Deletions occur when the polymerase skips, leaving a gap in the genetic code. The nasty thing about deletions is that the virus cannot correct them retroactively. Though more likely to disrupt function than a substitution, and just as capable of killing the virus altogether, some deletions, both small and large, still allow the virus to grow. The mutations of this sort that are most advantageous to the virus—and of most concern to us—alter the virus in significant ways.

Two kinds of deletions exist—in frame, and out of frame. So long as deletions appear in multiples of three, they remain in frame, unobtrusive to the protein as a whole. However, deletions that occur singly or doubly—out of frame—disrupt the entire protein, not unlike the misplaced letter in a run-on sentence that causes the reader to stop and stumble.

The third type is insertion, a mutation that has been observed occasionally amongst the hundreds of thousands of genomic sequences stored in GISAID, an open-access sequence repository. Insertions describe the extra bits of genetic information, whether it’s one nucleotide or an entire segment, that end up in a given genome by chance.

The fourth type is inversion, which happens when a segment of the genome is flipped on its head, reversing the order of its nucleotides.

When the polymerase stutters and spits out the same bit of information over again, a duplicate is created—hence the fifth type, duplication.

The sixth and final type is recombination. Literally, parts of two separate viruses become conjoined. When two different viral genomes find themselves in the same cell, their bits and pieces can shuffle from one to the other, unwittingly settling into new forms in the process. The way SARS-CoV-2 produces its proteins encourages recombination. Even a region as crucial as the S protein could be up for grabs.

The existing literature on coronaviruses already tells us they can recombine, at least in bats. In fact, this process is likely how SARS-CoV-2 came to be born, possibly in the ant-eating mammals known as pangolins. There are unpublished reports that recombination among SARS-CoV-2 variants occurs. In a dense crowd, one person might be infected simultaneously by two different people, or reinfected when subgenomic remnants of their first infection still linger on in their system. SARS-CoV-2 can also infect animals that live closely with us, and as such recombination events combining SARS-CoV-2 with animal genes might also occur, changing the virus in unknown and possibly dangerous ways.

Given the high prevalence of both SARS-CoV-2 and other cold-causing viruses at this time, I suspect all the events above have either already occurred or inevitably ill. At first glance, none of these mutations, taken in isolation, would appear capable of kickstarting a new variant of disease. Calculated and conceived at scale, however, a more swarming and certainly more startling impression begins to emerge.

Take substitution. On average, in a viral genome 30,000 units long, substitution alters one unit of every 10,000. We can’t say for sure how many viruses are teeming at any moment within one person infected with Covid-19, but we do know that every milliliter of contagious nasal fluid contains something like 10^8 (one hundred million) to 10^9 (one billion) viral particles.

One billion viruses, per our one per 10,000 average, means three billion substitutions—and that’s just in one drop of mucus. Multiply that by the actual volume of active virus in a single body, then the number of active Covid-19 infections in a given populace, and before long you’ll find yourself well into the quintillions. In other words, the mutations that give way to viral variation are happening all the time at rates nearly inconceivable.

We now know the mechanisms by which viral variation arises. The reasons why are what I’ll cover in the next piece in this series.