Scientist Finds Early Virus Sequences That Had Been Mysteriously Deleted

By rooting through files stored on Google Cloud, a researcher says he recovered 13 early coronavirus sequences that had disappeared from a database last year.

By Carl Zimmer

About a year ago, genetic sequences from more than 200 virus samples from early cases of Covid-19 in Wuhan disappeared from an online scientific database.

Now, by rooting through files stored on Google Cloud, a researcher in Seattle reports that he has recovered 13 of those original sequences — intriguing new information for discerning when and how the virus may have spilled over from a bat or another animal into humans.

The new analysis, released on Tuesday, bolsters earlier suggestions that a variety of coronaviruses may have been circulating in Wuhan before the initial outbreaks linked to animal and seafood markets in December 2019.

As the Biden administration investigates the contested origins of the virus, known as SARS-CoV-2, the study neither strengthens nor discounts the hypothesis that the pathogen leaked out of a famous Wuhan lab. But it does raise questions about why original sequences were deleted, and suggests that there may be more revelations to recover from the far corners of the internet.

“This is a great piece of sleuth work for sure, and it significantly advances efforts to understand the origin of SARS-CoV-2,” said Michael Worobey, an evolutionary biologist at the University of Arizona who was not involved in the study.

Jesse Bloom, a virologist at the Fred Hutchinson Cancer Research Center who wrote the new report, called the deletion of these sequences suspicious. It “seems likely that the sequences were deleted to obscure their existence,” he wrote in the paper, which has not yet been peer-reviewed or published in a scientific journal.

Dr. Bloom and Dr. Worobey belong to an outspoken group of scientists who have called for more research into how the pandemic began. In a letter published in May, they complained that there wasn’t enough information to determine whether it was more likely that a lab leak spread the coronavirus, or that it leapt to humans from contact with an infected animal outside of a lab.

The genetic sequences of viral samples hold crucial clues about how SARS-CoV-2 shifted to our species from another animal, most likely a bat. Most precious of all are sequences from early in the pandemic, because they take scientists closer to the original spillover event.

As Dr. Bloom was reviewing what genetic data had been published by various research groups, he came across a March 2020 study with a spreadsheet that included information on 241 genetic sequences collected by scientists at Wuhan University. The spreadsheet indicated that the scientists had uploaded the sequences to an online database called the Sequence Read Archive, managed by the U.S. government’s National Library of Medicine.

But when Dr. Bloom looked for the Wuhan sequences in the database earlier this month, his only result was “no item found.”

Source: Read Full Article