Genomic signatures in viruses
Abstract
In an age of global pandemics, studying how viruses and their genomes evolve is of great importance. It has previously been found that the genomes of many eukaryotes and prokaryotes have specific preferences for nucleotides, dinucleotides, and codons. Such preferences are characterized by the selective pressure acting on the genomes and are referred to as specific genomic signatures. The presence of such signatures has, to our knowledge, not been studied in viruses, and it is, therefore, the aim of this thesis to thoroughly investigate genomic signatures in viruses.
In the first two papers of this thesis, new algorithms for the study of genomic signatures were developed. Here, such genomic signatures were based on variable-length Markov chains of a genome. Compared to pre-existing methods, our new algorithms are a thousand times faster, and compared to the state-of-the-art, the algorithms are up to 600 times faster while also requiring less memory. These methods enable computationally efficient analysis of genomic signatures, even on laptops.
In the subsequent two papers, we thoroughly analyzed the genomic signatures of viruses and compared such signatures to those of the viruses' hosts. The results illustrate that a majority of viruses have specific genomic signatures. In addition, in most cases, the signatures of viruses are not similar to the signatures of their hosts other than in GC content. This dissimilarity indicates that viruses' signatures are independent of their host's signature, despite viruses' dependence on their host's genetic and protein-expression machinery.
In the final paper, we illustrated an application of the genomic signatures by applying them to identify recombination events between Human alphaherpesvirus 1 and Human alphaherpesvirus 2.
We thus demonstrate that genomic signatures of variable length are an important property of virus genomes. They hint at the importance of the evolution of specific patterns of the nucleotide sequence of viruses. These patterns can likely identify even remotely related viruses in collections of unknown sequences, thus helping detect and classify novel viruses. In addition, it might be possible to use and modify the genomic signatures to, e.g., attenuate viruses to create vaccine candidates.
Parts of work
1. Gustafsson, J., Norberg, P., Qvick-Wester, J.R., Schliep, A. Fast parallel construction of variable-length Markov chains. BMC Bioinformatics 22, 487 (2021). https://doi.org/10.1186/s12859-021-04387-y 2. Gustafsson, J., Edwards, S. V., Schliep, A., Norberg, P. Estimating phylogenies from raw sequencing reads using variable-length Markov chains. Manuscript 3. Holmudden, M.*, Gustafsson, J.*, Schliep, A., Norberg, P. Species-specific genomic signatures in viruses. Manuscript. * Denotes shared first-authorship 4. Gustafsson, J., Schliep, A., Norberg, P. Virus-host similarities in genomic signatures. Manuscript 5. Gustafsson, J., Schliep, A., Norberg, P. Detection of Herpes simplex type 1 and 2 recombination in clinical samples. Manuscript
Degree
Doctor of Philosophy (Medicine)
University
University of Gothenburg. Sahlgrenska Academy
Institution
Institute of Biomedicine. Department of Infectious Diseases
Disputation
Fredagen den 16 juni 2023, kl. 13.00, Föreläsningssalen våning 3, Guldhedsgatan 10a, Göteborg
Date of defence
2023-06-16
Date
2023-05-25Author
Gustafsson, Joel
Keywords
Virus evolution
Bioinformatics
Markov Chains
Genomic signatures
Publication type
Doctoral thesis
ISBN
978-91-8069-269-4 (tryckt)
978-91-8069-270-0 (PDF)
Language
eng