Genomic mutational 
heterogeneity in cancer 
Improved models and tools for driver gene 
detection
Martin Boström 
Department of Medical Biochemistry and Cell Biology 
Institute of Biomedicine 
Sahlgrenska Academy, University of Gothenburg 
Gothenburg 2022 
Cover illustration: Increased UV susceptibility due to ETS family transcription 
factor binding underlies promoter hotspot mutations in melanoma. 
By Martin Boström. DNA pattern brush for Adobe Illustrator by James 
Hedberg used with permission. 
Genomic mutational heterogeneity in cancer: Improved models and tools for 
driver gene detection 
© Martin Boström 2022 
martin.bostrom@gu.se 
ISBN 978-91-8009-578-5 (PRINT) 
ISBN 978-91-8009-579-2 (PDF) 
Printed in Borås, Sweden 2022 
Printed by Stema Specialtryck AB 
Till Antonia och Rufus 

Genomic mutational heterogeneity in 
cancer 
Improved models and tools for driver gene 
detection
Martin Boström 
Department of Medical Biochemistry and Cell Biology, Institute of 
Biomedicine 
Sahlgrenska Academy, University of Gothenburg 
Gothenburg, Sweden 
ABSTRACT 
Cancer is a disease that is strongly related to evolution, as mutations that confer 
a benefit to individual cells face positive selection and eventually lead to 
tumorigenesis. As such, the search for genes that drive cancer development 
entails distinguishing positive selection from other sources of increased 
mutation rates, which requires detailed knowledge of how normal mutation 
rates vary across the genome. This thesis aims to improve that knowledge, as 
well as to provide novel methods of driver detection. 
In cutaneous melanoma, there are mutational hotspots in promoters that 
coincide with the sequence motif “TTCCG”. These hotspots could easily be 
misinterpreted as cancer drivers, but in the first paper of this thesis we show 
that they are in fact caused by increased UV damage susceptibility upon 
transcription factor binding, with some contribution from impaired DNA 
repair. 
In the second paper, we study how the UV mutational signature varies between 
different genomic regions and show that the main difference is caused by the 
level of cytosine methylation, owing to its effect on UV damage formation. 
We also improve the traditional trinucleotide mutational signature by 
incorporating longer patterns, capturing the effect of TTCCG-related promoter 
mutations. 
In the third paper, we demonstrate a novel method for driver detection that 
ignores recurrence signals, instead testing the likelihood of observing a 
particular combination of mutated tumours in a patient cohort. In addition to 
providing an orthogonal perspective on driver detection, this method is less 
sensitive to flaws in modelling some forms of mutational heterogeneity, such 
as the TTCCG hotspots. 
In summary, this thesis improves our knowledge of mutational heterogeneity 
in cancer, in addition to describing a new driver detection test that is less 
sensitive to situations where that knowledge falls short. Both of these advances 
contribute to the search for genes that drive cancer development.  
Keywords: Cancer, genomics, ultraviolet light, mutational heterogeneity 
ISBN 978-91-8009-578-5 (PRINT) 
ISBN 978-91-8009-579-2 (PDF)  
SAMMANFATTNING PÅ SVENSKA 
Cancer är en sjukdom med stark koppling till evolution. Mutationer som 
gynnar individuella celler utsätts för positiv selektion, och bidrar därmed till 
tumörbildning. Jakten efter gener som driver cancerutveckling innefattar 
därför att skilja på positiv selektion och andra orsaker till ökad 
mutationsfrekvens, vilket kräver detaljerad kännedom om hur 
mutationsfrekvensen normalt varierar i genomet. Den här avhandlingen ämnar 
öka den kunskapen, samt bidra med nya metoder för att hitta cancergener.  
I malignt melanom i huden finns det positioner i genomet som har ovanligt hög 
mutationsfrekvens och som sammanfaller med sekvensmotivet ”TTCCG” i 
aktiva promotorer. Dessa positioner skulle kunna misstolkas som drivande för 
cancer, men i denna avhandlings första artikel visar vi att den underliggande 
orsaken är ökad känslighet för UV-relaterad skadebildning på DNA-molekylen 
vid bindning av transkriptionsfaktorer, med ett mindre bidrag från nedsatt 
DNA-reparation. 
I den andra artikeln studerar vi hur mutationssignaturen från UV-ljus varierar 
mellan olika genomiska regioner. Vi visar att den största skillnaden är kopplad 
till metyleringsnivån av cytosin, på grund av effekten den har på UV-relaterad 
DNA-skadebildning. Vi förbättrar också den traditionella trinukleotidbaserade 
mutationssignaturen genom att inkorporera längre sekvenser, och får på så sätt 
med den mutationsökande effekten hos TTCCG-relaterade promotorregioner. 
I den tredje artikeln demonstrerar vi en ny metod för detektion av 
cancerdrivande mutationer. Denna metod åsidosätter mutationsfrekvens, och 
utvärderar i stället sannolikheten att observera olika kombinationer av 
muterade tumörer. Metoden angriper detektion av cancerdrivande mutationer 
från en ny vinkel, och är dessutom mindre känslig för brister i underliggande 
mutationsmodeller, som annars leder till falska positiva resultat i regioner som 
de TTCCG-relaterade promotorerna. 
Sammanfattningsvis bidrar den här avhandlingen till att öka kunskapen om 
mutationsheterogenitet i cancer, samt introducerar en ny metod för detektion 
av cancerdrivande mutationer. Dessa framsteg främjar jakten på gener som 
driver cancerutveckling. 

i 
LIST OF PAPERS 
This thesis is based on the following studies, referred to in the text by their 
Roman numerals. 
I. Elliott K*, Boström M*, Filges S, Lindberg M, Van den
Eynden J, Ståhlberg A, Clausen A, Larsson E. Elevated
pyrimidine dimer formation at distinct genomic bases
underlies promoter mutation hotspots in UV-exposed
cancers
PLOS Genetics 2018, 14(12)
* These authors contributed equally
II. Lindberg M, Boström M, Elliott K, Larsson E.
Intragenomic variability and extended sequence patterns
in the mutational signature of ultraviolet light
PNAS 2019, 116 (41) 20411-20417
III. Boström M, Larsson E. Mutation distribution skew in
patient cohorts provides a novel signal for positive
selection in cancer
Manuscript
PAPERS NOT INCLUDED IN THIS THESIS 
1. Kreisel K, Engqvist M, Kalm J, Thompson L, Boström M,
Navarrete C, McDonald J, Larsson E, Woodgate R, Clausen
A. DNA polymerase η contributes to genome-wide lagging
strand synthesis
Nucleic Acids Res. 2019, 47(5):2425-2435

iii 
CONTENT 
ABBREVIATIONS .............................................................................................. V 
1 INTRODUCTION ........................................................................................... 1 
1.1 The Hallmarks of Cancer ...................................................................... 2 
1.2 Mutations in Cancer .............................................................................. 4 
1.2.1 Oncogenes and Tumour Suppressors ............................................ 5 
1.2.2 Non-Coding Mutations .................................................................. 6 
1.3 What Causes Mutations? ....................................................................... 8 
1.3.1 Mutations in Cutaneous Melanoma ............................................... 8 
1.3.2 Mutational Signatures .................................................................. 10 
1.4 Finding Drivers in Genomic Data ....................................................... 12 
1.4.1 Genomic Mutational Heterogeneity ............................................ 13 
1.4.2 Driver Detection Methods ........................................................... 17 
1.4.3 DNA Sequencing to Map Mutations ........................................... 21 
1.4.4 Damage and Repair Maps ........................................................... 22 
2 AIM ........................................................................................................... 25 
3 RESULTS AND DISCUSSION ....................................................................... 27 
3.1 Increased UV Damage Formation Underlies Promoter Hotspot 
Mutations in Melanoma (paper I) ............................................................... 27 
3.1.1 UV Damage Formation ............................................................... 27 
3.1.2 The Role of Repair ...................................................................... 29 
3.2 Variation of the UV Mutational Signature in the Genome (Paper II) . 30 
3.2.1 Trinucleotide Signature Variation ............................................... 30 
3.2.2 Extended UV Signature ............................................................... 31 
3.3 Recurrence-Independent Driver Detection (Paper III) ....................... 33 
3.3.1 Implementation of the Method .................................................... 33 
3.3.2 Detecting Drivers in Melanoma .................................................. 34 
3.3.3 Detecting Cancer Drivers in Different Cancer Types ................. 35 
4 CONCLUSIONS AND FUTURE PERSPECTIVES ............................................ 37 
ACKNOWLEDGEMENTS .................................................................................. 39 
REFERENCES .................................................................................................. 41 
PAPERS ........................................................................................................... 49 

v 
ABBREVIATIONS 
6-4 PP 6-4 pyrimidine pyrimidone
A Adenine 
bp Base pairs 
C Cytosine 
COSMIC Catalogue of Somatic Mutations in Cancer 
CPD Cyclobutane pyrimidine dimer 
cSCC Cutaneous squamous cell carcinoma 
Cys Cysteine 
DNA Deoxyribonucleic acid 
dN/dS Ratio between nonsynonymous and synonymous mutations 
FDR False discovery rate 
G Guanine 
ICGC International Cancer Genome Consortium 
kb Kilobases (Distance measurement in DNA – 1000 base pairs) 
MMR Mismatch repair 
NER Nucleotide excision repair 
NMF Nonnegative matrix factorisation 
PCA Principal component analysis 
PCAWG Pan-Cancer Analysis of Whole Genomes 
RNA Ribonucleic acid 
vi 
SNP Single-nucleotide polymorphism 
SNV Single-nucleotide variant 
T Thymine 
TCGA The Cancer Genome Atlas 
TF Transcription factor 
TFBS Transcription factor binding site 
TLS Translesion synthesis 
TSG Tumour suppressor gene 
TSS Transcription start site 
Tyr Tyrosine 
U Uracil 
UCEC Uterine Corpus Endometrial Carcinoma 
UV Ultraviolet (light) 
WGS Whole genome sequencing 
WXS Whole exome sequencing 
Y Ambiguous base code for a pyrimidine, i.e., C or T 
Martin Boström 
1 
1 INTRODUCTION 
For as long as there has been life, there has been evolution – the change across 
generations of heritable characteristics. Traits that are beneficial to the 
organism confer a selective advantage and are more likely to be passed on to 
the next generation. While evident in all life, we can observe this in real time 
most easily today in prokaryotes, where rapid generation cycles allow us to see 
the evolution of traits such as antibiotic resistance (1). In our early evolutionary 
history as single-cell organisms, positively selected traits would have included 
rate of proliferation, efficient metabolism, and countless others. As time 
progressed, multicellular life arose, and with it a stronger emphasis on the 
selective advantage of cooperation between cells, such as nutrient sharing and 
signalling. Today,  humans and other highly complex organisms are composed 
of trillions of highly specialised cells (2), organised in different tissue types 
and organs. The selective pressure on the organism level entails different 
requirements for individual cells in the body than what single-cell organisms 
face. 
And yet, the selective pressure that acts on single cells and started our journey 
toward becoming complex multicellular organisms never went away. While 
the cellular traits that are selected for in the human population emphasise 
cooperation and organisation, individual cells face very different selective 
pressures on the timescale of cellular generations. If a cell attains a trait that 
allows it to grow and divide faster than its neighbours, it may outcompete them 
and form a mass of descendant cells with uncontrolled growth (a tumour), if 
not hindered. Further selection of competitive advantages can result in the 
spread to other tissues through metastasis, eventually disrupting the 
functioning of the organism to the point of severe disease or death. This is 
cancer - a disease that is the product of positive selection in cell populations at 
the cost of the organism as a whole. 
If the force behind cancer is the selective pressure that is simultaneously active 
in all of our trillions of cells, how can complex organisms exist in the first 
place? The answer is that we have evolved highly efficient defences against 
tumorigenesis. Cell proliferation is tightly regulated, and only allowed at the 
appropriate time. There are checkpoints in the cell cycle where cells with 
damaged DNA are forced to stop dividing, or even undergo apoptosis – 
controlled cell death (3). However, since cell proliferation is required for us to 
function, the cell cycle becomes a balancing act between allowing growth 
when needed and preventing tumorigenesis. Changes in cells that disrupt that 
balance in favour of tumorigenesis are what lead to cancer. 
Genomic mutational heterogeneity in cancer 
2 
1.1 THE HALLMARKS OF CANCER 
There are many changes in a cell that are positively selected for and can 
contribute to tumorigenesis, but typically most can be categorised according to 
what advantage they grant – or viewed another way, what anti-cancer defence 
they help the cell overcome. In 2000, Hanahan and Weinberg released a 
seminal review article describing six categories of biological capabilities (4), 
later updated to ten (5), that are common in cancer cells (Figure 1). These traits 
(marked in bold below) are typically attained through mutations, changes in 
the genetic code of the cell that can result in altered phenotypes. 
In normal tissues, cell proliferation is controlled through modulation of 
growth-promoting signals. To attain the uncontrolled cell growth of cancer, 
sustaining proliferative signalling is critical. Bypassing the dependence on 
outer signals can be done in several ways. For instance, extra growth receptors 
can be expressed to achieve a stronger response to normal signalling (6), or the 
cell may cut out the middleman and produce its own growth signals (7). 
Alternatively, signalling can be sidestepped, at least in part, by constitutively 
activating the proteins downstream of the receptor (8). However, self-
sufficiency in growth signals is in itself insufficient, as negative regulation of 
growth needs to be handled by evading growth suppressors, perhaps most 
commonly simply by inactivating them. Additionally, continuous growth 
requires nourishment, necessitating sustained angiogenesis (formation of new 
blood and lymphatic vessels) to provide oxygen and nutrients, as well as 
remove waste products (9). 
When rampant cell growth or severe damage is detected in a cell, the most 
severe defensive response is triggering apoptosis, or controlled cell death. 
Resisting cell death is therefore another hallmark, achieved through various 
strategies, such as disrupting DNA damage sensors (10) or upregulating 
survival signals (11). Even with all these traits, unlimited proliferation is not 
possible without enabling replicative immortality. After a certain number of 
divisions, cells enter senescence, a state outside of the cell cycle with no further 
proliferation. Cells that manage to circumvent this eventually reach a state of 
crisis, due to the shortening of protective telomeres at the end of the 
chromosomes with each replication. This can be avoided by expressing 
telomerase, a DNA polymerase that lengthens telomeres, thereby staving off 
senescence and crisis-induced cell death (12). 
A tumour that does not spread to other tissues is called benign. It is not until it 
has accomplished tissue invasion and metastasis that we call it cancerous. 
Eventually, most cancers metastasise, and the resulting invasion of other 
tissues is what leads to the vast majority of cancer deaths (4). 
Martin Boström 
3 
These six original hallmarks of cancer have since been joined by the 
deregulation of cellular energetics, which is an alteration of the metabolism 
of the cell to provide enough energy for sustained growth, and avoidance of 
immune destruction, by avoiding or hindering the response of the immune 
system to cancer cells. Finally, two hallmarks that facilitate attaining the others 
already listed are genome instability, where increased genomic alterations 
provide the potential for acquiring cancer-related traits, and tumour-
promoting inflammation, where the immune system can paradoxically help 
tumorigenesis by supplying growth, survival, and angiogenic factors. 
 
Figure 1. The Hallmarks of Cancer (5). Adapted from “Hallmarks of Cancer: Circle”, by 
BioRender.com (2021). Retrieved from https://app.biorender.com/biorender-templates 
Genomic mutational heterogeneity in cancer 
4 
1.2 MUTATIONS IN CANCER 
The traits of cancer cells may be acquired in different ways. Some come from 
structural variations - genomic alterations involving segments of DNA larger 
than 1 kb (13). Examples include copy number variants, such as possessing 
multiple copies of DNA segments with oncogenic function, or deletions of 
those with protective roles (14). Viral integrations of DNA can also lead to 
cancer, with viruses implicated to some extent in 15-20 % of cancers, perhaps 
most notably exemplified by the human papilloma virus being detectable in 
nearly all cervical cancer (15). 
In this thesis, the focus will be on single nucleotide variants (SNVs), where 
one nucleotide is exchanged for another. The effects of these mutations differ 
depending on the affected position in the genome. In protein-coding sequences, 
SNVs may alter protein function (Figure 2). As each subsequent triplet of 
nucleotides (or codon) in the protein-coding sequence of a gene encodes a 
certain amino acid, mutations that change the codon can alter what amino acid 
is incorporated during protein translation. SNVs that result in a different amino 
acid being encoded are called missense mutations, and they can dramatically 
alter the function of a protein if they occur in a critical position. SNVs may 
also introduce stop codons, which bring about the premature end of translation, 
often resulting in a non-functional protein. These are called nonsense 
mutations. Finally, an SNV can occur without a change in the encoded amino 
acid due to the degeneracy of the genetic code, resulting in a synonymous 
mutation. 
Another class of mutation that commonly disrupts protein translation is the 
indel, where an insertion or a deletion of one or several nucleotides into a 
coding sequence can change which nucleotides belong to which codons in the 
Figure 2. SNVs in coding sequences can have different effects on the encoded protein depending 
on the original and mutated codon. Created with BioRender.com. 
Martin Boström 
5 
rest of the gene. These “frameshift” mutations normally result in a completely 
non-functional protein. 
Cancer-related mutations can occur in either germline or somatic cells. 
Germline mutations occur in reproductive cells and can therefore be passed on 
to offspring, leading to hereditary cancer risk. Examples include inherited 
mutations in the BRCA1 and BRCA2 genes, which predispose to breast cancer 
(16). Essentially, a cancer-associated germline mutation can lower the bar for 
tumorigenesis by providing every cell in the body with an unfortunate head 
start. Mutations in somatic cells, by contrast, are not inherited. Through 
positive selection, cells with cancer-associated somatic mutations proliferate 
more than normal, resulting in a larger pool of cells where another beneficial 
somatic mutation can set off another wave of clonal expansion. In most 
cancers, tumour cells descend from a single cell with a cancer-associated 
somatic mutation (17). 
1.2.1 ONCOGENES AND TUMOUR SUPPRESSORS 
The kinds of somatic coding mutations that are selected for in tumour cells 
depend on the gene’s role in tumorigenesis. Genes that contribute to cellular 
growth are called oncogenes, and those that prevent it, thereby guarding 
against tumorigenesis, are called tumour suppressor genes (TSGs). Strictly 
speaking, an oncogene is the mutated form of a proto-oncogene, where the 
latter is the normal variant that merely has the potential to become oncogenic 
after attaining a tumorigenic mutation. 
The type of mutation that is selected for in a TSG tends to be disruptive to 
protein function, as the outcome under positive selection is typically disabling 
or impairing the role of the protein. As such, nonsense and frameshift 
mutations are common, due to the fact that they often result in completely non-
functional proteins. The most frequently mutated TSG is TP53 (18), dubbed 
“the guardian of the genome”, which is mutated in more than 50 % of cancers 
(19). p53, the protein encoded by TP53, is heavily involved in arresting growth 
and inducing senescence or apoptosis in cells that exhibit DNA damage, 
shortened telomeres, or excessive activation of growth pathways, to name a 
few of its functions (20, 21). 
Since oncogenes contribute to cancer growth, their mutations tend to activate 
them in some way. This can happen through mutations that reduce sensitivity 
to negative feedback, or that enhance the function or expression of the protein. 
Ras proteins are some of the most important oncogenes in cancer, with 
approximately 30 % of tumours containing some kind of Ras-activating 
mutation (22). Various forms of Ras are involved in signalling pathways to 
induce proliferation, such as the PI3K and Ras-Raf-MAPK pathways, where 
they contribute to the signalling cascade by activating downstream proteins 
(23). Several sites in Ras proteins are hotspots for mutations in different cancer 
Genomic mutational heterogeneity in cancer 
6 
types, as they result in constitutive activation, causing constant activation of 
downstream proteins (8). 
While the categorisation of genes as oncogenes or TSGs can be helpful, genes 
can have both roles at once, as exemplified by TP53 (24). Some missense 
mutations in p53 not only hinder its tumour-suppressing capabilities, but also 
give it oncogenic function by allowing it to bind to and hinder its tumour-
suppressing homologues p63 and p73, thereby promoting tumour invasion and 
metastasis (25). 
1.2.2 NON-CODING MUTATIONS 
Mutations that contribute to cancer are not limited to DNA sequences that 
encode proteins. In 2013, two seminal papers demonstrated that point 
mutations in the promoter region of the TERT gene formed new transcription 
factor binding sites (TFBSs), thereby increasing its expression (26, 27) (Figure 
3). TERT encodes the reverse transcriptase subunit of telomerase, meaning its 
increased expression helps enable replicative immortality, as discussed in 
section 1.1. This discovery started a search for additional non-coding 
mutations, yet despite the promising start with the TERT mutations, few 
cancer-driving mutations have been found (28).  
Since expression level changes of cancer genes are often seen in cancer, there 
are many elements in non-coding DNA that could plausibly host tumorigenic 
mutations, such as enhancers (29) and regulatory RNAs (30, 31). Some 
technical aspects could partially explain why so few non-coding cancer 
mutations in these elements have been found, such as the smaller amount of 
available whole-genome sequencing (WGS) data compared to whole-exome 
sequencing (WXS), and the reduced coverage often seen when sequencing 
regulatory regions (28, 32). On a biological level, the robustness of regulatory 
sequences to point mutations, unlike coding sequences, could provide another 
explanation (33). 
While the relative lack of high-recurrence non-coding mutations is unlikely to 
change, the field is young compared to the analysis of coding sequences, and 
new discoveries are therefore still to be expected as new techniques and 
datasets become available (28). 
Martin Boström 
7 
 
Figure 3. TERT promoter mutations create new binding sites for ETS transcription factors, 
increasing expression and thereby enabling telomere extension. Created with BioRender.com 
Genomic mutational heterogeneity in cancer 
8 
1.3 WHAT CAUSES MUTATIONS? 
The processes behind mutation formation differ greatly depending on the type 
of mutation. The causes of gene duplication, for instance, are not the same as 
those of point mutations, which are what we will focus on here. Mutations can 
be either induced, meaning they are caused by environmental factors, or 
spontaneous, i.e., resulting from normal or defective processes in the cell. 
Often, the initial mutagenic event is environmentally caused damage to the 
DNA, but with cellular processes leading to the actual mutation (34). Well-
known examples of environmental mutagens include exposure to ultraviolet 
(UV) light and tobacco smoke (35). Major sources of mutations on the internal 
side are errors during DNA replication and repair. These can occur through 
simply incorporating the wrong nucleotide, as the error rate of fully functional 
DNA polymerases is not zero. Various factors can increase the error rate of 
DNA replication and repair, such as hereditary DNA repair defects and damage 
to the DNA due to external mutagens (34, 36). Other internal mutagenic factors 
include DNA damage from oxidative stress (37) and spontaneous nucleotide 
changes from chemical processes (38).  
1.3.1 MUTATIONS IN CUTANEOUS MELANOMA 
As an example of how several processes can be involved in mutagenesis, we 
will examine how UV-induced DNA damage causes mutations. UV-induced 
damage primarily causes C>T transitions and is the main source of mutations 
in cutaneous melanoma, a cancer type that is of particular interest in this thesis. 
Normally, pyrimidines base-pair with purines (C with G and T with A) on the 
opposite strand. When UV light hits the DNA molecule, the absorbed energy 
can lead to the formation of bonds between adjacent pyrimidines (C or T) on 
the same strand (Figure 4). The most notable photoproducts formed this way 
are cyclobutane pyrimidine dimers (CPDs) and 6-4 pyrimidine pyrimidones 
(6-4 PPs), with CPDs being the most numerous. These bulky lesions can be 
repaired by nucleotide excision repair (NER), where the damaged DNA is 
removed, and the resulting gap is filled using the complement strand as a 
template. However, if replication of the affected DNA is attempted before 
repair is finished (or if NER is defective, as in those afflicted with xeroderma 
pigmentosum (39)), the replication fork would stall, potentially leading to fatal 
double-strand breaks. To avoid this, the cell can attempt to continue replication 
past the DNA damage using special DNA polymerases that perform translesion 
synthesis (TLS). It is in this step that most UV-induced mutations arise, 
through one of two different models (40) (Figure 5). 
Martin Boström 
9 
 
Figure 4. Regular base-paired bases, with CPD formation following UV exposure. Created with 
BioRender.com 
In the first model, an error-prone TLS polymerase inserts an adenine opposite 
the lesion, resulting in C>T mutations. Adenine appears to be the most 
frequently inserted base by the error-prone polymerase, and it is also the base 
that is the easiest to extend the DNA strand from when opposite a lesion, for 
some polymerases. The second model relies on the fact that cytosine can 
spontaneously deaminate into uracil, a process that is much more prone to 
occurring in cytosines that are part of CPDs than when regularly base-paired. 
Uracil is the RNA equivalent of thymine and base-pairs with adenine rather 
than guanine, just like thymine. During TLS, polymerase η correctly pairs an 
adenine with the uracil, resulting in a C>T mutation after replication. Through 
the same process, CC>TT mutations can occur when a CPD is formed between 
two cytosines that both deaminate (40). 
Figure 5. UV-induced CPD formation can cause C>T mutations through error-prone TLS, or 
through spontaneous cytosine deamination followed by error-free TLS with polη. Created with 
BioRender.com 
Genomic mutational heterogeneity in cancer 
10 
Mutagenesis is further affected by the presence or absence of a G following a 
dipyrimidine with a C in the second position (i.e., YCG, where Y indicates a 
pyrimidine). CpG sites are highly methylated in the genome, and methylated 
cytosines form CPDs more readily than non-methylated cytosines under UVB 
exposure (41, 42), the wavelength of light responsible for most of the 
mutations caused by sunlight (40). This is an example of how neighbouring 
bases can affect the mutagenicity of a genomic site. 
1.3.2 MUTATIONAL SIGNATURES 
The highly specific mutation types generated by UV light, along with the effect 
of neighbouring bases, forms a kind of fingerprint or signature by which the 
origin of the mutations can be discerned. While the UV signature tends to be 
quite dominant in cutaneous melanoma, other cancer types are more accurately 
characterised as mosaics of different mutational processes, presenting the 
problem of how to separate those processes in a cohort of tumours in which 
they may be active to different degrees. In 2012, Nik-Zainal et al. published a 
landmark paper proposing a method for extracting signatures using 
nonnegative matrix factorisation (NMF) and applied it to a cohort of 21 breast 
cancer genomes (43). In this method, mutations are classified according to their 
substitution type (C>A, C>G, C>T, T>A, T>C, or T>G) and the immediate 
neighbouring bases, for a total of 96 combinations. Through NMF, different 
trinucleotide signatures could be separated in the cohort, and the degree to 
which the causative mutational processes were active in the different tumours 
could be discerned. 
Following up on this discovery, another paper was published applying the 
method to more genomes in different cancer types, establishing a catalogue of 
signatures that are present in tumours (44). In this catalogue, the dominant 
signature in cutaneous melanoma matched previously known facts about UV 
mutagenesis well, being almost exclusively C>T mutations at dipyrimidines 
(Figure 6). The preference for YCG trinucleotides is also captured by the 
signature, albeit slightly obscured due to the fact that the traditional 
representation of the signatures is not normalised by the frequency with which 
the trinucleotides occur in the genome. The CG dinucleotide is quite rare in 
most of the human genome, with the notable exception of promoter regions, 
owing to methylation and subsequent spontaneous deamination causing C>T 
mutations, resulting in CpG depletion over evolutionary time (45). 
While the link between signature 7 and UV radiation is quite clear, many 
signatures have unknown aetiology, as signature extraction does not inherently 
provide any information about the underlying process. Other signatures with 
known aetiology include signature 4, which is related to tobacco smoking 
(Figure 6). Tobacco-induced mutations occur through bulky adducts on the 
DNA molecule that, similar to UV-induced mutations, need to be repaired with 
NER or bypassed with TLS, but other differences cause the signatures to look 
Martin Boström 
11 
completely different, with most of the signature 4 mutations being C>A instead 
of C>T (46). 
Since the original publication of the compendium of signatures, new signatures 
have been added, and some signatures have been split into multiple 
components. The UV-caused signature 7 belongs to the latter category, having 
been split into four signatures (7a-d) in the latest release of COSMIC. There is 
some uncertainty about the processes behind each of these signatures. 
Figure 6. Signatures associated with UV radiation and tobacco smoking for comparison. Each 
bar shows the percentage of single-base substitutions belonging to the given substitution type 
and trinucleotide context for the mutational process in the human genome. The signature data 
was downloaded from cancer.sanger.ac.uk/signatures (version 2). 
Genomic mutational heterogeneity in cancer 
12 
1.4 FINDING DRIVERS IN GENOMIC DATA 
Discovering new cancer genes through bioinformatical methods involves 
sifting through somatic mutation data from tumour-normal pairs looking for 
mutations that are under positive selection - commonly referred to as “drivers”. 
Most tumours contain a handful of known driver mutations, but these are vastly 
outnumbered by “passenger” mutations that do not contribute to cancer (47, 
48). Separating the drivers from the passengers is a big area of research in 
cancer genomics, with the goal of gaining a better understanding of cancer 
biology and ultimately finding new drug targets. 
The most basic approach in the search for cancer genes is to look at the 
recurrence of mutations in tumours. If a mutation is beneficial to a cancer cell, 
it stands to reason that it would be encountered more frequently in patients as 
a result of positive selection. Some driver mutations in the very strongest 
cancer genes are so recurrent that this approach is feasible without further 
complications. As previously mentioned, TP53 is mutated in more than 50 % 
of tumours, providing strong support for TP53 mutations as drivers. For most 
drivers, however, detailed mutational models are necessary to determine 
whether a mutation is under positive selection or not. This is not only so that 
weakly recurrent driver mutations may be found, but also to avoid false 
positives due to normal mutation rates that are underestimated and 
misinterpreted as a sign of positive selection (Figure 7). 
  
Figure 7. Accurate mutational models are required to tell whether a highly recurrent mutation 
is caused by normal mutational processes, or whether it is under positive selection. Created with 
BioRender.com. 
Martin Boström 
13 
A decade ago, most tools aimed at finding cancer genes assumed a flat 
background mutation rate based on the average mutation rate in the cancer 
type, with only some adjustments for the relative frequencies of different 
mutation types (49). In a paper describing the new driver detection tool 
MutSigCV, Lawrence et al. showed that the lack of sophistication in the 
mutational models in use at the time was becoming a problem, with more and 
more false positives among the putative cancer genes reported by tools as 
cohort sizes grew (49). Many olfactory receptor genes, a group not particularly 
plausible as cancer genes to begin with, were reported as significantly mutated 
in cancer types where they were not even expressed. The problem with the 
models was attributed to genomic mutational heterogeneity, or how mutational 
processes induce mutations at varying rates across the genome. In MutSigCV, 
gene expression levels and replication timing were included in the mutational 
model to account for the variance in the background mutation rate, with 
excellent results. In the example of olfactory receptors, low expression levels 
and late replication timing explained why their mutation rates were higher than 
previously expected. Since the release of MutSigCV, many more different 
sources of mutation rate variation have been characterised and included in the 
mutational models of driver detection tools. 
1.4.1 GENOMIC MUTATIONAL HETEROGENEITY 
The background mutation rate is affected by phenomena that are active at 
different scales, from the megabase level all the way down to single 
nucleotides (50, 51). We have already explored how different mutational 
processes leave characteristic imprints on the genome through their 
trinucleotide signatures. Even without extracting signatures from a tumour 
cohort, a simple trinucleotide-based mutational model goes a long way toward 
modelling the mutation rate of the active mutational processes on the single-
nucleotide scale. What it misses are effects related to longer sequences, and 
variations in different genomic regions, which we will cover here. 
CHROMATIN 
To fit the DNA molecule inside a cell, compact packing without entanglement 
is required. To accomplish this, DNA is wound twice around octamers of 
histone proteins. Each such DNA-histone complex is called a nucleosome 
(technically a nucleosome core), and they are separated from each other by 
stretches of unwound DNA. This first loose level of packing is called 
euchromatin. The nucleosomes also allow for tighter packing of DNA, known 
as heterochromatin. Which regions of DNA are in different forms of chromatin 
can change depending on when access is required, for instance during 
replication and for gene expression. 
The chromatin structure of DNA has an effect on mutational heterogeneity on 
both small and large scales, and it interacts in different ways with various 
Genomic mutational heterogeneity in cancer 
14 
mutational processes. On the megabase scale, heterochromatin accumulates 
more mutations than more loosely packed DNA (52), largely due to restricted 
accessibility for DNA repair mechanisms (53) (Figure 8). The same concept 
holds true on the scale of individual nucleosomes, where the DNA linking 
DNA-histone complexes together is more easily accessible to repair than the 
DNA wrapped around the histones (54). On the other hand, some mutational 
processes are also hampered by tightly packed DNA. Spontaneous 
deamination of methylated cytosines is reduced in nucleosomes, leading to 
fewer C>T mutations caused by this process (55). While the overall mutation 
rate is still generally increased in tightly packed DNA, it is important to 
remember that the mutational processes active in our genomes are affected in 
different ways by features such as chromatin structure. 
Figure 8. The effect of chromatin structures on the background mutation rate at different scales. 
Created with BioRender.com 
Martin Boström 
15 
Even within the DNA wound around the histone proteins in a nucleosome, 
there is mutational heterogeneity. This is related to whether the major or the 
minor groove of the DNA molecule is positioned outward, resulting in a 
periodicity in mutation rate due to the way the DNA is wound around the 
histones. UV-induced CPD formation, which is otherwise unaffected by 
chromatin state, has been shown to correlate with this periodicity in a way that 
cannot be explained by sequence context, and DNA repair accessibility is also 
affected (56). 
EXPRESSION AND REPLICATION TIMING 
The level of gene expression is anticorrelated with mutation rate. This is 
strongly related to the aforementioned chromatin effects, as high expression 
requires loose packing and vice versa. There are also expression-related repair 
mechanisms at play, resulting in lower mutation rates in exons (57), and 
particularly on the transcribed strand (58, 59). The strand bias is not only 
caused by differential repair, as damage formation has been observed to 
increase on the coding strand (59, 60). 
Replication timing is similarly related to chromatin state, with early-replicating 
regions receiving fewer mutations than late-replicating regions. This has been 
attributed to differential mismatch repair (MMR), where early-replicating 
euchromatin regions are repaired more effectively (61). Another possible 
contributing factor is the depletion of the nucleotide pool towards the end of 
replication, which could result in increased mutation rates as vulnerable single-
strand DNA is exposed for longer as the replication fork slows or stalls (62). 
Replication-related strand bias has also been reported in mutational signatures 
both related and unrelated to DNA repair (60, 63, 64). 
TRANSCRIPTION FACTOR BINDING SITES 
Several studies have shown that mutation rates are increased at transcription 
factor binding sites (TFBSs) in some cancer types, most notably in melanoma 
(65-67). The main cause for this appears to be bound TFs blocking access for 
NER. This is backed up by the observations that the effect is observed in active, 
but not inactive, TFBSs, indicating that TF binding is required, and that 
xeroderma pigmentosum patients, who have deficient NER, do not show the 
same pattern. Furthermore, NER maps show reduced repair at bound TFBSs 
following UV damage formation (65, 66). The fact that the phenomenon is 
most notable in melanoma is consistent with the extensive use of NER to repair 
UV-induced damage. Similarly, the bulky adducts caused by tobacco smoking, 
also repaired through NER, explain why the mutation rate at TFBSs is also 
increased in lung cancer (66). 
Differential DNA damage formation has also been reported at occupied TFBSs 
(56, 68, 69). Unlike NER, where TF binding mostly blocks access, the effect 
Genomic mutational heterogeneity in cancer 
16 
of a bound TF appears to increase or decrease damage formation in a manner 
that is dependent on both the mutagen and the TF. For instance, UV-induced 
CPD formation at active TFBSs is reduced in some groups of TFs and 
increased in others. 
TTCCG-RELATED PROMOTER MUTATIONS IN MELANOMA 
In cutaneous melanoma, there are a number of promoters with highly recurrent 
mutations at specific sites. Mutations in the TERT promoter are confirmed 
drivers as discussed in section 1.2.2, but the other recurrent promoter mutations 
do not appear to be under positive selection. Previous explanations for this 
phenomenon tend to focus on differential DNA repair due to TF binding (65), 
as described above. 
In 2017, Fredriksson et al. published a paper highlighting the fact that very 
nearly all of the recurrent promoter mutations in melanoma, except the TERT 
mutations, occurred in or immediately upstream of the sequence TTCCG (70) 
(Table 1), a motif matching the consensus binding sequence of the ETS family 
of transcription factors (71). Interestingly, the TTCCG-related mutations 
occurred only in UV-related cancers and sun-exposed skin, suggesting UV 
mutagenesis rather than positive selection as the cause. In further support of 
this explanation, the number of hotspot mutations in each tumour correlated 
with that tumour’s mutation burden. The mutations could even be induced in 
cells following UV exposure. 
In light of the fact that both differential repair and damage formation have been 
observed after transcription factor binding (56, 65-69), either or both could be 
the cause of the recurrent mutations. Fredriksson et al. noted that UV-exposed 
tumours lacking global NER were still mutated in the hotspots, albeit to a lesser 
degree, suggesting that differential repair does not provide the full explanation. 
Increased UV damage formation seemed likely to contribute as well, which 
will be discussed at length in paper I. 
Martin Boström 
17 
Table 1. Sequence context of melanoma promoter mutations recurrent in at least 5 tumours. 
Mutated base in bold, with TTCCG sequence highlighted. Adapted from Fredriksson et al. (70)  
Recurrence Gene Sequence context 
11 RPL13A TCCGGACATTCTTCCGGTTGG 
10 TERT CCCGACCCCTCCCGGGTCCCC 
7 C16orf59 AGCCACGCCCCTTCCGGGAGG 
7 TERT GCCCAGCCCCCTCCGGGCCCT 
5 ASXL2 CGCCCCCGCCCTTCCGGTCTC 
5 PDCD11 CAAATCCCGCCCTTCCGATTC 
5 FTH1 GAGCCCGCTCCTTCCGGTGGG 
5 FTH1 CGAGCCCGCTCCTTCCGGTGG 
5 FUBP3 CCGGCTTTCCCTTCCGCCGGA 
5 ALYREF CGCGTGAGGCCTTCCGGTGCC 
5 RNF185 AAATTAACCTCTTCCGGTTGG 
5 MRPS31 CCCGCCCTCTCTTCCGCTTCC 
5 DPH3 AGGACTAGCCCTTCCGGCGCA 
5 RPL18A GAGGGCGGGTCTTCCGGTAGT 
5 C16orf59 GAGCCACGCCCCTTCCGGGAG 
5 DERL1 CGAAACTTCCCCTTCCGGCGA 
5 TERT CTCCCGGGTCCCCGGCCCAGC 
 
1.4.2 DRIVER DETECTION METHODS 
Including genomic mutational heterogeneity in a mutational model allows for 
more accurate prediction of mutation rates, which is a requirement for cancer 
driver detection, as we have discussed previously. The kinds of heterogeneity 
that are incorporated varies between methods, and depends in part on the 
underlying approach to detecting positive selection. The approaches in use 
today rely on a few different concepts, which we will summarise here (Figure 
9).  
EXCESS MUTATIONS 
The most straightforward way of searching for driver mutations is by testing 
whether the number of mutations in a region exceeds expectations. A major 
difference between methods is how the background mutation rate is modelled 
(49, 72, 73). As recurrence-based approaches are sensitive to model flaws on 
both local and larger scales, incorrect mutation rate expectations could result 
in both false positives and false negatives. 
Genomic mutational heterogeneity in cancer 
18 
dN/dS RATIO 
Both oncogenes and TSGs tend to have a higher proportion of nonsynonymous 
mutations than non-cancer genes. As discussed in section 1.2.1, oncogenes are 
often activated by missense mutations, while TSGs are made defunct through 
nonsense mutations. By analysing the ratio between nonsynonymous and 
synonymous mutations (dN/dS) in a gene, positively selected genes may be 
found (48). This approach assumes that most synonymous mutations are 
passengers. While synonymous mutations have been shown to contribute to 
cancer (74, 75), the dN/dS ratio still manages to detect positive selection of 
cancer genes (48). dN/dS methods require mutational models that accurately 
handle mutational probabilities within a gene, but the effects of megabase-
scale genomic mutational heterogeneity are reduced due to the fact that the 
count of synonymous mutations provides a local estimate of the mutation rate 
(76). Nevertheless, dNdScv, the most prominent dN/dS-based method, still 
takes large-scale genomic mutational heterogeneity into account to improve 
sensitivity (48). 
POSITIONAL CLUSTERING 
Positively selected mutations often occur in functionally relevant positions in 
a protein. For instance, a mutation in an active site is more likely to alter the 
function of a protein than one in another region, resulting in clustering of 
positively selected mutations (77). This can be exploited for driver detection, 
by searching for positional clustering of mutations in genes (78-83). The kinds 
of clustering detected by these methods can be divided into three groups (84). 
Linear clustering (78) is simply the distance in the primary structure of a 
protein, i.e., how many amino acid residues separate mutations. In domain 
clustering, specific regions such as SH2 and kinase domains are analysed for 
enrichment of mutations (79, 81). Finally, 3D clustering takes the tertiary 
structure of a protein into account, and tests for clustering of mutations in 3D, 
even if the mutations are far apart in the primary structure (80, 82). 
FUNCTIONAL IMPACT 
Functional impact-based methods work by evaluating the effect that a mutation 
has and relies on the assumption that positively selected mutations tend to skew 
more towards disruptive effects than passengers do (85-87). The exact impact 
of a mutation is generally not known, but it can be estimated through various 
methods (88-90). A simple example in the case of a nonsynonymous mutation 
in a coding region would be comparing how similar the substituted amino acid 
is in terms of polarity to the previous one, where a big change would constitute 
a larger functional impact. 
OncodriveFML is an example of a notable functional impact method that 
works on both coding and non-coding sequences, as long as a suitable 
functional impact score is provided (91). In this method, the observed average 
Martin Boström 
19 
functional impact of mutations in a region is compared to the expected value, 
provided by mutation simulations. By simulating the same number of 
mutations as observed in the studied region and cohort of tumours, the test 
eliminates recurrence signals and provides an orthogonal approach to driver 
detection. While still vulnerable to flaws in the mutational model pertaining to 
individual positions in the region of interest, this approach elegantly sidesteps 
the issue of genomic mutational heterogeneity on larger scales. 
COMBINING METHODS 
The different approaches outlined above may be combined to attack the 
problem of driver detection from multiple angles at once. As an example, 
Dietlein et al. recently published a paper describing how MutPanning 
combines recurrence and functional impact by testing the likelihood of 
observing a given number of mutations in a region, and the likelihood that 
those mutations occur in their respective trinucleotide contexts (92). The 
trinucleotide context portion is an indirect proxy for functional impact, as 
functionally important positions are not likely to have trinucleotide contexts 
that match the most mutated contexts in the signatures of the mutational 
processes active in the tumour cohort. 
Frameworks have also been established where multiple tools are run on the 
same dataset (84, 93). By combining orthogonal approaches to driver 
detection, high-confidence compendiums of cancer genes may be generated. 
  
Genomic mutational heterogeneity in cancer 
20 
  
Figure 9. Some different approaches in cancer driver detection. Created with BioRender.com. 
Martin Boström 
21 
1.4.3 DNA SEQUENCING TO MAP MUTATIONS 
All driver detection methods discussed here rely on the availability of somatic 
mutation data from tumours. Such data is generated by using high-throughput 
technologies to sequence tumour DNA and aligning the resulting short reads 
to the human genome. In order to filter out germline variants, non-tumour 
DNA is also sequenced, and variants that are present in the tumour but not in 
the normal sample are presented as the results of this somatic mutation calling. 
The process is complicated by the fact that tumour DNA may be of different 
purity, meaning the tumour sample will have some amount of non-tumour 
DNA as well, which has an effect on the expected number of sequencing reads 
that will contain a variant base. Another problem is the varying mappability of 
the genome, where repetitive regions are more difficult to detect mutations in. 
The end result of these complications is that somatic mutation calling is 
imperfect, with both missed somatic mutations and misclassified germline 
mutations being issues that have to be taken into consideration in downstream 
analyses. The latter can result in false positives in driver detection, and for that 
reason variants commonly occurring in the human population (SNPs, short for 
single-nucleotide polymorphisms) are often filtered out prior to driver analysis. 
The area of the genome covered when sequencing depends on the technique 
used. Most of the publicly available mutation data to date comes from WXS, 
where only the exome is sequenced. This is done by isolating exonic DNA, for 
instance through array-based capture, where single-stranded DNA from the 
target regions is used to bind exonic DNA before sequencing. As the coding 
part constitutes only 1 % of the full genome, this process is cheaper than 
sequencing the whole genome, and allows for deeper sequencing. As much of 
the search for cancer drivers is focused on protein-coding genes, WXS 
mutation calls are of great use. The availability of WXS somatic mutations is 
also excellent, in no small part due to The Cancer Genome Atlas (TCGA). This 
project has characterised a massive amount of genomic data, starting with 
glioblastoma in 2008 (94) and now covering 33 cancer types. 
WGS data presents an obvious advantage in that it allows for the analysis of 
non-coding regions as well, but the larger number of mutations covered can 
also help with modelling the background mutation rate. This is important for 
driver detection, as we have discussed at length. In particular, tumour-specific 
mutational models are much more feasible with the greater coverage, and 
therefore higher number of mutations, of WGS data. High-quality WGS 
somatic mutation calls have been generated by the Pan-Cancer Analysis of 
Whole Genomes (PCAWG) consortium (47), using WGS data from both 
TCGA and the International Cancer Genome Consortium (ICGC) (95). As the 
cost of sequencing goes down, WGS is becoming more and more common 
compared to WXS. 
Genomic mutational heterogeneity in cancer 
22 
1.4.4 DAMAGE AND REPAIR MAPS 
When attempting to characterise genomic mutational heterogeneity, somatic 
mutation data only provides part of the picture. As discussed in section 1.4 and 
1.4.1, mutations in tumours are the end result of DNA damage, repair, and 
selection. Detangling the relative roles of each of these processes is aided by 
mapping of not only mutations, but of damage formation and repair. There are 
many different methods available to this end. 
DAMAGE MAPS 
Of the many different damage mapping techniques available, we will briefly 
touch on Excision-seq, Damage-seq/HS-Damage-seq, and CPD-seq. In 
Excision-seq, damaged bases are excised using a base excision repair enzyme 
(96). Given a large enough amount of damage, this will split the DNA molecule 
into small double-stranded fragments that can be sequenced and mapped. The 
damaged sites will then correspond to the base just before the mapped read. If 
the amount of DNA damage is insufficient for this approach, an alternative 
version exists, where excision repair is instead used to destroy damaged DNA, 
leading to only undamaged DNA remaining to be mapped to the genome. 
In Damage-seq (97), and the improved version called HS-Damage-seq (High 
Sensitivity) (69), the fact that bulky adducts block most DNA polymerases is 
utilised. In HS-Damage-seq, fragments containing bulky lesions are partially 
copied, with the copies ending at the damaged sites. Copies of non-damaged 
fragments are discarded, allowing for amplification of relevant fragments only. 
These are then mapped to the genome, with the damaged site corresponding to 
the position right before the read. 
The most important damage mapping technique for this thesis is CPD-seq (56), 
used to map the positions of CPDs. In this method, the ends of DNA fragments 
are blocked with specific primers, followed by cleavage of the CPD site with 
T4 endonuclease V and subsequent end repair with another endonuclease. This 
leaves an unblocked end to which another primer can be ligated, and because 
of the initial blocking step, all such unblocked ends will be at CPD positions. 
Through amplification, sequencing, and mapping, the positions of CPDs can 
then be determined. As in the other techniques, the position just before the 
mapped read corresponds to the damaged site. 
REPAIR MAPS 
NER can be mapped using excision repair sequencing (XR-seq) (98, 99). In 
this technique, excised DNA fragments containing bulky lesions are captured 
using immunoprecipitation, typically with antibodies that only bind to one type 
of damage, such as CPDs. Adapters are added to the fragments, after which the 
damage is either reversed or bypassed with TLS. For instance, CPDs can be 
separated into regular pyrimidines using photolyase. After amplification of the 
Martin Boström 
23 
resulting undamaged DNA, the fragments can be sequenced and mapped to the 
genome in order to find where NER is active and to what extent. This technique 
has been used extensively to study differential repair of UV damage (65-67), 
as discussed in section 1.4.1. While highly useful, XR-seq does not show the 
exact position of the damage being repaired. 
An indirect way of mapping repair is to make damage maps at different time 
points. By calculating the difference in remaining damage between two time 
points, the level of repair in that time can be inferred (68). One of the main 
benefits of this approach is that it captures damage as well as repair. If only 
repair mapping is performed, it is not possible to tell whether high repair 
activity is caused by large amounts of DNA damage or because of high 
accessibility for DNA repair, for instance. Another benefit is that nucleotide-
resolution repair maps are possible provided that the chosen damage mapping 
technique has that property. 

Martin Boström 
25 
2 AIM 
The aim of this thesis is to investigate genomic mutational heterogeneity, 
particularly in cutaneous melanoma, with an aim to improving the mutational 
models required for driver detection in cancer. Commonly used trinucleotide 
models fail to capture effects related to longer sequence contexts, and do not 
account for mutational signature variability in different genomic regions – 
oversights that can result in both false positives and false negatives in the 
search for cancer genes. In three papers, we address some of these problems 
by: 
I. Investigating the cause of recurrent promoter mutations 
in UV-induced melanoma. 
 
II. Studying the sequence properties and epigenetic factors 
that contribute to variable levels of CPD formation, and 
its effect on the mutational signature of UV light. 
 
III. Developing a recurrence-independent method 
demonstrating a novel concept for driver detection, which 
is less sensitive to some forms of mutational 
heterogeneity. 
 
 

Martin Boström 
27 
3 RESULTS AND DISCUSSION 
3.1 INCREASED UV DAMAGE FORMATION 
UNDERLIES PROMOTER HOTSPOT 
MUTATIONS IN MELANOMA (PAPER I) 
As discussed in section 1.4.1, the majority of all recurrent mutations in 
cutaneous melanoma occur near TFBSs, specifically in or immediately 
upstream of the sequence context TTCCG, a sequence matching the binding 
motif of the ETS family of TFs. The high level of recurrence could easily be 
misinterpreted as positive selection, when in fact there are elements of genomic 
mutational heterogeneity at play. Several studies point to differential DNA 
repair due to TF binding being the cause of these mutations (65, 66), while 
Fredriksson et al. suggested that the main contributor might be increased UV 
damage formation (70), which has previously been shown to be modulated by 
protein binding (56, 68, 69, 100, 101). Following up on this, we set out to 
investigate the contributions of UV damage formation (specifically CPDs) and 
DNA repair in promoter mutation hotspots in melanoma. 
We started by characterising the hotspots in terms of mutations. For this, we 
used a cohort of 221 melanoma whole genomes, combining data from TCGA 
and ICGC (102, 103). With the exception of some notable driver genes, most 
highly recurrent mutations occurred near TFBSs, and the majority of those 
within 10 bp of the TTCCG context. The most recurrently mutated site was in 
the RPL13A promoter just upstream of a TTCCG motif, with mutations in 58 
tumours, outnumbering even the cancer-driving TERT promoter mutations (26, 
27). If the TERT mutations are excluded, the fraction of melanoma promoter 
mutations that are TTCCG-related increases with the recurrence of the 
mutations, illustrating the dominance of this phenomenon in promoters. 
Somewhat similarly, the number of mutated hotspots in each tumour was 
strongly correlated with their mutation burden, replicating the results of 
Fredriksson et al (70). This indicates that the hotspots are passenger mutations 
resulting from a mutational process, as opposed to driver mutations, since the 
selection of a mutation occurs irrespective of the overall burden in a tumour. 
We will revisit this concept in paper III, where it is applied to driver detection.  
3.1.1 UV DAMAGE FORMATION 
In order to study the relationship between UV damage formation and the 
hotspot mutations, we generated a map of CPD damage across the genome, 
adapting the CPD-seq technique previously used by Mao et al. in yeast (56) to 
work with Illumina sequencing. This method yields the position of one CPD 
per read pair, requiring extensive sequencing to attain high coverage. By 
Genomic mutational heterogeneity in cancer 
28 
comparison, sequencing where each base in the read is informative (for 
instance for mutation calling) covers two orders of magnitude more positions 
per read pair. 
To generate the map, we irradiated A375 melanoma cells with UV light, 
immediately followed by CPD-seq to ensure that there would be no time for 
DNA repair before sequencing, thus isolating the damage formation 
phenomenon. The UV exposure was performed on both cellular and naked 
DNA, as the UV damage formation in the former is affected by protein binding, 
for which naked DNA acted as a control. To this we added a sample with no 
UV exposure as a control for the method itself. We mapped 200 million CPDs 
to the genome in the cellular DNA sample, creating the most extensive map of 
CPD damage to date. 
As even this large dataset did not provide quantitative CPD data for individual 
regions, all of the TTCCG-related hotspots mutated in at least 5 tumours were 
aggregated. By centring a window on the TTCCG motif for each hotspot, we 
could compare mutations and CPD formation in the hotspots (Figure 10). The 
expected mutational peaks were accompanied by similarly strong CPD peaks, 
but only in cellular DNA. Neither the UV-exposed naked DNA nor the non-
exposed control sample showed any increase in CPD formation at the hotspot 
sites. The fact that UV damage formation was increased only in cellular DNA 
indicates that the binding of TFs to the TFBSs increases the sensitivity to UV 
light at these sites, ultimately resulting in C>T mutations. The dominant role 
of this phenomenon in explaining the promoter hotspots is further cemented 
by the strong correlation between the level of recurrence of mutations with the 
amount of CPD formation. It is further supported by another paper by Mao et 
al. that came out the same year (104). 
Figure 10. Comparison of mutations in 221 melanoma genomes with CPD formation following 
UV exposure with and without bound proteins, as well as a control sample without UV exposure, 
in aggregated hotspot positions. 
Martin Boström 
29 
3.1.2 THE ROLE OF REPAIR 
To directly test the role of DNA repair versus damage formation, we UV-
irradiated both A375 cells with functioning NER and fibroblasts with 
homozygous mutations in several NER-related proteins and looked for 
resulting mutations in the RPL13A hotspot. The UV dose had to be limited to 
only 20 J/m2 (as compared to 1000 J/m2 for the CPD map), as the repair-
deficient cells were sensitive to UV light, but even at this low exposure 
mutations at the hotspot site stood out after sequencing with the ultrasensitive 
SiMSen-Seq technique (105). Notably, mutations at the hotspot site were 
observed in all of the repair-deficient cell lines, arguing against differential 
DNA repair as the main explanation of the hotspots. 
As restricted access for NER to occupied TFBSs has been demonstrated to 
cause increased mutation rates (65-67), we decided to assess whether CPD 
formation also plays a part in this phenomenon. Therefore, we compared 
mutation rates around TFBSs in our melanoma cohort with CPD formation. 
While the same increased mutation rate observed by others was apparent, no 
great changes from expectations were evident in CPD formation. Instead, the 
TFBS-centred mutation peak matched well with impaired NER, mapped with 
XR-seq. Additionally, we observed a complete lack of increased mutation rate 
around TFBSs in cutaneous squamous cell carcinomas (cSCCs) deficient in 
global NER, indicating that NER is indeed responsible for these mutations. 
Interestingly, when we filtered the TFBSs to remove TTCCG-related 
promoters, there was a marked decrease in mutations in melanoma, but not in 
the NER-deficient cSCCs, suggesting that impaired NER contributes to 
mutations in TTCCG-related promoters as well. 
In conclusion, while impaired NER does indeed appear to be the source of most 
mutations around TFBSs, including in TTCCG-related promoters, the 
extraordinary recurrence of mutations at specific positions in and near the 
TTCCG motif are caused by increased UV damage susceptibility upon TF 
binding. This phenomenon illustrates the need for mutational models that 
accommodate longer sequence patterns than the normally used trinucleotides, 
as well as how seemingly promising putative drivers may just be passengers, 
misidentified because of an insufficient understanding of genomic mutational 
heterogeneity. 
Genomic mutational heterogeneity in cancer 
30 
3.2 VARIATION OF THE UV MUTATIONAL 
SIGNATURE IN THE GENOME (PAPER II) 
Trinucleotide signatures are widely used in modelling mutation rates for driver 
detection, but as demonstrated in paper I, they fail to capture effects related to 
extended sequences. The TTCCG motif that is central to the UV-induced 
melanoma hotspots is problematic not only because it is longer than 
trinucleotide models can accommodate, but also because it only affects 
mutation rates in active promoters. In this paper, we set out to study how the 
UV trinucleotide signature varies in different regions of the genome, as well 
as to propose an extended trinucleotide-based mutational model that 
incorporates longer sequences. 
3.2.1 TRINUCLEOTIDE SIGNATURE VARIATION 
The mutational profile of cutaneous melanomas is dominated by the UV-
associated signature 7, making it ideal for investigating variation across the 
genome, as no deconvolution of signatures is required. For this reason, we 
selected 130 melanoma whole genomes with a high fraction of UV-related 
mutations (>80 % C>T or CC>TT in dipyrimidines contexts, and at least 
10,000 mutations) from the cohort used in paper I. Unsurprisingly, the 
mutational profile of this dataset closely resembled the UV signature. To study 
signature variation across the genome, we utilised a ChromHMM model based 
on RoadMap epigenomic data (106, 107) dividing the genome into 15 
chromatin states, such as TSSs, transcribed regions, and heterochromatin. 
While the mutational signature in these regions was mostly highly similar to 
the UV signature, principal component analysis (PCA) separated regions 
related to active transcription start sites (TSS) from the others. These regions 
had lower cosine similarity to the UV signature, with the biggest difference 
being a strongly decreased number of mutations in the TCG trinucleotide 
context. While TCG is an uncommon trinucleotide in the genome due to 
methylation-mediated C>T mutations, as discussed in section 1.3.2, it has the 
highest weight in the UV signature. Previous observations confirm a lower 
number of TCG mutations in promoters which varied with methylation level, 
leading to a lower mutation rate in these regions (66). To confirm the 
connection between lower methylation levels and reduced TCG mutation rates, 
we used bisulfite sequencing data (107) to calculate the average methylation 
level of the chromatin states (low in TSS-associated regions and high in the 
rest of the genome), as well as in annotated promoters. In both cases, there was 
a clear correlation between the level of methylation and the amount of TCG 
mutations, explaining why active TSSs have fewer TCG mutations than the 
rest of the genome, and the deviation from the UV signature. 
Martin Boström 
31 
One plausible explanation for the relationship between TCG mutations and 
methylation levels is the documented propensity for CPD formation at 
methylated cytosines upon exposure to UVB light (41, 42). To investigate this 
possible cause, we generated a new CPD map using UVB wavelengths, as the 
CPD dataset in paper I used UVC light. The UVC CPD dataset was also 
included for comparison. In order to compare the CPD data to the mutational 
signature, we defined a similar trinucleotide signature for CPDs, including the 
CPD dinucleotide and one additional base on the 3’ side. This allowed us to 
study methylated cytosines in CPDs, by including both the CPD and 
overlapping CpG’s (YCG). Repeating the methylation level analysis 
performed for mutational signatures, we observed that CPD formation at YCG 
trinucleotides was similarly correlated with high methylation, though only for 
UVB and not UVC light (Figure 11a). In further support of methylation-related 
CPD formation as the cause of the variation in the mutational signature, a 
comparison of promoter and non-promoter regions showed significantly lower 
YCG CPD formation in promoters, again only for UVB light (Figure 11b). 
While CCG mutations did not have decreased weight in the mutational 
signature in promoters, this could simply be because of low frequency and the 
use of relative signature weights. 
3.2.2 EXTENDED UV SIGNATURE 
In an effort to model the effect of sequence contexts longer than trinucleotides 
on mutation rates in different regions, we extended the traditional trinucleotide 
signature model to include the presence or absence of selected pentamers 
within 10 bp on either side. For each chromatin state, pentamers frequently co-
occurring with C>T mutations were selected, and the top-ranking pentamers 
from each region were included in a regression model together with the regular 
Figure 11. The effect of methylation levels on the UV signature. (a) Methylated cytosines are 
more prone to CPD formation than unmethylated cytosines when exposed to UVB, but not UVC 
light. (b) The low level of methylation at CpG’s in TSS-related regions compared to the rest of 
the genome results in fewer CPDs forming in YCG contexts. This causes a significant decrease 
in the weight of C>T mutations in the TCG context in the UV mutational signature.  
Genomic mutational heterogeneity in cancer 
32 
trinucleotides. Using this model, we found both stimulating and attenuating 
effects from proximal pentamers. Most relevant to this thesis, the TTCCG 
motif was found to increase the probability of mutations, but only in TSS-
related regions. When applying this model to the TTCCG-related hotspots in 
paper I, the expected mutation rates were considerably higher than predicted 
by traditional trinucleotide signatures, thereby better modelling the mutational 
heterogeneity observed in these regions. 
 
Martin Boström 
33 
3.3 RECURRENCE-INDEPENDENT DRIVER 
DETECTION (PAPER III) 
In paper I, we argued that the correlation between the number of TTCCG 
hotspot mutations in each individual tumour and that tumour’s mutation 
burden indicated that the hotspots were passenger mutations. Essentially, if the 
probability of observing a mutation in a given region is directly proportional 
to the exposure of the mutational process that would cause it, then there is no 
evidence of positive selection. This is because positive selection only comes 
into play after a mutation has been formed, taking no heed of the process that 
brought it about. As such, a mutation that is beneficial to a tumour should 
generally be under the same degree of positive selection regardless of the 
overall mutation burden of the tumour, thereby disrupting the correlation 
between mutation burden and the occurrence of the mutation in tumours.  
In this paper, we set out to use this concept in the opposite way of how it was 
used in paper I. Instead of confirming passenger status upon observing the 
correlation, we developed a test that identifies cancer drivers by searching for 
disrupted correlation. In a cohort of patients, this would manifest as a higher-
than-expected portion of the mutations in a region occurring in low-burden 
tumours. There are several potential benefits to a driver test based on this 
principle. First of all, it is an orthogonal approach to driver detection compared 
to currently used methods, and as such could help identify cancer genes that 
are missed by other approaches. Secondly, an approach that bypasses 
recurrence effects like this could prove to be less sensitive to the false positives 
that plague driver tests due to inaccuracies in the mutational model, such as the 
TTCCG hotspots. 
3.3.1 IMPLEMENTATION OF THE METHOD 
Our method uses a simple trinucleotide-based mutational model that only takes 
SNVs into account, where the probability of observing a given mutation 
depends on the frequency of that mutation type in the genome. The model is 
entirely patient-specific when using WGS data, but for WXS data it is 
calculated based on the whole cohort and then scaled to each tumour using 
their mutation burden, due to the lower coverage of this data type. Using this 
model, the probability of observing one or more mutations in a region, 
typically a gene, can be calculated for each tumour. We then simulate 
thousands of cohorts with the same number of mutated tumours as observed in 
the region of interest in the real cohort, with the probability of each tumour 
being mutated determined by our model. By comparing the likelihood of the 
combination of mutated tumours in the real cohort to the simulated ones, we 
can determine whether it is likely to be the result of positive selection (Figure 
12). 
Genomic mutational heterogeneity in cancer 
34 
 
Figure 12. Positive selection test concept. The probability of a region (e.g., a gene) being 
mutated is calculated for each tumour in a cohort. This is used to determine the likelihood of 
different combinations of mutated tumours. The combination in the real cohort is compared to 
simulated cohorts with the same number of mutated tumours to test for positive selection. 
3.3.2 DETECTING DRIVERS IN MELANOMA 
To evaluate the method, we used somatic SNVs from 466 melanoma whole-
exome tumour genomes from TCGA. We tested all genes with mutations in at 
least 3 tumours, resulting in 6 significant genes at a false discovery rate (FDR) 
of 5 %. At the top of this list were BRAF and NRAS, both well-known drivers 
in melanoma, and mutated in a large portion of the cohort. The other significant 
genes were PTEN, GNAQ, MAP2K1, and KIT, all known cancer genes, with 
perhaps the most interesting result being GNAQ, with only 7 mutated tumours. 
This gene is a cancer driver in uveal melanoma and has been detected in 
melanoma subtypes that are not UV-related (102, 108, 109). Upon closer 
inspection, we found that the low-burden tumours with mutations in GNAQ 
had a much lower percentage of UV-typical mutations (C>T in dipyrimidines 
contexts) than most of the tumours. The same was true for KIT and the not 
quite significant SF3B1, both of which have also been reported in non-UV-
related melanomas. This result suggests that our method might be particularly 
suitable for finding cancer drivers in genes that are atypical of the cancer type 
being analysed. 
To test the method’s sensitivity to flaws in the mutational model, we next 
evaluated mutations in promoter regions in the cohort of 221 melanoma whole 
genomes used in paper I. In methods that do not include the TTCCG hotspot 
effect in their models, for instance through an extended trinucleotide signature 
as discussed in paper II, TTCCG-related sites would likely fill the list of 
significant results with false positives, interspersed with real cancer drivers 
such as TERT promoter mutations. However, since TTCCG hotspot mutations 
Martin Boström 
35 
correlate with UV-induced mutation burden, our method did not attribute 
positive selection to the TTCCG-related promoters despite not modelling the 
TTCCG phenomenon, instead only identifying the TERT promoter mutations 
as drivers. To illustrate the difference, we compared the TERT promoter with 
the most recurrently mutated TTCCG-related promoter, that of RPL13A, and 
noted that TERT promoter mutations skewed toward low-burden tumours in a 
way that RPL13A promoter mutations did not (Figure 13). As a representative 
for recurrence-based methods, ActiveDriverWGS (73) found both the TERT 
promoter and the TTCCG hotspots to be significant, with the RPL13A 
promoter edging out TERT as the most significant. This demonstrates how our 
method’s independence of recurrence reduces the impact of model flaws, 
avoiding problems with some types of genomic mutational heterogeneity. In 
theory, this should be applicable to any model flaws, provided that they affect 
the tumours in a cohort equally. 
3.3.3 DETECTING CANCER DRIVERS IN DIFFERENT 
CANCER TYPES 
After our initial tests in melanoma, we evaluated the method’s performance in 
additional cancer types using whole-exome tumour genomes from TCGA. Out 
of all cancer types with enough mutations for inclusion (≥ 1000 genes with at 
least 3 mutated tumours), endometrioid uterine corpus carcinoma (UCEC) 
performed best, with some 50 significant genes, most of which are canonical 
cancer genes (110) or identified in other cancer gene catalogues (48, 49, 92, 
93). This was attributed to the high mutational burden of UCEC, a trait shared 
by most cancer types that our method performed well with. Of all the genes 
identified in this test, several were promising putative driver genes not 
previously touted as such, including BMF (a pro-apoptotic protein) and DOK1 
(a negative regulator of the insulin signalling pathway) in UCEC, as well as 
DCLK1 (a marker for tuft cells in intestines) in stomach cancer. 
Figure 13. Log2 ratio of the number of observed vs expected mutated tumours in 75 tumour bins 
in a sliding window, for the RPL13A and TERT promoters in melanoma. The coloured area 
indicates the least extreme 90 % of the simulated cohorts. The skew of mutations toward low-
burden tumours in the TERT promoter is evidence of positive selection and is absent in the 
RPL13A promoter. 
Genomic mutational heterogeneity in cancer 
36 
In conclusion, our method introduces a novel concept for driver detection that 
is orthogonal to currently used approaches, allowing it to detect some drivers 
that are missed by other tools. The method’s uncoupling from recurrence 
signals shields it from false positives caused by some forms of poorly modelled 
genomic mutational heterogeneity, such as the TTCCG hotpots. As no model 
is ever perfect, this is a valuable property that should make our method a useful 
complement to other tools. 
Martin Boström 
37 
4 CONCLUSIONS AND FUTURE 
PERSPECTIVES 
Searching for cancer drivers in genomic data generally entails finding regions 
that are more mutated than expected. The strongest drivers are easily 
discernible without advanced mutational models, but detailed modelling of 
genomic mutational heterogeneity is necessary to find more subtle drivers, as 
well as to avoid false positives. Mutational heterogeneity exists on all scales, 
from megabase-level effects all the way down to single nucleotides, and it 
varies between different mutational processes. In this thesis, we have 
endeavoured to characterise heterogeneity in cutaneous melanoma related to 
the variable formation rate of UV-induced DNA damage, specifically TTCCG-
related mutation hotspots in promoters and the variation of the UV mutational 
signature due to methylation levels. While these advances will undoubtedly be 
useful, they cover only a fraction of the heterogeneity that mutational models 
have to contend with. In an attempt to circumvent the problem of flawed 
models, we developed a driver detection test that is not affected by recurrence 
signals, instead only testing whether the combination of mutated tumours in a 
cohort is likely while ignoring the number of mutations. This approach proved 
resilient to poorly modelled mutational heterogeneity in the form of the 
TTCCG promoter hotspots and should be similarly useful for other 
heterogeneity that affects the entire cohort equally. 
While the new driver detection method showed great promise, there are several 
avenues for improvement that are worth exploring. Firstly, the mutational 
model was quite simplistic, and could be improved. The test is robust 
concerning model flaws that affect all tumours in a cohort equally, but it is 
sensitive to differences between tumours. Better modelling of the mutation 
rates in individual tumours could potentially be achieved with tumour-specific 
expression scaling, for instance, and would be expected to improve driver 
detection results. Secondly, many tools use multiple driver detection concepts 
synergistically, and we believe our method could fit well into such approaches. 
For instance, studying how the dN/dS ratio or the functional impact of 
mutations varies across a cohort of tumours could yield more information than 
just treating the cohort as one data point. 
In conclusion, handling genomic mutational heterogeneity is of great 
importance for the discovery of cancer drivers. As the amount of available 
genomic data increases, ever better models as well as improved driver 
detection methods will be required to find drivers without being inundated with 
false positives. In this regard, we believe we have contributed by adding some 
small pieces to the puzzle.

Martin Boström 
39 
ACKNOWLEDGEMENTS 
Först och främst vill jag förstås tacka dig, Erik, för att jag har fått vara en del 
av labbet de senaste fem åren. När jag först ansökte om att doktorera hos dig 
tittade jag på bilder på gruppen på labbhemsidan, och noterade hur avslappnad 
och trevlig stämningen verkade vara. Det intrycket visade sig vara helt rätt, och 
jag tror inte det är någon slump att just din grupp blev så. Jag har lärt mig 
mycket om bioinformatik, skrivande, presentationer, figurdesign och allt 
möjligt av dig, men det allra bästa har varit din förmåga att entusiasmera. Det 
har alltid räckt att ta ett möte med dig när ett projekt känns tungrott för att det 
ska kännas kul igen, och det har varit ovärderligt. Tack! Tack också till min 
bihandledare Anders, inte minst för samarbetet på polη-artikeln. 
Secondly, a big thank you to all members of the Larsson Lab, past and present. 
Kerryn, you are the glue that holds the lab together, both socially and 
scientifically. You're both a good friend and something of a bonus supervisor. 
Thank you for all the good times, and for always being willing to give me input 
on anything, be it figures, scientific writing (including this thesis!), or dog 
names. Markus, du har stenkoll på vilket håll alla dörrar öppnas åt, och blandar 
bara ibland ihop M&M. Att ha fått dela hela doktorandresan med en vän som 
dig är jag mycket glad för. Den av oss som får nytt jobb först får påtala behovet 
av en extra bioinformatiker så att vi kan fortsätta på nästa arbetsplats med. Ari, 
you have helped me all throughout my PhD, from the ISP on my very first day 
all the way to the tangled mess of forms and deadlines for the dissertation. This 
would have been a lot more stressful without you, and I’m very grateful for 
your help and support. Isabella, Vinod, and Tom, on the rare occasions when 
I actually show up at the lab in person these days, it’s always great to see you. 
There will be more of that now that the thesis writing is done, I think, and I’m 
looking forward to it! Arman, Alireza, and Emma, I’m very happy that you 
were part of the lab, and I hope to see you again soon. Babak, Swaraj, 
Susanne, Joakim, and Jimmy, you were there when my PhD journey started, 
and it wouldn’t have been the same without you. Finally, Katrin, you are an 
honorary Larsson Lab member at this point, and a very welcome addition to 
all pre-pandemic lunches along with the rest of the Clausen lab. I could go on 
much more about everyone mentioned here, but in the interest of keeping this 
somewhat brief: thank you all for making the lab such a welcoming and fun 
place to work. Better lab mates I could not ask for. 
Anna och Erik (Kristiansson), tack för exjobbet som lät mig prova på 
bioinformatik. Det passade mig så mycket bättre än allt annat på utbildningen 
och ni var fantastiska handledare. Det är tack vare er som jag kunde fortsätta 
med en doktorandtjänst! 
Genomic mutational heterogeneity in cancer 
40 
Mamma och Pappa, tack för allt stöd under doktorandperioden och de 27 åren 
innan det. Ni har alltid haft tid att prata när jag har behövt det, oavsett om det 
gäller vilken kaffekokare jag ska köpa eller de lite större livsbesluten. Tack 
också till resten av familjen (Jonas, Karin, Anne, Åke, August, Astrid och 
Mormor) för att ni finns och är den bästa familjen man kan önska sig. 
Louise och Anders, tack för att ni har tagit hand om Rufus ännu mer än vanligt 
under avhandlingsskrivandet. Det har underlättat enormt! Rufus, tack för att 
du är en fin hund. Sitt. 
Sist, men inte minst, Antonia. Tack för ditt tålamod de senaste månaderna när 
jag har jobbat kvällar och helger med avhandling och artiklar. Tack för all 
uppmuntran när det har varit svårt, inte bara inför disputationen utan under hela 
doktorandperioden. Att få komma hem till dig varje dag har gjort de senaste 
fem åren så mycket bättre. 
 
 
 
 
 
 
Martin Boström 
41 
REFERENCES 
1. Baym M, Lieberman TD, Kelsic ED, Chait R, Gross R, Yelin I, 
et al. Spatiotemporal microbial evolution on antibiotic landscapes. Science. 
2016;353(6304):1147-51. 
2. Bianconi E, Piovesan A, Facchin F, Beraudi A, Casadei R, 
Frabetti F, et al. An estimation of the number of cells in the human body. 
Annals of Human Biology. 2013;40(6):463-71. 
3. Matthews HK, Bertoli C, De Bruin RAM. Cell cycle control in 
cancer. Nature Reviews Molecular Cell Biology. 2021. 
4. Hanahan D, Weinberg RA. The hallmarks of cancer. Cell. 
2000;100(1):57-70. 
5. Hanahan D, Weinberg RA. Hallmarks of cancer: The next 
generation. Cell. 2011;144(5):646-74. 
6. Witsch E, Sela M, Yarden Y. Roles for Growth Factors in 
Cancer Progression. Physiology. 2010;25(2):85-101. 
7. Sporn MB, Todaro GJ. Autocrine secretion and malignant 
transformation of cells. N Engl J Med. 1980;303(15):878-80. 
8. Fernandez-Medarde A, Santos E. Ras in Cancer and 
Developmental Diseases. Genes & Cancer. 2011;2(3):344-58. 
9. Nishida N, Yano H, Nishida T, Kamura T, Kojiro M. 
Angiogenesis in cancer. Vascular Health and Risk Management. 
2006;2(3):213-9. 
10. Harris CC. p53 tumor suppressor gene: from the basic research 
laboratory to the clinic--an abridged historical perspective. Carcinogenesis. 
1996;17(6):1187-98. 
11. Downward J. Mechanisms and consequences of activation of 
protein kinase B/Akt. Current Opinion in Cell Biology. 1998;10(2):262-7. 
12. Okamoto K, Seimiya H. Revisiting Telomere Shortening in 
Cancer. Cells. 2019;8(2):107. 
13. Feuk L, Carson AR, Scherer SW. Structural variation in the 
human genome. Nature Reviews Genetics. 2006;7(2):85-97. 
14. Shlien A, Malkin D. Copy number variations and cancer. 
Genome Medicine. 2009;1(6):62. 
15. Chan CK, Aimagambetova G, Ukybassova T, Kongrtay K, 
Azizan A. Human Papillomavirus Infection and Cervical Cancer: 
Epidemiology, Screening, and Vaccination—Review of Current Perspectives. 
Journal of Oncology. 2019;2019:1-11. 
16. Yang X, Lippman ME. BRCA1 and BRCA2 in breast cancer. 
Breast Cancer Res Treat. 1999;54(1):1-10. 
17. Nowell PC. The clonal evolution of tumor cell populations. 
Science. 1976;194(4260):23-8. 
Genomic mutational heterogeneity in cancer 
42 
18. Olivier M, Hollstein M, Hainaut P. TP53 mutations in human 
cancers: origins, consequences, and clinical use. Cold Spring Harb Perspect 
Biol. 2010;2(1):a001008. 
19. Hamzehloie T, Mojarrad M, Hasanzadeh Nazarabadi M, 
Shekouhi S. The role of tumor protein 53 mutations in common human cancers 
and targeting the murine double minute 2-p53 interaction for cancer therapy. 
Iran J Med Sci. 2012;37(1):3-8. 
20. Aubrey BJ, Strasser A, Kelly GL. Tumor-Suppressor Functions 
of the TP53 Pathway. Cold Spring Harbor Perspectives in Medicine. 
2016;6(5):a026062. 
21. Itahana K, Dimri G, Campisi J. Regulation of cellular 
senescence by p53. Eur J Biochem. 2001;268(10):2784-91. 
22. Murugan AK, Grieco M, Tsuchida N. RAS mutations in human 
cancers: Roles in precision medicine. Seminars in Cancer Biology. 
2019;59:23-35. 
23. Malumbres M, Barbacid M. RAS oncogenes: the first 30 years. 
Nature Reviews Cancer. 2003;3(6):459-65. 
24. Soussi T, Wiman KG. TP53: an oncogene in disguise. Cell 
Death & Differentiation. 2015;22(8):1239-49. 
25. Pitolli C, Wang Y, Mancini M, Shi Y, Melino G, Amelio I. Do 
Mutations Turn p53 into an Oncogene? Int J Mol Sci. 2019;20(24):6241. 
26. Horn S, Figl A, Rachakonda PS, Fischer C, Sucker A, Gast A, 
et al. TERT Promoter Mutations in Familial and Sporadic Melanoma. Science. 
2013;339(6122):959-61. 
27. Huang FW, Hodis E, Xu MJ, Kryukov GV, Chin L, Garraway 
LA. Highly recurrent TERT promoter mutations in human melanoma. Science 
(New York, NY). 2013;339(6122):957-9. 
28. Elliott K, Larsson E. Non-coding driver mutations in human 
cancer. Nature Reviews Cancer. 2021;21(8):500-9. 
29. Corona RI, Seo J-H, Lin X, Hazelett DJ, Reddy J, Fonseca 
MAS, et al. Non-coding somatic mutations converge on the PAX8 pathway in 
ovarian cancer. Nature communications. 2020;11(1):2020-. 
30. Shuai S, Suzuki H, Diaz-Navarro A, Nadeu F, Kumar SA, 
Gutierrez-Fernandez A, et al. The U1 spliceosomal RNA is recurrently 
mutated in multiple cancers. Nature. 2019;574(7780):712-6. 
31. Suzuki H, Kumar SA, Shuai S, Diaz-Navarro A, Gutierrez-
Fernandez A, De Antonellis P, et al. Recurrent noncoding U1 snRNA 
mutations drive cryptic splicing in SHH medulloblastoma. Nature. 
2019;574(7780):707-11. 
32. Wang W, Wei Z, Lam TW, Wang J. Next generation 
sequencing has lower sequence coverage and poorer SNP-detection capability 
in the regulatory regions. Sci Rep. 2011;1:55. 
33. Payne JL, Wagner A. Mechanisms of mutational robustness in 
transcriptional regulation. Front Genet. 2015;6:322. 
Martin Boström 
43 
34. Liu B, Xue Q, Tang Y, Cao J, Guengerich FP, Zhang H. 
Mechanisms of mutagenesis: DNA replication in the presence of DNA 
damage. Mutation Research/Reviews in Mutation Research. 2016;768:53-67. 
35. Wu S, Zhu W, Thompson P, Hannun YA. Evaluating intrinsic 
and non-intrinsic cancer risk factors. Nat Commun. 2018;9(1):3490. 
36. Sharma R, Lewis S, Wlodarski MW. DNA Repair Syndromes 
and Cancer: Insights Into Genetics and Phenotype Patterns. Frontiers in 
Pediatrics. 2020;8(683). 
37. Ames BN, Shigenaga MK, Hagen TM. Oxidants, antioxidants, 
and the degenerative diseases of aging. Proc Natl Acad Sci U S A. 
1993;90(17):7915-22. 
38. Duncan BK, Miller JH. Mutagenic deamination of cytosine 
residues in DNA. Nature. 1980;287(5782):560-1. 
39. Abeti R, Zeitlberger A, Peelo C, Fassihi H, Sarkany RPE, 
Lehmann AR, et al. Xeroderma pigmentosum: overview of pharmacology and 
novel therapeutic strategies for neurological symptoms. Br J Pharmacol. 
2019;176(22):4293-301. 
40. Ikehata H, Ono T. The Mechanisms of UV Mutagenesis. 
Journal of Radiation Research. 2011;52(2):115-25. 
41. Tommasi S, Denissenko MF, Pfeifer GP. Sunlight induces 
pyrimidine dimers preferentially at 5-methylcytosine bases. Cancer research. 
1997;57(21):4727-30. 
42. Drouin R, Therrien JP. UVB-induced cyclobutane pyrimidine 
dimer frequency correlates with skin cancer mutational hotspots in p53. 
Photochem Photobiol. 1997;66(5):719-26. 
43. Nik-Zainal S, Alexandrov LB, Wedge DC, Van Loo P, 
Greenman CD, Raine K, et al. Mutational processes molding the genomes of 
21 breast cancers. Cell. 2012;149(5):979-93. 
44. Alexandrov LB, Nik-Zainal S, Wedge DC, Aparicio SaJR, 
Behjati S, Biankin AV, et al. Signatures of mutational processes in human 
cancer. Nature. 2013;500:415-21. 
45. Bird AP. CpG-rich islands and the function of DNA 
methylation. Nature. 1986;321(6067):209-13. 
46. Pfeifer GP, Denissenko MF, Olivier M, Tretyakova N, Hecht 
SS, Hainaut P. Tobacco smoke carcinogens, DNA damage and p53 mutations 
in smoking-associated cancers. Oncogene. 2002;21(48):7435-51. 
47. ICGC TCGA Pan-Cancer Analysis of Whole Genomes 
Consortium. Pan-cancer analysis of whole genomes. Nature. 
2020;578(7793):82-93. 
48. Martincorena I, Raine KM, Gerstung M, Dawson KJ, Haase K, 
Van Loo P, et al. Universal Patterns of Selection in Cancer and Somatic 
Tissues. Cell. 2017;171(5):1029-41.e21. 
49. Lawrence MS, Stojanov P, Polak P, Kryukov GV, Cibulskis K, 
Sivachenko A, et al. Mutational heterogeneity in cancer and the search for new 
cancer-associated genes. Nature. 2013;499(7457):214-8. 
Genomic mutational heterogeneity in cancer 
44 
50. Poulos RC, Wong JWH. Finding cancer driver mutations in the 
era of big data research. Biophys Rev. 2019;11(1):21-9. 
51. Gonzalez-Perez A, Sabarinathan R, Lopez-Bigas N. Local 
Determinants of the Mutational Landscape of the Human Genome. Cell. 
2019;177(1):101-14. 
52. Schuster-Bockler B, Lehner B. Chromatin organization is a 
major influence on regional mutation rates in human cancer cells. Nature. 
2012;488(7412):504-7. 
53. Zheng CL, Wang NJ, Chung J, Moslehi H, Sanborn JZ, Hur JS, 
et al. Transcription Restores DNA Repair to Heterochromatin, Determining 
Regional Mutation Rates in Cancer Genomes. Cell Reports. 2014;9(4):1228-
34. 
54. Yazdi PG, Pedersen BA, Taylor JF, Khattab OS, Chen Y-H, 
Chen Y, et al. Increasing Nucleosome Occupancy Is Correlated with an 
Increasing Mutation Rate so Long as DNA Repair Machinery Is Intact. PLOS 
ONE. 2015;10(8):e0136574. 
55. Chen X, Chen Z, Chen H, Su Z, Yang J, Lin F, et al. 
Nucleosomes suppress spontaneous mutations base-specifically in eukaryotes. 
Science. 2012;335(6073):1235-8. 
56. Mao P, Smerdon MJ, Roberts SA, Wyrick JJ. Chromosomal 
landscape of UV damage formation and repair at single-nucleotide resolution. 
Proceedings of the National Academy of Sciences of the United States of 
America. 2016;113(32):9057-62. 
57. Frigola J, Sabarinathan R, Mularoni L, Muiños F, Gonzalez-
Perez A, López-Bigas N. Reduced mutation rate in exons due to differential 
mismatch repair. Nature Genetics. 2017;49:1684. 
58. Vrieling H, Venema J, van Rooyen ML, van Hoffen A, 
Menichini P, Zdzienicka MZ, et al. Strand specificity for UV-induced DNA 
repair and mutations in the Chinese hamster HPRT gene. Nucleic Acids Res. 
1991;19(9):2411-5. 
59. Mugal CF, von Grünberg H-H, Peifer M. Transcription-Induced 
Mutational Strand Bias and Its Effect on Substitution Rates in Human Genes. 
Molecular Biology and Evolution. 2008;26(1):131-42. 
60. Haradhvala NJ, Polak P, Stojanov P, Covington KR, Shinbrot 
E, Hess JM, et al. Mutational Strand Asymmetries in Cancer Genomes Reveal 
Mechanisms of DNA Damage and Repair. Cell. 2016;164(3):538-49. 
61. Supek F, Lehner B. Differential DNA mismatch repair 
underlies mutation rate variation across the human genome. Nature. 
2015;521(7550):81-4. 
62. Stamatoyannopoulos JA, Adzhubei I, Thurman RE, Kryukov 
GV, Mirkin SM, Sunyaev SR. Human mutation rate associated with DNA 
replication timing. Nature Genetics. 2009;41(4):393-5. 
63. Tomkova M, Tomek J, Kriaucionis S, Schuster-Böckler B. 
Mutational signature distribution varies with DNA replication timing and 
strand asymmetry. Genome Biology. 2018;19(1):129. 
Martin Boström 
45 
64. Kreisel K, Engqvist MKM, Kalm J, Thompson LJ, Boström M, 
Navarrete C, et al. DNA polymerase η contributes to genome-wide lagging 
strand synthesis. Nucleic Acids Research. 2018;47(5):2425-35. 
65. Sabarinathan R, Mularoni L, Deu-Pons J, Gonzalez-Perez A, 
Lopez-Bigas N. Nucleotide excision repair is impaired by binding of 
transcription factors to DNA. Nature. 2016;532(7598):264-7. 
66. Perera D, Poulos RC, Shah A, Beck D, Pimanda JE, Wong 
JWH. Differential DNA repair underlies mutation hotspots at active promoters 
in cancer genomes. Nature. 2016;532(7598):259-63. 
67. Poulos RC, Thoms JAI, Guan YF, Unnikrishnan A, Pimanda 
JE, Wong JWH. Functional Mutations Form at CTCF-Cohesin Binding Sites 
in Melanoma Due to Uneven Nucleotide Excision Repair across the Motif. Cell 
Reports. 2016;17(11):2865-72. 
68. Frigola J, Sabarinathan R, Gonzalez-Perez A, Lopez-Bigas N. 
Variable interplay of UV-induced DNA damage and repair at transcription 
factor binding sites. Nucleic Acids Research. 2020;49(2):891-901. 
69. Hu J, Adebali O, Adar S, Sancar A. Dynamic maps of UV 
damage formation and repair for the human genome. Proceedings of the 
National Academy of Sciences. 2017:201706522-. 
70. Fredriksson NJ, Elliott K, Filges S, Van den Eynden J, 
Ståhlberg A, Larsson E. Recurrent promoter mutations in melanoma are 
defined by an extended context-specific mutational signature. PLoS Genetics. 
2017;13(5). 
71. Wei G-H, Badis G, Berger MF, Kivioja T, Palin K, Enge M, et 
al. Genome-wide analysis of ETS-family DNA-binding in vitro and in vivo. 
The EMBO Journal. 2010;29(13):2147-60. 
72. Lochovsky L, Zhang J, Fu Y, Khurana E, Gerstein M. LARVA: 
an integrative framework for large-scale analysis of recurrent variants in 
noncoding annotations. Nucleic Acids Res. 2015;43(17):8123-34. 
73. Zhu H, Uusküla-Reimand L, Isaev K, Wadi L, Alizada A, Shuai 
S, et al. Candidate Cancer Driver Mutations in Distal Regulatory Elements and 
Long-Range Chromatin Interaction Networks. Molecular Cell. 
2020;77(6):1307-21.e10. 
74. Sharma Y, Miladi M, Dukare S, Boulay K, Caudron-Herger M, 
Groß M, et al. A pan-cancer analysis of synonymous mutations. Nature 
Communications. 2019;10(1). 
75. Supek F, Miñana B, Valcárcel J, Gabaldón T, Lehner B. 
Synonymous Mutations Frequently Act as Driver Mutations in Human 
Cancers. Cell. 2014;156(6):1324-35. 
76. Martincorena I, Roshan A, Gerstung M, Ellis P, Van Loo P, 
McLaren S, et al. High burden and pervasive positive selection of somatic 
mutations in normal human skin. Science. 2015;348(6237):880-6. 
77. Wagner A. Rapid detection of positive selection in genes and 
genomes through variation clusters. Genetics. 2007;176(4):2451-63. 
Genomic mutational heterogeneity in cancer 
46 
78. Tamborero D, Gonzalez-Perez A, Lopez-Bigas N. 
OncodriveCLUST: exploiting the positional clustering of somatic mutations to 
identify cancer genes. Bioinformatics (Oxford, England). 2013;29(18):2238-
44. 
79. Reimand J, Bader GD. Systematic analysis of somatic 
mutations in phosphorylation signaling predicts novel cancer drivers. Mol Syst 
Biol. 2013;9:637. 
80. Kamburov A, Lawrence MS, Polak P, Leshchiner I, Lage K, 
Golub TR, et al. Comprehensive assessment of cancer missense mutation 
clustering in protein structures. Proc Natl Acad Sci U S A. 
2015;112(40):E5486-95. 
81. Porta-Pardo E, Godzik A. e-Driver: a novel method to identify 
protein regions driving cancer. Bioinformatics (Oxford, England). 
2014;30(21):3109-14. 
82. Tokheim C, Bhattacharya R, Niknafs N, Gygax DM, Kim R, 
Ryan M, et al. Exome-Scale Discovery of Hotspot Mutation Regions in Human 
Cancer Using 3D Protein Structure. Cancer research. 2016;76(13):3719-31. 
83. Niu B, Scott AD, Sengupta S, Bailey MH, Batra P, Ning J, et 
al. Protein-structure-guided discovery of functional mutations across 19 cancer 
types. Nat Genet. 2016;48(8):827-37. 
84. Martínez-Jiménez F, Muiños F, Sentís I, Deu-Pons J, Reyes-
Salazar I, Arnedo-Pac C, et al. A compendium of mutational cancer driver 
genes. Nature Reviews Cancer. 2020;20(10):555-72. 
85. Carter H, Chen S, Isik L, Tyekucheva S, Velculescu VE, 
Kinzler KW, et al. Cancer-specific high-throughput annotation of somatic 
mutations: computational prediction of driver missense mutations. Cancer 
research. 2009;69(16):6660-7. 
86. Reva B, Antipin Y, Sander C. Predicting the functional impact 
of protein mutations: application to cancer genomics. Nucleic Acids Res. 
2011;39(17):e118. 
87. Gonzalez-Perez A, Lopez-Bigas N. Functional impact bias 
reveals cancer drivers. Nucleic Acids Res. 2012;40(21):e169. 
88. Kumar P, Henikoff S, Ng PC. Predicting the effects of coding 
non-synonymous variants on protein function using the SIFT algorithm. 
Nature Protocols. 2009;4(7):1073-81. 
89. Ng PC, Henikoff S. SIFT: Predicting amino acid changes that 
affect protein function. Nucleic Acids Res. 2003;31(13):3812-4. 
90. Adzhubei IA, Schmidt S, Peshkin L, Ramensky VE, 
Gerasimova A, Bork P, et al. A method and server for predicting damaging 
missense mutations. Nat Methods. 2010;7(4):248-9. 
91. Mularoni L, Sabarinathan R, Deu-Pons J, Gonzalez-Perez A, 
López-Bigas N. OncodriveFML: a general framework to identify coding and 
non-coding regions with cancer driver mutations. Genome biology. 
2016;17(1):128-. 
Martin Boström 
47 
92. Dietlein F, Weghorn D, Taylor-Weiner A, Richters A, Reardon 
B, Liu D, et al. Identification of cancer driver genes based on nucleotide 
context. Nature Genetics. 2020. 
93. Bailey MH, Tokheim C, Porta-Pardo E, Sengupta S, Bertrand 
D, Weerasinghe A, et al. Comprehensive Characterization of Cancer Driver 
Genes and Mutations. Cell. 2018;173(2):371-85.e18. 
94. The Cancer Genome Atlas Network. Comprehensive genomic 
characterization defines human glioblastoma genes and core pathways. Nature. 
2008;455(7216):1061-8. 
95. The International Cancer Genome Consortium. International 
network of cancer genome projects. Nature. 2010;464:993. 
96. Bryan DS, Ransom M, Adane B, York K, Hesselberth JR. High 
resolution mapping of modified DNA nucleobases using excision repair 
enzymes. Genome Res. 2014;24(9):1534-42. 
97. Hu J, Lieb JD, Sancar A, Adar S. Cisplatin DNA damage and 
repair maps of the human genome at single-nucleotide resolution. Proc Natl 
Acad Sci U S A. 2016;113(41):11507-12. 
98. Hu J, Adar S, Selby CP, Lieb JD, Sancar A. Genome-wide 
analysis of human global and transcription-coupled excision repair of UV 
damage at single-nucleotide resolution. Genes & Development. 
2015;29(9):948-60. 
99. Hu J, Li W, Adebali O, Yang Y, Oztas O, Selby CP, et al. 
Genome-wide mapping of nucleotide excision repair with XR-seq. Nat Protoc. 
2019;14(1):248-82. 
100. Pfeifer GP, Drouin R, Riggs AD, Holmquist GP. Binding of 
transcription factors creates hot spots for UV photoproducts in vivo. Molecular 
and cellular biology. 1992;12(4):1798-804. 
101. Tornaletti S, Pfeifer GP. UV light as a footprinting agent: 
modulation of UV-induced DNA damage by transcription factors bound at the 
promoters of three human genes. Journal of molecular biology. 
1995;249(4):714-28. 
102. Hayward NK, Wilmott JS, Waddell N, Johansson PA, Field 
MA, Nones K, et al. Whole-genome landscapes of major melanoma subtypes. 
Nature. 2017;545:175. 
103. The Cancer Genome Atlas Network. Genomic Classification of 
Cutaneous Melanoma. Cell. 2015;161(7):1681-96. 
104. Mao P, Brown AJ, Esaki S, Lockwood S, Poon GMK, Smerdon 
MJ, et al. ETS transcription factors induce a unique UV damage signature that 
drives recurrent mutagenesis in melanoma. Nature Communications. 
2018;9:2626. 
105. Ståhlberg A, Krzyzanowski PM, Jackson JB, Egyud M, Stein 
L, Godfrey TE. Simple, multiplexed, PCR-based barcoding of DNA enables 
sensitive mutation detection in liquid biopsies using sequencing. Nucleic Acids 
Research. 2016;44(11):e105-e. 
Genomic mutational heterogeneity in cancer 
48 
106. Ernst J, Kellis M. ChromHMM: automating chromatin-state 
discovery and characterization. Nat Methods. 2012;9(3):215-6. 
107. Kundaje A, Meuleman W, Ernst J, Bilenky M, Yen A, Heravi-
Moussavi A, et al. Integrative analysis of 111 reference human epigenomes. 
Nature. 2015;518(7539):317-30. 
108. Kim C-Y, Kim DW, Kim K, Curry J, Torres-Cabala C, Patel S. 
GNAQmutation in a patient with metastatic mucosal melanoma. BMC Cancer. 
2014;14(1):516. 
109. Livingstone E, Zaremba A, Horn S, Ugurel S, Casalini B, 
Schlaak M, et al. GNAQ and GNA 11 mutant nonuveal melanoma: a subtype 
distinct from both cutaneous and uveal melanoma. British Journal of 
Dermatology. 2020;183(5):928-39. 
110. Tate JG, Bamford S, Jubb HC, Sondka Z, Beare DM, Bindal N, 
et al. COSMIC: the Catalogue Of Somatic Mutations In Cancer. Nucleic Acids 
Research. 2019;47(D1):D941-D7.