Inherited DNA lesions cause multiallelic mutations in nematodes
Abstract
By reanalyzing sequencing data from a large-scale C. elegans mutagenesis experiment, I’ve found evidence of multiallelic variants in many worms. Given the experimental design used in the original paper, I believe the most parsimonious explanation for these multiallelic variants is that parental DNA lesions are inherited by F1 animals, and later serve as templates for multiple rounds of DNA replication during F1 gametogenesis. Because every sequenced population comprises the offspring of a single F1, mosaicism in the F1’s gametes will generate multi-allelic variants in the sequencing reads.
Background
C. elegans mutagenesis
In 2020, Volkova et al.1 mutagenized C. elegans strains with about a dozen unique DNA damaging agents. Many of these strains harbored homozygous loss-of-function (LOF) alleles in genes related to DNA replication and repair, including translesion synthesis (polk-1), nucleotide excision repair (xpa-1, xpc-1, and xpf-1), and so on. The authors discovered that mutagens leave behind characteristic patterns of single-base substitutions, insertions/deletions, and larger structural variants; we refer to these patterns as “mutational signatures.” They also found that mutation rates and signatures often depend on genetic background (see Figure 1).
agt-1 alkyltransferase LOF mutants. agt-1 mutants cannot remove rare \(O^6\)-methylguanine lesions caused by MMS, resulting in an excess of C\(\rightarrow\)T mutations absent from WT strains. (Volkova et al. 2020, Nature Communications)
Lesion segregation
Also in 2020, Aitken et al.2 mutagenized over 200 mice with a potent DNA damage agent called DEN. By analyzing the specific nucleotide changes (e.g., A\(\rightarrow\)T vs. T\(\rightarrow\)A) created by single-nucleotide variants in these mutagenized mice, the authors observed a phenomenon they termed “lesion segregation.” “Lesion segregation” occurs when mutagenic lesions persist for at least one cell division following an initial burst of mutagen exposure. In that and many follow-up papers3, Aitken and colleagues discovered that persistent DNA lesions are engines of multi-allelic variation (see Figure 2).
The C. elegans germline as a mutation “bottleneck”
The experimental strategy in Volkova et al. (2020) lets us make a few predictions about the kinds of mutations we’ll observe in sequenced nematodes.
Because a single F1 – the “child” of a mutagenized P0 – is used to initiate the clonal population of worms used for sequencing, we only expect to observe mutations derived from lesions that were present in the progenitors of a single P0 sperm cell and a single P0 egg cell.
As it’s more succintly described in the Methods section of Volkova et al. (2020):
The zygotes which lead to the F1 generation provide a single cell bottleneck where mutations of exposed male and female germ cells are fixed before being clonally amplified during C. elegans development and passed on to the next generation in a Mendelian ratio.
Results
Widespread evidence of multi-allelic mutations in mutagenized worms
I re-analyzed sequencing data from Volkova et al. (2020) and found compelling evidence for multi-allelic mutations in many mutagenized strains.
As an example, take a look at the single-nucleotide variants in Figure 3.
What might explain the presence of multiallelic variants?
Initially, I was surprised to see such robust evidence for multi-allelic mutations in these mutagenized worms. Because each sequenced population is derived from a single F1 animal, we should only observe mutations that were present in a single sperm and egg cell from a single mutagenized P0. And because sperm and egg cells are haploid, each gamete can only harbor a single allele — the zygote they form will therefore possess a maximum of two unique alleles.
As far as I can tell, there are a few possible explanations for these multi-allelic variants:
Bioinformatic artifacts (Note 1) are the most probable (and most disappointing) source of multi-allelic mutations. As I show below, however, I don’t think they are the cause.
Independent mutations/lesions at the same nucleotide (Note 2) seems extremely unlikely given the infinite sites assumption, and given the fact we observe so many multi-allelic mutations across independent strains.
Very low probability of multi-allelic mutations occurring by chance
As detailed in Note 1, bioinformatic artifacts may partly explain the abundance of multi-allelic mutations. How can we rule out the possibility of these artifacts?
First, we can use a simple permutation test to demonstrate that multi-allelic mutations are more frequent than we’d expect by chance. As described in Aitken et al. (2020), I simply permute the sample labels associated with every mutation identified in this dataset. Then, I collate the read evidence at each site (using a randomly permuted sample’s reads instead of the true sample’s reads) and ask if there is high-quality support for multiple alleles. Since a random sample is highly unlikely to harbor evidence for a mutation observed in a completely independent sample (i.e., its genotype should be HOM_REF), I consider these “multi-allelic” variants to be mutations with support for 2+ alleles.
Overall, the observed evidence for multi-allelic mutations is much greater than we’d expect by chance (Figure 4).
“Real” multi-allelic mutations have much greater read support than permuted expectation
Using the same permutation strategy as described above, I also calculated the amount of reads supporting the “extra” allele at each multi-allelic SNV. As expected, the “real” multi-allelic mutations are supported by many more reads than the permuted sites (Figure 5)
Multi-allelic mutations don’t occur in lower-complexity sequence
Another possible explanation for these multi-allelic mutations is that tools like BWA-MEM struggle to align reads in low-complexity, repetitive sequence, introducing base mismatches in the process. To check if this might be the case, I calculated the entropy of the sequence context surrounding every detected SNV in the dataset. Overall, the sequence context surrounding multi-allelic mutations is no less complex than the context surrounding bi-allelic mutations (Figure 6).
Could lesions be inherited from P0 to F1?
Let’s walk through how the explanation in Note 3 might happen. To do so, let’s first imagine how a mutagenized germ cell progenitor (i.e., a mitotic germ cell in the C. elegans gonad) might undergo both mitosis and meiosis to produce a haploid gamete (Figure 7).
Critically, if a haploid gamete harbors a lesion-containing strand, that lesion might end up in a fertilized zygote. And if it’s present in a fertilized zygote, there’s a chance it would end up in the P4 cell that gives rise to all germ cells in the F1 animal (see Figure 8).
Because chromatids independently segregate during cell division, we can easily calculate the probability that the lesion-containing strand will be present in the \(P_4\) germline progenitor cell (conditional on the lesion persisting for those cell divisions). If the zygote only possesses one lesion-containing strand (i.e., only the sperm or egg had a persistent lesion), the probability that the lesion persists until \(P_4\) is simply \(0.5^4 = 6.25\%\).
What if both diploid chromosomes carry one lesion-containing strand? Then, the probability that \(P_1\) contains at least one of those lesions is \(1 - 0.5^2 = 75\%\). The probability that \(P_4\) contains at least one lesion is \(0.75^4 = 42.2\%\).
Precedence for inherited lesions causing mutations in C. elegans
This would not be the first evidence for persistent DNA lesions across generations in C. elegans. Recently, Wang et al. (2023)4 demonstrated that paternal exposure to ionizing radiation led to embryonic lethality in later generations of worms.
We show that paternal exposure to ionizing radiation results in genome instability in the F1 generation and transgenerational embryonic lethality. We determined that the paternal DNA damage is mainly repaired in the zygote through maternally provided error-prone polymerase theta-mediated end joining (TMEJ), which results in chromosomal aberrations. (Wang et al. 2023)
However, my re-analysis lets us compare the incidence of lesion persistence across mutagen exposure and genetic backgrounds.
