Inherited DNA lesions cause multiallelic mutations in nematodes

Published

March 27, 2026

Abstract

By reanalyzing sequencing data from a large-scale C. elegans mutagenesis experiment, I’ve found evidence of multiallelic variants in many worms. Given the experimental design used in the original paper, I believe the most parsimonious explanation for these multiallelic variants is that parental DNA lesions are inherited by F1 animals, and later serve as templates for multiple rounds of DNA replication during F1 gametogenesis. Because every sequenced population comprises the offspring of a single F1, mosaicism in the F1’s gametes will generate multi-allelic variants in the sequencing reads.

Background

C. elegans mutagenesis

In 2020, Volkova et al.1 mutagenized C. elegans strains with about a dozen unique DNA damaging agents. Many of these strains harbored homozygous loss-of-function (LOF) alleles in genes related to DNA replication and repair, including translesion synthesis (polk-1), nucleotide excision repair (xpa-1, xpc-1, and xpf-1), and so on. The authors discovered that mutagens leave behind characteristic patterns of single-base substitutions, insertions/deletions, and larger structural variants; we refer to these patterns as “mutational signatures.” They also found that mutation rates and signatures often depend on genetic background (see Figure 1).

Figure 1: Mutation signatures depend on genetic background in C. elegans. Mutation signatures caused by methyl methanesulfonate (MMS) mutagenesis in C. elegans strains. Of note, the wild-type MMS mutation signature is significantly different from the MMS mutation signature in agt-1 alkyltransferase LOF mutants. agt-1 mutants cannot remove rare \(O^6\)-methylguanine lesions caused by MMS, resulting in an excess of C\(\rightarrow\)T mutations absent from WT strains. (Volkova et al. 2020, Nature Communications)

Lesion segregation

Also in 2020, Aitken et al.2 mutagenized over 200 mice with a potent DNA damage agent called DEN. By analyzing the specific nucleotide changes (e.g., A\(\rightarrow\)T vs. T\(\rightarrow\)A) created by single-nucleotide variants in these mutagenized mice, the authors observed a phenomenon they termed “lesion segregation.” “Lesion segregation” occurs when mutagenic lesions persist for at least one cell division following an initial burst of mutagen exposure. In that and many follow-up papers3, Aitken and colleagues discovered that persistent DNA lesions are engines of multi-allelic variation (see Figure 2).

Figure 2: Persistent DNA lesions are potent engines of multi-allelic mutations. DEN-induced lesions are shown as red triangles. As described in Anderson et al. 2024 (Nature): “As DNA lesions…can persist for multiple cell cycles, each round of replication could incorporate a different incorrectly paired nucleotide opposite a persistent lesion. Lesions rapidly removed by NER persist for fewer cell cycles, generating less multiallelic variation.”

The C. elegans germline as a mutation “bottleneck”

The experimental strategy in Volkova et al. (2020) lets us make a few predictions about the kinds of mutations we’ll observe in sequenced nematodes.

  1. Mutagenize P0 worms.
    • P0s were mutagenized at different life stages depending on the mutagen of interest. For now, we’ll focus on alkylating agents (EMS, MMS, DMS) and bulky adducts (Aflatoxin-B1, Aristocholic acid), which were applied to young adult (YA) worms.
  2. Transfer three mutagenized P0s to a single plate, let them lay eggs (which will develop into F1s), and remove adult P0s.
  3. Transfer two L4 F1s to two new plates (one per plate) and allow to proliferate.
  4. Choose a single expanded F1 population for sequencing.

Because a single F1 – the “child” of a mutagenized P0 – is used to initiate the clonal population of worms used for sequencing, we only expect to observe mutations derived from lesions that were present in the progenitors of a single P0 sperm cell and a single P0 egg cell.

As it’s more succintly described in the Methods section of Volkova et al. (2020):

The zygotes which lead to the F1 generation provide a single cell bottleneck where mutations of exposed male and female germ cells are fixed before being clonally amplified during C. elegans development and passed on to the next generation in a Mendelian ratio.

Results

Widespread evidence of multi-allelic mutations in mutagenized worms

I re-analyzed sequencing data from Volkova et al. (2020) and found compelling evidence for multi-allelic mutations in many mutagenized strains.

As an example, take a look at the single-nucleotide variants in Figure 3.

(a)
(b)
(c)
(d)
Figure 3: Multiallelic mutation evidence in mutagenized wild-type worms. Each aligned sequencing read is represented by a light or dark grey line. Reads are colored according to their orientation with respect to the reference (light gray = forward, dark grey = reverse). At a focal position in the center of the image, the nucleotide present in each read is shown as a colored point. The top subplot shows read evidence in a control sample (non-mutagenized) and the bottom subplot shows read evidence in the mutagenized sample. Only reads with MQ=60 and bases with BQ>=20 are shown. Supplementary, secondary, and improperly paired reads are removed.

What might explain the presence of multiallelic variants?

Initially, I was surprised to see such robust evidence for multi-allelic mutations in these mutagenized worms. Because each sequenced population is derived from a single F1 animal, we should only observe mutations that were present in a single sperm and egg cell from a single mutagenized P0. And because sperm and egg cells are haploid, each gamete can only harbor a single allele — the zygote they form will therefore possess a maximum of two unique alleles.

As far as I can tell, there are a few possible explanations for these multi-allelic variants:

Note 1: Bioinformatic artifacts

Multi-allelic mutations are an artifact caused by poor read alignment, collapsed segmental duplications, PCR errors, etc.

Note 2: Independent mutations at the same nucleotide

The mutagen created lesions at the same exact nucleotide in a sperm cell and an egg cell in a mutagenized P0. One lesion led to the incorporation of one “incorrect” nucleotide, and the other lesion led to the incorporation of another “incorrect” nucleotide. Or, the mutagen created a lesion in one gamete and an independent de novo SNV occured at the exact same nucleotide in the other gamete, or during the somatic development of the F1 worm.

Note 3: Lesion persistence from P0 gamete to F1 zygote

The mutagen created a lesion in either the sperm or egg cell in a P0. The lesion was not repaired prior to meiosis II, and the lesion-containing strand was inherited by an F1 animal. The lesion persisted through multiple rounds of cell division as the F1 developed, and ultimately served as a template for multiple rounds of DNA replication during the development of the F1’s germ cell pool. In those replication events, the lesion led to the mis-incorporation of 2+ unique nucleotides.

Bioinformatic artifacts (Note 1) are the most probable (and most disappointing) source of multi-allelic mutations. As I show below, however, I don’t think they are the cause.

Independent mutations/lesions at the same nucleotide (Note 2) seems extremely unlikely given the infinite sites assumption, and given the fact we observe so many multi-allelic mutations across independent strains.

Very low probability of multi-allelic mutations occurring by chance

As detailed in Note 1, bioinformatic artifacts may partly explain the abundance of multi-allelic mutations. How can we rule out the possibility of these artifacts?

First, we can use a simple permutation test to demonstrate that multi-allelic mutations are more frequent than we’d expect by chance. As described in Aitken et al. (2020), I simply permute the sample labels associated with every mutation identified in this dataset. Then, I collate the read evidence at each site (using a randomly permuted sample’s reads instead of the true sample’s reads) and ask if there is high-quality support for multiple alleles. Since a random sample is highly unlikely to harbor evidence for a mutation observed in a completely independent sample (i.e., its genotype should be HOM_REF), I consider these “multi-allelic” variants to be mutations with support for 2+ alleles.

Overall, the observed evidence for multi-allelic mutations is much greater than we’d expect by chance (Figure 4).

Figure 4: Support for multi-allelic mutations is greater than we’d expect by chance. For every sample, I calculated the number of de novo SNVs at which we observed evidence for 3+ alleles (shown as blue dots). In each of 25 trials, I randomly permuted the sample labels associated with each SNV, and calculated the number of SNVs at which I observed evidence for 2+ alleles. Here, I plot the total number of observed multi-allelic mutations across the dataset, using either the “true” (empirical) data or the permuted data.

“Real” multi-allelic mutations have much greater read support than permuted expectation

Using the same permutation strategy as described above, I also calculated the amount of reads supporting the “extra” allele at each multi-allelic SNV. As expected, the “real” multi-allelic mutations are supported by many more reads than the permuted sites (Figure 5)

Figure 5: Real multi-allelic mutations are supported by more reads. At each multi-allelic mutation, I calculated the number of reads supporting the “third” allele. In each of the permuted datasets generated above, I calculated the number of reads supporting the “second” allele.

Multi-allelic mutations don’t occur in lower-complexity sequence

Another possible explanation for these multi-allelic mutations is that tools like BWA-MEM struggle to align reads in low-complexity, repetitive sequence, introducing base mismatches in the process. To check if this might be the case, I calculated the entropy of the sequence context surrounding every detected SNV in the dataset. Overall, the sequence context surrounding multi-allelic mutations is no less complex than the context surrounding bi-allelic mutations (Figure 6).

Figure 6: Multi-allelic mutations don’t occur at low-complexity sequence. I calculated the sequence entropy (lower values indicating more repetitive sequence) of the 15 bases surrounding every SNV with multi-allelic support and averaged the entropy across those SNVs. In each of 100 trials, I then randomly sampled the same number of bi-allelic SNVs and calculated their average sequence entropy.

Could lesions be inherited from P0 to F1?

Let’s walk through how the explanation in Note 3 might happen. To do so, let’s first imagine how a mutagenized germ cell progenitor (i.e., a mitotic germ cell in the C. elegans gonad) might undergo both mitosis and meiosis to produce a haploid gamete (Figure 7).

Figure 7: The fate of a mutagenized germ cell progenitor. Imagine that we’ve mutagenized a mitotic germ cell in a young adult (YA) C. elegans animal. Mutagenic lesions (red triangles) are present on both the forward and reverse strands of each diploid chromosome. Let’s assume this germ cell undergoes one mitotic division before meiosis. Two of the lesions are efficiently removed by nucleotide excision repair (or another pathway) prior to replication, but error-prone DNA polymerases incorporate incorrect nucleotides (blue and orange circles) opposite the other lesions, creating “lesion-mutation duplexes”. Because those lesions are not repaired, they are still present in the daughter cells produced by this round of mitosis (note that only one possible daughter is shown). During meiosis, the DNA is replicated once again. Mis-incorporated bases are copied to create “fully-resolved,” double-stranded mutations, but a single lesion persists and may end up in a haploid gamete.

Critically, if a haploid gamete harbors a lesion-containing strand, that lesion might end up in a fertilized zygote. And if it’s present in a fertilized zygote, there’s a chance it would end up in the P4 cell that gives rise to all germ cells in the F1 animal (see Figure 8).

Figure 8: Lesion segregation during early post-zygotic development. Here, we trace the fate of a single chromosome with a lesion-containing strand during early C. elegans development. Assuming the lesion is not repaired, there is a 50% chance the lesion-containing strand ends up in the \(P_1\) cell after the first post-zygotic cell division. If that lesion persists for three more cell divisions, and segregates into the “right” daughter cell each time, it will eventually end up in the \(P_4\) cell that gives rise to all germ cells in the adult worm. And the longer the lesion persists, the more likely it is that another incorrect nucleotide (shown as a green circle) will be mis-incorporated opposite the lesion. If that happens, the pool of gametes produced by this worm will be mosaic for both the green and blue mutations.

Because chromatids independently segregate during cell division, we can easily calculate the probability that the lesion-containing strand will be present in the \(P_4\) germline progenitor cell (conditional on the lesion persisting for those cell divisions). If the zygote only possesses one lesion-containing strand (i.e., only the sperm or egg had a persistent lesion), the probability that the lesion persists until \(P_4\) is simply \(0.5^4 = 6.25\%\).

What if both diploid chromosomes carry one lesion-containing strand? Then, the probability that \(P_1\) contains at least one of those lesions is \(1 - 0.5^2 = 75\%\). The probability that \(P_4\) contains at least one lesion is \(0.75^4 = 42.2\%\).

Precedence for inherited lesions causing mutations in C. elegans

This would not be the first evidence for persistent DNA lesions across generations in C. elegans. Recently, Wang et al. (2023)4 demonstrated that paternal exposure to ionizing radiation led to embryonic lethality in later generations of worms.

We show that paternal exposure to ionizing radiation results in genome instability in the F1 generation and transgenerational embryonic lethality. We determined that the paternal DNA damage is mainly repaired in the zygote through maternally provided error-prone polymerase theta-mediated end joining (TMEJ), which results in chromosomal aberrations. (Wang et al. 2023)

However, my re-analysis lets us compare the incidence of lesion persistence across mutagen exposure and genetic backgrounds.