Joost van der Lee1*, Merlijn Hulsenboom2 , Reindert Nijland2
1 Wageningen University and Research, The Netherlands; 2
* Corresponding author: joost.vanderlee@wur.nl
Introduction
Sand suppletion, a widely applied coastal management strategy, alters benthic habitats and may significantly impact macrobenthic communities. Monitoring these communities is essential for assessing ecological effects. Traditional methods of biomonitoring include identification through morphological features, which is labour-intensive, time-consuming, and costly. Although less intensive molecular biomonitoring methods are improving in reliability and taxonomic coverage, important ecological parameters such as species abundance remain difficult to infer accurately.
An additional advantage of molecular methods is their ability to capture genetic differences between individual organisms: intraspecific genetic variation. This provides insight into population structure, connectivity, species abundance and potential recolonization dynamics following disturbances. However, typical levels of intraspecific variation (0–5%) overlap with sequencing error rates, making it difficult to distinguish true biological variation from technical artefacts using standard similarity thresholds.
Objective and Methods
The aim is to develop a bioinformatic pipeline that distinguishes true biological variation from sequencing errors in DNA samples containing multiple species. Instead of using a similarity threshold based approach, we focus on small genetic differences (Single Nucleotide Polymorphisms) which are likely to represent true biological variation.
The pipeline makes use of results from the sequence read processing pipeline DECONA (Doorenspleet et al., 2023) and the variant caller FreeBayes (Erik Garrison, 2012), which are aligned sequences and Single Nucleotide Polymorphisms respectively. Sequences are clustered based on shared SNP profiles, with each unique SNP combination defining a distinct genetic variant.
To validate the pipeline, benthic organism samples were collected from the coast of Ameland. DNA was extracted and a 313 base-pair fragment of a standard genetic marker gene (COI) was sequenced using Oxford Nanopore long-read sequencing technology.
The validity of the pipeline was assessed using technical replicates, database alignment for taxonomic verification, and comparison with independently measured species abundances.
Results
Across the species Donax vittatus, Macoma balthica, Magelona johnstoni, Magelona mirabilis, and Spiophanes bombyx, two to ten haplotypes per species were detected. Through replication and database alignment, most haplotypes were found to be true biological variation. For the Mollusca species D. vittatus and M. balthica, species abundance was known before sequencing. The number of haplotypes never exceeded species abundance, with samples containing 1 ~ 5 individuals computing 1 ~ 2 haplotypes.
This study demonstrates that SNP-based approaches can recover intraspecific genetic variation from multi-species bulk DNA samples. Future research should compare SNP-based and abundance-threshold-based methods and expand reference databases to improve accuracy.
References
Doorenspleet, K., Jansen, L., Oosterbroek, S., Kamermans, P., Bos, O., Wurz, E., Murk, A., & Nijland, R. (2023). The long and the short of it: Nanopore based eDNA metabarcoding of marine vertebrates works; sensitivity and specificity depend on amplicon lengths. bioRxiv. https://doi.org/10.1101/2021.11.26.470087
Erik Garrison, G. M. (2012). Haplotype-based variant detection from short-read sequencing. arxiv. https://doi.org/https://doi.org/10.48550/arXiv.1207.3907


