MiMyCS: A Processing-in-Memory Read Mapper for Compressing Next-Gen Sequencing Datasets - Université de Rennes 1
Communication Dans Un Congrès Année : 2024

MiMyCS: A Processing-in-Memory Read Mapper for Compressing Next-Gen Sequencing Datasets

Résumé

As Next-Gen sequencing (NGS) technologies keep improving their accuracy and get largely deployed in human health care infrastructures, it is critical to design efficient reference-based compressors that fully leverage the capabilities of modern processors and hardware accelerators. This work proposes MiMyCS: a C++ software to achieve Mapping in Memory for Compressing Short reads. It performs lossless reference-based compression of NGS datasets such as Illumina reads. To this end, MiMyCS computes a non-exhaustive mapping against a reference genome and accelerates this step with the Processing-in-Memory architecture developed by the UPMEM company. Such architecture extends the computational power of a machine by adding dual in-line memory modules on which each memory bank has its own processing unit that runs up to 16 threads. This creates a massively parallel environment, well-fitted to alleviate memory bottlenecks. To reduce the overall amount of sequence comparisons and accelerate further the process, MiMyCS also incorporates a Bloom filters-based dispatcher that predicts against which genome parts reads are most likely to be mapped. We show with real whole human sequencing datasets that MiMyCS is able to achieve a speed-up between 1.2x and 2.7x compared to Genozip, the current leading state-of-the-art compressor, while maintaining a comparable compression ratio and lowering the overall energy consumption. The code of MiMyCS is available at https://gitlab.inria.fr/pim/org.pim.srm.
Fichier principal
Vignette du fichier
BIBM2024_paper.pdf (1.11 Mo) Télécharger le fichier
Origine Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-04821180 , version 1 (05-12-2024)

Licence

Identifiants

  • HAL Id : hal-04821180 , version 1

Citer

Florestan de Moor, Meven Mognol, Charles Deltel, Erwan Drezen, Julien Legriel, et al.. MiMyCS: A Processing-in-Memory Read Mapper for Compressing Next-Gen Sequencing Datasets. BIBM 2024 - IEEE International Conference on Bioinformatics and Biomedicine, Dec 2024, Lisbonne, Portugal. pp.7176. ⟨hal-04821180⟩
0 Consultations
0 Téléchargements

Partager

More