Jun 14 – 16, 2023
AlbaNova Main Building
Europe/Stockholm timezone

Machine learning based compression for scientific data

Not scheduled
15m
Oskar Klein Auditorium FR4 (AlbaNova Main Building)

Oskar Klein Auditorium FR4

AlbaNova Main Building

Roslagstullsbacken 21, 114 21 Stockholm
Poster Sektionen för elementarpartikel och astropartikelfysik Sektionen för elementarpartikel och astropartikelfysik

Speakers

Alexander Ekman (Lund University (SE)) Axel Gallén (Lund University (SE))

Description

One common issue in vastly different fields of research and industry is the ever-increasing need for more data storage. With experiments taking more complex data at higher rates, the data recorded is quickly outgrowing the storage capabilities. This issue is very prominent in LHC experiments such as ATLAS where in five years the resources needed are expected to be many times larger than the storage available (assuming a flat budget model and current technology trends) [1]. Since the data formats used are already highly compressed, storage constraints could require more drastic measures such as lossy compression, where some data accuracy is lost during the compression process.

In our work, following from a number of undergraduate projects [2,3,4,5,6,7], we have developed an interdisciplinary open-source tool for machine learning-based lossy compression. The tool utilizes an autoencoder neural network, which is trained to compress and decompress data based on correlations between the different variables in the dataset. The process is lossy, meaning that the original data values and distributions cannot be reconstructed precisely. However, for certain variables and observables where the precision loss is tolerable, the high compression ratio allows for more data to be stored yielding greater statistical power.

The tool we have developed is called Baler and is available as an open source project [8][9].

[1] - https://cerncourier.com/a/time-to-adapt-for-big-data/
[2] - http://lup.lub.lu.se/student-papers/record/9049610
[3] - http://lup.lub.lu.se/student-papers/record/9012882
[4] - http://lup.lub.lu.se/student-papers/record/9004751
[5] - http://lup.lub.lu.se/student-papers/record/9075881
[6] - https://zenodo.org/record/5482611#.Y3Yysy2l3Jz
[7] - https://zenodo.org/record/4012511#.Y3Yyny2l3Jz
[8] - https://zenodo.org/record/7817467#.ZED-65FBzmE
[9] - https://github.com/baler-collaboration/baler

Primary authors

Alexander Ekman (Lund University (SE)) Axel Gallén (Lund University (SE))

Presentation materials

There are no materials yet.