hrvatski jezikClear Cookie - decide language by browser settings

SFQ: Constructing and Querying a Succinct Representation of FASTQ Files

Bakarić, Robert; Korenčić, Damir; Hršak, Dalibor; Ristov, Strahil (2022) SFQ: Constructing and Querying a Succinct Representation of FASTQ Files. Electronics, 11 (11). ISSN 2079-9292

[img]
Preview
PDF - Published Version - article
Available under License Creative Commons Attribution.

Download (315kB) | Preview

Abstract

A large and ever increasing quantity of high throughput sequencing (HTS) data is stored in FASTQ files. Various methods for data compression are used to mitigate the storage and transmission costs, from the still prevalent general purpose Gzip to state-of-the-art specialized methods. However, all of the existing methods for FASTQ file compression require the decompression stage before the HTS data can be used. This is particularly costly with the random access to specific records in FASTQ files. We propose the sFASTQ format, a succinct representation of FASTQ files that can be used without decompression (i.e., the records can be retrieved and listed online), and that supports random access to individual records. The sFASTQ format can be searched on the disk, which eliminates the need for any additional memory resources. The searchable sFASTQ archive is of comparable size to the corresponding Gzip file. sFASTQ format outputs (interleaved) FASTQ records to the STDOUT stream. We provide SFQ, a software for the construction and usage of the sFASTQ format that supports variable length reads, pairing of records, and both lossless and lossy compression of quality scores.

Item Type: Article
Uncontrolled Keywords: bioinformatics ; FASTQ data compression ; random access
Subjects: NATURAL SCIENCES > Biology
TECHNICAL SCIENCES > Computing
TECHNICAL SCIENCES > Basic Technical Sciences
Divisions: Division of Electronics
Projects:
Project titleProject leaderProject codeProject type
Napredni deterministički i hibridni algoritmi na nizovima, sljedovima i stablima s primjenama u tehničkim znanostima i znanostima o životuRistov, StrahilIP-2018-01-7317HRZZ
Primjena NGS metoda u procjeni genomske varijabilnosti preživačaČubrić Čurik, VlatkaIP-2018-01-8708HRZZ
Napredne metode i tehnologije u znanosti o podatcima i kooperativnim sustavima-DATACROSSSven Lončarić; Ivan Petrović; Tomislav Šmuc; Andrej JokićKK.01.1.1.01.009EK
Depositing User: Strahil Ristov
Date Deposited: 16 Aug 2022 11:41
URI: http://fulir.irb.hr/id/eprint/7508
DOI: 10.3390/electronics11111783

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year

Contrast
Increase Font
Decrease Font
Dyslexic Font
Accessibility