Raptor
A fast and space-efficient pre-filter
|
Optionally preprocesses files for the use with raptor layout
and raptor build
.
Can continue where it left off after a crash or in multiple runs.
When to use:
The input file contains paths to the sequence data. Each line may contain multiple paths (separated by a whitespace).
Supported file extensions are (possibly followed by bz2, gz, or bgzf):
• embl
• fasta
• fa
• fna
• ffn
• faa
• frn
• fas
• fastq
• fq
• genbank
• gb
• gbk
• sam
A path to the output directory. The directory will be created if it does not exist.
Will create a minimiser.list
inside the output directory. This file contains a list of generated minimiser files, in the same order as the input. This file can be used as input for raptor layout
or raptor build
.
Created output files for each input file:
*.header
: Contains the shape, window size, cutoff and minimiser count.*.minimiser
: Contains binary minimiser values, one minimiser per line.*.in_progress
: Temporary file to track process. Deleted after finishing computation.raptor layout
and raptor build
and cannot be overwritten there.raptor prepare
aborts unexpectedly, you can rerun the same command. Files that have already preprocessed will be skipped..in_progress
file, also delete the corresponding .header
and .minimiser
file.The number of threads to use. Multiple files will be handled in parallel. While more threads speed up the preprocessing, the RAM usage also increases.
raptor prepare
fails due to RAM restrictions.By default, runtime and memory statistics are printed to stderr at the end.
This flag disables this behaviour.
See Choosing window and k-mer size.
raptor build
and hence should be chosen carefully. The k-mer size cannot be changed afterwards.See Choosing window and k-mer size.
raptor build
and hence should be chosen carefully. The window size cannot be changed afterwards.Only store k-mers with at least (>=) x occurrences.
Apply cutoffs from Mantis(Pandey et al., 2018).
File size | Cutoff |
---|---|
≤ 300 MiB | 1 |
≤ 500 MiB | 3 |
≤ 1 GiB | 10 |
≤ 3 GiB | 20 |
> 3 GiB | 50 |
File sizes are based of gzipped FASTQ files. Compression reduces the file size by around factor 3
. FASTA files are approximately 2
times smaller than FASTQ.