The Last aligner is reportedly many times faster than DIAMOND on small problems, and a number of times faster on very large datasets.
For example, when aligning 60 million DNA reads of length 101 against 110 million reference proteins, Last is just over 3 times as fast as DIAMOND (287 minutes vs 900 minutes wall clock, running on a Linux server with 32 cores and 512GB of memory).
However, there are a couple of issues that make Last currently less suitable for us together with MEGAN:
- it can’t read compressed input files, thus such files have to be uncompressed, then run, then compressed.
Given the number and size of fastq files that most projects involve, this is annoying.
- The output is not sorted by query. For efficiency, when importing a file, MEGAN streams through the file once and assumes that all matches for a given query appear consecutively together. So, if you import data directly from Last, the same read my be counted multiple times. (Perhaps it is possible to change this by setting appropriate command line options?). A tool called maf-sort.sh for sorting MAF file entries exists, but it appears to sort by subject sequence, not by query sequence…
- Last does not provide a compact format for describing the details of alignments. For the above mentioned dataset, in MAF format the Last output file is 420GB in size. In contrast, the DAA file produced by DIAMOND is only 17GB. While it would take MEGAN too long to analyze the output of Last, it takes 500 minutes to perform complete taxonomic and functional analysis and indexing of all reads and alignments in the file produced by DIAMOND.
In summary, Last is faster than DIAMOND (and reportedly more sensitive), but at present does not produce output that can easily be imported into MEGAN.
MEGAN (V6.6.8+) is able to import Last’s MAF format, but with the caveat that instances of the same read occurring in different parts of the input file are treated as different reads.