Recently, I've tried using a newer version of MEGAN6 Community Edition (MEGAN 6.7.6) to parse the BlastTAB output from Diamond's BlastX mode, using NCBI nr from February 2017.
For one of the data sets, taxonomy algorithm malfunctioned with no error. The reported number of reads corresponded to the number of unique reads in the BlastTAB ouptput (= reads with hits), but the sum of all assigned reads and the unassigned reads was not even close to that amount. A problem must have occurred during the Naive LCA algorithm, as the "Total reads" and "Alignments" after the parsing phase drastically exceeded the "Total reads" and "Alignments" after the Naive LCA algorithm phase, as stated in the Messages box.
After a specific point, reads in the file got dropped and were not represented anywhere in the taxonomy tree. The specific point is reproducible and is included within the attached segment of the BlastX/Diamond output. The last properly parsed read is 63H9Q:00896:01124. The same file could be parsed properly with MEGAN 6.5.8. Additional details about the parameters and conditions of parsing are included below. Do you have any ideas about the problem?
MEGAN_taxonomyproblem_40000lines.tabular (2.8 MB)
The parameters were:
- MinScore 100,
- MaxExpected 0.01,
- Min Percent Identity 0,
- Top Percent 10,
- Min Support Percent 0 (off),
- Min Support 313,
- Naive LCA algorithm
The database used for Diamond was newer than Nov2016 and I used the prot_acc2tax-Nov2016 (didn't bother MEGAN 6.5.8).