Export of taxonomy exports duplicates


I’m not sure if this is a bug or the expected behaviour.

When I export a taxonomy from an rma6 file to text (readnames to taxonid) I get duplicates. I’m guessing it exports each read at each node it belongs to in the tree instead of only the most specific node. I have tested the command line export tool rma2info (v. 6.9.2; v. 6.10.5 does not export anything, see another bug report of mine) and the GUI (v. 6.10.5). In the GUI case I had the export dialog set to “assigned”.

I’m attaching an excerpt which is the result of grep -C 3 seq00000001.

seq00000001.duplicates.tsv (875 Bytes)


I tried reproducing this, but my read names all come out the same.
Can you check whether the name seq00000001 appears more than once in the MEGAN analysis.
That is, look at the corresponding nodes using the inspector viewer and search for the name. My guess is that it appears more than once.
I also ran rma2info and couldn’t replicate the problem…

You’re right, and I should have seen that reads come out from export not assigned to several levels of the taxonomic hierarchy but at different branches of the hierarchy. seq00000001 was indeed assigned to multiple nodes.

The rma file was created with blast2rma from a sorted LAST maf file. After creating a new rma6 file from the GUI using the same maf file, the export works fine. Moreover, when I try to replicate this using blast2rma from 6.10.5 (I used 6.9.x earlier), the duplicates are gone.

Sorry for this.