readName_to_taxonRank


#1

Hi,

Could you maybe include this format in the export option in another update?

Right now there is readName to taxId, taxonName and taxonPath which would give a similar output. I want to compare the MEGAN output with the result of other pipelines which have the following format - taxonRank, taxId, count.
I can do the direct comparison with just the taxId, but I am missing the connection to the taxonRank to format it into something more human-readable.

Or can you point me to the database (file) which contains taxonRank and taxId relationships, then I could just parse it out myself?

Unfortunately readName_to_taxonName is not specific enough because several ranks can have the same name and they all get collapsed to a single entry.

Thanks!


#2

Did you look at taxonPathPercent?
This reports the taxon path and each taxon is prefixed with e.g. c__ for class, g__ for genus etc.
You could remove all but the last taxon and remove the percent and then replace g__ by Genus etc to get the desired output. If that doesn’t work for you, I will look into adding your desired format
D


#3

Wait, I only had readName-taxonPath, let me check taxonPathPercent…
Ok, taxonPath has still the same format without any prefixes.

Yup I checked it out but I didn’t see any prefix for the rank.

e.g.
“root;cellular organisms;Bacteria;FCB group;Bacteroidetes/Chlorobi group;Bacteroidetes;Flavobacteriia;Flavobacteriales;Flavobacteriaceae;Elizabethkingia;Elizabethkingia anophelis;”
“root;cellular organisms;Bacteria;Proteobacteria;Betaproteobacteria;Burkholderiales;Burkholderiaceae;Cupriavidus;Cupriavidus taiwanensis;”

I also thought about trying to extract the information from the path but not all paths contain the same number of taxonomic levels because some group names simply have no rank, like in this example, so I could define e.g. the 5th name after “root” as a class, which works on the second line but doesn’t apply to the first.


#4

Dino, using

Choose data to export: readName_to_taxonPathPercent

the output should look like this:

100;g__Nitrosomonas; 100;s__Nitrosomonas eutropha; 100;
2946744.1 d__Bacteria; 100;p__Proteobacteria; 100;c__Betaproteobacteria; 100;o__Nitrosomonadales; 100;f__Nitrosomonadaceae; 100;g__Nitrosomonas; 100;s__Nitrosomonas eutropha; 100;
2961762.1 d__Bacteria; 100;p__Proteobacteria; 100;c__Betaproteobacteria; 100;o__Nitrosomonadales; 100;f__Nitrosomonadaceae; 100;g__Nitrosomonas; 100;s__Nitrosomonas eutropha; 100;
2961895.1 d__Bacteria; 100;p__Proteobacteria; 100;c__Betaproteobacteria; 100;o__Nitrosomonadales; 100;f__Nitrosomonadaceae; 100;g__Nitrosomonas; 100;s__Nitrosomonas eutropha; 100;

Perhaps you are using an old version of MEGAN? This update to the output format happened earlier this year.


#5

Also, consider using taxon-ids, as they are unique.
You can look up the name and rank of a taxon id in a file contained in data.jar: resources/files/ncbi.map


#6

version 6.6.0
Ah, alright, I looked at taxonPath_to_percent instead of readName_to_taxonPathPercent. Yes, this has the prefixes!

Thanks for the ncbi.map location, this should make it so much easier.