Taxonomy Mapping Problems


#1

I use diamond to blast NGS reads against a database of virus ref_seqs - I then process this file to add taxIDs into the blast hits (I wrote something to do that)- and import into MEGAN - which on the whole works great.

However, there are quite a few reads in the Not Assigned bin - even though the hits are in the taxonomy.

But, my main problem is, I have two reads that hit to Bat hepevirus - taxonID 1216472.

https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=1216472

Bat Hepevirus is in the MEGAN taxonomy - I checked when MEGAN loads up initially.

But these two seqs get assigned to the Virus node - not the Bat hepevirus node.

The BLAST hits look like this:

M01569:148:000000000-AH5LR:1:2118:7100:18591/1 gi|400354813|ref|YP_006576507.1|tax|1216472| 78.9 57 12 0 1 171 1390 1446 3.4e-24 109.4 1216472
M01569:148:000000000-AH5LR:1:2118:7100:18591/2 gi|400354813|ref|YP_006576507.1|tax|1216472| 79.7 59 12 0 178 2 1388 1446 2.9e-26 116.3 1216472

Using the Inspector on Viruses to view the seqs gives:
Bat hepevirus; score=116.0

gi|400354813|ref|YP_006576507.1|tax|1216472|
Score = 116, Expect = 3e-26
M01569:148:000000000-AH5LR:1:2118:7100:18591/2 gi|400354813|ref|YP_006576507.1|tax|1216472| 79.7 59 12 0 178 2 1388 1446 2.9e-26 116.3 1216472

Bat hepevirus; score=109.0

gi|400354813|ref|YP_006576507.1|tax|1216472|
Score = 109, Expect = 3e-24
M01569:148:000000000-AH5LR:1:2118:7100:18591/1 gi|400354813|ref|YP_006576507.1|tax|1216472| 78.9 57 12 0 1 171 1390 1446 3.4e-24 109.4 1216472

Any ideas whats wrong? Is the taxon missing from MEGAN taxonomy?

There are relatively few seqs assigned to Viruses node (3 in total, 2 of which are these) - but I also get a load in the Not Assigned which I don’t think should be there.


#2

The taxon is present in the taxonomy. What is minSupport and minSupportPercent set to? Try setting minSupport=1 and minSupportPercent=0


#3

Many thanks - that solved it - minSupport was already set to 1 but minSupportPercent was default 0.01. Have only just started using MEGAN - looks great


#4

I noticed that you added the taxon id as an additional column, as well…
MEGAN 6 doesn’t support that feature and that is why MEGAN doesn’t correctly set the format to BlastTab in the Import Blast dialog. I will update the manual accordingly.


#5

I need to remove that, I put it into the seq header as well with the |tax| keyword, but forgot to remove it from the end, MEGAN seems to work fine and ignore it - but shall remove it to be sure


#6

Hi @rjorton I’m having a similar problem as you, also with blasting NGS reads vs. RefSeq viral proteins (trying to replicate Norman et al., 2015 analyses).

Could you potentially share the script you used to add the taxIDs to the diamond blast hits? Thank you!