Reads with excellent hits against a single taxon are being placed in the "Not assigned" node

I have another question. Given the following result:

imagen

Shouldn’t MEGAN classify it as Eukaryota according to LCA (I’ve set it at top percent 10)? Because it’s assigning it to Vigna unguiculata (Viridiplantae).

I realised what’s going on: I checked this read with Inspector and, for some reason, MEGAN considers it has only 1 match (instead of 5)… this is why it’s being assigned to Vigna unguiculata.
The same thing is happening to more than 130.000 reads that were assigned by MEGAN to this same taxonomic group (Viridiplantae), because MEGAN considers they have only 1 hit when they actually have 5.
Nevertheless, MEGAN is not doing this with other reads from the same sample… Very strange…
Do you know what the problem might be and how I can solve it?

I would need more info to answer this. It would be easiest if you could give me a small example file and I will take a look at what is going on.

I’m attaching small test files for 15 of those reads:
fasta,
blastn result,
and the rma6 file (I processed them with megan to see if the same thing happened with a small file, and it does: megan only recognises 1 of the 5 hits for each of these reads).
test_15vigna.fasta (9.4 KB)
test_15vigna_blastn.txt (219.3 KB)
test_15vigna_blastn_Oct2019db.rma6 (11.9 KB)

That you very much for sharing the example. This has uncovered a bug in the processing of blast alignments that occurs when the same query has multiple alignments to the same subject in short read mode. The parser is supposed to skip the other alignments to the same reference, instead, it incorrectly skips the all other alignments for the given query, including to other references.
I will fix this and post a new version later this week.

Now assignment works as intended:

Fantastic that you were able to identify the bug! Thank you for letting me know.

Please update to version V6_19_5, which contains the fix

Will do so! Can´t wait! Thank you for advising me. Best!

I installed v6_19_5 and processed the whole sample with it. For some reason the number of reads in the root is much smaller (33961 when it should be 285185). I’ve tried it both in Windows and Linux and the same thing happens.

This is a screenshot of the result with v6_19_5:

This is a screenshot of the result with v6_18_5:

Another thing that is happening with v6_19_5 is that in some cases “extra” hits appear. The blastn search for this sample was set to retrieve a maximum of 5 hits but, as you can see below as an example, HJK3X6U01A26AN appears to have 10 hits appear (of which only the first 5 are the real ones):

In the image you sent previously I also noticed this, but I thought that maybe you’d blasted those 15 sequences again and set other parameters, and so I didn’t mention anything.

Sorry Chris, thank you for pointing out that this completely broken. I have removed 6.19.5 from the download website. I have fixed the bug and am currently uploading a new release 6.19.6.

Great! Thank you! I’ll try it and let you know.

I tried it and now the number of reads in the root is correct and the number of hits assigned to the reads is also correct :).

But I found a couple of other issues:

  1. There are still quite a few miss-assignments to Vigna unguiculata because even though now the number of matches for each read is correct, some appear with a question mark “?” (as far as I could see, those that correspond to Lepidoptera) and for this reason are not taken into account.
    Here is a partial image; all the “?” I checked mostly corresponded to Spodoptera frugiperda, but also Mamestra configurata, Bombyx mori and Trichoplusia ni:

Again, this only happens when Vigna unguiculata is involved, the rest of the species I checked were correctly assigned (no question marks in the hits).

  1. Many reads in which all the hits correspond to Lepidoptera are now “not assigned” (again, all the “?” I checked mostly corresponded to Spodoptera frugiperda):

I’ve tried with nucl_acc2tax-Jul2019.abin, megan-nucl-Oct2019.db and megan-nucl-map-Jul2020.db, and the same thing happens, even though the number of reads assigned to “viridiplantae” or “not assigned” varies slightly.

Unfortunately, in your images I could only see one actual accession, NJHR01000047. Looking this one up on the NCBI website (https://www.ncbi.nlm.nih.gov/nuccore/NJHR01000047) establishes that it is a scaffold accession. MEGAN’s current DNA mapping files contain genome level sequences only, not necessary all scaffolds.

I will look into providing a more complete mapping database.

BTW, you can easily query the mapping db files using sqlite3, for example, this is how I established that accession A00001 is mapped to a taxon (namely 10641), whereas NJHR01000047 does not have a mapping (which is indicated by a ? in MEGAN):

sqlite3 megan-nucl-Oct2019.db
SQLite version 3.26.0 2018-12-01 12:34:55
Enter “.help” for usage hints.
sqlite> select Taxonomy from mappings where Accession=‘A00001’;
10641
sqlite> select Taxonomy from mappings where Accession=‘NJHR01000047’;
sqlite>

You’re absolutely right, I checked various other accessions and found they’re not in the mapping db, which is why Megan can’t classify them, as you said.
The great thing is Megan is working perfectly! Thank you so much for all your efforts and help!

Datenschutzerklärung