Latest Megan version blastx XML import failed (LR mode)

Hi Daniel,
I suspect some bugs in the blast import module in SR & LR mode, inducing zero output to the tree. Blastx XML input must be OK with 1,808 alignments, contains all the standard information. I’ve also checked older version 6.17.0, built 7 Aug 2019 with the separated taxonomy/functional DBs, without any problems!

Here’s the log:

Executing: show window=ImportBlast;
Executing: import blastFile=‘E:\4201.xml’ fastaFile=‘E:\4201.fa’ meganFile=‘E:\4201-1.rma6’ useCompression=false format=BlastXML mode=BlastX minScore=50.0 maxExpected=0.01 minPercentIdentity=0.0 topPercent=10.0 minSupportPercent=0.0 minSupport=1 lcaAlgorithm=longReads lcaCoveragePercent=80.0 minPercentReadToCover=0.0 minPercentReferenceToCover=0.0 minComplexity=0.0 useIdentityFilter=false readAssignmentMode=readCount fNames= longReads=true;
Executing: ‘import’‘blastFile’’=’‘E:\4201.xml’‘fastaFile’’=’‘E:\4201.fa’‘meganFile’’=’‘E:\4201-1.rma6’‘useCompression’’=’‘false’‘format’’=’‘BlastXML’‘mode’’=’‘BlastX’‘minScore’’=’‘50.0’‘maxExpected’’=’‘0.01’‘minPercentIdentity’’=’‘0.0’‘topPercent’’=’‘10.0’‘minSupportPercent’’=’‘0.0’‘minSupport’’=’‘1’‘lcaAlgorithm’’=’‘longReads’‘lcaCoveragePercent’’=’‘80.0’‘minPercentReadToCover’’=’‘0.0’‘minPercentReferenceToCover’’=’‘0.0’‘minComplexity’’=’‘0.0’‘useIdentityFilter’’=’‘false’‘readAssignmentMode’’=’‘readCount’‘fNames’’=’‘longReads’’=’‘true’;
Classifications: Taxonomy
Annotating RMA6 file using FAST mode (accession database and first accession per line)
Parsing file: E:\4201.xml
Total reads: 2,844
Alignments: 1,808
Initializing binning…
Using ‘Interval-Union-LCA’ algorithm (80.0 %) for binning: Taxonomy
Binning reads…
Total reads: 2,844
With hits: 480
Alignments: 1,808
Assig. Taxonomy: 0
Min-supp. changes: 0
Numb. Tax. classes: 2
Class. Taxonomy: 2
Info: Command completed (15s): ‘import’‘blastFile’’=’‘E:\4201.xml’‘fastaFile’’=’‘E:\4201.fa’'m…
Induced tree has 3 of 2,175,510 nodes
Induced tree has 3 of 2,175,510 nodes

Thank you: Balázs

Dear Balázs,

could you please send me a small example file and I will look into this.
Daniel

Dear Daniel,

I’ve sent that file via e-mail!
Thank you & kind regards!

Balázs

I can’t find it in my inbox, did you use daniel.huson@uni-tuebingen.de?

Dear Daniel,

sorry, I sent it to the general megan@… box, so I’ve resent it directly to you, plz. check it again.

Thank you:

Balázs

Sorry for the delay, I have taken a look at your file.

This is what an alignment looks like:

Is this the result of alignment against the NCBI-nr database? If it is not (and it doesn’t look like it is), then that would explain why MEGAN can’t map the reference sequences to taxa or functions.

Please consider using alignment against NCBI-nr (or a subset of that database). Perhaps use DIAMOND and output format 100.

Hi Daniel,

I’m using the official NCBI blastx with the subset of NR database to produce standard XML blast output format. (As you can see in the attachment). When I import that XML (and the query de novo contigs as “reads”) to the latest Megan versions with the unified “megan-map-Oct2019.db” mapping file the output is empty. All of the contigs go to the no hits/not assigned bins.
It is necessary using blastx (and other remote homology searching tools with blast compatible XML outputs) because we’re digging for dark matter and the diamond aligner not designed for this.
As I wrote, earlier versions of Megan (eg. 6.17.0, built 7 Aug 2019 with the separated taxonomy/functional DBs) performs perfectly without any problems. With that older version I can import XML/fasta contigs smoothly and the taxonomical binning is pretty nice (as you can see it on the figure below):

Thanks for any idea & help!

Bests: Balázs

Thank you for being so persistent… I have finally identified the bug and will upload a new release later today in which parsing of XML files and the use of a mapping DB file should play nicely together.
(The problem was that my mapping-db based parser only looks at the first word in a header line and I wasn’t putting the HitDef record at the beginning of the header line.)

Thank you very much! It’ll be an enormous help for us to keeping up-to-date and efficient!

Best regards: B

Let me know whether the new version 6_18_6 does indeed fix the problem.

Problem solved! Latest version works preety nice! Thanx!

Datenschutzerklärung