Meganizer DAA file ArrayIndexOutOfBoundsException Error

Hi.

I have been following the protocol from Arumugam et al. 2019 for frame read correction and am having trouble meganizing my daa files. I am receiving an out of bounds error message and no taxonomy mapping. I’m just wondering if it is a bug or my error.

Thanks for any help you can give on this!

Diamond was run as follows, with a version of the nr database downloaded this week:

diamond blastx --range-culling --top 10 -F 15 --outfmt 100 -c1 -b12 -t /dev/shm -p 44 --query inputfile -d nr --out output_file

I have tried to meganize the resulting daa file using the command line and gui version of the meganizer, using the megan-map-Oct2019.db (unzipped) as the database.

I am hitting an “ArrayIndexOutOfBoundsException” error in with both extended and fast annotation modes.

Logs from GUI as follows:

Fast mode:

Meganizing file: input.daa
Annotating DAA file using FAST mode (accession database and first accession per line)
Initializing binning…
Using ‘Interval-Union-LCA’ algorithm (51.0 %) for binning: Taxonomy
Using Multi-Gene Best-Hit algorithm for binning: SEED
Using Multi-Gene Best-Hit algorithm for binning: EGGNOG
Using Multi-Gene Best-Hit algorithm for binning: INTERPRO2GO
Binning reads…
ArrayIndexOutOfBoundsException: Index 27 out of bounds for length 27
Total reads: 0
With hits: 0
Alignments: 0
Assig. Taxonomy: 0
Assig. SEED: 0
Assig. EGGNOG: 0
Assig. INTERPRO2GO: 0
Class. Taxonomy: 0
Class. SEED: 0
Class. EGGNOG: 0
Class. INTERPRO2GO: 0
Loading MEGAN File: input.daa

Extended mode:

Meganizing file: input.daa
Annotating DAA file using EXTENDED mode
Error: java.lang.ArrayIndexOutOfBoundsException: Index -1 out of bounds for length 14
java.lang.ArrayIndexOutOfBoundsException: Index -1 out of bounds for length 14

Are you sure the DAA file is complete. Good way to check: does DIAMOND complain when you attempt to use the view command to extract alignments in a different format?

Running into a similar error using the same approach (Diamond with longreads)

Binning reads…
Binning reads Analyzing alignments
Caught:
java.lang.ArrayIndexOutOfBoundsException: Index 27 out of bounds for length 27
at megan/megan.daa.io.DAAMatchRecord.parseTranscript(DAAMatchRecord.java:131)
at megan/megan.daa.io.DAAMatchRecord.parseBuffer(DAAMatchRecord.java:93)
at megan/megan.daa.io.DAAParser.readQueryAndMatches(DAAParser.java:294)
at megan/megan.daa.connector.ReadBlockGetterDAA.getReadBlock(ReadBlockGetterDAA.java:125)
at megan/megan.data.AllReadsIterator.next(AllReadsIterator.java:75)
at megan/megan.data.AllReadsIterator.next(AllReadsIterator.java:30)
at megan/megan.algorithms.DataProcessor.apply(DataProcessor.java:213)
at megan/megan.core.Document.processReadHits(Document.java:536)
at megan/megan.daa.Meganize.apply(Meganize.java:106)
at megan/megan.tools.DAAMeganizer.run(DAAMeganizer.java:250)
at megan/megan.tools.DAAMeganizer.main(DAAMeganizer.java:63)

Using the following for diamond:

diamond blastx --range-culling --top 10 -F 15 --outfmt 100 -b8 -c2 --frameshift 15 --query sample.fasta --db /data/diamondDB/nr.dmnd --out sample.fasta.daa

and then daa-meganizer:

~/megan/tools/daa-meganizer -i sample.fasta.daa --mapDB /data/diamondDB/megan-map-Oct2019.db --longReads --lcaAlgorithm longReads --lcaCoveragePercent 51 --readAssignmentMode alignedBases -t 16

Hello,
I am running throught the same error when trying to meganize my daa file produced following the steps reported in Arumugam et al.2019. My situation is exactly the same as akwatson. Therefore as you suggested I have checked if the DAA file was complete through the "view " command in DIAMOND and it complains indeed.
Do you know how could I solve this problem or what can cause a DAA file to be incomplete?

Thank you so much.
Ginevra

1 Like

Thanks for the response, so sorry I missed it.

I have run diamond view. I am running into out of memory errors while loading subject identifiers for all but one output file.

This is using: diamond view -a input .daa -o output.view --outfmt 6

I am running it on a high memory hpc node that should have around +200Gb of RAM available.

I can try to reproduce the issue with a smaller dataset instead? Or have I incorrectly specified the output format?

The one file that did work seems strangely formatted for --outfmt 6. The output is only 25 lines long and in the second line there are hundreds of subject IDs for hits in the second field that aren’t separated at all.

E.g.

query1 WP_121438119.1 100.0 5163 0 0 2269304 2284792 1 5163 0.0e+00 10326.8
query1 WP_121438119.1WP_158420174.1WP_129865494.1(etc, there are several hundred here)

Thanks!
Andrew

So it appears that diamond didn’t terminate cleanly. Benjamin Buchfink maintains a separate community page for diamond-specific problems here: http://www.diamondsearch.org

Thanks Daniel. I appreciate you helping diagnose the issue.

For the benefit of others following this thread, I started an issue over on the Diamond github page. It looked like a reproducible problem.

After a bug fix to diamond view, the .daa files appear OK. Any suggestions on how to diagnose this issue?

If you could give me access to a small daa file that triggers the issue then I will debug it

No need to send me a file, I have reproduced the problem and will work on fixing it.

1 Like

Great, thanks very much!

Release 6.19.1 of MEGAN fixes this issue.

Benjamin Buchfink and I figured out what the problem is. There was a change in DIAMOND (build 134) that broke MEGAN’s DAA parser.
I have updated MEGAN’s DAA parser to be compatible with DAA files created by all releases of DIAMOND (both pre- and post- build 134).
(The change in DIAMOND also makes the DIAMOND view command incompatible between different builds of DIAMOND, when the long-reads option was used).

2 Likes

Thank you for your efforts on this fix. I really appreciate how quickly you were able to find a solution.

Appears to be working on my end with the latest release of MEGAN.

Echoing the above comments, thanks for the fast fix! Everything is running smoothly now, and a quick look suggests the frame-shift correction is working very nicely for my dataset! Thanks again.

Datenschutzerklärung