Parallel running Blast2rma reports error


#1

Hello,

I’m trying to run multiple blast2rma commends in linux, in order to process multiple libraries.
When i run the first commend, everything is good, and no error message.
But when I opened another window and run the second or third commend, the same error about acc2seed-May2016XX.abin came out.

Is that mean I can only process one file at one time?

I notice that most of the time one blast2rma commend using only one core. in order to improve its performance, is there a way to run it with multiple cores?

ubuntu@ip-172-31-19-37:~$ /home/ubuntu/megan/tools/blast2rma -i /home/ubuntu/JJZ/h0702.blastp.gz -f BlastText -bm BlastP -r /home/ubuntu/JJZ/h0702fastqjoin.join-un1.fastq -o /home/ubuntu/JJZ/ -a2eggnog /home/ubuntu/JJZ/acc2eggnog-Oct2016X.abin -a2interpro2go /home/ubuntu/JJZ/acc2interpro-Nov2016XX.abin -a2seed /home/ubuntu/JJZ/acc2seed-May2016XX.abin
Version MEGAN Community Edition (version 6.10.6, built 20 Dec 2017)
Copyright © 2017 Daniel H. Huson. This program comes with ABSOLUTELY NO WARRANTY.
Functional classifications to use: EGGNOG, INTERPRO2GO, SEED
Loading ncbi.map: 1,601,128
Loading ncbi.tre: 1,601,131
Loading eggnog.map: 30,875
Loading eggnog.tre: 30,985
Opening file: /home/ubuntu/JJZ/acc2eggnog-Oct2016X.abin
Loading interpro2go.map: 11,294
Loading interpro2go.tre: 26,787
Opening file: /home/ubuntu/JJZ/acc2interpro-Nov2016XX.abin
Loading seed.map: 13,662
Loading seed.tre: 21,084
Caught:
java.io.FileNotFoundException: /home/ubuntu/JJZ/acc2seed-May2016XX.abin (No such file or directory)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(FileInputStream.java:195)
at java.io.FileInputStream.(FileInputStream.java:138)
at java.io.FileReader.(FileReader.java:72)
at jloda.util.FileInputIterator.(FileInputIterator.java:98)
at jloda.util.FileInputIterator.(FileInputIterator.java:55)
at megan.classification.data.Accession2IdMap.(Accession2IdMap.java:45)
at megan.classification.data.Accession2IdMapFactory.create(Accession2IdMapFactory.java:50)
at megan.classification.IdMapper.loadMappingFile(IdMapper.java:149)
at megan.tools.BLAST2RMA6.run(BLAST2RMA6.java:259)
at megan.tools.BLAST2RMA6.main(BLAST2RMA6.java:63)
Processing BlastText file: /home/ubuntu/JJZ/h0702.blastp.gz
Output file: /home/ubuntu/JJZ/h0702.rma6
Classifications: Taxonomy,SEED,EGGNOG,INTERPRO2GO
Parsing file: /home/ubuntu/JJZ/h0702.blastp.gz


#2

It looks like, from the results, SEED binning still working?

Output file: /home/ubuntu/JJZ/h0702.rma6
Classifications: Taxonomy,SEED,EGGNOG,INTERPRO2GO
Parsing file: /home/ubuntu/JJZ/h0702.blastp.gz
10% 20% 30% 40% 50% 60% 70% 80% 90% 100% (2578.3s)
Total reads: 488,319
Alignments: 1,383,750
100% (0.0s)
Binning reads: Initializing…
Initializing binning…
Using ‘Naive LCA’ algorithm for binning: Taxonomy
Using Best-Hit algorithm for binning: SEED
Using Best-Hit algorithm for binning: EGGNOG
Using Best-Hit algorithm for binning: INTERPRO2GO
Binning reads…
Binning reads: Analyzing alignments
Total reads: 488,319
With hits: 209,284
Alignments: 1,383,750
Assig. Taxonomy: 208,766
Assig. SEED: 176,384
Assig. EGGNOG: 4,523
Assig. INTERPRO2GO: 19,333
MinSupport set to: 244
Binning reads: Applying min-support & disabled filter to Taxonomy…
Min-supp. changes: 4,904
Binning reads: Writing classification tables
Numb. Tax. classes: 290
Numb. SEED classes: 328
Numb. EGG. classes: 890
Numb. INT. classes: 1,184
Binning reads: Syncing
Class. Taxonomy: 290
Class. SEED: 328
Class. EGGNOG: 890
Class. INTERPRO2GO: 1,184
100% (44.8s)
Total time: 2630s
Peak memory: 4.7 of 8.7G


#3

To my knowledge, there is no parallelization for blast2rma.

You can, however, launch multiple processes of blast2rma if you have multiple samples to save time.

I’ve written a script to use gnu-parallel to do this in an automated fashion:

for i in Wang-2013-modernplaque/*.fastq; do b=${i%.fastq}.blastn.gz; echo ./0_Software/MEGAN6-8-20/tools/blast2rma -r $i -i $b -o . -v -ms 44 -me 0.01 -f BlastText -supp 0.1 -alg weighted -lcp 80; done | parallel -j24 –
delay 6

-j24 = 24 jobs = 24 cores, so adjust this to suit. delay 6 is used because Java has a heart-attack when trying to launch multiple processes simultaneously.

Hope this helps,
Raphael


#4

I will look into parallelization…
However, if you do blastp with DIAMOND, then meganizing the DIAMOND output DAA file is the fastest option.