Joining multiple .m8 files or merging multiple .daa files into one .mx file

Hi there!

I would like to analyze my metagenomics data using your tools (DIAMOND and MEGAN).
I have 8 samples containing a total of ~235 million reads, at ~230bp long.
I plan to split each sample into 4 smaller chunks to reduce DIAMOND processing time.

(1) If I set the -o parameter for .m8 output, will I be able to join the 4 .m8 outputs created for each sample into one large .m8 file?

(2) If I set the -a parameter instead, will I be able to join the 4 .daa outputs for each sample into one large .daa file?

(3) Still using the -a parameter, will I be able to merge the 4 .daa outputs during the conversion to blast tabular format? i.e. will I be able to convert 4 .daa files into one blast tabular file?

(4) is there a way to join multiple DIAMOND outputs into one MEGAN input file using MEGAN?

If the answer is yes to any of these questions, could you please share how? Any help will be greatly appreciated.

Many thanks in advance,
Nsa

It doesn’t make sense to split the files, this won’t speed up DIAMOND.

(1) yes

(2) no

(3) no

(4) using the Import Blast dialog you can select multiple input files that give rise to one output rma6 file.
However, I do not recommend this. Rather, don’t split the 8 samples into smaller chunks but run as is.
Then use dat-meganizer (or equivalent File menu item of MEGAN) to meganize the 8 daa files. This will be the fastest route

Thank you for getting back Daniel.
I am running my analysis on a cluster and it’s nearly impossible to request all the resources I need to run diamond on my files as they are…I went ahead and split the files so that I can spread out the jobs without requesting any resources and it worked.

Also I have been able to join the .m8 outputs and they work in MEGAN!.
Cheers,
Nsa

I will look into writing a DAA merger program…

that will be really useful.
Thank you!

Daniel,
Has the DAA merger program been added to MEGAN6 yet?

1 Like

HI Daniel,

I created many meganized .daa files and I also wondered if the merge script is on the way?

Thanks a lot for the nice program!

Dear Sebastian,

I believe that we do have a DAA-Merger program and I will look into providing it with the next MEGAN release

2 Likes

Dear Daniel,

that sounds great!

Was there ever a solution for merging daa files? I’m running into the same issue, I have a large dataset and the only practical solution for the alignments is running on a cluster in small chunks. Thanks.

We do have a DAA-merger program, I will look into adding it to MEGAN tools

Hello,
I am working with large long-read datasets (each fasta file with 1-2 million reads of 10-15 kb length). I am also considering splitting fasta files for blastx alignment with Diamond. Currently, I am running some benchmarks to investigate if there is a major improvement in time.

Ultimately I would like to use the daa-meganizer tool, which sounds like the best option for import to MEGAN. Is the DAA-merger program currently available in MEGAN or elsewhere? I downloaded the latest release and did not see this program in the tools bin. This would be extremely helpful for my ongoing work.

Thanks,
Dan

Our alignment program Ella comes with a daa-merger program. However, the daa-merger program is painfully slow at present and requires more work. If you want to try it out, then please download and install Ella, available here: https://software-ab.informatik.uni-tuebingen.de/download/ella/welcome.html

If you wanted to join multiple .m8 files, then something like this should work:

cat *m8 |sort >a.m8

For use with MEGAN, the key thing is that alignments for the same read must appear together, hence the sort.
If you have paired reads in which both members of the pair have exactly the same name, then you should concatenate and sort all first reads together, then all second reads together, and then concatenate both files.

Hi Daniel,
Thank you for the suggestion. I won’t be working with paired reads so this will be a little more straightforward. In general, I am considering which formats are best for merging multiple outputs from a given alignment program. My workflows will require splitting input fasta files for processing. I’ve posted in detail about this here: Does sam2rma work for converting SAM protein alignments?

Datenschutzerklärung