To work with a large number of large metagenomic shotgun samples, proceed as follows.
Let us assume that you have just received a hard-disk containing a collection of fastq.gz files, each representing one sample from your study.
Put these files into a directly called 00fastq.
Create a second directory called 10daa. This will contain DAA files generated by DIAMOND.
Create a third directory called 20rma. This will contain MEGAN RMA files.
For each file reads.fastq.gz in your 00fastq directory, run DIAMOND as follows:
diamond blastx --query 00fastq/reads.fastq.gz --db nr --daa 10daa/reads.daa
When this has completed, for each file reads.aa in your 10daa directory, run daa2rma as follows:
daa2rma -i 10daa/reads.daa -o 20rma/reads.rma -g2t -g2t gi_taxid.bin -g2kegg gi2kegg.bin -fun KEGG
For this to work:
- you need to install diamond
- you need to download the nr database from here: ftp://ftp.ncbi.nih.gov/blast/db/FASTA/nr.gz (no need to ungzip,
can be read as is)
- you need to build a diamond index for nr as follows:
diamond makedb --in nr.gz --db nr
This will generate a new file called nr.dmnd
- you need to install MEGAN6 and you will find the command daa2rma in the megan/tools directory.
- You need to download the GI-to-NCBI mapping file gi_taxid.bin and the GI-to-KEGG file gi2kegg.bin from the MEGAN download webpage.
- Other mapping files are required if you want to also compute SEED, COG or other classifications. -
You may also consider using MeganServer to serve the files contained in your 20rma directory.