Problem exporting Reads to Taxa data from 3-string csv input


#1

Hello,

I’m currently using MEGAN 6.7.0, and I am working with 3-string .csv files of some metabarcoding data, (so I am providing the program with read names, BLAST matches and bitscores).

I think the LCA algorithm is returning good results, (they seem biologically reasonable), and I am able to export the number of reads assigned to each taxon from the program.

What I would like now to export is a file showing which reads were assigned to which taxa.
The options Export -> Reads and Export -> Matches are grayed out for me, (even when all the tree is selected).

I would like to export something like “TaxonName_to_Read”, but that option does not appear in my list of options for Export -> CSV Format.

Here is a small sample of my data:

Contig 483,GBVU419713|Austrostipa_sp._CHSL2012|rbcL|JQ933232,420.31
Contig 483,GRASS67607|Hesperostipa_comata|rbcL,420.31
Contig 483,GRASS67607|Hesperostipa_comata|rbcLa,420.31
Contig 483,GRASS74407|Achnatherum_hymenoides|rbcLa,420.31
Contig 483,GRASS74407|Achnatherum_hymenoides|rbcL,420.31
Contig 483,T19_Ausostipa_sp_rbcL,420.31
Contig 470,SDH118614|Medicago_polymorpha|ITS2,422.156
Contig 470,SDH338915|Medicago_polymorpha|ITS2,422.156
Contig 274,TDEF00812|Acacia_holosericea|rbcL,420.31
Contig 274,GBVW92913|Acacia_suma|rbcL|JX195516,420.31
Contig 274,GBVU199013|Inga_stipularis|rbcL|JQ626223,420.31
Contig 274,TDEF00112|Acacia_auriculiformis|rbcL,420.31

After the LCA algorithm has run and the tree has been created, I would like to export a file with the results of the LCA for each read, something like:

Contig 483,Poaceae
Contig 470,Medicago
Contig 274,Mimosidae

Any help with this would be greatly appreciated.

Best Regards,
Tim


#2

Hi Tim

That is currently not implemented; your reads are analyzed and then summarized on-the-fly during import… they are not stored and so they can be exported

I will look into changing this.
At a minimum, it will be possible to immediately output the assignments to a file.

Another point: it looks like your are processing contigs data.
Megan now has a “long read” mode that uses a “multi-gene” LCA algorithm for taxonomic assignment. The naive LCA that you are currently using is only suitable when the read covers only one gene.

However, the long read mode is currently not available for CSV impor as it requires additional info about the alignments (the start and end coordinates on the read).
I will look into making the long read mode available during CSV import.
Daniel


#3

Hi Daniel,

Thanks very much for your reply- I might work out a way around the problem by importing my data into MEGAN a different format.

The data I’m using does cover only one gene- they are assembled from Illumina sequencing reads, but they are all rbcL genes amplified with the same primers.

Thanks again for your help- MEGAN is a great program, and I’ve found it very useful for exploring my data!

Regards,
Tim