We have a large project with 75 samples run through DIAMOND that we want to use in a comparative analysis. One issue we have is that the DAA files are very large (these are on deep-sequenced host-filtered reads using the ‘-top 5’ flag, but with no compression). The data have also been meganized.
We would like to possibly filter these data more prior to combining and running analysis in MEGAN; would you have any recommendations? Just a note ‘diamond view’ appears like it could be used for at least filtering the original DIAMOND runs (pre-meganized) as it supports ‘-max-target-seqs’ and ‘-top’ but doesn’t appear to currently support DAA output.