Filtering on Query coverage


#1

Hello everyone
I would like to apply a filter on query coverage. There would be a simple way to realize this filtering out of MALT for example.
Best
Cédric


#2

The latest release of MEGAN has a “Min Percent Read to Cover” item in the Options->LCA Parameters dialog, Advanced tab. You could perform binning with a threshold set to 90%, say, then only reads with at leaset 90% coverage will be assigned to taxa, the others will go to the Unassigned node.
Then select all nodes except the Unassigned node and “Extract Reads” for those nodes.
Then you will have the reads that have 90% or more coverage.

Note that the “Min Percent Read to Cover” item performs slightly differently, depending whether you are in long read mode or not. If not in long read, i.e., if in short read mode (the default), read coverage is based on the longest alignment found for a read. So, in theory, if a read has two short alignments, side by side, then the coverage will be based on one of them, not on their union. This is for perform reasons.
In long read mode, coverage reflects the number of bases covered by any of the read’s alignments, which takes longer to calculate, but is necessary, of course, for long reads.


#3

yes I used the Advanced tab with “min percent read to cover” at 0.1 or 0.5.


#4

Hi Daniel,
I am not sure how to use the advanced option “Min Percent Read to Cover” item in the Options-> LCA Parameters dialog, Advanced tab. (MEGAN 6.8.5 window).

I only get assigned reads has a threshold value of min coverage of 0.1 ???

If I set this parameter to 0.3 no reads were assigned however some should be (see figure2)
1/ Do the MALT output need to specify a particular format option? (I used MALT default parameter)
2/ From what size should a read be considered long?

Below the parameters used :

Sequence for which a hit should be kept with a min coverage of 0.5:

thank you for your help

Best

Cédric


#5

Your screen-shot shows the Parameters tab, not the Advanced tab, are you using the Advanced tab?


#6

yes I used the Advanced tab with “min percent read to cover” at 0.1 or 0.5.


#7

I have the same issue. I have tried the GUI in windows and ubuntu and tried via command line and varied the parameter from 1-95. If I do not include the Min Percent Read to Cover parameter everything runs fine, but including it at any level results in all reads going to “Not Assigned”. I have checked and the vast majority of my reads have over 90% query coverage. Here is some output with and without the parameter (at 2%). Any help is greatly appreciated. The two “taxa” in the output in this example are “No blast hits” and “Not assigned”.

blast2rma -i 12s.Mblast.xml -f BlastXML -bm BlastN -r 12s.fasta -o 12s.Mblast.
MPI95.MS150.TOP0.01.MRC90.rma6 -ms 150 -me 0.001 -mpi 95 -top 0.01 -supp 0 -v -ram readMagnitude -a2t /media/sf_Documents/WORK/CIBIO/STATS_AND_CODE/megan/nucl_acc2tax-Mar2018.abin
Blast2RMA - Computes MEGAN RMA files from BLAST (or similar) files
Options:
Input
–in: 12s.Mblast.xml
–format: BlastXML
–blastMode: BlastN
–reads: 12s.fasta
Output
–out: 12s.Mblast.MPI95.MS150.TOP0.01.MRC90.rma6
–useCompression: true
Reads
–paired: false
–pairedSuffixLength: 0
–pairedReadsInOneFile: false
Parameters
–longReads: false
–maxMatchesPerRead: 100
–classify: true
–minScore: 150.0
–maxExpected: 0.001
–minPercentIdentity: 95.0
–topPercent: 0.01
–minSupportPercent: 0.0
–minSupport: 0
–minPercentReadCover: 0.0
–minPercentReferenceCover: 0.0
–lcaAlgorithm: naive
–lcaCoveragePercent: 100.0
–readAssignmentMode: readMagnitude
Classification support:
–parseTaxonNames: true
–acc2taxa: /media/sf_Documents/WORK/CIBIO/STATS_AND_CODE/megan/nucl_acc2tax-Mar2018.abin
Other:
–firstWordIsAccession: true
–accessionTags: gb| ref|
–verbose: true
Version MEGAN Community Edition (version 6.11.7, built 11 Jun 2018)
Copyright © 2018 Daniel H. Huson. This program comes with ABSOLUTELY NO WARRANTY.
Loading ncbi.map: 1,703,606
Loading ncbi.tre: 1,703,609
Opening file: /media/sf_Documents/WORK/CIBIO/STATS_AND_CODE/megan/nucl_acc2tax-Mar2018.abin
Processing BlastXML file: 12s.Mblast.xml
Output file: 12s.Mblast.MPI95.MS150.TOP0.01.MRC90.rma6
Classifications: Taxonomy
Parsing file: 12s.Mblast.xml
10% 20% 30% 40% 50% 60% 70% 80% 90% 100% (37.4s)
Total reads: 1,297
Alignments: 113,005
100% (0.0s)
Binning reads: Initializing…
Initializing binning…
Using ‘Naive LCA’ algorithm for binning: Taxonomy
Binning reads…
Binning reads: Analyzing alignments
Total reads: 1,297
With hits: 1,205
Alignments: 113,005
Assig. Taxonomy: 1,063
Binning reads: Writing classification tables
Numb. Tax. classes: 134
Binning reads: Syncing
Class. Taxonomy: 134
100% (93.3s)
Total time: 145s
Peak memory: 2.2 of 4.1G

blast2rma -i 12s.Mblast.xml -f BlastXML -bm BlastN -r 12s.fasta -o 12s.Mblast.
MPI95.MS150.TOP0.01.MRC90.rma6 -ms 150 -me 0.001 -mpi 95 -top 0.01 -supp 0 -mrc 2 -v -ram readMagnitude -a2t /media/sf_Documents/WORK/CIBIO/STATS_AND_CODE/megan/nucl_acc2tax-Mar2018.abin
Blast2RMA - Computes MEGAN RMA files from BLAST (or similar) files
Options:
Input
–in: 12s.Mblast.xml
–format: BlastXML
–blastMode: BlastN
–reads: 12s.fasta
Output
–out: 12s.Mblast.MPI95.MS150.TOP0.01.MRC90.rma6
–useCompression: true
Reads
–paired: false
–pairedSuffixLength: 0
–pairedReadsInOneFile: false
Parameters
–longReads: false
–maxMatchesPerRead: 100
–classify: true
–minScore: 150.0
–maxExpected: 0.001
–minPercentIdentity: 95.0
–topPercent: 0.01
–minSupportPercent: 0.0
–minSupport: 0
–minPercentReadCover: 2.0
–minPercentReferenceCover: 0.0
–lcaAlgorithm: naive
–lcaCoveragePercent: 100.0
–readAssignmentMode: readMagnitude
Classification support:
–parseTaxonNames: true
–acc2taxa: /media/sf_Documents/WORK/CIBIO/STATS_AND_CODE/megan/nucl_acc2tax-Mar2018.abin
Other:
–firstWordIsAccession: true
–accessionTags: gb| ref|
–verbose: true
Version MEGAN Community Edition (version 6.11.7, built 11 Jun 2018)
Copyright © 2018 Daniel H. Huson. This program comes with ABSOLUTELY NO WARRANTY.
Loading ncbi.map: 1,703,606
Loading ncbi.tre: 1,703,609
Opening file: /media/sf_Documents/WORK/CIBIO/STATS_AND_CODE/megan/nucl_acc2tax-Mar2018.abin
Processing BlastXML file: 12s.Mblast.xml
Output file: 12s.Mblast.MPI95.MS150.TOP0.01.MRC90.rma6
Classifications: Taxonomy
Parsing file: 12s.Mblast.xml
10% 20% 30% 40% 50% 60% 70% 80% 90% 100% (30.0s)
Total reads: 1,297
Alignments: 113,005
100% (0.0s)
Binning reads: Initializing…
Initializing binning…
Minimum percentage of read to be covered: 2.0%
Using ‘Naive LCA’ algorithm for binning: Taxonomy
Binning reads…
Binning reads: Analyzing alignments
Total reads: 1,297
Low covered: 1,297
With hits: 1,205
Alignments: 113,005
Assig. Taxonomy: 0
Binning reads: Writing classification tables
Numb. Tax. classes: 2
Binning reads: Syncing
Class. Taxonomy: 2
100% (56.9s)
Total time: 124s
Peak memory: 2.1 of 4.1G


#8

Thanks for the bug report… I have found the bug and the min read cover item will work as intended in the next release (6.12.1)