Set a Minimum Support for WGS metagenomic data


#1

Dear Daniel
I want to know what is the ideal threshold for minimum support that I can set for WGS metagenomic data. What happen if I process my data without setting the minimum support (i.e. by keeping it on “Zero”)


#2

The min support determines the number of reads that must be assigned to a taxon (and/or any of its descendant nodes) so that the taxon appears in the taxonomic tree representing the dataset.
If you set this threshold to 0, then any node that attracts at least one read will appear in the output.
Whether this is what you want depends very much on the use case. If you want to avoid false negatives, then setting the threshold to a very small number is a good idea. However, if you want to avoid false positives (which are often the bigger problem), then you should set this to a bigger value.
Note that reads assigned to a node that doesn’t meet the threshold are not “lost” but rather they are moved up the taxonomic until a node is reached that meets the threshold.

To understand the latter, imagine your threshold is 10 and
you have four different species of the same genus that each only receive 3 reads each. Then none meets the threshold. However, their common genus node then has 4x3=12 reads and is thus its count is above the threshold of 10 and the genus node will appear with a count of 10.