I’m trying out the MEGAN6 Weighted LCA algorithm for the first time after previously using MEGAN5’s ‘LCA of an X percent’. We currently cluster our reads into OTU sequences which we then BLAST and analyze through MEGAN. Each OTU sequence then corresponds to a certain number of reads in the dataset. To use the weighted LCA algorithm, would you recommend taking these read numbers into account perhaps by setting Read Assignment Mode to ‘readMagnitude’ and using the correct “weight=XX” terminology in our OTU headers? Currently I’m a little concerned we are getting misalignments (see below) where OTUs have ‘good’ hits to multiple species. In the below case MEGAN5 would annotate to G. parvulum while MEGAN6 with wLCA annotates to ‘A. sp. AORF-2015’. Within MEGAN6 there is one other OTU in this dataset that is annotated to G. parvulum while there are 6 OTUs annotated to ‘A. sp. AORF-2015’ so I’m assuming that is why MEGAN6_wlCA annotates to ‘A. sp. AORF-2015’?
I didn’t see mention of how the ‘readMagnitude’ parameter affects LCA placement in the MEGAN6 manual, if there is a further description online please point me to it. I’m assuming it would then weigh a certain OTU sequence as that many reads when calculating read placement.
I’m also curious as to your opinion on how this affects the accuracy of taxon assignment - should it matter more in a presence/absence manner if a species is detected by a unique sequence (and more unique OTUs being assigned to a taxon makes it more likely to be truly present) or should the abundance of each unique sequence be taken into account - so one very abundant unique sequence would make it more likely other sequences would be annotated as that taxon. Please let me know if I’m misunderstanding how the weighted LCA works.