I just took a look at how the distance calculation is implemented. The news is mixed.
Most of the distance calculations do indeed use summarized counts.
However, the JensenShannonDivergence uses summarized counts for leaves and assigned counts for all none leaf nodes.
I think that the latter way of calculating distances is more useful: Usually, people select leaves and use them as the basis of distance calculations, in which case either method of computing distances will produce the same result.
To provide more flexibility, in the next release of MEGAN, the distance calculation will be modified as follows:
- if a selected node is a leaf, then the summarized value associated with the node is used
- if a selected node is not a leaf, then the assigned value is used
If you want enforce the use of the summarized values for a given node, then you need to collapse that node so that is a leaf. If you want to enforce the use of the assigned values for a given node, then ensure that it is not a leaf by uncollapsing it. (If it cannot be uncollapsed, then it has the property that assigned=summarized).
Again, this change will not effect the results obtained by most users because by default, MEGAN selects leaves for doing this calculation, in which case the old and the new calculations use summarized values and provide the same results.
I’ll upload the new version later this week.
BTW, if you use the Export CSV option to export taxa to read counts and choose the “Assigned” option, then this feature exports summarized counts for leaves and assigned counts for all other nodes. This matches the new way of calculating distances that will be implemented in the next release.