RPK Calculation Bug


Hi Daniel

In the case of multiple hits to a single read, does the RPK option consider only the gene length of the top hit or is there a way to average the gene length, based on all the hits?



The top hit that has an assignment to the given class is used.


Hi Daniel

Thanks for the clarification. There seems to be something off with how MEGAN currently does this computation. I digged a bit deeper. It seems like the number of reads aren’t getting multiplied after the normalisation

Case 1: I inspected a protein family (SEED) which had 1 read assigned to it. The RPK value is currently exported in this case

Case 2: I inspected another protein family (SEED), which had multiple reads assigned to it. In this case, MEGAN chooses only 1 read with the best possible hit and does the RPK calculation. It should have additionally multiplied the total number of reads