Difference in "normalised" read counts between MEGAN and manual calculation

Hi Daniel,

I tried to do normalization of “absolute” values using the equation, *|C|/|S|m manually. I found minor differences between MEGAN normalized read counts and my calculation. Please refer to the below table:

In total five samples were used for testing.

  1. sample1 (smallest) <> 2,180,466
  2. sample2 <> 2,661,067
  3. sample3 <> 2,679,666
  4. tested-sample <> 2,710,131
  5. sample5 <> 3,113,893

MEGAN (version 6.17)

[read counts obtained from MEGAN]
absolute # <> normalised #
109466 <> 87918
39620 <> 31821
22673 <> 18210
14636 <> 11755
4348 <> 3493
24998 <> 20077
23862 <> 19165
7418 <> 5958

Calculated normalized value using the equation, |C|/|S|*m

(109466 /2710131) * 2180466 = 88072.08624
(39620/2710131)*2180466 = 31876.7111
(22673/2710131)*2180466 = 18241.814
(14636/2710131)*2180466 = 11775.55638
(4348/2710131)*2180466 = 3498.231697
(24998/2710131)*2180466 = 20112.41858
(23862/2710131)*2180466 = 19198.43716
(7418/2710131)*2180466 = 5968.234299

As you can see there is difference between the manually calculated and MEGAN reported normalized reads counts. Can tell me, if I missed anything or why there is this difference?

Thanks in advance,

Prem

There are indeed minor differences due to the way that MEGAN does book-keeping.

Oh I see, thanks. Then do you think the value that I calculated using the formula is correct? and I can use for my analysis?

Thanks

I took another look at the code and at your numbers. The discrepancies are larger than what I would expect. Basically, I would only expect to see differences due to rounding. I just tried this out on a collection of files to confirm this.

When doing normalization, MEGAN reports the following line in the message window:

Normalizing to: N reads per sample.

What was N for your data? Was it exactly 2,180,466?

Also, did you use the “ignore unassigned option”?

Datenschutzerklärung