How can I obtain genome length covered by reads?


#1

Hello,
In order to normalize my results, I want to obtain the gene or genome length that is coverage by at least 1x depth (number of bases of a reference taxon that is cover) . But when I export taxonId to length it seems to export the sum of the contigs lengths even do they are sometime partial overlapped. For example, suppose that each base is one character, in one taxon I have:

case 1:
GGGGGGGGGGG Reference genome (Length =11)
–RR____RRR (Length read1=2, length read2=3)
RR____ RR (Length read3=2, length read4=2)

taxonId to length = 9 and I will like to obtain 7

case 2:
GGGGGGGGG Reference genome (Length =11)
____ RRR (Length read1=3)
___RRR (Length read2=3)
______RRR (Length read3=3)

the same, taxonId to length = 9 and I will like to obtain 5

For example, in the first case, reference genome covere is longest than case 2, but by exporting taxonId to length it give me the same thing. ¿It is possible to calculate or obtain in some way this information?

Best regards,

Blanca