I’ve got three questions. I’ve searched the forum and I could not see a similar problem. I hope it will be OK.
I am using Megan Community Edition (6.12.6 built 15 October 2018) to analyze the rma6 files produced by MALT (version 0.4.1, built 24 May 2018) tool. I am trying to filter the SAM files based on the reference sequence and read names to do some further ancient DNA analysis (like authenticating the deamination patterns).
To do this, I selected a species node (lets say X) and clicked extract reads button and created a file (I used the include summarized reads button checked otherwise I got 0 reads). Let’s say I got 25000 DNA reads extracted.
Then I filtered the SAM files based on reference (Accession name and Taxid of the species X), the reads that I have extracted and merged the SAM files into one BAM file. In the end, I have realized that in some of the SAM files, I did not have the exact number of reads (Lets say the sum of the reads decreased to 12500 for species X). So:
First question: Why I got different number of reads? What could be the problem in here? Do you have any idea?
Second question: I didn’t understand the include summarized reads box to extract DNA reads. When we extract DNA reads that are assigned to a species, we extract the reads only for this node right?
Third question: In the alignment viewer there is an expression: “List of 151 reference sequences for X”. Some of them belongs to species X and some of them belongs to taxonomically related references of species X. So, when we extract DNA sequences, are we extracting all the DNA reads that are related to species X or DNA sequences that are only assigned to species X? This is confused me.