Import csv - lost Tetracladium genus


#1

Hi everyone,

While importing csv taxonomy-table I’ve noticed that there is a problem with importing one specific taxa: Tetracladium (genus). The rest taxa (~ 250) are completely fine.

When some lines looks like this (in csv file):

#ID,s1,s2,s3,s4,s5,s6,s7,s8
Fungi;Ascomycota;Leotiomycetes;Helotiales;Helotiaceae;Tetracladium;Tetracladium_marchalianum
or
Fungi;Ascomycota;Leotiomycetes;Helotiales;Helotiaceae;Tetracladium;Tetracladium marchalianum
or
Fungi;Ascomycota;Leotiomycetes;Helotiales;Helotiaceae;Tetracladium;Tetracladium
or
Fungi;Ascomycota;Leotiomycetes;Helotiales;Helotiaceae;Tetracladium

Megan6 imports this only to the Family level: Heliotaceae and the final tree looks this:

  •    Fungi: 51.0,0.0,0.0,544.0,0.0,17.0,143.0,59.0
        Dikarya: 51.0,0.0,0.0,544.0,0.0,17.0,143.0,59.0
          Ascomycota: 51.0,0.0,0.0,544.0,0.0,17.0,143.0,59.0
            saccharomyceta: 51.0,0.0,0.0,544.0,0.0,17.0,143.0,59.0
              Pezizomycotina: 51.0,0.0,0.0,544.0,0.0,17.0,143.0,59.0
                leotiomyceta: 51.0,0.0,0.0,544.0,0.0,17.0,143.0,59.0
                  sordariomyceta: 51.0,0.0,0.0,544.0,0.0,17.0,143.0,59.0
                    Leotiomycetes: 51.0,0.0,0.0,544.0,0.0,17.0,143.0,59.0
                      Helotiales: 51.0,0.0,0.0,544.0,0.0,17.0,143.0,59.0
                        Helotiaceae: 51.0,0.0,0.0,544.0,0.0,17.0,143.0,59.0
    

If I correctly understand - taxa should be in read in that case from “the most right” - Megan should catch the Tetracladium and then assign upper levels, like it’s doing with rest of my taxa. Where is the problem ?

However if I remove everything before genus or species level, for example leave only species:

#ID,MZ1,MZ2,MZ3,MZ4,MZ5,MZ6,MZ7,MZ8
Tetracladium marchalianum,51,0,0,544,0,17,143,59

everything is ok, Megan6 assignment and tree is correct:

  •     Fungi: 51.0,0.0,0.0,544.0,0.0,17.0,143.0,59.0
        Dikarya: 51.0,0.0,0.0,544.0,0.0,17.0,143.0,59.0
          Ascomycota: 51.0,0.0,0.0,544.0,0.0,17.0,143.0,59.0
            mitosporic Ascomycota: 51.0,0.0,0.0,544.0,0.0,17.0,143.0,59.0
              Tetracladium: 51.0,0.0,0.0,544.0,0.0,17.0,143.0,59.0
                Tetracladium marchalianum: 51.0,0.0,0.0,544.0,0.0,17.0,143.0,59.0
    

This causes a huge lost in the data (since summing up, Tetracladium counts for about 20% of some of my samples). When importing to Megan6 it looks like Tetracladium is absolutely absent and may lead to misinterpretations of results. And the most important - I was lucky that I found out this problem, but in many cases this kind of issue may be unnoticed.

Have anyone some idea ?
Best wishes :slight_smile:

Version MEGAN Community Edition (version 6.10.6, built 20 Dec 2017)


#2

The problem is that your path does not coincide with the NCBI taxonomy.

This is how NCBI classifies Tetracladium:

root; cellular organisms; Eukaryota; Opisthokonta; Fungi; Dikarya; Ascomycota; Ascomycota incertae sedis; Tetracladium;

This is how NCBI classifies Helotiaceae:

root; cellular organisms; Eukaryota; Opisthokonta; Fungi; Dikarya; Ascomycota; saccharomyceta; Pezizomycotina; leotiomyceta; sordariomyceta; Leotiomycetes; Helotiales; Helotiaceae;

Your path:

Fungi;Ascomycota;Leotiomycetes;Helotiales;Helotiaceae;Tetracladium

MEGAN matches your path into the NCBI classification starting at the root and goes as far as it can into the NCBI tree. That is why the assignment ends at Helotiaceae, which is the last point of agreement between your path and the NCBI classification.

The alternative would be to assign by name, so you would have to remove the path or, to be safe, as some taxon names are not unique, use taxon Ids…
D


#3

Thank you @Daniel it’s clear to me now :slight_smile: