I think I am making progress.
The ncbi.tre file (newick format) shows the tree using the ncbi taxonomy id
The ncbi.map file maps the taxonomic name to the taxonomy id
it looks like the .lvl files specify levels for filtering. is this entirely free or are there limits? I notice you use
the range 0-100 and cluster the values at the top and bottom of the range. i guess this file may be optional
but how is the link between the sequence ID and the taxonomy name (found in square brackets) made? My diamond search files only contain the protein ID? do you actually query the ncbi database to get the taxonomic name?
here is the scenario: I have a database in fasta format, can i just put the taxonomic name in the description in square brackets, make the indices, and search (diamond or blast). then load the search file with the appropriate .tre and .map? I think I’m missing a link.