Taxon Letter Codes in Soil Taxonomy

Andrew Brown

2021-04-18

There are four different “levels” of the hierarchy in Soil Taxonomy that are represented by letter codes:

In the SoilTaxonomy package the level argument is used by some functions to specify output for a target level within the hierarchy. Other functions determine level by comparison against known taxa or codes. This vignette covers the basics of how taxon letter code conversion to and from taxonomic names is implemented.

library(SoilTaxonomy)

taxon_to_taxon_code

taxon_to_taxon_code converts a taxon name (Soil Order, Suborder, Great Group, or Subgroup) to a letter code that corresponds to the logical position of that taxon in the Keys to Soil Taxonomy.

Gelisols are the first Soil Order to key out and are given letter code “A”

taxon_to_taxon_code("gelisols")
#> gelisols 
#>      "A"

The number of letters in a taxon code corresponds to the level of that taxon. Histels are the first Suborder to key out in the Gelisols key (A), so they are given two letter code “AA”

taxon_to_taxon_code("histels")
#> histels 
#>    "AA"

For each “step” in each key, the letter codes are “incremented” by one.

Glacistels are the second Great Group in the Histels key (AA), so they have the three letter code “AAB”.

taxon_to_taxon_code("glacistels")
#> glacistels 
#>      "AAB"

Typic subgroups, by convention, are the last subgroup to key out in a Great Group.

taxon_to_taxon_code("typic glacistels")
#> typic glacistels 
#>           "AABC"

Since Typic Glacistels have code "AABC" we can infer that there are three taxa in the Glacistels key with codes "AABA", "AABB" and "AABC"

This follows for Great Groups with many more subgroups. In case a Great Group has more than 26 subgroups within it, a fifth lowercase letter code is used to “extend” the ability to increment the code beyond 26.

An example of where this is needed is in the Haploxerolls key where the Typic subgroup has code "IFFZh".

taxon_to_taxon_code("typic haploxerolls")
#> typic haploxerolls 
#>            "IFFZh"

From this code we infer that the Haploxerolls key has \(26+8=34\) subgroups corresponding to the range from IFFA to IFFZ plus IFFZa to IFFZh.

taxon_code_to_taxon

We can use a vector of letter codes to do the inverse operation with taxon_code_to_taxon.

Above we determined the Glacistels Key contains three taxa with codes "AABA", "AABB" and "AABC". Let’s convert those codes to taxon names.

taxon_code_to_taxon(c("AABA", "AABB", "AABC"))
#>                AABA                AABB                AABC 
#>  "Hemic Glacistels" "Sapric Glacistels"  "Typic Glacistels"

taxon_to_level

We can infer from the length of the four-letter codes that all of the above are subgroup-level taxa. taxon_to_level confirms this.

taxon_to_level(c("Hemic Glacistels","Sapric Glacistels","Typic Glacistels"))
#> [1] "subgroup" "subgroup" "subgroup"

taxon_to_level can also identify a fifth (lower-level) family tier (level="family"). Soil family differentia are not handled in the Order to Subgroup keys. Family names are defined by concatenating comma-separated class names on to the subgroup. Classes used in family names are determined by specific keys and apply variably depending on the subgroup-level taxonomy.

For instance, the soil family "Fine, mixed, semiactive, mesic Ultic Haploxeralfs" includes a particle-size class ("fine"), a mineralogy class ("mixed"), a cation exchange capacity (CEC) activity class ("semiactive") and a temperature class ("mesic")

taxon_to_level("Fine, mixed, semiactive, mesic Ultic Haploxeralfs")
#> [1] "family"

getTaxonAtLevel

A wrapper method around taxon letter code functionality is getTaxonAtLevel.

Say that you have family-level taxon above and you want to determine the taxonomy at a higher (less detailed) level. You can determine what to remove (family and subgroup-level modifiers) to get the Great Group using getTaxonAtLevel(level="greatgroup")

getTaxonAtLevel("Fine, mixed, semiactive, mesic Ultic Haploxeralfs", level = "greatgroup")
#> Fine, mixed, semiactive, mesic Ultic Haploxeralfs 
#>                                    "haploxeralfs"

If you request a more-detailed taxonomic level than what you start with, you will get an NA result.

For example, we request the subgroup from suborder ("Folists") level taxon name which is undefined.

getTaxonAtLevel("Folists", level = "subgroup")
#> Folists 
#>      NA

getParentTaxa

Another wrapper method around taxon letter code functionality is getParentTaxa. This function will enumerate the tiers above a particular taxon.

getParentTaxa("Fine, mixed, semiactive, mesic Ultic Haploxeralfs")
#> $`Fine, mixed, semiactive, mesic Ultic Haploxeralfs`
#>                    J                   JD                  JDG 
#>           "Alfisols"            "Xeralfs"       "Haploxeralfs" 
#>                 JDGR 
#> "Ultic Haploxeralfs"

You can alternately specify code argument instead of taxon.

getParentTaxa(code = "BAB")
#> $BAB
#>           B          BA 
#> "Histosols"   "Folists"

And converting the internally used taxon codes to taxon names can be disabled with convert = FALSE. This may be useful for certain applications.

getParentTaxa(code = c("BAA","BAB"), convert = FALSE)
#> $BAA
#> [1] "B"  "BA"
#> 
#> $BAB
#> [1] "B"  "BA"

decompose_taxon_code

For more general cases decompose_taxon_code might be useful. This is a function used by many of the above methods that returns a nested list result containing the letter code hierarchy.

decompose_taxon_code(c("BAA","BAB"))
#> $BAA
#> $BAA[[1]]
#> [1] "B"
#> 
#> $BAA[[2]]
#> [1] "BA"
#> 
#> $BAA[[3]]
#> [1] "BAA"
#> 
#> 
#> $BAB
#> $BAB[[1]]
#> [1] "B"
#> 
#> $BAB[[2]]
#> [1] "BA"
#> 
#> $BAB[[3]]
#> [1] "BAB"

preceding_taxon_codes and relative_taxon_code_position

Another pair of functions that might be useful for comparing relative positions within Keys, or the number of “steps” that it takes to reach a particular taxon, is preceding_taxon_codes and relative_taxon_code_positon.

preceding_taxon_codes returns a list of vectors containing all preceding codes.

For example, the AA suborder key precedes AB. And within the AB key ABA and ABB precede ABC.

preceding_taxon_codes("ABC")
#> $ABC
#> [1] "AA"  "ABA" "ABB"

relative_taxon_code_position counts how many taxa key out before a taxon plus \(1\) (to get the taxon position).

relative_taxon_code_position(c("A","AA","AAA","AAAA",
                               "AB","AAB","ABA","ABC",
                               "B","BA","BAA","BAB",
                               "BBA","BBB","BBC"))
#>    A   AA  AAA AAAA   AB  AAB  ABA  ABC    B   BA  BAA  BAB  BBA  BBB  BBC 
#>    1    1    1    1    2    2    2    4    2    2    2    3    3    4    5