OneZoom Beta: Popularity service

Overview

As part of OneZoom, we have developed methods to assign a phylogenetically informed popularity score to every species on the OneZoom tree of life (i.e. nearly all described species of life on Earth). The details of how it all works are now published on BioRxiv "Dynamic visualisation of million-tip trees: the OneZoom project" Yan Wong, James Rosindell (2020)bioRxiv 2020.10.14.323055; doi: 10.1101/2020.10.14.323055.

Image from our publication showing the top twenty most popular species

Calculating our popularity index is done by using our mappings from Open Tree of Life taxonomy IDs to wikidata Qids. From the Qid, we can find the equivalent wikipedia page for that taxon in any given language. Since the english language wikipedia is the most visited, we default to calculating popularity on the basis of activity on en.wikipedia.org. Note that this may give english-language specific weights to the popularity measures. Given the predominance of visits from the US, we might also expect north American taxa to be ranked particularly highly.

The popularity value for a particular taxon involves calculating a summary of the average number of wikipedia monthly page visits over a set period of time (truncated so as to remove spikes), combined with the size of the wikipedia page. This base or raw popularity is then percolated up and down the tree, so that species gain some fraction of the popularity of their subspecies plus the popularity of their parent taxa. We call this the phylogenetic popularity, or simply popularity for short. Using the evolutionary tree, or phylogeny, means that popularity should be less influenced by the level of knowledge of a taxon (for example, if few people visit a page for a particular beetle species, but instead consult a higher-level in the taxonomic hierarchy). It also means that all species that can be placed on the tree should get some popularity measure, even if they do not have a wikipedia page.

A major issue with any such popularity statistic is that the value is subject to change as the OneZoom tree topology changes, and as wikipedia visits and page size data change. For this reason, you are recommended only to consider the relative popularity values - i.e. the ranked order. It is also not necessarily clear what it means to compare the popularity of taxa at different levels of the taxonomic hierarchy. However, comparing species popularity values is reasonable, so to aid comparison, species (but not higher-level taxa) are ranked from 1 to n_taxa, where n_taxa is the maximum number of species on the OneZoom tree (the current value of n_taxa is also is returned by each API call).

Note that we do not normally include subspecies, such as Canis lupus familiaris (the domestic dog) on the OneZoom tree: only the full species, Canis lupus (the grey wolf, including the domestic dog, dingo, Indian wolf, etc.), will be displayed on the tree or returned using this API. Although there is an Open Tree identifier for Canis lupus familiaris, it will not be recognised if passed to this API. Additionally, OneZoom does not store higher level names for monospecific taxa, i.e. for a species like the tuatara, Sphenodon punctatus, the OTT for the species (35890) will be recognised, but searching for the genus Sphenodon (OTT 35886) will not return any results, nor will the higher level monospecific order Rhynchocephalia (OTT 35876). In a similar vein, where multiple taxonomic names describe the same set of species on the OneZoom tree, only the highest distinguishing level is used. For example, lancelets are classified in OneZoom as Cephalochordata (OTT 176555): the lower level family Branchiostomidae (OTT 176551), in which all living cephalochordates also fall, is not recognised in OneZoom or in this API.

Services available

Popularity services are accessed via URLs that return JSON data for any set of passed-in Open Tree Taxonomy identifiers (you cannot pass in scientific names). All services require you to specify an API key (e.g. by appending key=MY_API_KEY to the URL) this allows us to keep track of the volume of requests we get from different private users: no other information is stored. If you have not been given an API key, you can use the public key (0), which is restricted to 5 taxa per request, and 100 results for each taxon requested.

URL: `popularity/list`

An API to get a list of taxa ranked by popularity, e.g. https://beta.onezoom.org/popularity/list?key=0&otts=770315 (see examples below for more).

Parameters

Required

key
: Your OneZoom API key (use the public key 0 if you have not been given one)
otts: A comma-separated list of Open Tree Taxonomy identifiers (one or more positive integers)

Optional

max: An integer giving the maximum number of taxa to return. By default this is equivalent to the number of OTTs passed in: if expand_taxa is true, you may wish to set this to a higher value. There is an upper limit on the maximum number of taxa you can ask for, which is determined by your API key.
expand_taxa
: Set to True or False (can also use 0 or 1): If True, include all species which are descendants of the given OTTs (i.e. higher level taxa are "expanded" into their constituent species, and the returned taxa are all at the species level). You can mix OTTs for species and higher taxa in a single call: only higher taxa will be expanded. If the parameter is False or not given, do not expand taxa and simply return data on the OTTs that have been passed in (note that higher level taxa will not contain popularity_rank information, which is only valid for species)
spread_taxa_evenly
: Set to True or False (can also use 0 or 1): If True, divide the maximum number of taxa to return (e.g. as given by the max parameter) so that each of the N taxa that are passed in are limited to the top (max÷N) most popular species. Results are then combined and re-sorted, meaning that the final list of taxa returned may well contain fewer species than max. This only really makes sense when expand_taxa is set to 1.
names
: Set to True or False (can also use 0 or 1): If True, return a scientific (Latin) name for each returned taxon, if one is available. Names are usually (but not necessarily) the same as the scientific name listed on the Open Tree of Life.
include_raw
: Set to True or False (can also use 0 or 1): If True, also return the raw (non-phylogenetically informed) score for each taxon.
sort
: If this is set to rank or raw then do not order species by phylogenetic popularity score then OTT id (as is the default), but instead use an alternative order. If raw, then use the raw (non-phylogenetic) score, which may be zero or absent for some taxa such as these with no wikipedia page (the raw score is also returned, in the same way as it is when setting include_raw=True). If rank, then order by species rank (lowest first, with higher level taxa at start) then by plain popularity score (highest first), and finally by OTT id. Since only species (not higher taxa) have a rank, this necessarily puts all species first.

Optional for developers / debugging

db_seconds
: Set to True or False (can also use 0 or 1): If True, return a db_seconds property giving the number of seconds taken to make the underlying database calls. Default to False.

Non-monophyletic groups, like "reptiles", can be queried by combining the OTTs for all their constituent parts into a single call (see examples below). However, this cannot be then combined in turn with the spread_taxa_evenly parameter, to find the most popular groups of different sets of species. So for example, to find the 5 most popular reptiles and 5 most popular plants, you must make 2 separate API calls, one with all the identifiers defining the reptile clade, and another with the identifier for the plant clade.

Return values

In the absence of error, the returned JSON object contains the following properties:

header
: An object containing the names of each column, mapped to an index into the data "data" array (below). At a minimum the columns include "ott", "popularity", and "popularity_rank". Depending on the parameters passed in, they may also include "name". To get e.g. the popularity rank of the Nth species, you can thus do data[X][header.popularity_rank].
data: A list of arrays, one per valid OTT returned. Each array in the list contains a set of values as described in the header property. The list of arrays is sorted according the to sort parameter. Taxa above the species level have no popularity rank
tot_spp: The total number of species in the OneZoom popularity list. This gives the maximum possible value for "popularity_rank". Quantiles for ranks can be gained by dividing the rank by this number (for example,
n_taxa: The total number of taxa that this call would have listed, were it not limited to returning only a certain number
max_taxa_in: The maximum number of OTTs this API key is allowed to send in the comma-separated list.
max_taxa_out: The maximum number of taxa this API key is allowed to return.

Examples

The 10 most popular species of life, with names: https://beta.onezoom.org/popularity/list?expand_taxa=1&key=0&max=10&names=True&otts=93302 (NB: OTT 93302 = biota = all life)
The 15 most popular species of insect, with names: https://beta.onezoom.org/popularity/list?expand_taxa=1&key=0&max=15&names=1&otts=1062253 (NB: OTT 1062253 = Insecta)
The 50 most popular "reptiles" (turtles; crocodiles; lizards, snakes & tuatara; but not birds): https://beta.onezoom.org/popularity/list?expand_taxa=1&key=0&max=50&otts=639666,195672,35881 (NB: OTT 639666 = Testudines, 195672 = Crocodylia, 35881 = Lepidosauria)
The 30 most popular mammals and birds combined (i.e. 15 mammals and 15 birds), with names: https://beta.onezoom.org/popularity/list?expand_taxa=1&key=0&max=30&names=1&otts=244265,81461&spread_taxa_evenly=1 (NB: OTT 244265 = Mammalia, 81461 = Aves (birds))

Test this API

Please remember to cite the OneZoom popularity index if you use it in your own project or publication. Use of the OneZoom popularity index is subject to our terms of use.