Data mashups

The main reason to develop this library is to allow you to connect data from your own work, or other databases, to the tips of the Open Tree trees. Here is one example of how that might work, hooking up rfishbase to the open tree data:

Red Fish Blue Fish..

We'll start by finding trees in the Open Tree data stor that are focused on everyone's favourite perches the cichlids. We use studies_find_studies and get_study_meta to trun up the tree id, then read the resulting string into memory using phytols read.newick

NOTE: In future releases get_study_tree will return a phylo object by default:

ot_cichlids <- studies_find_studies(property="ot:focalCladeOTTTaxonName", 
                                    value="Cichlidae")

cichlid_summary <- get_study_meta("2655")
names(cichlid_summary$nexml$treesById$trees2655$treeById)
## [1] "tree6182" "tree6181"
tr_string <- get_study_tree(study="2655", tree="tree6182", "newick")
tr <- phytools::read.newick(text=tr_string)
## Warning: NAs introduced by coercion
plot(tr)

plot of chunk find a tree

That's a lot of tips! I wonder how many ahve data in fish base? We can make seraching the fishbase data easier by first pulling out on the Cichlids (note we also had to strip out single quotes from the taxon names):

library(rfishbase)
data(fishbase)

fb_cichlids <- fish.data[which_fish("Cichlidae", "Family")]
tr$tip.label <- gsub("'", "", tr$tip.label)
fb_ot_intersect <- tr$tip.label %in% sapply(fb_cichlids, "[[", "ScientificName")
tr_fb <- drop.tip(tr, tr$tip.label[!fb_ot_intersect])
plot(tr_fb)
axisPhylo()

plot of chunk fishbase

That's a bit more managable. Let's follow up on Seuess (1960) and ask aboutthe evolution of red and blue coloration in fish:

to_fb_intersect <- sapply(fb_cichlids, "[[", "ScientificName") %in% tr_fb$ti

intersect_cichlids <- fb_cichlids[to_fb_intersect]
names(intersect_cichlids) <- sapply(intersect_cichlids, "[[", "ScientificName")

grep_diagnosis <- function(x){
    grepl(x, sapply(intersect_cichlids[ tr_fb$tip.label ], "[[", "diagnostic"))
}
red_fish <- grep_diagnosis('red')
blue_fish <- grep_diagnosis('blue')

cols <- rep("black", length(blue_fish))
cols[red_fish] <- "red"
cols[blue_fish] <- "blue"

plot(tr_fb, tip.color=cols)

plot of chunk redfish We can even go a little furthre and do a (very boring) stochastic character map to understand the evolution of this trait

names(cols) <- tr_fb$tip.label
res <- phytools::make.simmap(tr_fb, cols)
## 
## Warning: some elements of Q not numerically distinct from 0; setting to 1e-08 
## 
## make.simmap is sampling character histories conditioned on the transition matrix
## Q =
##        black       blue       red
## black -8.930  3.561e+00  5.37e+00
## blue   3.561 -3.561e+00  1.00e-08
## red    5.370  1.000e-08 -5.37e+00
## (estimated using likelihood);
## and (mean) root node prior probabilities
## pi =
##  black   blue    red 
## 0.3333 0.3333 0.3333
## Done.
described <- describe.simmap(res)
plot(described)

plot of chunk char_mapping