Do you know how many species (other than humans) there are in the world? Trick question. We actually don’t know the answer. Estimates vary, but some think this answer could lie anywhere between a few million to 100 million species. So why should we care?
Monitoring biodiversity has many useful applications. What if I told you that there is a species of mosquito called the Asian Tiger Mosquito that is invasive, has high biting potential, and is known to carry disease[i]. This mosquito is also expected to significantly expand its territory within the northeastern United States if temperatures continue to increase as predicted[ii]. Before you write this off as another sensationalist news bite, consider the great benefit of having this knowledge: (1) scientists have identified this species and (2) predicted its expansionary behavior before it has happened so we can do something about it. So what about all of the other species we don’t know about?
And what if I told you that scientists are predicting one third of the world’s species will be extinct by 2100? As Dr. Paul Hebert from the University of Guelph likened this phenomenon, “…imagine if astronomers predicted ‘last light’ for a third of the luminescent objects in the universe within a human lifetime.” Wouldn’t you want some record of what those things were before they went extinct? Or how will these extinct species impact your local environment, the places you visit, or the food you eat? Or better yet, wouldn’t you want to know how you might be able to save some of these species?
So far I’ve posed a lot of questions, but I would like to discuss some possible answers that combine biology, technology, and SAP’s love of Big Data. The International Barcode of Life Project (iBOL) has been working almost 10 years on building a DNA-based barcode identification system for all multi-cellular life. Lead by the Biodiversity Institute of Ontario at the University of Guelph, this project includes more than 25 countries. With over 2.2 million barcodes to-date, it is estimated that when complete, the barcode library for the entire animal kingdom will be 50 times its current size[iii]. iBOL represents a very systematic way of defining and tracking species. Even better, this information is accessible online via the Barcode of Life Data (BOLD) Systems.
Now I’ve mentioned biology and some technology, so what about Big Data and SAP? The question really is why would SAP be interested in bioinformatics? Well, we already know that SAP HANA can help life science companies offering genome analytic services process data in minutes. Did I also mention that BOLD data and future sequencing data is in fact really really big. In fact, one sample can yield anywhere between 350MB to 8GB of data. The SAP fit seems obvious.
How many people can say that they get to collaborate with leading experts in bioinformatics to help revolutionize how humanity interacts with biodiversity? Mosquito monitoring is really only the tip of the iceberg on what this data can do for the human race and I’m proud that our Emerging Technologies team in SAP Waterloo is a part of this greater global initiative.