Over the course of my last few blogs I have discussed and looked at how SAP HANA interacts with the Hadoop the Big Data world. Needless to say I was very excited when Databricks announced its partnership with SAP on July 1’st. You can read the Databricks press release here. This completes the trilogy – as SAP HANA now supports all three approaches to SQL on Hadoop – batch scale, disk optimized and now Spark in-memory. This means that SAP is able to work with Hadoop and NoSQL and meet the varied needs of customers as they adopt and begin to utilize these technologies in their data fabric. Integration with Hadoop just got better. And this is great news for SAP and Hadoop. Databricks has created a Spark distribution, based on Apache Spark 1.0, integrated with the SAP HANA. You can download it here.
Someone at the recent Hadoop Summit North America show asked me if I felt that in-memory on Hadoop was a threat to SAP and SAP HANA. My response then, as now, is no it is not a threat it is a complimentary piece of the puzzle that makes things better. There are times when you must move your data around and times when you will want to access it in situ – and SAP HANA’s smart data access let’s you do just that. The only down side is the latency of the target system. You get instant results and speed from SAP HANA, but if the results include data residing in another system the query will be only as fast as the response from other system. Now with Spark providing in-memory for HDFS and NoSQL the responses back from the Hadoop systems will be faster. Less latency is better performance overall. The faster the response the more HANA’s smart data access will be used to join data optimally stored across the data fabric. The follow up question was, but will in-memory Hadoop not replace or mitigate the need for HANA. To which I noted no – they cannot. Would you really put your vital, mission critical data in Hadoop, which at this point, does not offer ACID (Atomicity, Consistency, Isolation, Durability) based persistence to guarantee that database transactions are processed reliably. The Hadoop community may be working toward this but it is a long way off. Hadoop is a great place to store contextual data that can be combined with enterprise data to enrich it, and now Spark can provide faster access to that data. Besides SAP HANA is much more than just an in-memory data store – it is a platform providing an application server, web server, powerful libraries and processing engines, all where the data resides. So I see in-memory Hadoop as complimentary and enabling not a threat.
There is great power in bringing the compute to the data, rather than doing it the other way around; this is why SAP HANA was conceived and built. And in the evolving IT landscape that is an optimized data fabric that needs to woven together SAP HANA’s smart data access can help achieve this. The integration of SAP HANA and Spark provides a simpler process for developers and data scientists to start from a single system integrate data stored throughout the data fabric and bring the result sets back. SAP HANA provides Spark developers rich analytics capabilities that do not currently exist in Spark, (Text Analytics, OLAP, Geospatial, OLTP) and they can use HANA to do all the compute and return the results (not move the data) to Spark. As well they get access to the apps that are built on SAP HANA and use its smart access capability to access data in situ on a variety of data stores in the data fabric. This integration now means that those developing on SAP HANA can now enrich enterprise apps and analytics with insights derived from contextual data that store on HDFS and NoSQL systems; along with data in other data stores. And keeping with the open agnostic SAP approach that I have discussed and expounded previously – the Spark integration accessing HDFS can work with any Hadoop distribution so SAP continues to be agnostic and open around Hadoop. Though as I have often pointed out before we do have a reseller agreement with Hortonworks.
This integration is a beautify synergy and a nice step forward as we weave together the data fabric of your enterprise.