Solving Big Data with SAP HANA and Hadoop

Balaji Krishna

Posted by Balaji Krishna on August 27, 2012

More by this author

It’s clear that mushrooming data growth, coupled with the decline in relative technology costs capable of managing and leveraging that information, has led many organizations to initiate (or consider initiating) Big Data analytics programs.

Large distributed retail networks generate huge amounts of rich data over time – sales, tender, money movements and inventory are the raw fuel for analytical analysis. The challenge is processing that level of information as close to real-time as possible for the benefit of the business.

With the advent of Big Data we are seeing a lot of customers get excited about NO (Not Only) SQL databases like Casandra, Mongo DB etc. One of the key value propositions of these new kinds of DB’s is its ability to run on commodity hardware plus the flexibility to not restrict only to SQL programming.

With Hadoop, users can load and retrieve data using general purpose programming languages like Java and this opens up a much wider developer audience. Key advantages of using Hadoop are

  • Data is distributed across several nodes, this provides the ability to have the computation being processed by the local node which has the data.
  • Easy to handle partial failures.
  • More commonly used programming models for map reduce tasks.

Although most of the points associated with Hadoop are positive there are few scenarios like delivering real-time reporting which is something Hadoop cannot deliver by default. There are few instances where this might be the case but in general Hadoop is strong on batch reporting where delta loads are done in Enterprise Data Warehouses trying once or twice a day only.

These are scenarios where a solution like SAP HANA with its real time data replication support using SLT or Sybase replication (roadmap) can deliver to companies when trying to address analytic needs as and when new data is loaded thru their transaction system. Some of these scenarios that support this use case are

  • Track the effect of marketing initiatives close to real-time.
  • Inventory levels and margins are tracked much faster, enabling profit maximizing decisions.
  • Fraud detection, especially around the use of credit cards, can be more reactive.

Customers like T-mobile, Honeywell have taken advantage of these real-time analytics scenarios to get ahead of their competition by delivering a much more agile experience to their end customers.

MKI is one customer who is using this powerful combination of SAP HANA and Hadoop to reduce the time it takes to perform Genome analysis when treating cancer patients to compare genome data between healthy individuals and affected patients.

By using SAP HANA as the mission-critical and reliable genome data platform it becomes possible for MKI to deliver advanced medical treatment by reducing the time in the following scenarios

  • Case analysis: comprises of Fragment Extraction, High Speed Entry, and Genome DNA extraction. First, all cases are collected and the data is preprocessed using Hadoop (fragment extraction). After that, HANA is used to do fast data analysis to find the patterns in the genome fragments and find the relationship between genome and the case.
  • Data Consumption: With the genome fragment library and the relationship to the cases in place a doctor can collect a patient’s genome and send it to the system to compare the genome fragment. Based on the knowledge library, the doctor can recommend most appropriate treatment for the patient.
  • Case study: The new clinic case will be sent to the researcher to do further study which can improve the correctness of the knowledge library.

Another use case where Hadoop and SAP HANA were used in conjunction to create value based on the criticality of data is from Cognylitics. This application was used to enable financial institutions to monitor the risk and performance of their portfolios, streamline operations, help ensure regulatory compliance in real time to enhance profitability.

This solution stores the mortgage information of consumers in Hadoop and applies complex predictive algorithms on this data to identify most probable delinquent mortgages and runs these algorithms (Business Function Libraries) inside the predictive engine of SAP HANA. The ability to combine customers corporate data with unstructured data coming from social media provides an unthinkable intelligence to the analytics that can be collected out of this combination.

Some of the technologies that have enabled this combination for SAP HANA and Hadoop to work effectively are

  • SAP BusinessObjects Data Services 4.1 – provides the ability to load data from Hadoop into SAP HANA.
  • SAP BusinessObjects BI4.x – can use the Information Design Tool to connect to SAP HANA and Hadoop and create a multi-source universe (Semantic Layer) and report using any of the BI4.x clients.

As an acknowledgment to this success companies like Cisco and EMC have partnered with SAP to create big data real time analytics package which will enable companies to tame the huge amounts of data in their enterprise and make informative decisions out of it.

VN:F [1.9.22_1171]
Average User Rating
Rating: 5.0/5 (2 votes cast)
Solving Big Data with SAP HANA and Hadoop, 5.0 out of 5 based on 2 ratings