SAP Cloud Platform Big Data Services became part of the SAP portfolio a year ago, with the acquisition of Altiscale, a leading provider of Big Data as a service. In that time, we’ve met with many organizations seeking to learn more about our offering, because they desire to get a handle on their Big Data challenges. The majority of these discussions focused on strategies to bridge new types of data and existing enterprise data, and we’ve repeatedly found that integration with the SAP HANA platform can greatly simplify this Big Data use case for customers.
The pairing of an analytic data warehouse with a data lake isn’t an entirely recent occurrence. Altiscale came of age as a single-product company, providing organizations a full-service Hadoop and Spark platform in the cloud. Customers benefited from an integrated approach to Big Data that included infrastructure, software, operations, and support in an easily consumable subscription, but they would often have to couple the Altiscale platform with a data warehouse or data mart from a second provider to complete their reporting and analytics solutions. Now, as part of SAP, we have a more potent combination to offer our users: SAP HANA and SAP Cloud Platform Big Data Services.
Using Big Data Services as a Data Refinery
When used together, SAP Cloud Platform Big Data Services act as a data refinery, cleansing and transforming raw data before it is surfaced through SAP HANA for further, more detailed analysis. Big Data Services’ large-scale data storage and computing infrastructure, based on Hadoop, makes the platform the natural location to ingest, refine, and subsequently store terabytes of less structured external data—clickstreams, logs, IoT, text, and images and video. Analyzing the refined data using SAP HANA allows users to leverage a proven in-memory data platform, and exploit its high performance and concurrency for low-latency analytics.
Figure 1: SAP Cloud Platform Big Data Services as a Data Refinery for external data and SAP HANA as a highly interactive analytic platform
In the course of sales engagements, customer workshops, and various events, most of the data management professionals we met agreed that seamless integration between SAP HANA and Big Data Services would be of significant benefit to their organizations. Using SAP HANA as the central entry point for most users of the Big Data system allows data from both the in-memory and Hadoop tiers to be exposed through familiar tools and applications, thereby making new, external data sources accessible to non-technical users, and reusing existing investments in the SAP HANA ecosystem.
As a complement to SAP HANA, SAP Cloud Platform Big Data Services possess certain qualities that make them ideal for use as a data refinery, particularly on high-volume, unstructured data types that enterprises struggle to harness today. Big Data Services can act as an elastic cloud ETL engine for refinery jobs, transparently growing and contracting customers’ Big Data clusters in response to workload. Big Data Services also offer organizations the ability to take advantage of open-source software, like Spark, and its machine learning and streaming capabilities, for their data science and data engineering requirements.
Using Big Data Services for Data Aging
An additional benefit users get from pairing SAP HANA and SAP Cloud Platform Big Data Services is the option to relocate older data (or lower-value data) into their Hadoop environments. Big Data Services are well-suited to storing large volumes of infrequently accessed data, while still providing users the ability to query that data, when needed, through SAP HANA’s data virtualization capabilities, called SAP HANA smart data access.
Figure 2: Data Refinery architecture, including enterprise data and data aging
Managed SAP HANA Connectivity Available on Big Data Services Starting in Q4
Connectivity between SAP HANA and SAP Cloud Platform Big Data Services is achieved through the use of SAP HANA smart data integration (SDI) and smart data access (SDA). SDI enables data movement between Big Data Services and SAP HANA. While SDI allows smaller subsets of refined data to be brought into SAP HANA for querying, SDA allows data in Big Data Services to be queried remotely from SAP HANA through the use of virtual tables and without moving the data into SAP HANA.
The use of SDI and SDA requires specific components, the Data Provisioning Agent and Spark Controller, to be running on the Big Data Services environment. Starting in Q4 2017, Big Data Services will install, configure, and manage these components, providing a reliable, always-on connection between Big Data Services and SAP HANA, as part of its managed SAP HANA connectivity offering.
Users will have the option of using either access path—data movement or data virtualization—to query data in Big Data Services from SAP HANA. Users may also employ Data Lifecycle Manager (DLM) to define data tiering strategies between SAP HANA and Big Data Services. Using SAP HANA in conjunction with Big Data Services has the added benefit that applications written for SAP HANA can now be extended, without modification, to use data from Big Data Services as well.
SAP Data Hub Support for Big Data Services
SAP Data Hub was recently launched to help organizations manage complex enterprise data landscapes and will support SAP Cloud Platform Big Data Services. Customers will be able to use SAP Data Hub to orchestrate the ingestion of external and enterprise data into Big Data Services and SAP HANA, and coordinate data engineering and data science jobs on Big Data Services as part of their data pipelines.
Making Organizations More Productive with Big Data
We expect the combination of SAP HANA and SAP Cloud Platform Big Data Services to be a powerful driver towards the proliferation of Big Data usage in the enterprise. With fully managed integration between the two products, SAP can offer the simplicity of a unified Big Data solution, comprising category-leading in-memory database and Big Data-as-a-service components, from a single provider. In situations where users struggle to bring new types of data into an existing data warehouse, or where they have to deal with the complexities of operationalizing a Hadoop data lake and a data warehouse from different vendors, the combination of SAP HANA and Big Data Services can make organizations significantly more productive with their data assets.
To learn more about how SAP can make you successful with your Big Data initiatives, visit us at SAP TechEd, where you can experience our Big Data solution in a hands-on workshop, or the Strata Data Conference, at booth 401, or find us online at SAP Cloud Platform Big Data Services.
VN:F [1.9.22_1171]Simplifying Big Data with SAP HANA and SAP Cloud Platform Big Data Services,