In the world of Big Data one of the challenges in today’s landscape is the use of Map-Reduce framework which is primarily designed for batch processing and not optimal for real time interactive SQL use cases like ad-hoc query analysis, machine learning etc. Big Data vendors are trying to address this through several initiatives like Cloudera Impala, Hortonworks Stinger and Pivotal HAWQ that aim to improve Hive performance.
Apache Spark is another open source project gaining a lot of momentum to solve the same challenges of MapReduce and does so by enabling in-memory access to data in HDFS. It provides high-level APIs in Scala, Java and Python. Spark is a fast and general purpose computing system which supports a rich set of tools like Shark (Hive on Spark), Spark SQL, MLlib for machine learning, Spark Streaming and GraphX for graph processing.
SAP HANA is expanding its Big Data solution by providing integration to Apache Spark using the HANA smart data access technology. This provides an enterprise wide in-memory data fabric architecture for delivering real-time interactive analysis and applications across corporate application data and content stored in HDFS. Developers and data scientists can build the right tools easily employing in-memory techniques to gain immediate Big Data insights into customers, suppliers, partners, and products. Combining SAP HANA and Spark dramatically simplifies integration of mission critical applications and analytics with contextual data from Hadoop.
This integration of SAP HANA with Apache Spark delivers major benefits to customers and SAP HANA Startups by delivering high performance decision making using in-memory business data in SAP HANA and enriching it with in-memory Hadoop objects. It drives simplicity and power of the SQL model of data management requires less developer training to work on Hadoop data.
Learn more about the future of SAP HANA and Apache Spark at Spark Summit 2014
To try out SAPs Spark distribution please refer to the following link.