SAP with the help of our partners BMMSoft, HP, Intel, NetApp, Red Hat, and MD&Profy set the Guinness World’s record for largest data warehouse at 12.1 petabytes (PB). It is an exciting milestone in SAP’s big data strategy and proof of our leading industry technologies. View the press release here.
We started the project to see how we could cost-effectively handle big data, while pushing the boundaries of fast, ad hoc, and scalable analytics. The scale has turned out to be impressive.
The Guinness World Record Largest Data Warehouse was creating in the SAP/Intel shared lab in Santa Clara, California. The data warehouse is 12.1PB of data running on 25 HP ProLiant DL580 G7 servers with Intel processors on a Red Hat® Enterprise Linux® 6.4 X86-64 operating system using SAP HANA and SAP IQ 16 with BMMsoft Federated EDMT® 9. The server environment is connected to a SAN comprised of 20 NetApp E5460 storage arrays through HP 8 Gb/s Fibre switches.
To help wrap your head around how massive 12+ petabytes of data really is, consider these real world examples. 12+ PBs could store the entire printed collection of the US Library of Congress (10 Terabytes) 12,000+ times over or store the entire printed content of all academic research libraries (2 Petabytes) 6 times over. From a business big data and analytical perspective, this means you have the capacity and performance to never archive critical business data. Imagine storing and analyzing 10 years of social media data in real-time. The potential impacts on business and decision making is startling. The end result is enabling a whole new set of applications and capabilities as a result of this unprecedented scale of analysis. In devising this test database, we wanted you to feel secure that our technologies could handle your largest database and data warehouse problems with ease.
We used SAP HANA Platform and SAP IQ together for a variety of reasons. First, industry revolutionizing SAP HANA Platform addresses the changes in the data warehousing world. HANA Platform offers the option to simplifying the centralized data warehousing architecture by moving toward a highly optimized in-memory based data fabric based on intelligent query federation. Secondly, SAP IQ’s columnar database has the ability to scale to massive petabyte capacity while still delivering the analytic performance needed to solve demanding database problems in real time. The combination of the two technologies means SAP is storing the data in the right places for optimizing query sets and data resolution, while at the same time ensuring the delivery of query results within the business SLAs, and reducing the cost and complexity of what one might consider an unmanageable real-time data warehouse at 12PB.
Approximately 50% structured and 50% unstructured data was used in the data warehouse, at a data compression rate of approximately 4 to 1. This modeled the complex data warehouses facing business today. BMMsoft Federated EDMT was used for query federation as well as optimized data loading.
So what is next for SAP to deliver best performance & best economy for enterprise-grade big data analytics at ANY scale? Plenty. This is only a portion of our data warehouse transformation roadmap, moving toward using SAP HANA Platform with extended tables to ensure cost savings are realized, real-time queries can be resolved through HANA, and predictive/trending/compliance queries across infrequently accessed data can be answered quickly and easily, no matter where the data resides.
Testing continues on load times (our previous world record was 34.1 terabytes per hour last April!) and other performance measures that will demonstrate the capability and performance of the warehouse. These details will be published in the coming months. The SAP HANA Platform with SAP IQ is the clear leader in big data warehousing and analytics. It is a powerful architecture for the largest data warehouses or to augment less structured data systems such as Hadoop. For now, delivering a 12.1PB data warehouse is an exciting step in SAP’s re-inventing of the big data roadmap and strategy.
To watch the replay of the March 5th SAP HANA Launch event, please click here.
VN:F [1.9.22_1171]Guinness World Record - Largest Data Warehouse,