If you listened to Vishal Sikka’s keynote presentation at SAP TechEd Las Vegas, or if you saw my previous blog on this topic, you already know something about the incredible results we have achieved with a petabyte of raw data in SAP HANA. Well, I promised details would come, and here they are!
In Intel’s co-location facility in Santa Clara, California, we installed 100 IBM servers in a single SAP HANA cluster, and loaded 10 years worth of Sales & Distribution data, at 330 million transactions per day. This data model is similar to what many of our SAP NetWeaver BW customers use. In our test case, that worked out to 1.2 Trillion rows of data in a single fact table, partitioned across the entire cluster. Complex BI queries running against this data set ran mostly in less than a second! Not only that, this was performed with no secondary indexes, no materialized views, no aggregations. In fact, many of these queries, such as those involving sliding time window comparisons, can’t have aggregates built to speed up the results, and so are virtually impossible to support with acceptable performance on traditional, disk-based databases today.
As soon as the data is loaded, users can start asking whatever questions of the data they want, and they will be returned with tremendous speed. No need to ask the question ahead of time so that DBAs or Developers can build structures to speed up the results. No waiting for re-indexing or caching. Just breakthrough performance and scalability, and this is the proof. And, thanks to improvements by the SAP HANA development team, these results are almost identical to, even a little faster than, the 16-node testing performed with 100 Billion rows / 100TB of raw data back in April. Same performance with a ten-fold increase in data volume – that’s true, linear scalability.
Here is a summary of the results:
Watch Wes Mukai, VP of Systems Engineering, and Boris Gelman, VP of Development, talk about the cluster, the testing, and the results:
And stay tuned for more amazing results, as our cluster is increased from 100 nodes to 250, using servers from both IBM and HP.