I’ve had a problem lately where customers have been coming up to me and saying “Another vendor has told me X about SAP HANA, is it true”. My take is that it means that IBM and Oracle in particular (for this is where I suspect the questions are coming from) see HANA as a major threat in the market, which is very flattering to SAP.
Still, I have been keeping track of these questions from customers and now I thought it was time to make a list, and explain what I believe the real answer to these questions to be. Where I wasn’t sure, I pulled up a SAP HANA system and tested it myself by hand.
1) IBM BLU is faster than SAP HANA
There will always be speed plays, and I hear IBM have done a number of internal benchmarks comparing HANA vs BLU. I hear 10x faster is the main story, and from what I can see the reason for this is that the benchmark wasn’t in any way optimized for HANA.
To get HANA to work great, you do have to do some redesign of an existing data mart. We do this by using HANA Analytic and Calculation Views, which allow the true power of HANA out. These don’t change the data structures underneath and nor do they materialize anything, but they provide the HANA OLAP engine with the information it needs to perform really fast.
I’ve done a little test which is very similar to other benchmarks – a star schema with 14bn rows in the main fact table. On my test appliance, which has 160-cores and 2TB RAM, I find it uses 206GB RAM. Get some of these results:
– Full aggregation of all sales in 3 seconds. Yes, that’s SUM(AMOUNT) on 14bn rows with no restriction.
– Anything aggregating less than 1/4 of the overall data is sub second.
– Complex joins to dimension tables add very little overhead, even with WHERE clauses on the dimension tables.
– Testing with concurrency of hundreds or thousands of benchmark users, with a mixed BI workload
From what I’ve been told about IBM’s benchmarks they are showing customers, a well designed HANA data mart should be about 100x faster than BLU, rather than 10x slower. I’d love to offer to IBM to run their benchmarks on HANA any time!
However as you will see later on, it’s not about speed – it’s about simplicity and business outcomes. Speed plays are so last 2012!
2) HANA doesn’t handle concurrency
So rather like most SIMD databases like HANA, it is possible to use all of your SAP HANA appliance for a single query. Now remember that you can aggregate all of 14bn rows in 2 seconds, which means about 2000 queries per hour over a whole big data set – aggregating every transaction every time. If you filter that down, you can expect much faster query response.
There is a caveat here because if you are using all the power of HANA in a hugely complex query which performs billions of calculations, takes 5-10 seconds and uses all the CPUs, then you can’t expect that to scale to hundreds of concurrent users running that same complex query. HANA has to obey the laws of physics!
For example, if you are filtering by region, product, supplier then you are doing fewer aggregations, which means you will get many more queries per hour. Suppose on average your filters mean you only need to aggregate 20m of the 2bn transactions for each query, you will expect nearly 100x more throughput (it’s a bit less, because scanning has some cost).
If you load HANA up with more users than it has capacity for (I often load 1000-10000 concurrent queries for testing) then HANA becomes linearly slower. Kick off 10 queries that take 2 seconds each and it will take 20 seconds. Increase the number of cores on your HANA appliance and you will linearly decrease the response time. Double the cores and a 2 second query will take 1 second, or 4000 queries per hour for that scenario.
The other thing worth knowing about HANA is that you always get around 2bn scans/second/core and 16m aggregations/sec/core. If you look at your design, this can tell you how many cores you need for a specific amount of concurrency.
3) HANA isn’t Enterprise Ready
You have to judge this yourself and if you’re an architect, you can download the HANA Technical Operations Manual. What I can tell you is that HANA has most of the enterprise stuff you’d expect:
– High Availability
– Fault Tolerance either active-active, active-passive, in-memory, on-disk, with storage replication or synchronous/asynchronous application replication
– Support for asymmetric DR systems
– Backup and Recovery (including on secondary node)
– Row level security and support for standards like X.509 and SAML
– Table distribution and partition management tools
– Snapshot scheduling and restart on snapshot
– Support for Virtualization
– Scalability to up to 56TB of raw HANA in certified configurations, and more if you use Open Certification
When evaluating a software platform, you need to take many things into account, and enterprise readiness is one of them. Most of the questions I get around enterprise readiness have been answered with HANA 1.0 SP06 but as I said – make your own evaluation.
4) HANA has to decompress data
This totally depends on your application design. As I said earlier, if you use Analytic Views, then HANA will never decompress data to partition prune or process predicates, but rather it will decompress and aggregate in the CPU cache only when required. For instance this is not required for COUNTS, or for data pruning.
This is why you get the outstanding performance with HANA.
There are situations however where HANA will materialize data sets into the calculation engine and if you design your application badly, you can end up with excessive memory consumption when you run queries. If you see this in your application – big increases and decreases of memory usage in the HANA appliance – then you have got your design wrong and you need some help.
5) HANA only runs on “prior generation” processors and are a step behind Intel
Intel has several major product lines, and in its server processor lines they have E3, E5 and E7 CPUs. Currently, the E3 and E5 processor lines are using a new core “Ivy Bridge” – which is slightly more modern than the E7 core “Westmere”.
Despite this, if you look at Intel’s literature, you will see that the E7 is the top-of-the-line CPU and HANA absolutely runs on the fastest, most modern version of this. It will be replaced in Q1 2014 with the E7-8870 v2, which will be based on Ivy Bridge. HANA only supports the E7 CPU and there are several reasons for this:
– E7 supports up to 80 cores in one system. E5 supports up to 32 and E3 up to 8.
– E7 provides several capabilities to protect RAM against failure that do not exist in E5 and E3 CPUs.
– E7 is a more mature platform and more stable.
The new E7-V2 will support up to 240 cores and 12TB RAM and we expect this to be the same size, but 4-5x faster than the current fastest 80-core server.
As an aside, the “latest” IBM POWER7+ CPUs are not available in their high-end servers either. This is pretty typical behavior for server manufacturers: the latest and greatest tech goes into consumer, then low-end, then enterprise hardware. More worrying for IBM customers is that even on the latest roadmap, there is no published or confirmed timeline for POWER8.
6) HANA hardware is specifically defined and certified
SAP requires that HANA hardware be certified, so the system works correctly and this is why HANA is an appliance. However if you are a HANA customer, you are permitted to build and certify your own appliance with the SAP HANA Open Platform Initiative. This is useful if you have standard servers, networks and storage providers that you wish to use.
In addition, you can run HANA on AWS or CloudShare with instances as small as 16GB, and in SAP’s HANA Enterprise Cloud.
SAP won’t let you run HANA on any old hardware because in-memory computing does have some requirements. And yes, it has to be installed by a trained professional. It’s important to get the installation of HANA right.
7) HANA can only pre-loads entire columns and runs out of memory the whole time
I couldn’t find this clearly defined in the documentation so I’ve just run a definitive set of tests on this and I can tell you exactly how HANA behaves. First, HANA loads data into RAM at a granularity level of partitions within columns. So, if you define partitions in HANA (the Business Suite does automatically, and you should, for several reasons) then HANA will load only those column partitions that are required e.g. by year, customer group etc. That means that historic data won’t be loaded if it’s not queried.
Second, HANA dumps column partitions out of memory on a least-recently-used basis, once its reaches about 80% of RAM used. In a HANA system you should have all your hot data fitting into 50% of your main memory, so this should never be a problem.
Third, you can load and unload data manually, but only down to a specific column or table. This is mostly for the purposes of dumping data out of memory, for instance in a data warehouse where you want to load data into HANA, process it but then it isn’t needed. SAP BW uses this extensively, for example, for objects which are marked as “inactive”.
8) HANA can’t use all its CPUs
If your application isn’t using all the CPUs then you have done something wrong with the design. Mostly likely you are also seeing lots of RAM consumption, which means you are using the calculation engine excessively and are moving a lot of RAM around. Change your design and you will fix both problems.
9) IBM BLU can replace SAP HANA
There are some scenarios where BLU will be really neat – IBM DB2 install base customers for data mart or data warehouse, for example. The migration to BLU will be straightforward and they will be able to convert row to column tables easily and get benefits with very little effort. SAP BW customers – once support for DB2 10.5 is released – and support for DB2 BLU is released – will also benefit, if they are die-hard DB2 customers. But if you are an OLTP customer for DB2 then you don’t have the option to move to columnar tables on DB2 – they aren’t designed for fast-moving tables.
But you need to be on the very latest versions of SAP to run on DB2 10.5 according to SAP Note 1835822, which is caused by “The problems are caused by incompatible changes in DB2 for LUW version 10.5 or higher”. And none of the SAP Business Suite is expected to be supported by BLU – and this isn’t on a roadmap as far as I’m aware. This means DB2 customers can’t take advantage of HANA Live for Operational Reporting.
Why is HANA actually different to IBM BLU, Netezza, Teradata and Oracle Exadata?
What I find interesting about the Vendor FUD is that those vendors compare HANA to them as a database platform. Quite often, SAP let them do this and they devalue their biggest asset every time they do it. Here are a few of my thoughts on how HANA is truly different:
1) HANA is an application platform, not a database
Most of the work I do now relates to using many of the capabilities that come with the HANA platform – all included in one simple installer. Here are a few examples of the capabilities:
– SAP HANA XS Application Server. We build in-memory and transactional web apps directly in the HANA appliance, which search text indexes on big data, do high-volume analytics and transactions.
– SAP HANA Predictive Analytics Library. The PAL is built into SAP HANA and allows us to run predictive analytics against data stored in HANA, based on real-time data. We use this for things like correlation analysis and we build these into web applications in XS.
– Event Stream Processing. The SAP ESP will be built into HANA soon, but for now it is a bolt-on which streams data into HANA in real-time from sources like Point of Sale.
– Smart Data Access. With HANA we can build a logical model which accesses and aggregates information in Hadoop, Sybase or Teradata databases into one application.
To do what you can do in one HANA appliance, you would need IBM DB2 BLU, IBM SPSS, IBM WebSphere ESB, IBM WebSphere Application Server, IBM WebSphere Portal and IBM Streams or WebSphere MQ. Ouch.
2) SAP HANA works for OLTAP, today
The SAP Business Suite including ERP, CRM and SCM run on SAP HANA. Included in this is HANA Live for Operational Reporting, meaning that you can do away with using a separate Data Warehouse for operational reporting.
IBM BLU will be supported for SAP BW later on this year but it is entirely unsuited for OLTP workloads like ERP, because it only runs on one system, with no high availability. I think BLU will run SAP BW pretty well, but IBM have moved too little too late, because SAP customers are already moving to HANA, and then they won’t need BW for Operational Reporting, only for an Enterprise Data Warehouse.
With HANA, what SAP did was to build a column store, which runs really fast for analytics, and put a row store in front as a buffer. Periodically, it combines the two and this means that you get great transactional performance as well as great analytic performance. HANA has a dedicated row store but this is only used for transient tables like queues, and configuration tables. It is never used for master data or transactional data.
With IBM BLU, there is no row buffer which means you have to choose between the column store and the row store. You can’t use the column store for OLTP applications because it has no row buffer and individual insert performance is poor. You can use the row store for OLAP applications but it’s slow, so you want the performance of BLU and use a column store.
So with BLU, it presumably means that you have to build your transactional app in the row store, the analytics app in the column store and transfer data between them, presumably using Informatica? This makes BLU complex and expensive.
My conclusion brings me back to where I started: IBM and Oracle have woken up to thinking that SAP HANA is a real threat to their core business, and they are spreading Vendor FUD to win business.
I hope this piece goes some way to clear up the FUD, and if you think I got anything wrong here then please challenge me and I’ll respond, update or make corrections as necessary.