After watching some videos and reading some blogs about latest in-memory column store databases I feel compelled to give you a preview of the next version of financials in the SAP Business Suite for HANA. This is very timely as we are on the eve of SAP TechEd Amsterdam where technologists will gather to see the latest in SAP HANA.
As mentioned in an earlier blog, it took seven years to bring OLTP and OLAP processing back into one single system with the SAP HANA in-memory database. In contrast to older columnar databases or the new approaches by SAP’s competitors, all data being processed in the hot partition of the database is brought and kept into main memory and as a single copy. What does that mean exactly? It means:
- Only columns of a table are being loaded once they are referred to in projections or joins in SQL statements or in HANA internal libraries as stored procedures, business functions, etc. Unused columns or columns which are not being referred to, stay on SSD or disk. With every system start the load process will start again.
- From time to time HANA purges tables which are no longer in use in order to perform housekeeping.
- Configuration data or small look up tables can remain in row store. If they become part of frequent join operations, it is better to transfer them to the columnar store (faster join). The compression is mainly dictionary based, but other methods like run length encoding are also used.
- The tables in the columnar store can be considered as fully indexed relational tables. Any attribute can be used as an index and the index scan performs with 3.5 MB per millisecond per core. With the upcoming “Haswell” processors from Intel, this scan speed is likely to exceed 5MB per millisecond per core.
- All longer lasting operation will automatically be parallelized according to the number of available threads and the current workload.
- All operations on attributes will be executed using the compressed format. The materialization of result sets takes place at the latest point in time.
- Incoming new data is stored in a separate storage section called delta store. The delta store is from time to time automatically merged into the main store.
- Since HANA supports the concept of ‘insert only’, there is also a storage section for historical data.
- Result sets of large queries can be cached and reused by subsequent queries. Changes to the involved tables (insert or updates) will automatically be included in the new result set. This way we can monitor accumulated data while new data is streaming in.
- The hot standby is a replica of the primary system and can perform all read only queries in parallel. The merge process flushes the result caches where a table is involved.
These are the HANA features the latest version for financials uses. The results are staggering to say the least. We have many times talked about the unbelievable performance gains and the huge simplification of code. The new algorithms that became possible will change the way we looked at financials. But all this is not my point today.
SAP is running now all larger enterprise systems on HANA and the systems are running extremely well. I predicted in an earlier blog, that we can expect a dramatic reduction in the data footprint. Already, we need only 15% of the storage space on disk in comparison to the previous database system and this is before we activate the new data model, where all materialized aggregates are replaced by SQL statements for aggregation on the fly, all redundant tables become projections and most database indices are being dropped. When all these actions are taken, the SAP financial system will have only footprint of less than 2% left. How can this happen? The old data model was created to overcome the relatively slow speed collecting all data entries for an account, for example with redundant data structures, already sorted by account number. BTW, these numbers don’t include the reduction of our business warehouse, where we kept copies of the financial data mostly in form of a star schema. Furthermore, we know that data from Finance / ERP is being replicated into other systems besides BW (e.g. CRM for customer orders, PLM for product data, SCM for supplier, product and customer data), this redundancy can be removed as data can be directly picked from the ERP tables and need not be replicated. We also kept historic data in the main data base for reporting convenience.
To summarize we will see the following reductions:
– 2x for dropping redundant tables and aggregates
– 2x for splitting the remaining data into hot and cold partitions
– 10x for compression of the main transactional data entry table
– some % due to simplification in the data base e.g. no indices
– further % savings with less replication of redundant data into other systems (e.g. BW, CRM)
Since HANA is running all OLTP and OLAP workload for the financials (the reports in the bw are coming back and run on the entry data) we can anticipate that the ERP system shrinks another 50%, from once 7TB to 400 GB. As mentioned earlier, of these 400 GB only the really active data, e.g. being accessed once a week, should be kept in main memory. Should means like in any other database HANA can work with less than sufficient main memory at the price of performance degradation.
The hot data partition includes all data objects, which are necessary to conduct the ongoing business, to close the books of the current fiscal year and to have the last fiscal year for comparison. The data in the cold partition occupy less main memory, because they can be pre-fetched on request. To process cold data the hot data partition will also be accessed. The same report can run for hot or hot and cold data.
Now it becomes clear why SAP recommends to keep all active data in main memory – because it is so cheap. SAP runs it’s whole ERP system on a x86 based server with 64 cores and 4 TB of main memory. Soon the server for hot standby (only CPU plus storage, no disk) will take over read only requests. It is not my job to talk about the cost savings but they are huge. Hardware for production, development, test, archive and remote back up, the lower DBA costs, much faster data base back ups, no batch job scheduling everything becomes less and faster.
It is the by far biggest improvement in the history of SAP’s enterprise systems. And as promised earlier, SAP will now work with their customers together on the redesign of the user interaction. And again, features like the superfast “enterprise search” or “c’est bon”, the navigation through business object instances, or simply the unbelievable response times HANA offers will help establish a completely different user experience. No wonder that the number of cloud or hosting providers for this enterprise software is growing rapidly.
VN:F [1.9.22_1171]Massive Simplification: Case of SAP Financials on HANA,