I’m writing this blog today to respond to Martin Lauer’s fantastic question in the thread on Microsoft’s announcements. His question is: “I’m wondering if a database that has been optimized for both OLTP and OLAP can be as fast in each of these areas as a database that has been optimized for either OLTP or OLAP?”
Let’s tease the problem apart…
The OLTP + BI workload question can be abbreviated into five components that can be evaluated separately:
- An OLTP database: a data structure optimized for OLTP… for insert, update, and delete transactions with a small read component;
- A BI database: a data structure optimized for BI queries… a read optimized system;
- An ETL process: to move data from the OLTP structure to the BI structure;
- A transaction: an ACID-compliant, OLTP CREATE, REPLACE, UPDATE, and/or DELETE with SELECT “query”;
- A report: a BI query;
If you have only one database structure you can optimize for only one query; e.g. the OLTP query is fast against the OLTP structure but slow against the BI structure. This is the point of Martin’s question. If you have two structures you have to Extract, Transform, and Load the data between the two at some cost. There is cost in keeping a replica of the data, cost is developing, administering, and executing the ETL process. In addition there is a lost opportunity cost hidden in the latency of the data. You cannot see the current state of the business by querying the BI data as some data has not yet moved across.
OLTP performance is normally paramount; so the perfect system would not compromise that performance or compromise it only a little. Let’s look at the HANA approach to this at a high level. HANA provides a single view of a table to an application or a user, but underneath it includes an OLTP optimized part, a BI optimized part, and a mechanism for moving data. When a transaction hits the system inserts, updates, and deletes are processed in the OLTP part with no performance penalty. The read component accesses the read-optimized internal structure with no performance penalty. Note that reading a single column in a column store, which is the key for the transaction, is roughly equivalent to reading an index structure on top of a standard disk-based DBMS. Except the column is always in-memory which means I/O is never required. This actually provides the HANA system with an advantage over a disk-based system. Disk I/O is 120+ times slower than memory access so even an index is unlikely to beat in-memory. See here for some numbers you should know and see here for a detailed look at the steps required to execute a transaction on a system like Exadata.
After the transaction is committed into the internal OLTP-optimized part, a process starts that moves the data to the BI optimized part. This is called a delta merge as the OLTP portion holds all of the changes in the data set.
When a BI query starts it can limit the scan to only partitions in the BI optimized part, or if real-time data is required it can scan both parts. The small portion of the scan that accesses the OLTP portion is sub-optimal when compared to the scan of the BI part, but still faster than any disk-based system as the data is all in-memory.
The result is that HANA performs OLTP writes as fast or faster than disk-based systems. The design optimizes the read portion in-memory. The delta merge process is automatic and very inexpensive compared to any two-system alternative. BI queries that do not require real-time data run without impact and BI queries that require real-time data minimize the impact to a slightly slower read of only changed data.
The bottom line: HANA provides a single table object that supports both BI and OLTP simultaneously. Even if we compared HANA to a split-configuration where both sides of the configuration were in-memory databases, the ability to move data automatically from the OLTP side to the BI side and to query real-time data would set HANA apart.
HANA will perform OLTP like an OLTP DBMS and query against the same instance of the data much faster than the best BI DBMS. You can do both if you design for it… you probably cannot do both if you try to do OLTP on a system designed for BI… or BI on a database designed for OLTP.
VN:F [1.9.22_1171]OLTP + BI in a Single HANA Instance,