A recent post here nicely summarized the architectural advantages of column store for analytics and described the trade-off inherent in choosing a column-orientation. In short, there is overhead associated with taking a row as input and breaking it into columns while retaining the information required to recompose the data as rows at any time. This overhead makes a row orientation significantly more effective for OLTP.
The author used HANA as the model for an in-memory column store and the product his company has developed as the model for an in-memory row-store.
But there are two concepts that were missed, I think.
First, HANA supports a columnar table type that supports the insert, update, and delete of rows as rows. This provides all of the architectural advantages of an in-memory row store for OLTP workloads. The hybrid notion means that after a row is transacted into the transient queue it moves to a columnar format in the background where it is available for analytics… and once moved the transient row is removed. If you run an analytic query against a table it will process against a snapshot using MVCC that exposes the majority of the data as columns and then picks up whatever records were still in the transient queue when the query started.
Note that this means that the compromise with the hybrid table is on the analytic side. Analytics gets 99% of the benefit of a column store (assuming that 99% of the data has been converted to column-orientation) and there is no compromise on the OLTP side. In the case where a mixed workload; analytics and OLTP are pounding the table we might expect that the analytics side might suffer more as more data will still be in a row orientation… and transactions will interfere some with the ability of HANA to fully optimize the use of processor caches for columnar analytics… but the transactions should fly.
This leads to my second point. For lots of good reasons HANA, Oracle, DB2, SQL Server and ASE are general purpose relational database systems. They solve for a wide range of query types and are not tightly tuned for just OLTP. It is a fact that you can build extremely powerful OLTP-tuned systems. They have been around for ages: Tandem Non-Stop and IBM TPF come to mind. But if you implement these then you have to replicate the data to satisfy operational reporting requirements and this introduces significant latency into your business processes.
Oracle, DB2, SQL Server and ASE effectively satisfy some operational reporting requirements, plus OLTP, against a single table instance. But the existence of separate data warehouse systems and operational data stores (ODS) are evidence that they cannot satisfy much without a replica and without significant latency. HANA changes the game here with its hybrid tables.
The author mentioned above might have done better to point out that his OLTP system is much more effective than the older OLTP players. He might point out that there is a spot for a high-volume OLTP product that does not carry the overhead of a general-purpose query engine. But the real question is: if you add the TCO of a specialized high-performance OLTP database plus the TCO of an ODS plus the TCO of a specialized data warehouse infrastructure; how will that compare to the TCO of one instance of HANA for OLTP and ODS and hot data DW plus something like Hadoop for colder data… with the HANA Smart Data Access facility making it all seem as one?