I wrote a piece last week about Hasso Plattner:
seven years ago we analyzed at the hpi in potsdam the database call profile for oltp and olap systems. once we decided to move to insert only, disallowed materialized updates, disallowed performance enhancing redundant tables (e.g. with a different sort sequence), eliminated select *, replaced joining by looping over select single there was not much of a difference in the db call profile any more. in oltp systems we find a significant amount of applications, which are of the analytical type. if we take into account that many oltp type applications had to be moved to a data warehouse just for performance reasons (not to slow down data entry) and that modern planning/forecasting/predicting algorithms became realtime applications there is no significant difference between an oltp or an olap system with regards to db call profiles at all.
So thanks, Hasso, for inspiring this post. Let’s head into the past, first, and then the future.
Today’s Relational Databases all stem back to research done by Edgar Codd and others in the early 1970s and the Oracle and IBM databases were released in the late 1970s. The SAP R/2 ERP software was released for the IBM database soon after. By the time I started working in IT in the mid-90s, data volumes had increased in these databases and it was now slow to ask complicated reporting questions of the Oracle or IBM database.
What’s more, businesses were relying more on management information, and these questions could tie up operational finance and logistics systems for hours on end, impairing other business functions. Most management questions are based on aggregated information (Sales by Region Year on Year, for example) and so we built separate systems and transferred data from the operational system to the reporting system and then we aggregated the information overnight so we could answer simple management questions easily. This was the decision we made at the time.
Then, we categorized the workloads: OnLine Transactional Processing (OLTP) for the operational system, and OnLine Analytic Processing (OLAP) for the reporting system. Oracle and IBM were most pleased, because they could sell more product. Consultancies were even more pleased, because they made a truckload of money implementing more complex environments. Support personnel did well managing more complex systems.
The Future is Here, but Unevenly Distributed
If you look at what Oracle, IBM and Microsoft have done for the last 35 years, it is to capitalize on this database market. Times have been good and it is very unusual for the incumbent to disrupt their own business. You can see this most clearly in IBM’s earnings, where software is flat even given 70% growth in cloud. This means their core database business is on the decline. Oracle is more protected from this because they have a lot of cloud companies dependent on them, and for Microsoft, database is a small amount of their overall revenue.
Plus, we must remember that the OLTP/OLAP divide was created entirely of our own making. And the single biggest shift in business in the last 20 years has been the move to a much faster-moving world. News travels at the speed of Twitter. Apple has to react to changes of millions of shipments of reduced forecast iPhone 5Cs within hours. And in the midst of this, we have databases with man-made restrictions which make a faster-reacting business very hard to achieve.
And so the database market is disrupting itself. In my opinion, the end-game isn’t here – as with most necessitated change, the solution happens organically, but I see three major strands:
1) The re-convergence of structured and un-structured stores
NoSQL is a very fashionable concept right now: throw the information in a store, and worry about how to get it out later. But, NoSQL is very inefficient when it comes to processing the information – MongoDB and Hadoop are around 30x less CPU and RAM efficient than they might be, according to my research. The solution to this problem is to store the information more efficiently and process it more efficiently, which requires more structure to be defined in advance, and an aggregation framework to process this. MongoDB is part-way along this journey already, and Apache Spark is promising.
But one thing is for sure, there is a trade-off between how much structure you define, and storage and aggregation efficiency.
2) The shift from database, to application development platform
I was sat with a customer last week, and I said to him “If you think of SAP HANA as a replacement database, then you will find it is good but very expensive. If you think of it as a development platform, then you will think it is amazing and very cost-effective”. He said “Now you’re speaking my language”. We moved on to sketch out a few ideas on how to revolutionize their business.
When consultancies think about building business applications, consultants used to start with requirements, and translate that into an architecture. Then part way through the project they would bolt reporting requirements on the top.
Take the example of TSG Hoffenheim. They collect 200,000 pieces of information a second from players – including spatial and movement sensors and physical sensors, and they use this to make their team play better. Designing a solution like this requires a holistic approach to application design, and an application development platform.
If you were to use a traditional RDBMS, you would be in deep trouble. You need a stream processor to process the data in real-time, a row-based OLTP RDBMS to capture the information fast enough, ETL software to get it into a separate OLAP analytic database periodically (every 15 minutes, usually). Then you need integration software and a reporting layer.
If you use either SAP HANA or MongoDB, you just write directly from the sensors into the application platform, and write the API for consumption in an application inside the application platform. With SAP HANA, you can write the app inside the same platform too and there are enough simple choices to build apps with MongoDB. The main difference with MongoDB is you will need 30x the CPUs and RAM for the same outcomes, but if you look at point (1), this will converge.
3) Moving the Code to the Data
This was a term coined by one of my customers during a recent SAP HANA engagement and I love it. In a traditional database system, you use the database to get stuff out, push it along a wire, process it, and send it somewhere else.
Hadoop and MongoDB use the functional programming paradigm Map/Reduce to write software which runs directly against the data, and to run in parallel. SAP HANA does it a different way: C++ or SQL Stored Procedures which run against the in-memory data and are converted into parallel operations by the HANA compiler. HANA takes this to another level by offering specific engine capabilities against the data stored in-memory: Text, Search, Predictive, Business Function, Spatial and (soon) Graph operations. All of these run against the data in-memory.
And again, if you look at the direction that Hadoop and MongoDB are taking, they are optimizing these operations for aggregation functions, like HANA already has. This allows us to write applications that we couldn’t have imagined 5 years ago.
The Database is Dead
And we come to where I started: the database is dead.
I should more accurately say that the database is dying: technologies take an awfully long time to die. Many SAP (and other) customers will be running existing RDBMS systems for 20-30 years into the future, just as there are many people running 30-year old systems today. But they will be IT systems that businesses keep alive, whilst they innovate (and spend money) elsewhere.
The real question is: will any of the existing incumbents have a place in the future? Hadoop and MongoDB are fast converging into the future, as is SAP, in my opinion, though from a very different direction. SAP has a different challenge, which is how its roots (internally and in its ecosystem) are embedded in the RDMBS market.
Salesforce has been ahead so far, but I question whether Salesforce can innovate in the right direction over the next 5 years, given they are based on the Oracle RDBMS platform. Workday is interesting, because their walled-garden development platform looks to solve many of the problems that will shackle Salesforce, but that’s a conversation for another day.
In the meantime, the offerings from Oracle, IBM and Microsoft promise to provide databases with better analytic performance, which completely misses the simplification that Hadoop, MongoDB and SAP HANA provide. Oracle, IBM and Microsoft are all stuck to the OLTP/OLAP paradigm split: row-store for OLTP, column-store for OLAP. Whilst they build a faster database, the market has moved on.