The Impact of Aggregates

For over 45 years the world has built enterprise systems with the help of aggregates. The idea is based on the assumption that we can improve the response times for most of the business applications by pre-building aggregates and maintain them transactionally as materialized tables. This technique is used in financial systems as well as in sales and logistic systems. The aggregates could be defined in sql and managed by the database or completely handled by the application code. They are so popular that every database textbook starts with examples for aggregation and every database benchmark includes them to test the database capabilities.

These aggregates create some specific issues for relational databases. While the insert of a new row to a database table is pretty straight forward, we know that the update of an aggregation table is not only more expensive (read before update and rewrite) but requires a lock mechanism in the database. These locks might lead to an a-b, b-a lock situation in the application. SAP solves the problem by single threading all update transactions of the same class.

But there is a much bigger issue with this concept. The assumption that we can anticipate the right pre-aggregations for the majority of applications without creating a transactional bottleneck is completely wrong. We knew that and maintained only a few aggregates transactionally and postponed the creation of the others to an asynchronous process in a data warehouse. Here we went even further and build multi dimensional cubes for the so called slicing and dicing.

For the aggregation we use either structures like account, product, region, organization, etc in the coding blocks or we define hierarchical structures and map them onto the transactional line items. Any change in the ‘roll up’ means we have to reconstruct the aggregation table which instantly leads to down time with synchronized changes in the application code. Just think about changing the org structure for the next period while the current one is still in progress. And now ask yourself why there is a transactional system, a data warehouse and a host of data marts.

Not only did SAP carefully define and maintain these aggregates, but also duplicated the transactional line items with the appropriate sorting sequence to allow for a drill down in real time. In all applications is the management of the aggregates the most critical part. Thanks to the database capabilities to guarantee the correct execution of an insert/update transaction we live with this architecture now for many decades.

And now we have to realize that we only did transactional pre-aggregation for performance reasons. Let’s assume the database response time is almost zero and we therefore can run all business applications like reporting, analytics, planning, predicting, etc directly on the lowest level of granularity, i.e. the transactional line items, all the above is not necessary any more. SAP’s new HANA database is close to the almost zero response time and so we dropped the pre-aggregation and removed any redundant copy of transactional line items. The result is a dramatically simplified set of programs, a flexibility for the organization, an unheard of reduction in the data footprint, a simplified data model and a new product release strategy (continuous update) in the cloud.

Not only breaks the new system every speed record in read only applications and provides nearly unlimited capacity via replicated data nodes for the increased usage (better speed leads to more usage), but it accelerates also the transactional processing, because most of the activities like maintaining aggregates or inserting redundant line items or triggering asynchronous update processes are simply gone. And don’t forget, that the typical database indices are gone, all attributes in a columnar store work as an index. And there is no single threading of update tasks any more. This is how an in-memory database with columnar store outperforms traditional row store databases even in the transactional business.

And then there is another breakthrough: result sets or intermediate result sets can be cached and kept for a day (hot store) or a year (cold store) and dynamically mixed with the recently entered data (delta store). The database handles the process. Large comparisons of data or simulations of business scenarios are possible in sub-second response times – a new world.

A few years ago I predicted that in-memory databases with columnar store will replace traditional databases with row store and you can see what is happening in the marketplace. Everybody has followed the trend. Now I predict all enterprise applications will be build in the aggregation and redundancy free manner in the future.

VN:F [1.9.22_1171]
Average User Rating
Rating: 4.4/5 (11 votes cast)
The Impact of Aggregates, 4.4 out of 5 based on 11 ratings