As outlined my previous blogs, SAP HANA allows us to calculate aggregates on the fly, based on line items. Aggregate tables, which have previously been used to provide fast access to totals, have therefore been removed with SAP Simple Finance and SAP S/4HANA, in favor of on-the-fly calculated compatibility views.
One question that has been raised in this context is how to handle archived items. How are aggregates calculated if line items have been archived? How does archiving work in SAP S/4HANA without materialized aggregates? And, is it necessary to de-archive items? The short answer to the latter question is: no.
For detailed answers, it is key to understand the concept of data tiering. Data tiering is the partitioning of data into multiple storage classes (see Hasso Plattner and Bernd Leukert, The In-Memory Revolution, p. 29). Data tiering can be achieved with traditional archiving – as known from enterprise applications in the past –, or by classifying data into actual and historical data coupled with aging historical data.
Traditional Archiving in Enterprise Applications
Traditional archiving removes data that is no longer needed for any day-to-day activities from the database. The archived data is kept outside of the database on secondary storage media for legal reasons, often in a different format. Archiving reduces the database footprint. As the data is no longer accessible for the enterprise application running on top of the database, only data that will neither be used in business transactions nor for analytical queries anymore can be archived in this manner.
Archiving tends to be a complex process. It results in separate data storage for archived data and, thus, a more complex system landscape, possibly even using different technology and hardware. Analytical evaluations that often may explore a question over a longer time period can no longer access archived data. An aggregation on the fly cannot consider archived line items. Hence, totals for archived items (e.g., account balances) need to be stored separately.
Data Aging After Classification of Actual and Historical Data
Thanks to SAP HANA and its partitioning possibilities, it is now possible to separate data tiers into different storage classes, but keeping all data accessible in the database. The approach presented here combines two strategies: first classifying actual and historical data based on application knowledge, and second data aging into hot and cold data for the historical data. This ensures that only historical data not accessed frequently is moved to slower storage.
Data aging can help circumvent the disadvantages of traditional archiving by separating data into different partitions based on data access statistics, but keeping the data accessible in the database.
- Hot data is data that has been frequently used (accessed and/or written) over a defined time period. SAP HANA keeps hot partitions of a table in main memory.
- Warm (and cold) data is data that is being less frequently used and that is by default not loaded into memory to reduce the memory footprint. It is by default stored on disk, but remains part of the database. As a consequence, the data is still accessible via SQL queries, but access is not as fast as for hot data.
All data, independent of aging status, is kept in the database and is accessible by queries, but access times vary. Data aging thus has the benefit that there is no technology break between different data tiers, as the database system handles the different storage locations (memory, SSD, disk) transparently.
While access statistics provide a first indication whether data is needed by daily operations or not, they need to be preceded by checks based on the business status of items which classify data into actual and historical. This ensures that data that will be accessed in the future, as known from its status, is kept as hot data in main memory for fast access. For example, open accounting document line items remain relevant even if they have not been accessed over a defined period of time. Similarly, all line items of the current fiscal year are necessary latest for reporting and closing activities.
This classification of data into actual and historical data can only be done based on business rules, knowing the semantics of the application. Actual data is necessary to conduct business or needed for legal and managerial documentation. Historical data will not be necessary to conduct business in the future and cannot be changed anymore. Business rules for actual/historical classification in case of financial line items depend on the age and the status of items: historical line items are line items from previous fiscal years that are not open or, more generally speaking, not part of ongoing processes.
The classification into actual or historical data happens on application level, as it is based on the business semantics and needs application knowledge. The database then determines the hot or cold status for data aging of historical documents based on access statistics at database level. Besides keeping all frequently used actual data in main memory, this two-step approach ensures that no data known to be needed in the future (such as open items) is shifted from the main partition and that no historical data still actively used (for example, for analysis purposes) is moved to slower storage either.
Data Tiering with SAP Simple Finance
In Simple Finance, actual data as determined by archiving checks stays always hot. Only historical data that has not been modified for a defined period of time is moved to colder partitions by updating a data aging value per record (“temperature”) and setting it to the posting date. SAP HANA regularly moves those records with a corresponding data aging value to separate table partitions.
A data tiering concept that considers actual versus historical classification of data yields benefits due to the different access behavior that enterprise applications exhibit for the different classes, resulting in different requirements. As defined above for financial data, only actual data will and can be changed. The historical partitions do not require sophisticated hardware and algorithms to ensure consistency in case of updates, such as a delta store. They can use advanced compression optimized for read-only data and a backup is only required after aging. Furthermore, typical queries already contain the information whether access to colder partitions may be needed, as items are distributed according to known criteria discernible from queries such as fiscal year or clearing status.
Keeping Track of Totals
In conclusion, data aging as the preferable data tiering approach is the mechanism used for data after the adoption of SAP Simple Finance, as it reduces the main memory footprint while keeping data accessible and reducing complexity. Nevertheless, many enterprise systems have used archiving in the past. So how does the removal of materialized aggregates work without requiring companies to de-archive their data that they have archived in the past?
When calculating aggregates on the fly in the presence of data tiering, the two options outlined here raise two separate questions discussed in the following:
How does the on-the-fly calculation access the information from items that have been archived in the past and are no longer available in the database?
SAP HANA, in particular the partitioning and data aging concepts, provide the opportunity to keep more data in the database, separated in partitions of different age, making archiving to separate storage media obsolete. Companies that do not want to de-archive their already archived data do not need to do so, as Simple Finance includes measures to keep information on totals for data that is no longer in the database when switching to Simple Finance.
As part of the switch to Simple Finance, this information that would otherwise be lost is stored in specific delta tables. Note that this does not violate the principle of redundancy, as the source archived line items are no longer part of the database and the totals cannot be reconstructed otherwise from data available in the database.
The delta table is filled once during the switch-over to the new data model with the difference between the old materialized totals table and the calculated totals from the line items in the system. The compatibility views that replace the materialized aggregates read from both the line items, calculating the totals for non-archived items on the fly, and from the delta table that keeps track of the totals for archived items.
How does on-the-fly calculation work for the data aging concept that keeps data in separate partitions, possibly on disk?
As aged data is still part of the database, usual caching strategies of SAP HANA apply also to aged partitions of data. This is particularly true as historical data does not change, so that the database system needs to cache aggregate values only once and update them only in case of reorganizations. The aggregate values are still calculated from the underlying line items, but the result of the calculation is being cached by the database, transparently for any application.
The result of an aggregate query spanning several tiers of data can then be calculated on-the-fly by computing it for the line items from the hot partition and the cached result of “colder” tiers.
Furthermore, queries can be designed in a way that makes sure the database only accesses hot partitions that reside in main memory. As the posting date is the main influencing value for aging besides the business status, deriving a lower limit to the posting date where possible ensures that cold partitions are only accessed if their data is relevant to the query at hand.
In summary, SAP Simple Finance uses data aging based on access statistics on top of an application-specific classification into actual and historical data. For historically archived documents, delta tables keep track of totals.
VN:F [1.9.22_1171]Keeping Track of Totals in case of Data Tiering (Archiving, Data Aging) with SAP Simple Finance,