We have seen how bringing as much data in HANA, SAP’s in-memory platform, as is needed for real-time decision support from any source is revolutionizing interactive analysis on live-data. HANA scales linearly across hundreds of nodes to analyze as much data as needed within a window of opportunity. Hasso’s demo of 4 billion records of retail data (start approx. at 1:01:30) at SAPPHIRE and Vishal’s previous announcement on 1 PB of raw data in main memory already show the scalability of HANA.
However, for information lifecycle reasons (e.g. archives, historical data, clickstream logs for years of web data or detailed machine logs, corporate data retention policies such as retaining all data for 7+ years for legal purposes) customers often use cold storage strategies and the challenge has been to smoothly and unrecognizably (to the outside) integrate such a cold storage area thereby allowing all of HANA’s functions to work on that data too if required.
In Amsterdam’s SAP Teched keynote, a dashboard demo was shown (start approx. at 41:00) that worked on top of a BW system with HANA as the underlying DB thereby leveraging a new, innovative feature inside HANA, namely extended tables. The latter are tables that logically sit in HANA and can be used as if they were normal HANA tables. However, they physically sit in a Sybase IQ server that is closely tied to that HANA system. This allows to provide an area for “cold data” – i.e. not frequently used but important data, e.g. in the corporate memory of an EDW – at an attractive price point at the expense of slightly decreased performance. The BW-HANA-IQ system holds ½ PB of data in total. This blog describes a little bit the background of that demo and the exposed feature.
Data warehouses typically have areas of cold data:
Data sitting in such cold areas – in real-world scenarios typically 40-60% of data volume of a data warehouse fall into this category – do not need to occupy main memory or other resources in HANA. It makes sense to provide an area within HANA and w/o any functional restrictions that match the usage profile of that type of data. This is what is referred to as extended storage. Technically, this is done by leveraging infrastructure of Sybase IQ; to the user, this is not visible.
So what is an extended table? In simple words: it is a table definition sitting in the HANA catalog but actually pointing to a table in a connected Sybase IQ server. The latter acts as an extended storage to HANA. An extended table is similar to a virtual table but there is more to it as it is more tightely integrated into HANA than just a virtual table, e.g.
Fig. 1: The extended table concept in HANA
Fig. 2: Creating an extended table from HANA Studio; BW creates them automatically (see fig. 3).
The demo runs on a BW-on-HANA system with an IQ system connected as extended storage. The IQ system holds ½ PB of raw data (≈ CSV file data). In the demo, a simple dashboard was shown that was built with Design Studio. The dashboard runs unchanged on both, an iPad and a desktop browser. It uses a BW query that sits on top of a BW composite provider. The latter comprises 167 write-optimized DSOs, one for each fiscal period between January 2000 and November 2013. All write-optimized DSOs have been created with the extended table property in BW – see figure 3 – but the one for November 2013. The latter “lives” in HANA, i.e. in in-memory storage. Each write-optimized DSO holds approx. 2 billion rows, translating into approx. 320 billion rows in total for that composite provider.
Fig. 3: The extended table property for a write-optimized DSO in BW.
The dashboard can be seen in figures 4 and 5. The initial access (fig. 4) reads data only from the write-optimized DSO that sits completely in-memory (November 2013). There are bars that indicate how much data has been selected in each storage (HANA and IQ); in the first access, it is approx. 2.8 million rows in HANA and 0 rows in IQ. The second drill-down (fig. 5) incorporates the data from Nov 2012 to Oct 2013 and, thus, accesses the write-optimized DSO in IQ as well: approx. 288 million rows are selected from there. While the first drill-down takes less than 1 second, the second step takes about 9 seconds.
Fig. 4: Result of drill step 1 in the demo dashboard.
Fig. 5: Result of drill step 2 in the demo dashboard.
The extended table feature is technically available with the following product versions:
Please check OSS note 1983178 for details, e.g. the initial pilot shipment.
So, what are the benefits? Fundamentally, this feature allows BW-on-HANA to handle PB-scale big data volumes at an attractive price point. This is specifically important in order to cater for cold data areas – e.g. of an EDW like staging (acquisition) or corporate memories.