We have seen how bringing as much data in HANA, SAP’s in-memory platform, as is needed for real-time decision support from any source is revolutionizing interactive analysis on live-data. HANA scales linearly across hundreds of nodes to analyze as much data as needed within a window of opportunity. Hasso’s demo of 4 billion records of retail data (start approx. at 1:01:30) at SAPPHIRE and Vishal’s previous announcement on 1 PB of raw data in main memory already show the scalability of HANA.
However, for information lifecycle reasons (e.g. archives, historical data, clickstream logs for years of web data or detailed machine logs, corporate data retention policies such as retaining all data for 7+ years for legal purposes) customers often use cold storage strategies and the challenge has been to smoothly and unrecognizably (to the outside) integrate such a cold storage area thereby allowing all of HANA’s functions to work on that data too if required.
In Amsterdam’s SAP Teched keynote, a dashboard demo was shown (start approx. at 41:00) that worked on top of a BW system with HANA as the underlying DB thereby leveraging a new, innovative feature inside HANA, namely extended tables. The latter are tables that logically sit in HANA and can be used as if they were normal HANA tables. However, they physically sit in a Sybase IQ server that is closely tied to that HANA system. This allows to provide an area for “cold data” – i.e. not frequently used but important data, e.g. in the corporate memory of an EDW – at an attractive price point at the expense of slightly decreased performance. The BW-HANA-IQ system holds ½ PB of data in total. This blog describes a little bit the background of that demo and the exposed feature.
Purpose: Cold Data Areas
Data warehouses typically have areas of cold data:
- The acquisition or landing area receives the data as it arrives from the source systems. It is accessed once to be refined and harmonized but then “waits” there to be deleted or to be archived. The latter takes place only after some days, weeks or months, i.e. whenever it is guaranteed that the data has been successfully incorporated (harmonized, refined, transformed) into higher (e.g. reporting) layers.
- Similarly, a corporate memory area captures the complete history of the loaded data. It is used as a source for reconstructions without the need to access the sources again. For instance, there are internet companies that keep 10-15 years of clickstream history in their corporate memory.
Data sitting in such cold areas – in real-world scenarios typically 40-60% of data volume of a data warehouse fall into this category – do not need to occupy main memory or other resources in HANA. It makes sense to provide an area within HANA and w/o any functional restrictions that match the usage profile of that type of data. This is what is referred to as extended storage. Technically, this is done by leveraging infrastructure of Sybase IQ; to the user, this is not visible.
So what is an extended table? In simple words: it is a table definition sitting in the HANA catalog but actually pointing to a table in a connected Sybase IQ server. The latter acts as an extended storage to HANA. An extended table is similar to a virtual table but there is more to it as it is more tightely integrated into HANA than just a virtual table, e.g.
- optimized data transfer between HANA and IQ – e.g. type conversions
- data processing is pushed to IQ
- monitoring in HANA Studio
- joint backup & recovery across HANA and IQ – i.e. as if it were one homogenous DB instance
Fig. 1: The extended table concept in HANA
Fig. 2: Creating an extended table from HANA Studio; BW creates them automatically (see fig. 3).
The Demo Scenario
The demo runs on a BW-on-HANA system with an IQ system connected as extended storage. The IQ system holds ½ PB of raw data (≈ CSV file data). In the demo, a simple dashboard was shown that was built with Design Studio. The dashboard runs unchanged on both, an iPad and a desktop browser. It uses a BW query that sits on top of a BW composite provider. The latter comprises 167 write-optimized DSOs, one for each fiscal period between January 2000 and November 2013. All write-optimized DSOs have been created with the extended table property in BW – see figure 3 – but the one for November 2013. The latter “lives” in HANA, i.e. in in-memory storage. Each write-optimized DSO holds approx. 2 billion rows, translating into approx. 320 billion rows in total for that composite provider.
Fig. 3: The extended table property for a write-optimized DSO in BW.
The dashboard can be seen in figures 4 and 5. The initial access (fig. 4) reads data only from the write-optimized DSO that sits completely in-memory (November 2013). There are bars that indicate how much data has been selected in each storage (HANA and IQ); in the first access, it is approx. 2.8 million rows in HANA and 0 rows in IQ. The second drill-down (fig. 5) incorporates the data from Nov 2012 to Oct 2013 and, thus, accesses the write-optimized DSO in IQ as well: approx. 288 million rows are selected from there. While the first drill-down takes less than 1 second, the second step takes about 9 seconds.
Fig. 4: Result of drill step 1 in the demo dashboard.
Fig. 5: Result of drill step 2 in the demo dashboard.
The extended table feature is technically available with the following product versions:
- BW 7.4 SP5 – Dec 2013
- HANA SP7 – Dec 2013
- IQ 16.0 SP02 – end of Nov 2013
Please check OSS note 1983178 for details, e.g. the initial pilot shipment.
So, what are the benefits? Fundamentally, this feature allows BW-on-HANA to handle PB-scale big data volumes at an attractive price point. This is specifically important in order to cater for cold data areas – e.g. of an EDW like staging (acquisition) or corporate memories.
There is obviously a number of fair questions on how this compares to the nearline storage (NLS) feature or active / not-active data concept. However, I will leave that for a separate blog.
This blog has been cross-published here. You can follow me on Twitter @tfxz.