Having discussed the system configurations for large sERP systems in my last blog, I want to continue today to explain why HANA is an ideal platform for applications in the cloud. Instead of reacting to the ranting of some scared competitors I like to go step by step through some of the unique features of HANA, making it the best, currently available, choice for software as a service. This applies to sERP on HEC, the business network applications, the customer and partner applications on HCP and the many new services from startups. This is my personal view as an academic and not an official position of SAP. As such it may not be complete and is based on my research experience at the HPI in Potsdam Germany.
Applications offered as a service promise an instant availability, easy configuration, elasticity when growing from 5 to 500 users, continuous improvement, connectivity to many other cloud services, guaranteed service level, a hassle free operation at lower costs. The service provider has to cope with these challenges by building the applications in the right way, really focus on ease of use, choose the hardware and the system software carefully, always with the above mentioned goals in mind. The services in the SAP Cloud vary from marketplace (business network) applications like ARIBA, Fieldglass, Concur to enterprise specific applications like sERP, B1, BYD or Success Factors. The underlying database has to handle large data sets as well as many smaller ones at low cost and good performance. HANA is an in memory database with mostly columnar store. Together with the new developments in hardware, where large amounts of memory can be shared by many CPUs with up 18 cores each (INTEL Haswell) it offers some very interesting features, which are especially helpful in the cloud.
- The columnar store architecture allows for the use of massive parallelism. Database transactions can be split up into multiple threads running on different cores in parallel. The number of parallel threads varies automatically with the data volume and the current system load. It is easily possible to change the resource provisioning for a given system. The parallelism enables larger analytical or simulation runs even in a multi user environment.
- The dictionary based compression (>5x) reduces the data footprint for all types of applications. All activities like initial data load, data backup, system restart or data reorganization benefit from this fact. The disk capacity can be reduced accordingly.
- The data is being converted into integer, which speeds up the database internal processing. As a consequence the higher throughputs allows the cloud application to serve more users for the same hardware costs.
- Columns of tables, which are not being populated, are not taking any space. This feature is important for generic applications serving a variety of industries and locations.
- New attributes can be added on the fly without disrupting the cloud services. In general developers on the HANA platform experience faster development and test cycles which is paramount for a permanent improvement of the cloud service.
- All columns of a table can work as an index. The scan speed is tremendously fast because of the compression and the integer encoding. The number of database indices is basically reduced to primary indices and a few secondary ones. The database management effort in HANA is much less than with conventional databases. Since data is either in row store or columnar store, no further administration has to take place, which greatly reduces the DBA workload.
- Via federation (SDA) HANA can access data stored in databases outside the system it runs on. Common data like countries, cities, measurements, weather information, historical data, maps etc can be shared by many applications in multiple systems.
- The speed of HANA makes older concepts like transactional aggregation, data cube roll up, materialized projections or views superfluous and thus accelerates data entry transactions. sERP sees another data footprint reduction by a factor 2 and a significant reduction in size and complexity of code. Other redundant data structures like star schema are not necessary any more and again reduce the inner complexity of applications and allow on the other hand for more flexibility. Algorithms are replacing complex data structures, can be changed any time or exist in different versions in parallel. This opens up completely new ways of designing and maintaining applications and simplifies the whole data model.
- Transactional ERP systems can be split into an actual part (data changes all the time) and a historical part (data changes only periodically). HANA will use different strategies for keeping data in memory. For historical data HANA uses a scale out approach, if the data volume requires multiple physical chassis. For the actual data (ca 20-30%) HANA uses a scale up approach via SMP across multiple chassis, if the data volume should require it. The size of the actual data footprint is 20-40x less than the original one on any database.
- For extreme high availability HANA replicates the actual data of enterprise systems and uses the replica(s) for read only requests. Only the actual data of an sERP system will be part of the daily backup routine. Since the historical part cannot change for longer periods of time no backup is required. This alone contributes to significant cost savings in the operation.
- HANA incorporates text and geospatial capabilities. Completely new applications are possible and make the cloud service offerings, together with HANA SDA more attractive.
- Sophisticated libraries for business functions, or planning-forecasting-and-simulation algorithms help simplify the application code.
- The combination of OLTP and OLAP in one single database on one single image (columnar) changes the way we build enterprise in the future. In case of sERP that means many of reports, planning activities and optimization algorithms, currently running in separate systems, can come back and share the transactional tables directly.
- HANA is fully compatible with ORACLE, IBM DB2, MS SQL Server and SAP ASE, only stored procedures have to be translated or rewritten for legal reasons.
- Multi-tenancy through the database for smaller application deployments or generic applications of the business network type will come soon, although the benefits in storage savings and operation costs are less visible in an ‘all in memory’ architecture. Virtualization will be used for test, development and smaller systems. The effect on large systems is not fully understood yet.
- With HANA the percentage of updates of all database activities are minimal and the remaining ones are done in insert-only mode, which nearly eliminates the need for database locks completely. All applications modifying tables benefit from this.
- An extreme high speed version of HANA, with a very small footprint for the deployment in data marts or on frontend clients, will ship very soon. The data exchange is in the compressed format. BI tools can use this version in addition to the direct access to HANA.
- The architecture of the data store supports the caching of intermediate results, while new data input is added automatically. This is very helpful closing procedures or a rapid sequence of varying queries of the same tables. Result sets from historical data remain unchanged by definition for longer periods of time, only the data from the actual partitions have to be recalculated. Analyzing point of sales, sensor, process or maintenance data at high speed is possible this way.
- Long running batch programs are omitted nearly completely, reducing the amount of management by the user and the service provider. The tasks run in real-time as normal transactions.
- A separate data warehouse, running only OLAP type applications is still valuable and well supported by HANA.
All these features make HANA a very attractive database for cloud based applications of any type. It is well understood, that the user of a cloud based service doesn’t care about the technical deployment, but will clearly experience the performance and ease of operation. After the transition of Success Factor or Ariba applications to the HANA platform the number of queries grew threefold, which gives us an indication how much speed determines ease of use. An in-memory database will soon be standard for cloud based applications, which gives cloud applications using HANA now a huge competitive advantage over databases, storing data both in row and columnar format simultaneously. In order to take advantage of all the HANA features the backend of the existing ERP applications had to be partially rewritten, partially moving application logic into stored procedures. Since sERP is only based on HANA this makes perfect sense. A large part of the ERP application code deals with transactional aggregation and the creation of redundant materialized views. All this code could be dropped in sERP on HANA. The majority of the read-only part of the code in sERP is new and getting a new UI as well. The new UI takes the speed of HANA into account wherever possible.
One of the breakthrough advantages of HANA lies in the possibility to run OLTP and OLAP type applications in one system. Even in data entry transactions more and more complex queries play a vital part of the application. But we have to understand that OLTP applications mostly run in a single threading mode, while OLAP applications achieve much of their performance improvements via parallelism.
Using the same database for both types of applications is one thing, to run them on one database instance is another one. In this case we have to manage the mixed workload. One way is to give the OLTP applications a higher priority and to restrict the parallelism of the OLAP for times with high system load. Instead a much better way is to use the hot standby replica for most of the read-only transactions, which are more than 90% of the total workload. Both the primary system and the replica work on the identical permanent storage system (ssd or disk). Now it becomes even more important to split the data into a partition where changes can take place (inserts, updates or deletes) – the actual data, and a partition where no changes are allowed any more – the historical data. Only the first one we have to replicate for high availability reasons. Here it shows how important the data footprint reduction for the actual data is. As a rule of thumb we should restrict the parallelism on the primary system to 25% of the available threads, while on the read-only replica we could use all the available parallelism to run a single OLAP application as fast as possible. The amount of parallelism depends on size of the tables queried. Since there are nearly no I/O waits any more, classic multi programming optimization doesn’t work that well. The mixed workload, with partial parallelism, is new to system optimization and SAP will monitor the behaviour in the cloud carefully and share tuning experience with on premise installations.
VN:F [1.9.22_1171]Why HANA is the right Platform for Applications in the Cloud,