HANA Meets “The SIMs”; SIMD, SIMplicity and SIMulation

Posted by Henry Cook on March 19, 2014

Principal Consultant

More by this author

We know that the design of HANA marks a fundamental departure from traditional information system design, starting witha clean sheet of paper and designiHANA SIMD Cache Vertical.jpgng for modern multi-core CPU’s. This provides performance that is simply startling, far in excess of what we see from traditional systems. This in turn provides the opportunity to radically simplify our systems and to provide greatly enhanced personal productivity for our users and developers.

SIMD

Single Instruction Multiple Data or SIMD processing is one of the key concepts used to provide this enhanced performance and productivity, from this flows many of the special capabilities of HANA. This sounds complicated but in fact is quite straightforward. We know that a while ago we found we couldn’t make our computers go any faster, but we could continue to double the number of transistors on a chip every eighteen months or so. Simply put SIMD, and the other features of modern chips that support it explains what we did with all those extra transistors.

On the left we see a depiction of a modern CPU. A CPU chip has many cores. Also the CPU chip has special very fast memory that can be used to keep the fast registers in each core fed with data. Each core has a Level 1 and Level 2 cache, and all the cores share a Level 3 cache on the chip.

As we remarked above we can do all of this because we’ve been able to double the number of transistors on a chip every eighteen months for many years, as per Moore’s Law. The latest generation of chips has over 2.6 billion transistors. We’ve deployed all those extra transistors implementing our multiple cores and memory caches. We’ve also used them to widen our registers, from 8 bits to 16 bits then 32 bits then 64 and finally 128 or 256 bits.

But suppose we are computing using 100,000,000 32 bit values, originally we’d take each of those individual values load them into a register, do something to them, then store them back somewhere. We’d do this 100,000,000 times for each our our single scalar values. But if wev’e got a 128 bit register available then, if we add a bit of circuitry (more transistors, which we have plenty of) and also add some special instructions, how about if we could load those values 4 at a time, operate on them 4 at a time and then store them back 4 at a time ? – that is exactly what we do with SIMD, we operate on multiple data values at once with a single computer instruction, hence the name.

Our computer chip is ticking away at 3.4 billion ticks a second and even if an instruction takes a few ticks how do we keep the registers constantly fed with data ? – we don’t have time to go all the way out to DRAM that’d take too long – so that’s where the on-chip caches come in. If we can keep them filled with data then we can keep the registers fed. This is what gives us our phenomenal speed, for example 3.2 billion item scans / core / second.

Similarly by designing our algorithms for aggregation to make use of the local cache’s the we can achieve 10m aggregations / core / second. Bear in mind we can have many CPU’s in the system and many cores, and we can spread our computation over them running many parts of the computation in parallel, so this technique multiplies the benefits of using Massively Parallel Processing (MPP) .

But to do this successfully we also need to load and compute multiple values at a time, and this is where our column store and dictionary compression comes in. Our data values are tightly packed in columns and represented as binary data, ideal for taking advantage of SIMD processing. (BTW this is the main reason that HANA uses column stores, not just to get good compression – though that is a bonus and adds its own advantages). This brings great efficiency to scans, filtering, joins and aggregation – the exact work that databases do so much of.

Simplification

The point is that by building a system from the ground up to take full advantage of modern CPU architectures by using column stores, dictionary compression, cache awareness, SIMD and MPP we can process data thousands of times faster than traditional systems.

This means, in turn, that we can dispense with the need for pre-aggregation and indexing, radically simplifying system design. The kind of radical simplification we can achieve is described in this blog by Hasso: where he outlines how 95% plus of the footprint (that is cost and complexity) of a major application can be removed.

Its worth reading Hasso’s blog through a few times for the implications to properly sink in. The implications for simplification, cost savings and productivity are very significant. Also, as you read it don’t just think of the specifics of using SAP’s Finance app – but think of it as an example of the radical simplification that can be effected for any application, whether it belongs to SAP or it’s a custom development of your own.

Simulation

This major simplification and performance boost can result in the ability to perform simulations that might otherwise be prohibited, either because of performance or cost. Take a look here SAP Executive Keynote Las Vegas 2013: Dr. Vishal Sikka – YouTube to see a demonstration of a manufacturing application enabled by HANA. Watch for the moment where the application shows that there will be a problem with production so that orders cannot be met – but it also displays the results of four simulations that it has done of four different ways in which the problem can be resolved! It presents these to the user in order that s/he can choose which one to go with. Blog The Sims MRP Diagram 01.jpgThen imagine the effort that would go into generating those four options manually on a traditional system rather than having them presented immediately, as a set, and all within the same transaction. What this shows is that doing complex simulations to show the impact of a transaction will become a normal part of transactional working, this is something that is prohobitive in performance and cost terms for most systems right now, yet with a HANA enabled system this is both natural and highly productive.

As an aside, this shows massive compute power being deployed but at very low cost. The system is running on commodity Intel hardware. This is a a major advantage of in-memory systems; extremely low cost processing. Sometimes in-memory systems are perceived as being expensive, because memory is more expensive than disk on a “per Gigabyte” basis. But if we look at this from the point of view of cost of processing and not cost of storage we see that in-memory processing enables us to perform useful processing much, much more cost effectively than any disk based system can do. We should remember that when we buy a computer system it is not just to store data but, more often, to perform useful work. An in-memory system like HANA can potentially do hundreds or thousands of times more useful work for a comparable price, and this is clearly a bargain. Prediction: The regular use of simulations in interactive applications will become commonplace not just because they are possible in performance terms but also because it is now very affordable to incorporate them.

VN:F [1.9.22_1171]
Average User Rating
Rating: 0.0/5 (0 votes cast)

3843 Views