For all the good things customers expect out of big data, they are also worried about the potential chaos. Value of big data needs to be delivered without chaos – and with absolute minimum disruption. That is a core engineering principle I and other colleagues in SAP hold close to our hearts when we make design decisions. Chaos can come from all the usual suspects – data quality, project management failures, hype, and so on. Those are not new and customers and SAP know how to deal with those as they are not big data specific. Big data might amplify these issues, but at least the solutions are not new and can be extended to solve the amplified problems.
There is another type of chaos that big data world needs to deal with – and it is one of those things that is genuinely a blessing and a curse. And that is the speed of innovation happening on this topic at SAP and other commercial vendors, as well as in the open source movement. So customers need to be very careful on which technology to bet on. Even if technology decisions can be reversed somehow, it is not always easy to drastically change direction mid way of a big data initiative.
One way to negate the impact of chaos is to start with applications rather than focus on a platform. Applications will shield customers from technology changes that happen behind the scenes. This is especially true of those big data apps that are deployed in a cloud model (loosely defined to include all kinds of cloud including hosting). If the apps are on premises, there still might be a need to apply patches and so on. But in general, apps are a good way to get value with minimum chaos.
Apps by themselves have some limitations – given their general “one size fits all” nature might not fit your specific needs. And if you need to extend an app – you need a platform you can safely bet on to remain stable, yet allow you to take the most advantage of all the awesome innovations happening in the world of big data. And that is essentially what we are trying to address via SAP HANA Data Platform.
I am not going to say anything more about real time – that is a given with HANA. And I expect everyone else to fully conform to real time soon as a minimum requirement ( or at least try to come as close to real time as their technology lets them). SAP saw the death of batch a few years ago, and now it is clear others have taken notice too. This is good for customers – and it is good for SAP. Competition is a good driver for rapid innovation.
If you develop an application on SAP HANA Data Platform, you can of course make use of all the innovations that are readily available in HANA – the MPP, the assorted libraries for predictive, business functions etc, the planning engine, and so on. But that is not all – the platform also lets you use data and processing of other SAP systems like Sybase IQ , other DB vendors like Teradata and opensource projects like Hadoop. This does not need the developer to know the inner workings of any of those systems – the developer can visualize the data in other systems as a table in HANA. Such integrations will keep on getting more and more sophiticated with each SP in HANA.
This is not just a one time development advantage – in the long term, this provides immense flexibility for customers, like
1. You can make use of the continuous innovations in the federated systems without impacting the HANA application in most cases
2. You can swap the federated systems to something else as technology progresses, without impacting the application too much
3. Have a consistent security model
and so on
This is not new – this philosophy is what led to BW and HANA Live and data marts to all share data without moving data across physically.
Vast majority of big data vendors are focussed on analytical applications today. This is true for opensource initiatives like Hadoop too. However, real business applications need full transactional abilities to go with the deep analytical capabilities. SAP HANA Data Platform will let you do that effectively – given it was designed ground up for mixed workloads. Building mixed workload applications on HANA natively while using the data residing within and outside HANA is way more efficient in both value and TCO terms than trying to tie them together in an additional application layer.
If for some good reason you do choose to build an app in a non-native fashion – you can of course use JDBC/ODBC etc. Our intention is to make sure we give developers as much flexibility as possible while safeguarding the stability of the platform which runs mission critical apps.
SAP Teched Las Vegas is just a few weeks away. We will have a lot of experts from SAP available there to explain the inner workings of the platform, and the big data applications. Till then, we can continue the discussion here via comments, or on email.