Why do Data Warehouse release cycles often exceed several months, and what can you do about it?
The time to market, the time it takes from requirements to end up productionized, is often notoriously long for a DW. In the previous post we talked about developing the HANA SQL DW using a git repository and containers, describing how the Web IDE for HANA enables agile DW development. That was one part of the answer. In this post we treat deployment, or in other words, “making the DW available for productive use”.
This blog series focuses on the agile DW, where the HANA SQL DW is used to develop and deploy iteratively. Therefore, we assume a wide definition of deployment, which ties into developing and testing activities, and includes deployment to testing environments.
The road to production can be unreliable and effort intensive
What are the most common issues in deploying a DW to production, which prevent iterative deployments? Well, I didn’t do any statistical analysis, but these are the main ones I came across:
- High effort and long duration of regression testing before deploy. As in a DW so many data flows touch other data flows, a few simple changes might require you to test all functionality to make sure nothing got broken. Those are usually manual tests, funnily dubbed “compare and stare”, and you could add “fall asleep” to that.
- Unreliable test quality, with the reason being the previous point, and not being able to cover all test cases well enough.
- High effort in creating a “build”. In making the deploy package, you must choose what to put into it, and what not. The effort is higher the more you have a “best-of-breed” solution where different tools have different repositories. How do you know which tables belong to which ETL flows?
- Unreliable productive deployments. I’m not talking bugs, just if at all the deploy succeeds. What if it turns out that the source table structure you assumed at your development/QA system are not like they are at production? Will deployment succeed? Will you fix and retry? The risk here is more effort, longer downtime, or rescheduling.
Now you can guess what the consequence is of the high effort involved in preparing a productive deployment, which might still fail and require the DW team to cancel their plans for the weekend. You don’t want to do this too often! This is one big reason why release cycles often exceed three, four months, and that’s again the reason why the time to market of new requirements is so high. This is where DW teams are between a rock and a hard place: the choice is high frequency and unreliable deploys, or low frequency and somewhat unreliable deploys.
Sit back and enjoy your iterative deployment
So how to make high quality and high-frequency productive deployments happen? This is what you can do:
- Automate regression testing. While developing new functionality, this should include writing new tests. Then you can run the whole collection of tests every night, or whenever a developer finishes a part of his work.
- Test a full deploy often, to detect issues early. Did you know you can test deploys to a production system, without changing or impacting what runs there productively?
- Organize development in “branches”, and run automated tests for each branch, starting with the feature branches. The outcome of these tests tells you if a feature is ready to go to the next stage. A small example of this is illustrated in Figure 1.
You will need to invest in setting up the automation tooling for this, and you will need to spend incremental effort in writing tests. But you can run these then as often as you want, without manual intervention, and that’s where the investment pays back.
Figure 1: branch strategy example (without release branch)
Automation requires a toolset that allows for it. The HANA SQL DW leverages the Web IDE, and this one certainly allows:
- Git is used as its repository. As 3rd party automation software nearly all support Git, you can hook these up easily to automate your test and deployment pipeline, triggered by selected Git events such as a code push for a feature.
- Web IDE has design time artifacts for ALL runtime artifacts needed to build a DW, including data definitions, and they all go into Git. No messy automation scenario’s where you still have to take care manually of data definitions.
- The XSA platform where Web IDE runs on, provides additional tooling for automating the build process, which again can be steered by any 3rd party automation tool.
So, the HANA SQL DW provides you with integrated tooling to build and automate a DW, assuming you run a 3rd party automation tool. Many companies run these already, and often they already tie in to the Git repository. You might have the need to integrate with 3rd party DW tooling as well. If these tools support the Git repository, and they can be steered by pipeline tooling, this should be fine. If they don’t, you might want to look for other tooling, as having multiple different repositories, each with their own way of working, make automation and life cycle management a lot more complex.
Figure 2: simplified example of test and deploy automation
The above illustration shows an example of the deployment steps you can automate using the HANA SQL DW and a 3rd party automation tool. You can see that any change, be it a change by a developer in his own isolated code, or adding a developed feature to a central branch, kicks off a test deployment.
We offer a First Guidance document for Git branching and automation integration that is specific to the HANA SQL DW, integrated with a 3rd party automation toolset. You can easily re-use the config and code for other automation software. This document is based on a best practice blog for automation on the XSA platform. The latter blog is not specific to DW, and there are some code differences, but it offers a good read outside of the DW context, and uses another automation tool as example that you might like.
If you want to run an agile DW and reduce time to market, then you should make sure your DW tooling allows for automation of tests and deployments. With that fulfilled, you can develop and deploy iteratively, because your tools allow you to repeat several steps without manual involvement. The SAP HANA SQL DW allows you to do this. We didn’t touch in detail on the testing topic – that’s maybe for another blog post.
For more info, visit the SAP HANA SQL Data Warehousing landing page.
VN:F [1.9.22_1171]Automated Deployment & Testing with SAP HANA SQL Data Warehousing,