In the first blog of this series, “Data Fragmentation: Individual Efficiency vs the Big Picture,” David Chaiken, the CTO of SAP Cloud Platform Big Data Services, spoke to us about the challenges data fragmentation poses to enterprises and their customers. David gave us insight on the core causes of fragmentation and the obstacles created both internally and externally as a result. He discussed the most effective ways to handle the phenomena and proposed how to resolve the limits imposed on businesses and their customers trying to get the most value from data.
David Chaiken related the virtues of automation used to improve data standardization and effective business analytics. Manual processing of data runs the risk of increased lag time and incorrect data. In this blog, David talked to us about the best practices of implementing automation to big data and how it increases extracted value.
What are four tips for automating big data?
Here are four principles that have proven successful when automating big data.
- Understand the business use cases for the system. What features are required to achieve the revenue, profitability, cost, or other important metrics? What features are not required for the automated system?
- Think about data quality from the outset. Answer questions like these: How do you know that the data is correct? Does the automated system have the right number of records flowing in? Does it have the right number of records flowing out? Are the business metrics of downstream systems within an acceptable range?
- Think about business continuity. What happens if the system stops functioning for one minute, one hour, one day, one week? What will you do when there is a malfunction?
- Prepare for change. What happens if your upstream data sources change without notice? What happens when customer requirements change? How will you introduce metadata (especially schema) changes into an operational pipeline?
Where can automation help with data complexity?
In environments where immediate access to data is paramount, replacing manual processes with automation is essential. Every time a human being is required to manipulate data within the pipeline, the risk of delayed and erroneous data increases. Timely access to data doesn’t always result in huge advantages, but automation is needed at every stage of the process when it does.
Automation also helps by allowing the humans involved in the process to enhance productivity and creativity. When data engineers, data scientists, and business analysts can stop dealing with the block-and-tackle of day-to-day ETL (Extract, Transform, and Load process) jobs, they have a lot more time to think about whether the data makes sense and how to find value in the data. The less drudgery involved in the job, the more time for the kind of artistry that makes our jobs interesting, rewarding, and profitable.
Automation neutralizes the operator error of manual processing, expedites data through the pipeline and frees up time to derive more value from analytical initiatives. There are many factors to be considered when implementing the automation of big data, but the pros outweigh the cons.
The SAP Cloud Platform Big Data Services blog team would like to thank David Chaiken for taking the time to speak to us and sharing his views on the challenges of data fragmentation and the benefits of automation confronting today’s enterprises. If you missed the first installment of this blog series, check it out here.