The SAP blog team recently sat down with David Chaiken, the CTO of SAP Cloud Platform Big Data Services (BDS), to talk about the challenges that data fragmentation poses to enterprise customers. David Chaiken was the CTO of Altiscale, a fully-managed Big Data-as-a-Service provider, that was acquired by SAP in September of 2016. David previously served as the Chief Architect of Yahoo!, where he worked on several projects related to Hadoop and big data.
Data fragmentation is a critical challenge facing companies today. It occurs when conceptually-related data sets originate in different places in a company. The general problem with data fragmentation is that it limits the ability of a company to benefit from business intelligence across the enterprise. This issue is particularly complicated in the case of large, unstructured data sets. We went to David Chaiken to get his viewpoint on the fragmentation-related challenges that enterprises are struggling to overcome, and what can be done to improve an enterprise’s ability to discover value in the flood of data generated by the connected world.
This is the first of a two-part series. The second part of this blog series will focus on using Data Automation to overcome fragmentation challenges.
What are the biggest challenges around data fragmentation and how can businesses resolve them?
The biggest challenge with data fragmentation is that a business’ data naturally gets siloed when employees are trying to get their job done efficiently, because it’s almost always faster to avoid dependencies on other groups. By building an application-specific (or business-unit-specific) data store with its own master data, schemas, metrics, and performance guarantees, individual employees can control the way that their data is processed. They can get their work done much more quickly than if they need to work across business organizations.
The problem is that the road to Hell is paved with good intentions. Although individual employees and business units find themselves working faster, it’s much harder to leverage the data across the corporation. For example, if the CEO is considering investment proposals from two different business units, the proposals may use different mathematical formulas for computing profitability, revenue, and other important metrics. This kind of issue makes it very hard to do apples-to-apples comparisons when making corporate-level decisions, and can be very frustrating to top management.
Similarly frustrating, from the customer’s point of view, is when they get lost in the shuffle of different business units. For companies with multiple properties or business units, data silos can make it very hard to get a 360-degree profile of each customer. Imagine a banking support call that ends with a customer saying something like, “Yes, I know that I only have a $2,500 limit on this credit card, but I also have a $350,000 investment account! Do you really want to lose $350,000 of business due to our disagreement over a $10 service charge?”
What are the best ways to handle fragmented data?
Understanding and implementing the business cases for dealing with fragmented data is key. It is all too often that the IT department starts a grand unification data-lake project that will take years to deliver business value (if it ever delivers value). We have found that it is much more effective to deliver business value as each organization does the work to adopt corporate-wide data infrastructure and standards. Look for specific, important business metrics that can be improved by unifying data from different sources. Will an ecommerce portal get higher revenue if customers see better offers? Can manufacturing bottlenecks be eliminated by providing product demand information to raw materials vendors? Can you find a new source of upsell opportunities for your sales force? Pick a use case, get it done, demonstrate success, and iterate.
How can metadata help with data fragmentation and automation?
Fragmented data can be used holistically, but only to the extent that it can be aggregated and joined across different sources. Mathematically speaking, that’s only possible to the extent that the metadata (schema, lineage, and metrics) of the data is standardized across different sources. For example, let’s say that a company has one division that sells vitamins and other division that sells painkillers. Simply consolidating the revenue from these two divisions might be very misleading if the profitability of vitamins is really low, while the profitability of pain killers is really high. Getting the metrics right is important for understanding the two businesses and how they might relate to each other. Metadata that results in a consistent business view is hard work, because it involves a deep understanding of both the data and the business, plus the iron will required to get different organizations to conform to unifying standards.
Manual processes are, of course, the eternal enemy of data standardization and effective business analytics. Every time a human being needs to touch data to move it from one place to another, modify the data, or join different data sources, the risk of delayed and incorrect data increases. Automated data movement, extraction, transformation, joins, and distribution are critical to getting value out of the data. We’ve worked with sales organizations in the online advertising space that live or die by their daily reports — not to mention advertising account managers who watch hourly reports like hawks. It’s not always the case that timely access to data confers huge advantages, but every step of the data pipeline needs to be automated when it does. Metadata is an important part of the specification of the automated systems: without the metadata, it’s hard (if not impossible) for the outputs of the automated system to be correct.
Learn more about how automation can help address the challenge of data fragmentation in the next blog interview with David, “Big Data Automation: Bringing the Pieces Together.”