What is Data Unification?

Data unification gives a software system the ability to collect, digest, de-duplicate and export millions of data points from multiple sources.

Data collection no longer presents a challenge for businesses. Departments that focus on collection, storage and extraction of data are as common as accounting and human resources.

The issue now has become what to do with that data.

Having millions of different data points on issues such as customer demographics, retail sales and marketing spend is all well and good. But leveraging massive amounts of data in a way that helps achieve strategic business goals is another matter.

The issue becomes more complicated because data is often “siloed” – in other words, stored in separate databases or in systems that limit access to a specific set of users. That prohibits the merging of data to mine for useful insights into past business or to create predictive models.

In these cases, data unification provides a solution.

Before understanding why data unification is needed, it’s important to understand what it can do.

Essentially, it’s an approach that gives a software system the ability to collect, digest, “de-duplicate” and export millions of data points from multiple sources. Both human programmers and machine learning play key roles.

Humans are needed to write the programs that direct software on how to collect, match and merge data. This is an involved process, sometimes taking as long as six months to accomplish. Machines then can take these directives and use them to collect and interpret data from multiple sources, merging it into a cohesive data set.

Why is Data Unification Needed?

A good way to illustrate the usefulness of data unification is an example from General Electric.

The global conglomerate has 80 different procurement systems. They don’t share information. Applying data unification has a potential big benefit for the company.

For example, a procurement officer buying supplies cannot see beyond her own database for information on past prices. With data unification of all the company’s procurement systems, she can see prices for the latest deals struck with the supplier and then attempt to get the best price for her purchase.

It will take years for General Electric to completely unify its many databases. But ultimately, tying together the information offers them huge cost benefits. The company projects it could amount to saving $1 billion a year.

The numbers won’t be as huge with most companies and, of course, small businesses. But the benefits remain the same.

Rules For Data Unification

For those considering data unification, here are a few rules to keep in mind.

Identify Best Use

Not technical, but perhaps the most important step. Business leaders need to determine how unification of data can further strategic goals and then follow that path. Otherwise, as with any project, the lack of a concrete objective will lead to wasted time and effort.


To the extent the technology allows, automate the unification process once the proper codes are written by people to “teach” the system which data to collect and unify.

Most databases use ETL (extraction, transformation and loading) systems. These systems rely on matching terms to extract and combine data. However, humans would need to write code that creates rules for every transaction involving millions of records from multiple sources. That’s simply not workable.

In data unification, machine learning must play the largest role in transforming and merging data. The investment in such systems can eventually pay off and dig companies out of the data hole that many have created over the past two decades of data collection – a situation where plenty of data is available, but not in a form that maximizes its potential.


Machine learning is powerful, but people still need to talk. Data experts writing the rules for unification must collaborate with subject experts before completing the code. In the above case with General Electric, a code writer may not know if suppliers with similar names are one supplier or two. However, a procurement officer will know right away. The project cannot just be handed off to data scientists, there must be collaboration with experts within the company on specific areas.

Machine Learning Advantage

Traditional ETL systems extract and store data typically from single sources. For example, retails sales or marketing spend. A schema is created beforehand, and programs are written for data sources mapped to the schema. Data is collected and cleaned (elimination of duplicate data, for example).

This schema-first method will not work in data unification. The amount of data is too vast, and things become more complex when combining internal data with external data.

While humans are needed to set up the system and augment when necessary, machine learning is required to handle the large datasets from multiple sources. Fortunately, technology has evolved to the point where machines can do the job with remarkable speed and precision.

More Examples

Despite the cost and time commitment, many companies have made the move into data unification in addition to General Electric.

Toyota Motors Europe, for example, is connecting customer data from hundreds of distributors across Europe. The company had separate databases for more than 10 million customers, but now has a unified database across the continent that allows for better customer service as they move through the customer journey. The ultimate goal for Toyota is customer satisfaction and loyalty, leading to further sales.

Information company Thomson Reuters also used machine learning to merge data from hundreds of different business acquisitions and the merger of Thomson and Reuters, which were separate companies until 2008. Machine learning sped up the process by months and reduced manual effort by 40%.


Please enter your comment!
Please enter your name here