Bias is an inevitable, inherent part of analysis. From framing analysis through misleading questions to stopping analysis when the desired results are reached, there are many ways to skew how you look at data.
With the growth of Big Data, we have more to analyze. We’re using these analyses to make important decisions in business and other industries and since Big Data will be driving more critical decisions in the future, it is important to avoid bias. Here are six best practices for reducing data bias:
Ask the Right Questions
Starting with the right questions is essential in reducing bias. Don’t just ask the questions you are expecting positive answers to – that’s confirmation bias. Instead, consider what questions will impact the bottom line. As Bernard Marr says in his 2015 Forbes article, “A lot of data can generate lots of answers to things that don’t really matter; instead companies should be focusing on the big unanswered questions in their business and tackling them with data.”
Your questions should be specific and actionable, according to the Harvard Business Review.
Make Sure It’s Good, Clean Data
Sometimes called the “janitorial work” of big data, cleaning up and wrangling the data can be a huge part of the job. Making sure you have the right data, cleaned up correctly, before starting your analysis has always been important. This concept is not new – Charles Babbage, the first computer scientist, said:
“…I have been asked, — ‘Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?… I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.”
After you’ve identified that you are working with solid data, there are many ways to clean it up. One approach for dirty data includes thinking about it categorically. If you consider how data will be used by the business in every step of the development process, the data will be structured in a way that’s easier to analyze. Another approach is data unification, which provides a software system with the ability to collect, digest, de-duplicate and export data points from multiple sources.
However you choose to clean up your data, be aware that your methods can affect the results. Make sure outside agendas don’t impact your steps toward clean data, especially when dealing with unstructured data.
Review and Audit Regularly
To ensure that your analysis is objective, reviewing your efforts internally and by external, independent auditors is essential. In their 2017 report, Gartner suggests using independent auditors to examine whether or not you have bias. They also recommend establishing a board of standards for analytics and models, and regularly reviewing your analytical processes to determine whether or not they fit the criteria.
Unify Frames of Reference
As MIT Sloan Management Review contends in their 2015 article, sometimes it’s necessary to have a “Data Dictator.” Someone who will pay strict attention to where data comes from and what it’s called. Different starting points can lead to data being shaped to fit biases, so organizations should create a shared understanding of where the numbers come from and common vocabulary for describing different data points.
Look for Contradictions
As mentioned above, it’s important to avoid confirmation bias in analysis. Being skeptical of your results and looking for the contradictions can help reduce bias. In a post for Medium, Angela Bassa writes that data is always biased, measurements always have errors, and people always make assumptions. So, it’s important not to just trust the data or how it was collected:
“Skepticism is not a free pass to disregard data you disagree with. It’s a tool to ensure that the conclusions derived from data are reliable and do, in fact, reflect reality.”
Consider how your opponent would counter your point, what your response would be, and any assumptions you’re making in the process of your analysis.
Be Careful with Visualization
How you present the data is also an important part of analysis. Visualizing data in a misleading way to support your own ideas or obfuscate the main points creates a bias not to you, but for whoever you are presenting the analysis.