Big Data: The Method Behind the Matching

Behind the scenes, dating apps are all doing essentially the same thing – using big data to match people with potential lifelong partners.

Millions of single Americans are pouring their hearts out onto the internet in search of their soulmate.

For many, the traditional dating scene consists of awkward encounters, broken promises and lonely nights, which is why the concept of online dating was created – to make it easier to find the perfect mate.

Revenue for the online dating industry has now surpassed $3 billion, according to market research firm IBIS World, a billion dollar increase from its 2014 analysis. That’s because for every 10 Americans, one of them has used an online dating site or a mobile dating app.

The major players include,, and eHarmony – all of which promise long-lasting relationships.

Then there are niche sites like (for Christians seeking other single people), (for single people over 50 looking for a serious relationship), and (for Jewish singles) – all of these offer a unique value proposition to a specific group of consumers.

But the leader among mobile dating apps is Tinder, with no other apps coming close to its market share. Hinge, Zoosk, OKCupid, Bumble, Happn, JSwipe and The League all offer competition to Tinder, but the Los Angeles based giant boasts 50 million users, making close to 1 billion swipes and 12 million matches per day, according to a study from UC Berkeley.

While each of these sites or apps offer different features and value propositions to different audiences, behind the scenes they’re all doing essentially the same thing – using big data to match people with potential lifelong partners.

Generating, Collecting and Analyzing Big Data

The online dating ecosystem is generating massive amounts of data every day., for example, has estimated that it has harvested more than 70,000 gigabytes of customer data.

Each site or app tends to differ slightly on how they gather data from consumers. Most sites gather the bulk of their information using a questionnaire, which usually asks about a user’s likes, dislikes, interests or hobbies. The number of questions asked can depend on what service the user has selected, but some sites ask as many as 400 in the hopes that volume will yield better results.

The main problem that arises with questionnaires is – people lie. They try to present themselves in what they believe to be a better light by providing information that isn’t always completely accurate. According to an article from the BBC, men typically lie about their age, height and income, while women tend to lie about their age, weight and body build.

In some cases, people may provide inaccurate information unintentionally. For example, a user may believe that they watch action movies most of the time, but an analysis of their Netflix history might provide a more accurate picture of their movie-watching habits.

In some rare cases, it’s possible to not only manipulate the site through lies, but through coding and a bit of natural language processing, as was the case with data scientist Chris McKinlay.

Whether deception is intentional or unintentional, inaccurate information is a problem for sites because it often leads to incorrect matches. To solve this, dating agencies are exploring other ways to supplement user-provided data with information gathered from other sources.

With the user’s permission, dating services are beginning to access large amounts of data from sources including a user’s browsing history, TV streaming habits, and even online shopping history from websites like Amazon.

The process, known as collaborative filtering, looks at this data, but doesn’t use it to supplement other information gathered from questionnaires. Instead, it uses that information to recommend partners the same way websites like Amazon or Netflix suggest products or movies, based on what customers with similar preferences also liked.

Gathering data about consumers in this way removes human error, a major weakness for online dating sites.

Big Data and Online Dating in Action

After consumer data has been gathered and analyzed, it is typically compiled in some sort of relational database management system (RDBMS) and/or NoSQL database, mechanisms used for the storage and retrieval of data. It is then organized using a variety of different algorithms which can predict the best match.

Each dating site or app has created their own set of algorithms to match users with potential partners. Let’s take a closer look at some of the techniques used by three of the biggest players in the industry:

Match first asks users to fill out a questionnaire consisting of 15 to 100 questions. Points are allocated to them based on pre-defined parameters such as religion, income, education, age, hair color, etc. Users are then matched with people on the site who have a similar number of points.

Match then uses advanced analytics to identify any discrepancies in what people actually do on the site compared to the information they provided in their questionnaire. If discrepancies are found, the match making algorithms adjust results for compatible matches based on their behavior.

Going even further to remove any risk associated with determining the accuracy of online dating data, Match has started using facial recognition technology to pinpoint the specific category of matches that users prefer and then highlight the features that users tend to be more attracted to.

Data scientists at Match believe that even if people are not so specific about a person’s height, hair color, weight or race, they have a type of facial shape in mind when it comes to their ideal partner. Using facial feature analysis, Match aims to find a person’s “type,” so they can be paired up with people that fit into that specific category.

With over 21 million members on the site, Match boasts that of couples who have met on a dating site, 30% have met on Match, and 42% of their matches result in dates while 35% result in relationships that last three months or longer.


The eHarmony system matches men and women based on “29 dimensions of compatibility.” Each dataset on the site is made up of 4,000 gigabytes of data, not including photos the user has uploaded. This includes the answers to a questionnaire, which can include as many as 400 questions, and the behavior of users on the site such as how many pictures they upload, how often they log in and what kind of profiles they frequently visit.

The system is built on an open-source cross-platform known as MongoDB, which allows matches to be made in less than 12 hours. Multiple algorithms provide machine learning power capable of processing over one billion matches each day.

With more than 15.5 million members, eHarmony claims its matching technique is responsible for 5% of all marriages in the U.S. Additionally, only 3.86% of eHarmony marriages result in divorce, which is the lowest percentage of all dating sites.


Mobile dating apps like Tinder tend to be less involved when compared to more traditional online dating sites. Tinder is known as a casual dating site that requires users to make split second decisions when determining if they like a potential match or not.

Users on the mobile app look at vague previews of another user’s profile, and can swipe right if they believe them to be a match. If the potential match also swipes right, then a match is made and both users are notified.

Currently, Tinder is using software from a startup called Interana, which was released in 2014. Essentially, the software allows Tinder employees to run queries over all kinds of data while decreasing query time from hours to less than a second. Before adopting Interana, Tinder had been using legacy analytics software, which was completely overwhelmed by the company’s meteoric growth.

Today, Interana is everywhere inside Tinder, troubleshooting network connectivity issues and measuring the effectiveness of social media partnerships.

Recently, Tinder experienced a problem with too many of their users swiping right to maximize their chances of finding a match. Vice President of technology at Tinder, Dan Gould, explained to that doing this “decreases the value of the swipe.”

To correct this problem, Tinder set a limit on the number of right swipes a user could make in a day. Then they carefully watched the profiles of the users that were swiping right the most to see whether they got upset and left, if they would stop using the app for a while, or if they would just adapt to the new rule. Ultimately, these users corrected their swiping habits and continued using the app.


Please enter your comment!
Please enter your name here