But what about data on what people really do look for in those online profiles? That information isn't as hard to come by as you might think, and the best source for the data, it turns out, is the online dating sites themselves.
“You cannot trust what people say. You have to watch what they do,” Amarnath Thombre, senior vice president of strategy and analytics at Match.com, told the audience during a session at last month's Predictive Analytics World conference.
Drawing from experience with millions of singles in 24 countries since its launch in 1995, the site has compiled a wealth of data about what factors tend to attract people to one another. From this information, it can predict the likelihood of similar attractions in the future.
Factors as simple as height and a desire for kids can be important in predicting whether members are likely to connect, said Thombre, estimating that 95 percent of relationships can be predicted by analyzing as few as 10 characteristics on each profile.
Some characteristics, such as smoking and politics, are polarizing. And here interesting patterns emerge. For example, Match.com’s statistics show that more people identifying themselves as Republicans in their profile are willing to connect with people identifying themselves as Democrats than the reverse.
Thombre also threw out another interesting and amusing statistic. For the most part, he said, members with accounts on Twitter, the micro-blogging platform famous for allowing messages of no more than 140 characters, have shorter relationships.
But is this data really measuring the frequency of love connections? Well, not really. In fact, the site measures any instance in which members exchange more than three emails as an “event” of significance.
Still, with an estimated one in every five relationships now starting online, dating sites are collecting volumes of information on factors influencing attraction that have never before been available. We’ve blogged previously, for example, about Stanford University studies that used data mining of social sites to predict such things as the individuals most likely to connect to each other on Facebook and which members on other sites were likely to be “friends” or “enemies.” (See: Getting Friendly With Facebook Analytics and The Pluses & Minuses of Social Analytics.)
Ignoring for a moment the issue of privacy, on which all social sites, including Match.com, have exhaustive policies, how might this data be used by miners in the future, and what might it be able to help us predict?