What can we learn about ourselves from the things we ask online? US data scientist Seth Stephens‑Davidowitz analysed anonymous Google search results, uncovering disturbing truths about our desires, beliefs and prejudices, reports the Guardian:
Everybody lies. People lie about how many drinks they had on the way home. They lie about how often they go to the gym, how much those new shoes cost, whether they read that book. They call in sick when they’re not. They say they’ll be in touch when they won’t. They say it’s not about you when it is. They say they love you when they don’t. They say they’re happy while in the dumps. They say they like women when they really like men. People lie to friends. They lie to bosses. They lie to kids. They lie to parents. They lie to doctors. They lie to husbands. They lie to wives. They lie to themselves. And they damn sure lie to surveys. Here’s my brief survey for you:
Have you ever cheated in an exam?
Have you ever fantasised about killing someone?
Were you tempted to lie?
Many people underreport embarrassing behaviours and thoughts on surveys. They want to look good, even though most surveys are anonymous. This is called social desirability bias. An important paper in 1950 provided powerful evidence of how surveys can fall victim to such bias. Researchers collected data, from official sources, on the residents of Denver: what percentage of them voted, gave to charity, and owned a library card. They then surveyed the residents to see if the percentages would match. The results were, at the time, shocking. What the residents reported to the surveys was very different from the data the researchers had gathered. Even though nobody gave their names, people, in large numbers, exaggerated their voter registration status, voting behaviour, and charitable giving.
The word 'gay' is 10% more likely to complete searches that begin 'Is my husband...' than the word 'cheating'
Has anything changed in 65 years? In the age of the internet, not owning a library card is no longer embarrassing. But, while what’s embarrassing or desirable may have changed, people’s tendency to deceive pollsters remains strong. A recent survey asked University of Maryland graduates various questions about their college experience. The answers were compared with official records. People consistently gave wrong information, in ways that made them look good. Fewer than 2% reported that they graduated with lower than a 2.5 GPA (grade point average). In reality, about 11% did. And 44% said they had donated to the university in the past year. In reality, about 28% did.
Then there’s that odd habit we sometimes have of lying to ourselves. Lying to oneself may explain why so many people say they are above average. How big is this problem? More than 40% of one company’s engineers said they are in the top 5%. More than 90% of college professors say they do above-average work. One-quarter of high school seniors think they are in the top 1% in their ability to get along with other people. If you are deluding yourself, you can’t be honest in a survey.
The more impersonal the conditions, the more honest people will be. For eliciting truthful answers, internet surveys are better than phone surveys, which are better than in-person surveys. People will admit more if they are alone than if others are in the room with them. However, on sensitive topics, every survey method will elicit substantial misreporting. People have no incentive to tell surveys the truth.
How, therefore, can we learn what our fellow humans are really thinking and doing? Big data. Certain online sources get people to admit things they would not admit anywhere else. They serve as a digital truth serum. Think of Google searches. Remember the conditions that make people more honest. Online? Check. Alone? Check. No person administering a survey? Check.
The power in Google data is that people tell the giant search engine things they might not tell anyone else. Google was invented so that people could learn about the world, not so researchers could learn about people, but it turns out the trails we leave as we seek knowledge on the internet are tremendously revealing.
I have spent the past four years analysing anonymous Google data. The revelations have kept coming. Mental illness, human sexuality, abortion, religion, health. Not exactly small topics, and this dataset, which didn’t exist a couple of decades ago, offered surprising new perspectives on all of them. I am now convinced that Google searches are the most important dataset ever collected on the human psyche.
The Truth About Sex
How many American men are gay? This is a regular question in sexuality research. Yet it has been among the toughest questions for social scientists to answer. Psychologists no longer believe Alfred Kinsey’s famous estimate – based on surveys that oversampled prisoners and prostitutes – that 10% of American men are gay. Representative surveys now tell us about 2% to 3% are. But sexual preference has long been among the subjects upon which people have tended to lie. I think I can use big data to give a better answer to this question than we have ever had.
First, more on that survey data. Surveys tell us there are far more gay men in tolerant states than intolerant states. For example, according to a Gallup survey, the proportion of the population that is gay is almost twice as high in Rhode Island, the state with the highest support for gay marriage, than Mississippi, the state with the lowest support for gay marriage. There are two likely explanations for this. First, gay men born in intolerant states may move to tolerant states. Second, gay men in intolerant states may not divulge that they are gay. Some insight into explanation number one – gay mobility – can be gleaned from another big data source: Facebook, which allows users to list what gender they are interested in. About 2.5% of male Facebook users who list a gender of interest say they are interested in men; that corresponds roughly with what the surveys indicate.
And Facebook too shows big differences in the gay population in states with high versus low tolerance: Facebook has the gay population more than twice as high in Rhode Island as in Mississippi. Facebook also can provide information on how people move around. I was able to code the home town of a sample of openly gay Facebook users. This allowed me to directly estimate how many gay men move out of intolerant states into more tolerant parts of the country. The answer? There is clearly some mobility – from Oklahoma City to San Francisco, for example. But I estimate that men moving to someplace more open-minded can explain less than half of the difference in the openly gay population in tolerant versus intolerant states.
If mobility cannot fully explain why some states have so many more openly gay men, the closet must be playing a big role. Which brings us back to Google, with which so many people have proved willing to share so much.
Countrywide, I estimate – using data from Google searches and Google AdWords – that about 5% of male porn searches are for gay-male porn. Overall, there are more gay porn searches in tolerant states compared with intolerant states. In Mississippi, I estimate that 4.8% of male porn searches are for gay porn, far higher than the numbers suggested by either surveys or Facebook and reasonably close to the 5.2% of pornography searches that are for gay porn in Rhode Island.
So how many American men are gay? This measure of pornography searches by men – roughly 5% are same-sex – seems a reasonable estimate of the true size of the gay population in the United States. Five per cent of American men being gay is an estimate, of course. Some men are bisexual; some – especially when young – are not sure what they are. Obviously, you can’t count this as precisely as you might the number of people who vote or attend a movie. But one consequence of my estimate is clear: an awful lot of men in the United States, particularly in intolerant states, are still in the closet. They don’t reveal their sexual preferences on Facebook. They don’t admit it on surveys. And, in many cases, they may even be married to women.
It turns out that wives suspect their husbands of being gay rather frequently. They demonstrate that suspicion in the surprisingly common search: “Is my husband gay?” The word “gay” is 10% more likely to complete searches that begin “Is my husband...” than the second-place word, “cheating”. It is eight times more common than “an alcoholic” and 10 times more common than “depressed”.
Most tellingly perhaps, searches questioning a husband’s sexuality are far more prevalent in the least tolerant regions. The states with the highest percentage of women asking this question are South Carolina and Louisiana. In fact, in 21 of the 25 states where this question is most frequently asked, support for gay marriage is lower than the national average.
Closets are not just repositories of fantasies. When it comes to sex, people keep many secrets – about how much they are having, for example. Americans report using far more condoms than are sold every year. You might therefore think this means they are just saying they use condoms more often during sex than they actually do. The evidence suggests they also exaggerate how frequently they are having sex to begin with. About 11% of women between the ages of 15 and 44 say they are sexually active, not currently pregnant, and not using contraception. Even with relatively conservative assumptions about how many times they are having sex, scientists would expect 10% of them to become pregnant every month. But this would already be more than the total number of pregnancies in the United States (which is one in 113 women of childbearing age).
In our sex-obsessed culture it can be hard to admit that you are just not having that much. But if you’re looking for understanding or advice, you have, once again, an incentive to tell Google. On Google, there are 16 times more complaints about a spouse not wanting sex than about a married partner not being willing to talk. There are five-and-a-half times more complaints about an unmarried partner not wanting sex than an unmarried partner refusing to text back.
And Google searches suggest a surprising culprit for many of these sexless relationships. There are twice as many complaints that a boyfriend won’t have sex than that a girlfriend won’t have sex. By far, the number one search complaint about a boyfriend is “My boyfriend won’t have sex with me.” (Google searches are not broken down by gender, but since the previous analysis said that 95% of men are straight, we can guess that not many “boyfriend” searches are coming from men.)
How should we interpret this? Does this really imply that boyfriends withhold sex more than girlfriends? Not necessarily. As mentioned earlier, Google searches can be biased in favour of stuff people are uptight talking about. Men may feel more comfortable telling their friends about their girlfriend’s lack of sexual interest than women are telling their friends about their boyfriend’s. Still, even if the Google data does not imply that boyfriends are really twice as likely to avoid sex as girlfriends, it does suggest that boyfriends avoiding sex is more common than people let on.
Google data also suggests a reason people may be avoiding sex so frequently: enormous anxiety, with much of it misplaced. Start with men’s anxieties. It isn’t news that men worry about how well endowed they are, but the degree of this worry is rather profound. Men Google more questions about their sexual organ than any other body part: more than about their lungs, liver, feet, ears, nose, throat, and brain combined. Men conduct more searches for how to make their penises bigger than how to tune a guitar, make an omelette, or change a tyre. Men’s top Googled concern about steroids isn’t whether they may damage their health but whether taking them might diminish the size of their penis. Men’s top Googled question related to how their body or mind would change as they aged was whether their penis would get smaller.
Do women care about penis size? Rarely, according to Google searches. For every search women make about a partner’s phallus, men make roughly 170 searches about their own. True, on the rare occasions women do express concerns about a partner’s penis, it is frequently about its size, but not necessarily that it’s small. More than 40% of complaints about a partner’s penis size say that it’s too big. “Pain” is the most Googled word used in searches with the phrase “___ during sex.” Yet only 1% of men’s searches looking to change their penis size are seeking information on how to make it smaller.
Google search data can give us a minute-by-minute peek into eruptions of hate-fuelled rage
Men’s second most common sex question is how to make their sexual encounters longer. Once again, the insecurities of men do not appear to match the concerns of women. There are roughly the same number of searches asking how to make a boyfriend climax more quickly as climax more slowly. In fact, the most common concern women have related to a boyfriend’s orgasm isn’t about when it happened but why it isn’t happening at all.
We don’t often talk about body image issues when it comes to men. And while it’s true that overall interest in personal appearance skews female, it’s not as lopsided as stereotypes would suggest. According to my analysis of Google AdWords, which measures the websites people visit, interest in beauty and fitness is 42% male, weight loss is 33% male, and cosmetic surgery is 39% male. Among all searches with “how to” related to breasts, about 20% ask how to get rid of man breasts.
The Truth About Hate and Prejudice
Sex and romance are hardly the only topics cloaked in shame and, therefore, not the only topics about which people keep secrets. Many people are, for good reason, inclined to keep their prejudices to themselves. I suppose you could call it progress that many people today feel they will be judged if they admit they judge other people based on their ethnicity, sexual orientation, or religion. But many Americans still do. You can see this on Google, where users sometimes ask questions such as “Why are black people rude?” or “Why are Jews evil?”
A few patterns among these stereotypes stand out. For example, African Americans are the only group that faces a “rude” stereotype. Nearly every group is a victim of a “stupid” stereotype; the only two that are not: Jews and Muslims. The “evil” stereotype is applied to Jews, Muslims, and gay people but not black people, Mexicans, Asians, and Christians. Muslims are the only group stereotyped as terrorists. When a Muslim American plays into this stereotype, the response can be instantaneous and vicious. Google search data can give us a minute-by-minute peek into such eruptions of hate-fuelled rage.
Consider what happened shortly after the mass shooting in San Bernardino, California, on 2 December, 2015. That morning, Rizwan Farook and Tashfeen Malik entered a meeting of Farook’s co-workers armed with semi-automatic pistols and semi-automatic rifles and murdered 14 people. That evening, minutes after the media first reported one of the shooters’ Muslim-sounding names, a disturbing number of Californians decided what they wanted to do with Muslims: kill them. The top Google search in California with the word “Muslims” in it at the time was “kill Muslims”. And overall, Americans searched for the phrase “kill Muslims” with about the same frequency that they searched for “martini recipe” and “migraine symptoms”.
In the days following the San Bernardino attack, for every American concerned with “Islamophobia”, another was searching for “kill Muslims”. While hate searches were approximately 20% of all searches about Muslims before the attack, more than half of all search volume about Muslims became hateful in the hours that followed it. And this minute-by-minute search data can tell us how difficult it can be to calm this rage.
Four days after the shooting, President Obama gave a prime-time address to the country. He wanted to reassure Americans that the government could both stop terrorism and, perhaps more importantly, quiet this dangerous Islamophobia. Obama appealed to our better angels, speaking of the importance of inclusion and tolerance. The rhetoric was powerful and moving. The Los Angeles Times praised Obama for “[warning] against allowing fear to cloud our judgment”. The New York Times called the speech both “tough” and “calming”. The website ThinkProgress praised it as “a necessary tool of good governance, geared towards saving the lives of Muslim Americans”. Obama’s speech, in other words, was judged a major success. But was it?
Google search data suggests otherwise. Together with Evan Soltas, then at Princeton, I examined the data. In his speech, the president said: “It is the responsibility of all Americans – of every faith – to reject discrimination.” But searches calling Muslims “terrorists”, “bad”, “violent”, and “evil” doubled during and shortly after the speech. President Obama also said: “It is our responsibility to reject religious tests on who we admit into this country.” But negative searches about Syrian refugees, a mostly Muslim group then desperately looking for a safe haven, rose 60%, while searches asking how to help Syrian refugees dropped 35%. Obama asked Americans to “not forget that freedom is more powerful than fear”. Yet searches for “kill Muslims” tripled during his speech. In fact, just about every negative search we could think to test regarding Muslims shot up during and after Obama’s speech, and just about every positive search we could think to test declined.
In other words, Obama seemed to say all the right things. But new data from the internet, offering digital truth serum, suggested that the speech actually backfired in its main goal. Instead of calming the angry mob, as everybody thought he was doing, the internet data tells us that Obama actually inflamed it. Sometimes we need internet data to correct our instinct to pat ourselves on the back.
So what should Obama have said to quell this particular form of hatred currently so virulent in America? We’ll circle back to that later. First we’re going to take a look at an age-old vein of prejudice in the United States, the form of hate that in fact stands out above the rest, the one that has been the most destructive and the topic of the research that began this book. In my work with Google search data, the single most telling fact I have found regarding hate on the internet is the popularity of the word “nigger”.
Either singular or in its plural form, the word is included in 7m American searches every year. (Again, the word used in rap songs is almost always “nigga”, not “nigger”, so there’s no significant impact from hip-hop lyrics to account for.) Searches for “nigger jokes” are 17 times more common than searches for “kike jokes”, “gook jokes”, “spic jokes”, “chink jokes”, and “fag jokes” combined. When are these searches most common? Whenever African Americans are in the news. Among the periods when such searches were highest was the immediate aftermath of Hurricane Katrina in 2005, when television and newspapers showed images of desperate black people in New Orleans struggling for their survival. They also shot up during Obama’s first election. And searches rose on average about 30% on Martin Luther King Jr Day.
The frightening ubiquity of this racial slur throws into doubt some current understandings of racism. Any theory of racism has to explain a big puzzle in America. On the one hand, the overwhelming majority of black Americans think they suffer from prejudice – and they have ample evidence of discrimination in police stops, job interviews, and jury decisions. On the other hand, very few white Americans will admit to being racist. The dominant explanation among political scientists recently has been that this is due, in large part, to widespread implicit prejudice. White Americans may mean well, this theory goes, but they have a subconscious bias, which influences their treatment of black Americans.
Academics invented an ingenious way to test for such a bias. It is called the implicit association test. The tests have consistently shown that it takes most people milliseconds longer to associate black faces with positive words, such as “good”, than with negative words, such as “awful”. For white faces, the pattern is reversed. The extra time it takes is evidence of someone’s implicit prejudice – a prejudice the person may not even be aware of.
There is, though, an alternative explanation for the discrimination that African Americans feel and whites deny: hidden explicit racism. Suppose there is a reasonably widespread conscious racism of which people are very much aware but to which they won’t confess – certainly not in a survey. That’s what the search data seems to be saying. There is nothing implicit about searching for “nigger jokes”. And it’s hard to imagine that Americans are Googling the word “nigger” with the same frequency as “migraine” and “economist” without explicit racism having a major impact on African Americans. Prior to the Google data, we didn’t have a convincing measure of this virulent animus. Now we do. We are, therefore, in a position to see what it explains. It explains why Obama’s vote totals in 2008 and 2012 were depressed in many regions. It also correlates with the black-white wage gap, as a team of economists recently reported. The areas that I had found make the most racist searches underpay black people.
And then there is the phenomenon of Donald Trump’s candidacy. When Nate Silver, the polling guru, looked for the geographic variable that correlated most strongly with support in the 2016 Republican primary for Trump, he found it in the map of racism I had developed. To be provocative and to encourage more research in this area, let me put forth the following conjecture, ready to be tested by scholars across a range of fields. The primary explanation for discrimination against African Americans today is not the fact that the people who agree to participate in lab experiments make subconscious associations between negative words and black people; it is the fact that millions of white Americans continue to do things like search for “nigger jokes”.
The Truth About Girls
The discrimination black people regularly experience in the United States appears to be fuelled more widely by explicit, if hidden, hostility. But, for other groups, subconscious prejudice may have a more fundamental impact. For example, I was able to use Google searches to find evidence of implicit prejudice against another segment of the population: young girls. And who, might you ask, would be harbouring bias against girls? Their parents.
It’s hardly surprising that parents of young children are often excited by the thought that their kids might be gifted. In fact, of all Google searches starting “Is my two-year-old…,” the most common next word is “gifted”. But this question is not asked equally about boys and girls. Parents are two-and-a-half times more likely to ask “Is my son gifted?” than “Is my daughter gifted?” Parents show a similar bias when using other phrases related to intelligence that they may shy away from saying aloud, like “Is my son a genius?”
Are parents picking up on legitimate differences between young girls and boys? Perhaps young boys are more likely than young girls to use big words or show objective signs of giftedness? Nope. If anything, it’s the opposite. At young ages, girls have consistently been shown to have larger vocabularies and use more complex sentences. In American schools, girls are 9% more likely than boys to be in gifted programmes. Despite all this, parents looking around the dinner table appear to see more gifted boys than girls. In fact, on every search term related to intelligence I tested, including those indicating its absence, parents were more likely to be inquiring about their sons rather than their daughters. There are also more searches for “is my son behind” or “stupid” than comparable searches for daughters. But searches with negative words like “behind” and “stupid” are less specifically skewed toward sons than searches with positive words, such as “gifted” or “genius”.
What then are parents’ overriding concerns regarding their daughters? Primarily, anything related to appearance. Consider questions about a child’s weight. Parents Google “Is my daughter overweight?” roughly twice as frequently as they Google “Is my son overweight?” Parents are about twice as likely to ask how to get their daughters to lose weight as they are to ask how to get their sons to do the same. Just as with giftedness, this gender bias is not grounded in reality. About 28% of girls are overweight, while 35% of boys are. Even though scales measure more overweight boys than girls, parents see – or worry about – overweight girls much more frequently than overweight boys. Parents are also one-and-a-half times more likely to ask whether their daughter is beautiful than whether their son is handsome.
Liberal readers may imagine that these biases are more common in conservative parts of the country, but I didn’t find any evidence of that. In fact, I did not find a significant relationship between any of these biases and the political or cultural makeup of a state. It would seem this bias against girls is more widespread and deeply ingrained than we’d care to believe.
Can We Handle the Truth?
I can’t pretend there isn’t a darkness in some of this data. It has revealed the continued existence of millions of closeted gay men; widespread animus against African Americans; and an outbreak of violent Islamophobic rage that only got worse when the president appealed for tolerance. Not exactly cheery stuff. If people consistently tell us what they think we want to hear, we will generally be told things that are more comforting than the truth. Digital truth serum, on average, will show us that the world is worse than we have thought.
But there are at least three ways this knowledge can improve our lives. First, there can be comfort in knowing you are not alone in your insecurities and embarrassing behaviour. Google searches can help show you are not alone. When you were young, a teacher may have told you that if you have a question you should raise your hand and ask it, because if you’re confused, others are too. If you were anything like me, you ignored your teacher and sat there silently, afraid to open your mouth. Your questions were too dumb, you thought; everyone else’s were more profound. The anonymous, aggregate Google data can tell us once and for all how right our teachers were. Plenty of basic, sub-profound questions lurk in other minds, too.
The second benefit of digital truth serum is that it alerts us to people who are suffering. The Human Rights Campaign has asked me to work with them in helping educate men in certain states about the possibility of coming out of the closet. They are looking to use the anonymous and aggregate Google search data to help them decide where best to target their resources.
The final – and, I think, most powerful – value in this data is its ability to lead us from problems to solutions. With more understanding, we might find ways to reduce the world’s supply of nasty attitudes. Let’s return to Obama’s speech about Islamophobia. Recall that every time he argued that people should respect Muslims more, the people he was trying to reach became more enraged. Google searches, however, reveal that there was one line that did trigger the type of response Obama might have wanted. He said: “Muslim Americans are our friends and our neighbours, our co-workers, our sports heroes and, yes, they are our men and women in uniform, who are willing to die in defence of our country.”
After this line, for the first time in more than a year, the top Googled noun after “Muslim” was not “terrorists”, “extremists”, or “refugees”. It was “athletes”, followed by “soldiers”.” And, in fact, “athletes” kept the top spot for a full day afterwards. When we lecture angry people, the search data implies that their fury can grow. But subtly provoking people’s curiosity, giving new information, and offering new images of the group that is stoking their rage may turn their thoughts in different, more positive directions.
Two months after that speech, Obama gave another televised speech on Islamophobia, this time at a mosque. Perhaps someone in the president’s office had read Soltas’s and my Times column, which discussed what had worked and what hadn’t, for the content of this speech was noticeably different.
Obama spent little time insisting on the value of tolerance. Instead, he focused overwhelmingly on provoking people’s curiosity and changing their perceptions of Muslim Americans. Many of the slaves from Africa were Muslim, Obama told us; Thomas Jefferson and John Adams had their own copies of the Koran; a Muslim American designed skyscrapers in Chicago. Obama again spoke of Muslim athletes and armed service members, but also talked of Muslim police officers and firefighters, teachers and doctors. And my analysis of the Google searches suggests this speech was more successful than the previous one. Many of the hateful, rageful searches against Muslims dropped in the hours afterwards.
There are other potential ways to use search data to learn what causes, or reduces, hate. For example, we might look at how racist searches change after a black quarterback is drafted in a city, or how sexist searches change after a woman is elected to office. Learning of our subconscious prejudices can also be useful. We might all make an extra effort to delight in little girls’ minds and show less concern with their appearance. Google search data and other wellsprings of truth on the internet give us an unprecedented look into the darkest corners of the human psyche. This is at times, I admit, difficult to face. But it can also be empowering. We can use the data to fight the darkness. Collecting rich data on the world’s problems is the first step toward fixing them.
Q&A with Seth Stephens-Davidowitz
What’s your background?
I’d describe myself as a data scientist, but my PhD is in economics. When I was doing my PhD, in 2012, I found this tool called Google Trends that tells you what people are searching, and where, and I became obsessed with it. I know that when people first see Google data, they say “Oh this is weird, this isn’t perfect data”, but I knew that perfect data didn’t exist. The traditional data sets left a lot to be desired.
What would your search records reveal about you?
They could definitely tell I’m a hypochondriac because I’m waking up in the middle of the night doing Google searches about my health. There are definitely things about me that you could figure out. When making claims about a topic, it’s better to do it on aggregate, but I think you can figure out a lot, if not everything, about an individual by what they’re searching on Google.
You worked at Google?
For about a year and a half. I was on the economics team and also the quantitative marketing team. Some was analysis of advertising, which I got bored of, which is one of the reasons I stopped working there.
Did working there give you an understanding that helped this book?
Yeah, I think it did. All this data I’m talking about is public. But from meeting the people who know more about this data than anyone in the world, I’m much more confident that it means what I think it means.
Does it change your view of human nature? Are we darker and stranger creatures than you realised?
Yeah. I think I had a dark view of human nature to begin with, and I think now it’s gotten even darker. I think the degree to which people are self-absorbed is pretty shocking.
When Trump became president, all my friends said how anxious they were, they couldn’t sleep because they’re so concerned about immigrants and the Muslim ban. But from the data you can see that in liberal parts of the country there wasn’t a rise in anxiety when Trump was elected. When people were waking up at 3am in a cold sweat, their searches were about their job, their health, their relationship – they’re not concerned about the Muslim ban or global warming.
Was the Google search data telling you that Trump was going to win?
I did see that Trump was going to win. You saw clearly that African American turnout was going to be way down, because in cities with 95% black people there was a collapse in searches for voting information. That was a big reason Hillary Clinton did so much worse than the polls suggested.
What’s next?
I want to keep on exploring this, whether in academia, journalism or more books. It’s such an exciting area: what people are really like, how the world really works. I may just research sex for the next few months. One thing I’ve learned from this book, people are more interested in sex than I thought they were.
Everybody lies. People lie about how many drinks they had on the way home. They lie about how often they go to the gym, how much those new shoes cost, whether they read that book. They call in sick when they’re not. They say they’ll be in touch when they won’t. They say it’s not about you when it is. They say they love you when they don’t. They say they’re happy while in the dumps. They say they like women when they really like men. People lie to friends. They lie to bosses. They lie to kids. They lie to parents. They lie to doctors. They lie to husbands. They lie to wives. They lie to themselves. And they damn sure lie to surveys. Here’s my brief survey for you:
Have you ever cheated in an exam?
Have you ever fantasised about killing someone?
Were you tempted to lie?
Many people underreport embarrassing behaviours and thoughts on surveys. They want to look good, even though most surveys are anonymous. This is called social desirability bias. An important paper in 1950 provided powerful evidence of how surveys can fall victim to such bias. Researchers collected data, from official sources, on the residents of Denver: what percentage of them voted, gave to charity, and owned a library card. They then surveyed the residents to see if the percentages would match. The results were, at the time, shocking. What the residents reported to the surveys was very different from the data the researchers had gathered. Even though nobody gave their names, people, in large numbers, exaggerated their voter registration status, voting behaviour, and charitable giving.
The word 'gay' is 10% more likely to complete searches that begin 'Is my husband...' than the word 'cheating'
Has anything changed in 65 years? In the age of the internet, not owning a library card is no longer embarrassing. But, while what’s embarrassing or desirable may have changed, people’s tendency to deceive pollsters remains strong. A recent survey asked University of Maryland graduates various questions about their college experience. The answers were compared with official records. People consistently gave wrong information, in ways that made them look good. Fewer than 2% reported that they graduated with lower than a 2.5 GPA (grade point average). In reality, about 11% did. And 44% said they had donated to the university in the past year. In reality, about 28% did.
Then there’s that odd habit we sometimes have of lying to ourselves. Lying to oneself may explain why so many people say they are above average. How big is this problem? More than 40% of one company’s engineers said they are in the top 5%. More than 90% of college professors say they do above-average work. One-quarter of high school seniors think they are in the top 1% in their ability to get along with other people. If you are deluding yourself, you can’t be honest in a survey.
The more impersonal the conditions, the more honest people will be. For eliciting truthful answers, internet surveys are better than phone surveys, which are better than in-person surveys. People will admit more if they are alone than if others are in the room with them. However, on sensitive topics, every survey method will elicit substantial misreporting. People have no incentive to tell surveys the truth.
How, therefore, can we learn what our fellow humans are really thinking and doing? Big data. Certain online sources get people to admit things they would not admit anywhere else. They serve as a digital truth serum. Think of Google searches. Remember the conditions that make people more honest. Online? Check. Alone? Check. No person administering a survey? Check.
The power in Google data is that people tell the giant search engine things they might not tell anyone else. Google was invented so that people could learn about the world, not so researchers could learn about people, but it turns out the trails we leave as we seek knowledge on the internet are tremendously revealing.
I have spent the past four years analysing anonymous Google data. The revelations have kept coming. Mental illness, human sexuality, abortion, religion, health. Not exactly small topics, and this dataset, which didn’t exist a couple of decades ago, offered surprising new perspectives on all of them. I am now convinced that Google searches are the most important dataset ever collected on the human psyche.
The Truth About Sex
How many American men are gay? This is a regular question in sexuality research. Yet it has been among the toughest questions for social scientists to answer. Psychologists no longer believe Alfred Kinsey’s famous estimate – based on surveys that oversampled prisoners and prostitutes – that 10% of American men are gay. Representative surveys now tell us about 2% to 3% are. But sexual preference has long been among the subjects upon which people have tended to lie. I think I can use big data to give a better answer to this question than we have ever had.
First, more on that survey data. Surveys tell us there are far more gay men in tolerant states than intolerant states. For example, according to a Gallup survey, the proportion of the population that is gay is almost twice as high in Rhode Island, the state with the highest support for gay marriage, than Mississippi, the state with the lowest support for gay marriage. There are two likely explanations for this. First, gay men born in intolerant states may move to tolerant states. Second, gay men in intolerant states may not divulge that they are gay. Some insight into explanation number one – gay mobility – can be gleaned from another big data source: Facebook, which allows users to list what gender they are interested in. About 2.5% of male Facebook users who list a gender of interest say they are interested in men; that corresponds roughly with what the surveys indicate.
And Facebook too shows big differences in the gay population in states with high versus low tolerance: Facebook has the gay population more than twice as high in Rhode Island as in Mississippi. Facebook also can provide information on how people move around. I was able to code the home town of a sample of openly gay Facebook users. This allowed me to directly estimate how many gay men move out of intolerant states into more tolerant parts of the country. The answer? There is clearly some mobility – from Oklahoma City to San Francisco, for example. But I estimate that men moving to someplace more open-minded can explain less than half of the difference in the openly gay population in tolerant versus intolerant states.
If mobility cannot fully explain why some states have so many more openly gay men, the closet must be playing a big role. Which brings us back to Google, with which so many people have proved willing to share so much.
Countrywide, I estimate – using data from Google searches and Google AdWords – that about 5% of male porn searches are for gay-male porn. Overall, there are more gay porn searches in tolerant states compared with intolerant states. In Mississippi, I estimate that 4.8% of male porn searches are for gay porn, far higher than the numbers suggested by either surveys or Facebook and reasonably close to the 5.2% of pornography searches that are for gay porn in Rhode Island.
So how many American men are gay? This measure of pornography searches by men – roughly 5% are same-sex – seems a reasonable estimate of the true size of the gay population in the United States. Five per cent of American men being gay is an estimate, of course. Some men are bisexual; some – especially when young – are not sure what they are. Obviously, you can’t count this as precisely as you might the number of people who vote or attend a movie. But one consequence of my estimate is clear: an awful lot of men in the United States, particularly in intolerant states, are still in the closet. They don’t reveal their sexual preferences on Facebook. They don’t admit it on surveys. And, in many cases, they may even be married to women.
It turns out that wives suspect their husbands of being gay rather frequently. They demonstrate that suspicion in the surprisingly common search: “Is my husband gay?” The word “gay” is 10% more likely to complete searches that begin “Is my husband...” than the second-place word, “cheating”. It is eight times more common than “an alcoholic” and 10 times more common than “depressed”.
Most tellingly perhaps, searches questioning a husband’s sexuality are far more prevalent in the least tolerant regions. The states with the highest percentage of women asking this question are South Carolina and Louisiana. In fact, in 21 of the 25 states where this question is most frequently asked, support for gay marriage is lower than the national average.
Closets are not just repositories of fantasies. When it comes to sex, people keep many secrets – about how much they are having, for example. Americans report using far more condoms than are sold every year. You might therefore think this means they are just saying they use condoms more often during sex than they actually do. The evidence suggests they also exaggerate how frequently they are having sex to begin with. About 11% of women between the ages of 15 and 44 say they are sexually active, not currently pregnant, and not using contraception. Even with relatively conservative assumptions about how many times they are having sex, scientists would expect 10% of them to become pregnant every month. But this would already be more than the total number of pregnancies in the United States (which is one in 113 women of childbearing age).
In our sex-obsessed culture it can be hard to admit that you are just not having that much. But if you’re looking for understanding or advice, you have, once again, an incentive to tell Google. On Google, there are 16 times more complaints about a spouse not wanting sex than about a married partner not being willing to talk. There are five-and-a-half times more complaints about an unmarried partner not wanting sex than an unmarried partner refusing to text back.
And Google searches suggest a surprising culprit for many of these sexless relationships. There are twice as many complaints that a boyfriend won’t have sex than that a girlfriend won’t have sex. By far, the number one search complaint about a boyfriend is “My boyfriend won’t have sex with me.” (Google searches are not broken down by gender, but since the previous analysis said that 95% of men are straight, we can guess that not many “boyfriend” searches are coming from men.)
How should we interpret this? Does this really imply that boyfriends withhold sex more than girlfriends? Not necessarily. As mentioned earlier, Google searches can be biased in favour of stuff people are uptight talking about. Men may feel more comfortable telling their friends about their girlfriend’s lack of sexual interest than women are telling their friends about their boyfriend’s. Still, even if the Google data does not imply that boyfriends are really twice as likely to avoid sex as girlfriends, it does suggest that boyfriends avoiding sex is more common than people let on.
Google data also suggests a reason people may be avoiding sex so frequently: enormous anxiety, with much of it misplaced. Start with men’s anxieties. It isn’t news that men worry about how well endowed they are, but the degree of this worry is rather profound. Men Google more questions about their sexual organ than any other body part: more than about their lungs, liver, feet, ears, nose, throat, and brain combined. Men conduct more searches for how to make their penises bigger than how to tune a guitar, make an omelette, or change a tyre. Men’s top Googled concern about steroids isn’t whether they may damage their health but whether taking them might diminish the size of their penis. Men’s top Googled question related to how their body or mind would change as they aged was whether their penis would get smaller.
Do women care about penis size? Rarely, according to Google searches. For every search women make about a partner’s phallus, men make roughly 170 searches about their own. True, on the rare occasions women do express concerns about a partner’s penis, it is frequently about its size, but not necessarily that it’s small. More than 40% of complaints about a partner’s penis size say that it’s too big. “Pain” is the most Googled word used in searches with the phrase “___ during sex.” Yet only 1% of men’s searches looking to change their penis size are seeking information on how to make it smaller.
Google search data can give us a minute-by-minute peek into eruptions of hate-fuelled rage
Men’s second most common sex question is how to make their sexual encounters longer. Once again, the insecurities of men do not appear to match the concerns of women. There are roughly the same number of searches asking how to make a boyfriend climax more quickly as climax more slowly. In fact, the most common concern women have related to a boyfriend’s orgasm isn’t about when it happened but why it isn’t happening at all.
We don’t often talk about body image issues when it comes to men. And while it’s true that overall interest in personal appearance skews female, it’s not as lopsided as stereotypes would suggest. According to my analysis of Google AdWords, which measures the websites people visit, interest in beauty and fitness is 42% male, weight loss is 33% male, and cosmetic surgery is 39% male. Among all searches with “how to” related to breasts, about 20% ask how to get rid of man breasts.
The Truth About Hate and Prejudice
Sex and romance are hardly the only topics cloaked in shame and, therefore, not the only topics about which people keep secrets. Many people are, for good reason, inclined to keep their prejudices to themselves. I suppose you could call it progress that many people today feel they will be judged if they admit they judge other people based on their ethnicity, sexual orientation, or religion. But many Americans still do. You can see this on Google, where users sometimes ask questions such as “Why are black people rude?” or “Why are Jews evil?”
A few patterns among these stereotypes stand out. For example, African Americans are the only group that faces a “rude” stereotype. Nearly every group is a victim of a “stupid” stereotype; the only two that are not: Jews and Muslims. The “evil” stereotype is applied to Jews, Muslims, and gay people but not black people, Mexicans, Asians, and Christians. Muslims are the only group stereotyped as terrorists. When a Muslim American plays into this stereotype, the response can be instantaneous and vicious. Google search data can give us a minute-by-minute peek into such eruptions of hate-fuelled rage.
Consider what happened shortly after the mass shooting in San Bernardino, California, on 2 December, 2015. That morning, Rizwan Farook and Tashfeen Malik entered a meeting of Farook’s co-workers armed with semi-automatic pistols and semi-automatic rifles and murdered 14 people. That evening, minutes after the media first reported one of the shooters’ Muslim-sounding names, a disturbing number of Californians decided what they wanted to do with Muslims: kill them. The top Google search in California with the word “Muslims” in it at the time was “kill Muslims”. And overall, Americans searched for the phrase “kill Muslims” with about the same frequency that they searched for “martini recipe” and “migraine symptoms”.
In the days following the San Bernardino attack, for every American concerned with “Islamophobia”, another was searching for “kill Muslims”. While hate searches were approximately 20% of all searches about Muslims before the attack, more than half of all search volume about Muslims became hateful in the hours that followed it. And this minute-by-minute search data can tell us how difficult it can be to calm this rage.
Four days after the shooting, President Obama gave a prime-time address to the country. He wanted to reassure Americans that the government could both stop terrorism and, perhaps more importantly, quiet this dangerous Islamophobia. Obama appealed to our better angels, speaking of the importance of inclusion and tolerance. The rhetoric was powerful and moving. The Los Angeles Times praised Obama for “[warning] against allowing fear to cloud our judgment”. The New York Times called the speech both “tough” and “calming”. The website ThinkProgress praised it as “a necessary tool of good governance, geared towards saving the lives of Muslim Americans”. Obama’s speech, in other words, was judged a major success. But was it?
Google search data suggests otherwise. Together with Evan Soltas, then at Princeton, I examined the data. In his speech, the president said: “It is the responsibility of all Americans – of every faith – to reject discrimination.” But searches calling Muslims “terrorists”, “bad”, “violent”, and “evil” doubled during and shortly after the speech. President Obama also said: “It is our responsibility to reject religious tests on who we admit into this country.” But negative searches about Syrian refugees, a mostly Muslim group then desperately looking for a safe haven, rose 60%, while searches asking how to help Syrian refugees dropped 35%. Obama asked Americans to “not forget that freedom is more powerful than fear”. Yet searches for “kill Muslims” tripled during his speech. In fact, just about every negative search we could think to test regarding Muslims shot up during and after Obama’s speech, and just about every positive search we could think to test declined.
In other words, Obama seemed to say all the right things. But new data from the internet, offering digital truth serum, suggested that the speech actually backfired in its main goal. Instead of calming the angry mob, as everybody thought he was doing, the internet data tells us that Obama actually inflamed it. Sometimes we need internet data to correct our instinct to pat ourselves on the back.
So what should Obama have said to quell this particular form of hatred currently so virulent in America? We’ll circle back to that later. First we’re going to take a look at an age-old vein of prejudice in the United States, the form of hate that in fact stands out above the rest, the one that has been the most destructive and the topic of the research that began this book. In my work with Google search data, the single most telling fact I have found regarding hate on the internet is the popularity of the word “nigger”.
Either singular or in its plural form, the word is included in 7m American searches every year. (Again, the word used in rap songs is almost always “nigga”, not “nigger”, so there’s no significant impact from hip-hop lyrics to account for.) Searches for “nigger jokes” are 17 times more common than searches for “kike jokes”, “gook jokes”, “spic jokes”, “chink jokes”, and “fag jokes” combined. When are these searches most common? Whenever African Americans are in the news. Among the periods when such searches were highest was the immediate aftermath of Hurricane Katrina in 2005, when television and newspapers showed images of desperate black people in New Orleans struggling for their survival. They also shot up during Obama’s first election. And searches rose on average about 30% on Martin Luther King Jr Day.
The frightening ubiquity of this racial slur throws into doubt some current understandings of racism. Any theory of racism has to explain a big puzzle in America. On the one hand, the overwhelming majority of black Americans think they suffer from prejudice – and they have ample evidence of discrimination in police stops, job interviews, and jury decisions. On the other hand, very few white Americans will admit to being racist. The dominant explanation among political scientists recently has been that this is due, in large part, to widespread implicit prejudice. White Americans may mean well, this theory goes, but they have a subconscious bias, which influences their treatment of black Americans.
Academics invented an ingenious way to test for such a bias. It is called the implicit association test. The tests have consistently shown that it takes most people milliseconds longer to associate black faces with positive words, such as “good”, than with negative words, such as “awful”. For white faces, the pattern is reversed. The extra time it takes is evidence of someone’s implicit prejudice – a prejudice the person may not even be aware of.
There is, though, an alternative explanation for the discrimination that African Americans feel and whites deny: hidden explicit racism. Suppose there is a reasonably widespread conscious racism of which people are very much aware but to which they won’t confess – certainly not in a survey. That’s what the search data seems to be saying. There is nothing implicit about searching for “nigger jokes”. And it’s hard to imagine that Americans are Googling the word “nigger” with the same frequency as “migraine” and “economist” without explicit racism having a major impact on African Americans. Prior to the Google data, we didn’t have a convincing measure of this virulent animus. Now we do. We are, therefore, in a position to see what it explains. It explains why Obama’s vote totals in 2008 and 2012 were depressed in many regions. It also correlates with the black-white wage gap, as a team of economists recently reported. The areas that I had found make the most racist searches underpay black people.
And then there is the phenomenon of Donald Trump’s candidacy. When Nate Silver, the polling guru, looked for the geographic variable that correlated most strongly with support in the 2016 Republican primary for Trump, he found it in the map of racism I had developed. To be provocative and to encourage more research in this area, let me put forth the following conjecture, ready to be tested by scholars across a range of fields. The primary explanation for discrimination against African Americans today is not the fact that the people who agree to participate in lab experiments make subconscious associations between negative words and black people; it is the fact that millions of white Americans continue to do things like search for “nigger jokes”.
The Truth About Girls
The discrimination black people regularly experience in the United States appears to be fuelled more widely by explicit, if hidden, hostility. But, for other groups, subconscious prejudice may have a more fundamental impact. For example, I was able to use Google searches to find evidence of implicit prejudice against another segment of the population: young girls. And who, might you ask, would be harbouring bias against girls? Their parents.
It’s hardly surprising that parents of young children are often excited by the thought that their kids might be gifted. In fact, of all Google searches starting “Is my two-year-old…,” the most common next word is “gifted”. But this question is not asked equally about boys and girls. Parents are two-and-a-half times more likely to ask “Is my son gifted?” than “Is my daughter gifted?” Parents show a similar bias when using other phrases related to intelligence that they may shy away from saying aloud, like “Is my son a genius?”
Are parents picking up on legitimate differences between young girls and boys? Perhaps young boys are more likely than young girls to use big words or show objective signs of giftedness? Nope. If anything, it’s the opposite. At young ages, girls have consistently been shown to have larger vocabularies and use more complex sentences. In American schools, girls are 9% more likely than boys to be in gifted programmes. Despite all this, parents looking around the dinner table appear to see more gifted boys than girls. In fact, on every search term related to intelligence I tested, including those indicating its absence, parents were more likely to be inquiring about their sons rather than their daughters. There are also more searches for “is my son behind” or “stupid” than comparable searches for daughters. But searches with negative words like “behind” and “stupid” are less specifically skewed toward sons than searches with positive words, such as “gifted” or “genius”.
What then are parents’ overriding concerns regarding their daughters? Primarily, anything related to appearance. Consider questions about a child’s weight. Parents Google “Is my daughter overweight?” roughly twice as frequently as they Google “Is my son overweight?” Parents are about twice as likely to ask how to get their daughters to lose weight as they are to ask how to get their sons to do the same. Just as with giftedness, this gender bias is not grounded in reality. About 28% of girls are overweight, while 35% of boys are. Even though scales measure more overweight boys than girls, parents see – or worry about – overweight girls much more frequently than overweight boys. Parents are also one-and-a-half times more likely to ask whether their daughter is beautiful than whether their son is handsome.
Liberal readers may imagine that these biases are more common in conservative parts of the country, but I didn’t find any evidence of that. In fact, I did not find a significant relationship between any of these biases and the political or cultural makeup of a state. It would seem this bias against girls is more widespread and deeply ingrained than we’d care to believe.
Can We Handle the Truth?
I can’t pretend there isn’t a darkness in some of this data. It has revealed the continued existence of millions of closeted gay men; widespread animus against African Americans; and an outbreak of violent Islamophobic rage that only got worse when the president appealed for tolerance. Not exactly cheery stuff. If people consistently tell us what they think we want to hear, we will generally be told things that are more comforting than the truth. Digital truth serum, on average, will show us that the world is worse than we have thought.
But there are at least three ways this knowledge can improve our lives. First, there can be comfort in knowing you are not alone in your insecurities and embarrassing behaviour. Google searches can help show you are not alone. When you were young, a teacher may have told you that if you have a question you should raise your hand and ask it, because if you’re confused, others are too. If you were anything like me, you ignored your teacher and sat there silently, afraid to open your mouth. Your questions were too dumb, you thought; everyone else’s were more profound. The anonymous, aggregate Google data can tell us once and for all how right our teachers were. Plenty of basic, sub-profound questions lurk in other minds, too.
The second benefit of digital truth serum is that it alerts us to people who are suffering. The Human Rights Campaign has asked me to work with them in helping educate men in certain states about the possibility of coming out of the closet. They are looking to use the anonymous and aggregate Google search data to help them decide where best to target their resources.
The final – and, I think, most powerful – value in this data is its ability to lead us from problems to solutions. With more understanding, we might find ways to reduce the world’s supply of nasty attitudes. Let’s return to Obama’s speech about Islamophobia. Recall that every time he argued that people should respect Muslims more, the people he was trying to reach became more enraged. Google searches, however, reveal that there was one line that did trigger the type of response Obama might have wanted. He said: “Muslim Americans are our friends and our neighbours, our co-workers, our sports heroes and, yes, they are our men and women in uniform, who are willing to die in defence of our country.”
After this line, for the first time in more than a year, the top Googled noun after “Muslim” was not “terrorists”, “extremists”, or “refugees”. It was “athletes”, followed by “soldiers”.” And, in fact, “athletes” kept the top spot for a full day afterwards. When we lecture angry people, the search data implies that their fury can grow. But subtly provoking people’s curiosity, giving new information, and offering new images of the group that is stoking their rage may turn their thoughts in different, more positive directions.
Two months after that speech, Obama gave another televised speech on Islamophobia, this time at a mosque. Perhaps someone in the president’s office had read Soltas’s and my Times column, which discussed what had worked and what hadn’t, for the content of this speech was noticeably different.
Obama spent little time insisting on the value of tolerance. Instead, he focused overwhelmingly on provoking people’s curiosity and changing their perceptions of Muslim Americans. Many of the slaves from Africa were Muslim, Obama told us; Thomas Jefferson and John Adams had their own copies of the Koran; a Muslim American designed skyscrapers in Chicago. Obama again spoke of Muslim athletes and armed service members, but also talked of Muslim police officers and firefighters, teachers and doctors. And my analysis of the Google searches suggests this speech was more successful than the previous one. Many of the hateful, rageful searches against Muslims dropped in the hours afterwards.
There are other potential ways to use search data to learn what causes, or reduces, hate. For example, we might look at how racist searches change after a black quarterback is drafted in a city, or how sexist searches change after a woman is elected to office. Learning of our subconscious prejudices can also be useful. We might all make an extra effort to delight in little girls’ minds and show less concern with their appearance. Google search data and other wellsprings of truth on the internet give us an unprecedented look into the darkest corners of the human psyche. This is at times, I admit, difficult to face. But it can also be empowering. We can use the data to fight the darkness. Collecting rich data on the world’s problems is the first step toward fixing them.
Q&A with Seth Stephens-Davidowitz
What’s your background?
I’d describe myself as a data scientist, but my PhD is in economics. When I was doing my PhD, in 2012, I found this tool called Google Trends that tells you what people are searching, and where, and I became obsessed with it. I know that when people first see Google data, they say “Oh this is weird, this isn’t perfect data”, but I knew that perfect data didn’t exist. The traditional data sets left a lot to be desired.
What would your search records reveal about you?
They could definitely tell I’m a hypochondriac because I’m waking up in the middle of the night doing Google searches about my health. There are definitely things about me that you could figure out. When making claims about a topic, it’s better to do it on aggregate, but I think you can figure out a lot, if not everything, about an individual by what they’re searching on Google.
You worked at Google?
For about a year and a half. I was on the economics team and also the quantitative marketing team. Some was analysis of advertising, which I got bored of, which is one of the reasons I stopped working there.
Did working there give you an understanding that helped this book?
Yeah, I think it did. All this data I’m talking about is public. But from meeting the people who know more about this data than anyone in the world, I’m much more confident that it means what I think it means.
Does it change your view of human nature? Are we darker and stranger creatures than you realised?
Yeah. I think I had a dark view of human nature to begin with, and I think now it’s gotten even darker. I think the degree to which people are self-absorbed is pretty shocking.
When Trump became president, all my friends said how anxious they were, they couldn’t sleep because they’re so concerned about immigrants and the Muslim ban. But from the data you can see that in liberal parts of the country there wasn’t a rise in anxiety when Trump was elected. When people were waking up at 3am in a cold sweat, their searches were about their job, their health, their relationship – they’re not concerned about the Muslim ban or global warming.
Was the Google search data telling you that Trump was going to win?
I did see that Trump was going to win. You saw clearly that African American turnout was going to be way down, because in cities with 95% black people there was a collapse in searches for voting information. That was a big reason Hillary Clinton did so much worse than the polls suggested.
What’s next?
I want to keep on exploring this, whether in academia, journalism or more books. It’s such an exciting area: what people are really like, how the world really works. I may just research sex for the next few months. One thing I’ve learned from this book, people are more interested in sex than I thought they were.
No comments:
Post a Comment