What Stays in Vegas

Home > Nonfiction > What Stays in Vegas > Page 12
What Stays in Vegas Page 12

by Adam Tanner


  “The first situation in which Iran will use big data to put some people in prison, it will have a huge backlash on companies like Amazon or Facebook or whichever company will be unlucky to have their data used against human rights,” Kosinski says. Figuring out personality patterns from data is not difficult, he says. In fact, a high school student could write a Facebook “gaydar” application in an evening to out gay Facebook friends, perhaps to disastrous consequences. It could prompt suicides or other tragedies.6

  It’s one thing for an academic to unmask intimate patterns from Internet postings. But would a company actually seek to use such information for profit? Of course. Jim Adler, the former chief privacy officer at Intelius, says data brokers should be able to publish anything that people can see in public.7 Such a standard, in Adler’s view, opens the way to recording when people walk into gay bars, cancer facilities, or Alcoholics Anonymous clinics. Mass urbanization has created an expectation of privacy that did not exist before, Adler says. But the Internet is returning standards back to those of the small towns where people knew many details about one another.8 “I really don’t think we are violating people’s privacy. I feel that there is an era of innovation that we are going through that is shrinking the world and putting us in public where we thought we were in private,” he says.

  Knowing someone’s sexual orientation could prove valuable for Las Vegas casinos advertising drag shows or gay bars, for example. But targeted ads could also offend. Ads in gay publications or on Internet sites visited by people with such interests—the theme of Chapter 13, on Internet advertising—may prove a more effective and less potentially offensive approach.9

  * * *

  Likes are just one of many ways to discern unexpected private details from Facebook profiles. The same year Stillwell set up myPersonality, a Massachusetts Institute of Technology master’s degree student and an undergraduate senior wanted to see just how much they could infer about a person’s sexual orientation even if the person did not disclose that information in public. Without Facebook’s permission, Behram Mistree and Carter Jernigan used a computer program to harvest Facebook profiles of 6,077 MIT students. The automated process took several weeks. They noted sexual orientation for people who stated a preference in Facebook’s “Interested In” tab, where one can list men, women, or both.

  In their sample they found that a typical straight male had 0.7 gay male friends, whereas those who declared themselves gay males had 4.6 gay friends on average.10 They created a logistic regression model and found that if more than 1.89 percent of a male’s friends identified themselves as gay, the Facebook user who did not express a sexual preference was likely gay. They checked their finding against students whose true sexual orientation they already knew.11

  “It’s not so much that you are inadvertently disclosing things that you hadn’t wanted to,” Mistree said years later.12 “It’s actually that the locus of control for describing personal information about yourself using these social networks has moved from you to others. It’s not your decision anymore. It’s the decision of your neighbor. It’s the decision of your basketball coach, all these people.” Jernigan added, “It’s not about what they post about you, it is what they post on themselves that then reflects on you.”13

  The same kinds of techniques that reveal intimate information from Facebook can help outsiders figure out who you are when you have not identified yourself. The vast proliferation of personal data as well as advances in computing power have made it harder to maintain anonymity. That’s because some parts of a person’s data could match with another dataset about them with more identifying details. It is as if several city maps had been ripped into pieces. An individual piece might not show enough to recognize the place, but a few pieces together would. In 1997 Latanya Sweeney, who in 2014 served as chief technology officer at the FTC, showed just how easy it is to identify someone with a few simple clues—even for a graduate student, as she was at the time at MIT.

  One May morning in 1996, Massachusetts Governor William Weld attended a graduation ceremony at Bentley University, outside Boston, to receive an honorary degree. The event brought attention to a school often overshadowed by better-known area institutions such as Harvard, MIT, and Boston University. Shortly after receiving his honorary law doctorate, Weld collapsed and lost consciousness for about a minute. An ambulance took him to Deaconess-Waltham Hospital. The graduation ceremony proceeded, with the crowd pausing for a moment of silence and prayer for the governor. At the hospital, doctors announced that they had conducted an electrocardiogram, a chest X-ray, and blood tests on the fifty-year-old Weld. They concluded he had suffered nothing more serious than the flu. He recovered quickly.

  The following year, Sweeney wanted to see if she could identify medical patients from anonymous records. The Massachusetts Group Insurance Commission (GIC), a state body that looks at health-care costs and treatment, released hospital exit records on state employees to researchers but without the patients’ names. “I remembered Weld had collapsed and that’s why I thought, ‘Can I find Weld in the records?’” says Sweeney, who later became a Harvard professor.

  She bought a copy of the voter rolls for Cambridge, the city where Weld lived. Those records contained the name, birth date, gender, and ZIP code for each resident. Only three men in the Cambridge area shared Weld’s birth date, and he was the only one with that birth date in his ZIP code. Using that limited information, she pinpointed his hospital records.14 Her study, published the following year, showed that just knowing someone’s date of birth, gender, and postal code provided enough information to identify up to 87 percent of the US population. “You don’t need very much information to reidentify people,” she says.

  In 2013 Sweeney, working with a research assistant and two students, tried to unlock the names of participants in an especially ambitious medical research study. That effort, called the Personal Genome Project (PGP), aimed to spark new discoveries. George Church, a professor of genetics at Harvard Medical School, says that advances in data and in medicine make it impossible to guarantee anonymity for most medical experiments. When he set up the Personal Genome Project, he made no privacy promises. In the interest of advancing knowledge of human health and disease, he posts the data for all volunteers on the Internet for any researcher to study. He does not list names, but many participants share intimate details: abortions, depression, sexual ailments, and prescription drugs are listed along with their DNA sequence.

  Before accepting new volunteers, Church requires that they take an online exam about privacy risks. They must intimately know the details of the twenty-four-page consent form. “The Personal Genome Project is a new form of public genomics research and, as a result, it is impossible to accurately predict all of the possible risks and discomforts that you might experience,” it says. Having someone identify participants is one of the listed risks. The exam does not pose a simple generic question such as “Do you understand the risks?” It lists twenty questions, and Church requires a perfect score. Potential volunteers can take the test as many times as they need until they pass. One person took the test ninety times before getting the required perfect score.15

  Of course, almost no one reads privacy policies because they are so dull and obtuse. One study found that it would take between eight and twelve minutes to read a typical website privacy statement. The study’s authors estimated that it would take a person between 181 hours and 304 hours a year to read all the privacy statements he or she came across over that period—well over a month of working hours.16

  People likely read privacy policies about medical experiments more carefully, but Church says most studies are disingenuous in describing privacy risks. “This is one of the ways people get in over their heads in terms of personal data being exposed,” he says. In fact, many surveys can expose personally identifiable information even if they say they are anonymous.

  As of 2014, more than three thousand people had volunteered their data to the Pers
onal Genome Project. Church would like to recruit up to one hundred thousand people, but he needs additional funding (it costs about $4,000 per person to take the DNA test and cover related administrative costs).

  Every year the project hosts a conference where scientists and participants meet for two days of formal lectures as well as informal discussions. For the 2013 event in Boston, Sweeney set up a table in the hallway with her assistant to demonstrate that she could unmask the identity of many participants. Ahead of the conference she programmed her computers to collect publicly posted data on 1,130 of the volunteers. Of this number, 579 provided ZIP code, date of birth, and gender—the key information her 1997 study had shown could be used to identify large swaths of the US population. By cross-referencing the three pieces of information against voter registration records or other public documents, Sweeney identified 241 people, 42 percent of the total.17 The Personal Genome Project confirmed that she had the names right 84 percent of the time, or 97 percent when adding nicknames and other variations on the first name.18

  Participants at the conference reacted to Sweeney’s findings largely by saying they expected one day to be identified. Gabriel Dean, who works at a telephone company, signed up after hearing about the Personal Genome Project on National Public Radio. He checked first with his siblings because he realized what he gave away about himself could reflect on them. As open as he was about his medical data, he remains concerned about revealing information on social networks, so he does not maintain profiles on Facebook or LinkedIn.

  Throughout the two-day conference, study participants stopped by the table where Sweeney walked them through her website aboutmyinfo.org to demonstrate how easily she could identify them. She asked people to enter their ZIP code, date of birth, and gender into the site, which in turn told users if they were unique and thus identifiable. One woman came up and asked in a somewhat feisty tone why she should care. Sweeney responded that, for example, a life insurance company could theoretically deny writing a policy based on personal data. The woman turned pale. “I was just denied life insurance,” she said. The Harvard professor quickly replied that someone could be denied life insurance for many reasons and that it was far from clear that anyone had actually seen her medical data. But the woman did seem to recognize the potential danger from having such intimate medical details out there.

  Harvard professor Latanya Sweeney talks with her research assistant Sean Hooley at her office. Source: Author photo.

  Many attending the conference embraced a let-the-world-know ethos. Steven Pinker, a well-known experimental psychologist and author of the 2011 book The Better Angels of Our Nature, stepped forward as one of the first ten volunteers in the study. He posts his genome and a 1996 scan of his brain on his website and insists even that amount of information does not reveal much about him as a person.19 “There just isn’t going to be an ‘honesty gene’ or anything else that would be nearly as informative as a person’s behavior, which, after all, reflects the effect of all three billion base pairs and their interactions together with chance, environmental effects, and personal history,” he says. “As for the medical records, I just don’t think anyone is particularly interested in my back pain.”

  Sweeney’s goal in publishing such findings is not to humiliate people by outing them. She believes that researchers with access for medical data on millions of patients may be able to find new cures for diseases or different patterns of effective treatment. Yet she wants to encourage people to find a better balance between sharing data and preserving some privacy. For example, people could list just their year of birth rather than full birth date, and just three rather than five or nine digits from their ZIP code. “Vulnerabilities exist, but there are solutions too,” she says. “If they change those demographics, they can thwart that attack without losing research value.”

  Does someone need Sweeney’s training, an advanced degree in computer science, to reidentify people from the Personal Genome Project? Apparently not. To test Sweeney’s findings, I tried to find three participants who had especially detailed and lengthy medical histories. One profile listed an abortion, anal itching, constipation, marijuana use, urinary tract infection, and many other ailments. She gave her weight as 160 pounds and said she took medication for high blood pressure. I went to the site of a commercial data broker and entered the birth date of a woman in a certain ZIP code. Instantly two names came up for that birth date in that ZIP code, only one of which was a woman.

  She turned out to be a professor and well-known scholar. She was surprised when I contacted her out of the blue. “I certainly did pay attention to the caveats about ‘personal identification’ when I signed up for the PGP, but didn’t realize it would be so ridiculously easy to track down an individual,” she said. “It doesn’t worry me over-much, perhaps because I’m at an age where I’m not all that concerned ‘what people might think’ of various aspects of my history. I can imagine, though, that would not always have been the case.”

  Although she did not object to my using her case to illuminate the problem, I checked back several times. She works for a faith-based institution that strongly disapproves of abortion. The school staff handbook warns that engaging in conduct detrimental to the reputation of the institution could lead to dismissal. She said she was confident nothing would happen to her after serving as a tenured professor for decades. “It certainly isn’t anything I hide (in fact, I use it as an example in class). That said, it might have made a difference many years ago, especially given that some former administrators were much more conservative than those we have today,” she said. But in the end she did not want her name published—the details were just too intimate.

  Another woman, sixty-eight, admitted on her Personal Genome Project survey to using cocaine and marijuana in the past and gave a long list of her ailments and the medications that she took. She also said she had suffered from child abuse from 1946 to 1963. For her birth date and ZIP code combination four or five female names appeared (one name was Lee, which could have been either gender). Yet the volunteer had also uploaded a genetic test to her profile that included her name. It took a little searching to find the phone number and email address for the woman, who had been involved in her high school’s fiftieth reunion. She had left her web address details on a school alumni website, which in turn led to her email address.

  The third volunteer was a seventy-two-year-old man whose profile listed alcoholism, bed-wetting to age twelve, bipolar disorder, cocaine use, depression, and many other ailments. Two men share his birth date in his Santa Barbara, California, ZIP code. One appeared to have moved. Searching the name of the second man led to a LinkedIn page saying he had gone to Harvard and had worked as a scientist. Fred Gamble confirmed he did indeed participate in the Personal Genome Project. He expressed surprise that someone had identified him, but not concern. He was retired and had become an active gardener. “Mine is detailed and there is some stuff in it that a younger person might not want broadcast, but I’m seventy-two and I don’t really care,” he told me.20

  People volunteer for the Personal Genome Project because they are more open about their personal data in the first place. But not everyone wants to share such intimate data so freely. As a black woman who grew up in the South in the 1960s, Sweeney is sensitive to the potential for discrimination based on personal identity. She is also concerned about the vulnerability of anonymized medical data made public, which includes hospital discharge records released by most states. Such records exclude a patient’s name, address, and Social Security number but still contain identifying clues. Insurance companies, labs, pharmacies, and various middlemen also have wide access to claims data related to medical conditions.

  Selling deidentified data has become a multibillion-dollar business, even if such practices are largely hidden from the public. For example, when you fill a prescription, the pharmacy sells details about that transaction, earning about a penny. American pharmacies fill more than 2.5 million prescriptions e
very day, so over time those pennies add up. Nearly all of the country’s sixty thousand pharmacies send out details of each transaction to companies that compile and analyze the information to resell to others. The data include age and gender of the patient; the doctor’s name, address, and contact details; and details about the prescription.21

  Despite assurances from the health-care industry, some privacy advocates say the trade in personal medical data will eventually harm people through reidentification. One prominent medical privacy advocate is Deborah Peel, a Freudian psychoanalyst. The first week Peel opened her practice in 1977, a patient startled her with an unusual question.

  “If I pay you cash, will you keep my medical records private?”

  At medical school Peel thought that mental health records could not be released without the patient’s explicit permission. Yet she learned that records did get to employers who either fired people or demoted them. She agreed to keep records off the insurance rolls for cash. Over the years, she became ever more concerned about patient privacy. In 2004 she stopped taking new patients and set up the Patient Privacy Rights Foundation, based in Austin, Texas. “It’s really hard not to come off as kind of a wing nut or separatist or I don’t know what. But I’m just a doctor who’s watched this for thirty-five years,” she says. “With all this data out there, it’s going to be the greatest source of job discrimination we’ve ever seen in this country, and it’s going to start very early with your kid.”

 

‹ Prev