by Luke Dormehl
Are You a Darth Vader or a Luke Skywalker?
Berk’s algorithm is a black box, meaning that its insides are complex, mysterious and unknowable. What goes in is a dataset of x’s, containing information on an individual’s background, as well as demographic variables like their age and gender. What comes out is a set of y’s, representing the risk they are calculated as posing. The challenge, Berk says, is to find the best formula to allow x to predict y. “When a new person comes through the door, we want to find out whether they are high risk or low risk,” he explains. “You enter that person’s ID, which is sent out to the various databases. The information in those databases is brought back to the local computer, which figures out what the risk is. That information is then passed along to the decision-maker.”
Berk makes no apology for the opacity of his system. “It frees me up,” he explains. “I get to try different black boxes and pick the one that forecasts best.” What he doesn’t care about is causal models. “I make no claims whatsoever that what I’m doing explains why it is that individuals fail,” he says. “I’m not trying to develop a cause-and-effect rendering of whatever it is that’s going on. I just want to forecast accurately.”
For this reason, Berk will place more emphasis on what he personally considers to be a strong predictor—such as a prisoner missing work assignments—than he will on full psychological evaluations. If it turns out that liver spots can tell him something about a person’s risk of committing future crime, they become a part of his model, with no questions asked. “I don’t have to understand why [liver] spots work,” Berk says. “If the computer finds things I’m unaware of, I don’t care what they are, just so long as they forecast. I’m not trying to explain.”
The majority of Berk’s metrics for predicting future dangerousness are somewhat more obvious than liver spots. Men are more likely to commit violent crimes than women, while younger men are more likely to behave violently than older men. A man’s odds of committing a violent crime are at their highest when he is in his mid-twenties. From there, the chance steadily decreases until the age of 40, after which they plummet to almost zero. It is for this reason that violent crimes committed early in life are more predictive of future crime than those committed later on.
“People assume that if someone murdered, then they will murder in the future,” Berk has noted. “But what really matters is what that person did as a young individual. If they committed armed robbery at age 14 that’s a good predictor. If they committed the same crime at age 30, that doesn’t predict very much.”
There is, of course, the question of potential errors. Berk claims that his forecasting system can predict with 75 percent accuracy whether a person released on parole will be involved in a homicide at some point in the future. That is certainly an impressive number, but one that still means he will be wrong as often as one in every four times. “No matter how good this forecasting is going to be we are always going to make mistakes,” Berk acknowledges. “Everyone appreciates this, although that doesn’t make it any less painful.” The margin for error in crime prediction is something of a theme in the film Minority Report, where the “minority report” alluded to in the title refers to the suppressed information that the three precogs used to predict crimes occasionally disagree on predictions. “Are you saying I’ve [arrested] innocent people?” asks Tom Cruise’s police chief when he discovers that this vital piece of information has been kept from him. “I’m saying that every so often, those accused of a PreCrime might, just might, have an alternate future,” comes the answer. It is easy to see why news of such minority reports might have been kept from the public. In order for PreCrime to function, there can be no suggestion of fallibility. Who wants a justice system that instills doubt, no matter how effective it might be?
To Berk, mistakes come in one of two different forms: false positives and false negatives. A false positive (which he refers to as a “Luke Skywalker”) is a person incorrectly identified as a high-risk individual. A false negative (a “Darth Vader”) is a high-risk individual who is not recognized as such. This, Berk says, is a political question rather than a statistical one. Is it worse to falsely accuse a Luke Skywalker, or fail to find a Darth Vader? “In theory, false positives and false negatives are treated as equally serious errors,” he says. “But in practice, this turns out not to be true. If you work with criminal justice officials, or talk to stakeholders or citizens, some mistakes are worse than others. In general it is worse to fail to identify a high-risk individual, than it is to falsely represent someone as if they were a high-risk individual.” That is therefore the way that Berk’s algorithm is weighted. In many criminal justice settings—particularly when it comes to violent crime—decision-makers are likely to accept weaker evidence if this means avoiding letting a potential Darth Vader slip through the cracks. The price that you pay is that more Luke Skywalkers will be falsely accused.
Soon, Berk says, the availability of data sources will expand even further. Rather than just relying on the official records routinely available in criminal-justice files, Berk and his colleagues will be able to use custom data sources to predict future criminality. GPS ankle bracelets, for instance, can examine how people spend their free time—and algorithms can then compare this to a database of criminals to see whether a person happens to share similar pastimes with an Al Capone or a Ted Bundy. “Are they at home watching TV, or are they spending [their free time] on a particular street corner, which historically has been used for drug transactions?” Berk asked in a 2012 talk for Chicago Ideas Week. “Knowing whether someone is out at 2 A.M. on a Saturday morning versus 10 A.M. will, we think, help us forecast better.”
Given time it might even prove possible to begin building up profiles of people before they commit a crime in the first place: one step closer to the authoritarian world of PreCrime imagined by Philip K. Dick. As fraught with legal and ethical dilemmas as this is, a person who grows up in a high-crime area, has a family history of drug abuse, or perhaps a sibling or parent already in prison, could be identified by Berk’s algorithm—despite not yet having an arrest history.
Another area in which such technology will likely be used is as part of the school system. “Schools want to know whether the students they have are going to create problems,” Berk says. “There’s some initial work being done using school data for school kids who have no criminal record, to determine which are high risk for dropping out of school, for being truant, for getting in fights, for vandalism, and so on.” In 2013, he was working with children, aged between 8 and 10, to use predictive modeling to establish how likely it is that they may commit a felony later in life. For now, however, realizing this technology remains the stuff of the future. As Berk himself acknowledges, “That’s just a gleam in our eye. We’re just starting to do that.”
Delete All the Lawyers
There is a line in Shakespeare’s Henry VI that is likely a favorite of anyone who has ever had adverse dealings with those in the legal profession. Adding his two cents to a plan to stage a social revolt in England, the character of Dick the Butcher pipes up with a suggestion he sees as all but guaranteeing utopia. “The first thing we do,” he says, “let’s kill all the lawyers.”
Approaching 500 years after Shakespeare’s play was first performed, lawyers might not yet be dead, but The Formula may be rendering an increasing number of them irrelevant. Consider the area of legal discovery, for instance. Legal discovery refers to the pretrial phase of a lawsuit, in which each party obtains and sorts through the material it requires that may lead to admissible evidence in court. In previous years, discovery was mainly carried out by junior lawyers, dispatched by law firms to comb through large quantities of possible evidence by hand. This task was both good for law firms and bad for clients, who inevitably found themselves on the receiving end of costly fees. In 1978, five television studios became entangled in an antitrust lawsuit filed against broadcasting giant CBS. To examine
the 6 million documents deemed to be relevant to the case, the five television studios hired a team of lawyers who worked for a period of several months. When their bill eventually came in, it was for an amount in excess of $2.2 million—close to $8 million in today’s money.16
Thanks to advances in artificial intelligence, today discovery can be carried out using data-mining “e-discovery” tools, in addition to machine-learning processes such as predictive coding. Predictive coding allows for human lawyers to manually review a small percentage of the available documents, and in doing so to “teach” the computer to distinguish between relevant and irrelevant information. Algorithms can then process the bulk of the information in less than a third of the time it would take for even the most competent of human teams to carry out the same task. Such systems have shown a repeated ability to outperform both junior lawyers and paralegals in terms of precision. After all, computers don’t get headaches.
Often e-discovery requires nothing more complex than basic indexing or the imposing of simple legal classification. It can be used to go further, though. Algorithms can extract relevant concepts (such as pulling out all documents pertaining to social protest in the Middle East) and are becoming increasingly adept at searching for specific ideas, regardless of the wording used. Algorithms can even search for the absence of particular terms, or look for the kind of underlying patterns that would likely have eluded the attention of human lawyers. Better yet, not only can this work be carried out faster than it would take human lawyers—but for a fraction of the cost as well. One of the world’s leading e-discovery firms, Palo Alto’s Blackstone Electronic Discovery, is regularly able to analyze 1.5 million documents for a cost of less than $100,000.
Located in the heart of Silicon Valley, Blackstone boasts a client list that includes Netflix, Adobe and the United States Department of Justice. In 2012, the company worked on the Apple v. Samsung patent case, described by Fortune magazine as the “(Patent) trial of the century.”17 Blackstone’s founder is an MBA named John Kelly, who started the company in 2003 in response to what he saw as the obvious direction the legal profession was headed in. “The amount of data we have to deal with today is absolutely exploding,” Kelly says. “Twenty years ago a typical case might have involved ten boxes of hard copy. Today it’s easy to pull 100GB of soft copy just from local servers, desktops and smartphones. That’s the equivalent of between 1 and 2 million pages right there.”
Part of the explanation for the exponential increase in data comes down to the ease with which information can now be stored. Rather than giant, space-consuming filing cabinets of physical documents, modern companies increasingly store their data in the form of digital files. In a world of cloud-based storage solutions there is literally no excuse for throwing anything away. In the Apple v. Samsung case, Samsung found itself verbally chastised by the presiding judge after admitting that all corporate e-mails carried an expiry date, which caused them to be automatically deleted every two weeks. As Fast Company pointed out, “Samsung lost [the case] anyway, but this [infraction] . . . might have sealed the outcome.”18
Where previously the field of discovery was about finding enough data to build a case, now the e-discovery process focuses instead on how much information can be safely ignored. As Kelly points out, “In the digital world, it’s more about figuring out how to limit the scope of our investigations.” This is another application in which the algorithms come into their own. Kelly acknowledges that the efficiency with which this limiting process can be carried out doesn’t always endear his company to those working in more traditional law firms. “Some firms might see a case and think of it as a $5 million opportunity,” he says. “A company like Blackstone then comes in and for $100,000 can take the number of relevant documents down to just 500 e-mails in the blink of an eye.”
If there’s one area Kelly admits to worrying about it is for the new generation of junior lawyers, whose livelihood is being threatened by automation. “One of the questions our work provokes is what happens to that cadre of folks just out of law school, who don’t have clients yet, who aren’t rainmakers—what are they going to do?” Kelly says, with a twinge of genuine pain in his voice. “In the old days there was tons of stuff around for them. It might not always have been exciting work, but at least it was available. Now guys like us can do a lot of that work just by using the right algorithm.”
Divorce by Algorithm
Business-management guru Clayton Christensen identifies two types of new technology: “sustaining” and “disruptive” innovations.19 A sustaining technology is something that supports or enhances the way a business or market already operates. A disruptive technology, on the other hand, fundamentally alters the way in which a particular sector functions. An example of the former might be something like the advent of computerized accounting systems, while the arrival of digital cameras (which famously led to the downfall of Kodak) represents the latter. Tools like e-discovery algorithms are disruptors. But they’re also far from the exception to the rule when it comes to the many ways in which the legal profession is being irreversibly altered by the arrival of The Formula.
In his classic book, The Selfish Gene, evolutionary biologist Richard Dawkins describes the way in which legal cases have “evolved” to become as inefficient as possible—thereby enabling lawyers, working with one another in “elaborately coded cooperation,” to jointly milk their clients’ bank accounts for the longest amount of time possible. In its own way this is an algorithm in itself, albeit one that is diametrically opposed to a computer-based algorithm designed to produce efficient results in as few steps as possible.20
With the unbalanced weighting of the legal system in favor of those practicing it,21 it is no surprise that many lawyers criticize the use of disruptive technologies in law, worried about the detrimental effects that it is likely to have on their earning power. A large number of these attack what is viewed as the “commoditization” of law: unfavorably comparing “routinized” legal work to the kind of “bespoke” work you would receive from a human lawyer. (Think about the difference between off-the-rack and hand-tailored clothing in both quality and price.)
But while this criticism makes sense if you are a lawyer carrying out what you feel to be bespoke work, it also heavily downplays the number of legal tasks that a bot can perform as well as, if not better than, a person. One such area that The Formula is revolutionizing is the process of contract drafting, thanks to the rise of automated document assembly systems like the snappily named LegalZoom. Founded by two corporate-law refugees in 2001, LegalZoom has since served more than 2 million customers and in the process become a better-known brand name within the United States than any other law firm. Charging as little as $69 for wills and $99 for articles of incorporation, LegalZoom uses algorithms for its high-volume, low-cost business of providing basic consumer and business documents: doing for the legal profession what Craigslist did for the newspaper industry’s profitable classified ad business.22 Another area is trademark analysis, with the Finnish start-up Onomatics having created an algorithm capable of generating instant reports showing how far apart two different trademarks might be: an area notorious for its high level of subjective stickiness.
A similar technology is Wevorce, a Mountain View, California, start-up, located several miles from Google’s corporate headquarters. If Internet dating’s frictionless approach to coupling—discussed last chapter—promises to take the pain out of love, then Wevorce makes the same promise about divorce: offering a divorce service mediated by algorithm. Not only does Wevorce provide a standardized service, based on a system that identifies 18 different archetypal patterns in divorcing couples—but it can even advise on what stage in the grieving process a user’s former partner is likely to be experiencing at any given moment. By asking couples to think rationally (even computationally) about separation, Wevorce claims that it can “[change] divorce for the better.”23 “Because the software keeps the pr
ocess structured, it’s less scary for divorcing couples and more efficient for attorneys, which leads to overall lower costs,” says CEO Michelle Crosby.24
The Invisible Law Enforcer
Many technology commentators have remarked on the major shift that has accompanied our changing understanding of the term “transparency” over the past 40 years. In the early days of personal computing, a computer that was transparent meant a computer on which one could “lift the lid” and tinker with its inner workings. Jump forward to the present day and transparency instead denotes something entirely more opaque: namely that we can make something work without understanding how it works. To quote MIT psychoanalyst Sherry Turkle, as we have moved from a modernist culture of calculation to a postmodern one of simulation, computers increasingly ask to be taken at “interface value.”25 This idea of interface value was perfectly encapsulated during a 2011 interview with Alex Kipman, one of Microsoft’s engineers on the Kinect motion-sensing device. Speaking to a reporter from the New York Times, Kipman proudly explained that increasingly we are headed toward “a world where technology more fundamentally understands you, so you don’t have to understand it.”26