Hello World

Home > Other > Hello World > Page 18
Hello World Page 18

by Hannah Fry


  The more platforms we use to see what’s popular – bestseller lists, Amazon rankings, Rotten Tomatoes scores, Spotify charts – the bigger the impact that social proof will have. The effect is amplified further when there are millions of options being hurled at us, plus marketing, celebrity, media hype and critical acclaim all demanding your attention.

  All this means that sometimes terrible music can make it to the top. That’s not just me being cynical. During the 1990s, two British music producers – fully aware of this fact – were rumoured to have made a bet on who could get the worst song possible into the charts. Supposedly, the result of the wager was a girl group called Vanilla, whose debut song, ‘No way no way, mah na mah na’, was based on the famous Muppets ditty. It featured a performance that could only generously be described as singing, artwork that looked like it had been made in Microsoft Paint and a promotional video that had a good claim to being the worst ever shown.5 But Vanilla had some powerful allies. Thanks to a few magazine features and an appearance on BBC’s Top of the Pops, the song still managed to get to number 14 in the charts.fn2

  Admittedly, the band’s success was short-lived. By their second single, their popularity was already waning. They never released a third. All of which does seem to suggest that social proof isn’t the only factor at play – as indeed a follow-up experiment from the Music Lab team showed.

  The set-up to their second study was largely the same as the first. But this time, to test how far the perception of popularity became a self-fulfilling prophecy, the researchers added a twist. Once the charts had had the chance to stabilize in each world, they paused the experiment and flipped the billboard upside down. New visitors to the music player saw the chart topper listed at the bottom, while the flops at the bottom took on the appearance of the crème de la crème at the top.

  Almost immediately, the total number of downloads by visitors dropped. Once the songs at the top weren’t appealing, people lost interest in the music on the website overall. The sharpest declines were in downloads for the turkeys, now at the top of the charts. Meanwhile, the good tracks languishing at the bottom did worse than when they were at the top, but still better than those that had previously been at the end of the list. If the scientists had let the experiment run on long enough, the very best songs would have recovered their popularity. Conclusion: the market isn’t locked into a particular state. Both luck and quality have a role to play.6

  Back in reality – where there is only one world’s worth of data to go on – there’s a straightforward interpretation of the findings from the Music Lab experiments. Quality matters, and it’s not the same thing as popularity. That the very best songs recovered their popularity shows that some music is just inherently ‘better’. At one end of the spectrum, a sensational song by a fantastic artist should (at least in theory) be destined for success. But the catch is that the reverse doesn’t necessarily hold. Just because something is successful, that doesn’t mean it’s of a high quality.

  Quite how you define quality is another matter altogether, which we’ll come on to in a moment. But for some, quality itself isn’t necessarily important. If you’re a record label, or a film producer, or a publishing house, the million-dollar question is: can you spot the guaranteed successes in advance? Can an algorithm pick out the hits?

  Hunting the hits

  Investing in movies is a risky business. Few films make money, most will barely break even, and flops are part of the territory.7 It’s a high-stakes business: when the costs of making a movie run into the tens or hundreds of millions, failure to predict the demand for the product can be catastrophically expensive.

  That was a lesson learned the hard way by Disney with its film John Carter, released in 2012. The studio sank $350 million into making the movie, determined that it should sit alongside the likes of Toy Story and Finding Nemo as their next big franchise. Haven’t seen it? Me neither. The film failed to capture the public’s imagination and wound up making a loss of $200 million, resulting in the resignation of the head of Walt Disney Studios.8

  The great and the good of Hollywood have always accepted that you just can’t accurately predict the commercial success of a movie. It’s the land of the gut-feel. Gambling on films that might bomb in the box office is just part of the job. In 1978, Jack Valenti, president and CEO of the Motion Picture Association of America, put it this way: ‘No one can tell you how a movie is going to do in the marketplace. Not until the film opens in a darkened theatre and sparks fly up between the screen and the audience.’9 Five years later, in 1983, William Goldman – the writer behind The Princess Bride and Butch Cassidy and the Sundance Kid – put it more succinctly: ‘Nobody knows anything.’10

  But, as we’ve seen throughout this book, modern algorithms are routinely capable of predicting the seemingly unpredictable. Why should films be any different? You can measure the success of a movie, in revenue and in critical reception. You can measure all sorts of factors about the structure and features of a film: starring cast, genre, budget, running time, plot features and so on. So why not apply these same techniques to try and find the gems? To uncover which films are destined to triumph in the box office?

  This has been the ambition of a number of recent scientific studies that aim to tap into the vast, rich depths of information collected and curated by websites like the Internet Movie Database (IMDb) or Rotten Tomatoes. And – perhaps unsurprisingly – there are a number of intriguing insights hidden within the data.

  Take the study conducted by Sameet Sreenivasan in 2013.11 He realized that, by asking users to tag films with plot keywords, IMDb had created a staggeringly detailed catalogue of descriptors that could show how our taste in films has evolved over time. By the time of his study, IMDb had over 2 million films in its catalogue, spanning more than a century, each with multiple plot tags. Some keywords were high-level descriptions of the movie, like ‘organized-crime’ or ‘father-son-relationship’; others would be location-based, like ‘manhattan-new-york-city’, or about specific plot points, like ‘held-at-gunpoint’ or ‘tied-to-a-chair’.

  On their own, the keywords showed that our interest in certain plot elements tends to come in bursts; think Second World War films or movies that tackle the subject of abortion. There’ll be a spate of releases on a similar topic in quick succession, and then a lull for a while. When considered together, the tags allowed Sreenivasan to come up with a score for the novelty of each film at the time of its release – a number between zero and one – that could be compared against box-office success.

  If a particular plot point or key feature – like female nudity or organized crime – was a familiar aspect of earlier films, the keyword would earn the movie a low novelty score. But any original plot characteristics – like the introduction of martial arts in action films in the 1970s, say – would earn a high novelty score when the characteristic first appeared on the screen.

  As it turns out, we have a complicated relationship with novelty. On average, the higher the novelty score a film had, the better it did at the box office. But only up to a point. Push past that novelty threshold and there’s a precipice waiting; the revenue earned by a film fell off a cliff for anything that scored over 0.8. Sreenivasan’s study showed what social scientists had long suspected: we’re put off by the banal, but also hate the radically unfamiliar. The very best films sit in a narrow sweet spot between ‘new’ and ‘not too new’.

  The novelty score might be a useful way to help studios avoid backing absolute stinkers, but it’s not much help if you want to know the fate of an individual film. For that, the work of a European team of researchers may be more useful. They discovered a connection between the number of edits made to a film’s Wikipedia page in the month leading up to its cinematic release and the eventual box-office takings.12 The edits were often made by people unconnected to the release – just typical movie fans contributing information to the page. More edits implied more buzz around a release, which in turn led to higher tak
ings at the box office.

  Their model had modest predictive power overall: out of 312 films in the study, they correctly forecast the revenue of 70 movies with an accuracy of 70 per cent or over. But the better a film did, and the more edits were made to the Wikipedia page, the more data the team had to go on and the more precise the predictions they made. The box-office takings of six high-earning films were correctly forecast to 99 per cent accuracy.

  These studies are intellectually interesting, but a model that works only a month before a film’s release isn’t much use for investors. How about tackling the question head-on instead: take all the factors that are known earlier in the process – the genre, the celebrity status of the leading actors, the age guidance rating (PG, 12, etc.) – and use a machine-learning algorithm to predict whether a film will be a hit?

  One famous study from 2005 did just that, using a neural network to try to predict the performance of films long before their release in the cinema.13 To make things as simple as possible, the authors did away with trying to forecast the revenue exactly, and instead tried to classify movies into one of nine categories, ranging from total flop to box-office smash hit. Unfortunately, even with that step to simplify the problem, the results left a lot to be desired. The neural network outperformed any statistical techniques that had been tried before, but still managed to classify the performance of a movie correctly only 36.9 per cent of the time on average. It was a little better in the top category – those earning over $200 million – correctly identifying those real blockbusters 47.3 per cent of the time. But investors beware. Around 10 per cent of the films picked out by the algorithm as destined to be hits went on to earn less than $20 million – which by Hollywood’s standards is a pitiful amount.

  Other studies since have tried to improve on these predictions, but none has yet made a significant leap forward. All the evidence points in a single direction; until you have data on the early audience reaction, popularity is largely unpredictable. When it comes to picking the hits from the pile, Goldman was right. Nobody knows anything.

  Quantifying quality

  So predicting popularity is tricky. There’s no easy way to prise apart what we all like from why we like it. And that poses rather a problem for algorithms in the creative realm. Because if you can’t use popularity to tell you what’s ‘good’ then how can you measure quality?

  This is important: if we want algorithms to have any kind of autonomy within the arts – either to create new works, or to give us meaningful insights into the art we create ourselves – we’re going to need some kind of measure of quality to go on. There has to be an objective way to point the algorithm in the right direction, a ‘ground truth’ that it can refer back to. Like an art analogy of ‘this cluster of cells is cancerous’ or ‘the defendant went on to commit a crime’. Without it, making progress is tricky. We can’t design an algorithm to compose or find a ‘good’ song if we can’t define what we mean by ‘good’.

  Unfortunately, in trying to find an objective measure of quality, we come up against a deeply contentious philosophical question that dates back as far as Plato. One that has been the subject of debate for more than two millennia. How do you judge the aesthetic value of art?

  Some philosophers – like Gottfried Leibniz – argue that if there are objects that we can all agree on as beautiful, say Michelangelo’s David or Mozart’s Lacrimosa, then there should be some definable, measurable, essence of beauty that makes one piece of art objectively better than another.

  But on the other hand, it’s rather rare for everyone to agree. Other philosophers, such as David Hume, argue that beauty is in the eye of the beholder. Consider the work of Andy Warhol, for instance, which offers a powerful aesthetic experience to some, while others find it artistically indistinguishable from a tin of soup.

  Others still, Immanuel Kant among them, have said the truth is something in between. That our judgements of beauty are not wholly subjective, nor can they be entirely objective. They are sensory, emotional and intellectual all at once – and, crucially, can change over time depending on the state of mind of the observer.

  There is certainly some evidence to support this idea. Fans of Banksy might remember how he set up a stall in Central Park, New York, in 2013, anonymously selling original black-and-white spray-painted canvases for $60 each. The stall was tucked away in a row of others selling the usual touristy stuff, so the price tag must have seemed expensive to those passing by. It was several hours before someone decided to buy one. In total, the day’s takings were $420.14 But a year later, in an auction house in London, another buyer would deem the aesthetic value of the very same artwork great enough to tempt them to spend £68,000 (around $115,000 at the time) on a single canvas.15

  Admittedly, Banksy isn’t popular with everyone. (Charlie Brooker – creator of Black Mirror – once described him as ‘a guffhead [whose] work looks dazzlingly clever to idiots’.)16 So you might argue this story is merely evidence of the fact that Banksy’s work doesn’t have inherent quality. It’s just popular hype (and social proof) that drives those eye-wateringly high prices. But our fickle aesthetic judgement has also been observed in respect of art forms that are of undeniably high quality.

  My favourite example comes from an experiment conducted by the Washington Post in 2007.17 The paper asked the internationally renowned violinist Joshua Bell to add an extra concert to his schedule of sold-out symphony halls. Armed with his $3.5 million Stradivarius violin, Bell pitched up at the top of an escalator in a metro station in Washington DC during morning rush hour, put a hat on the ground to collect donations and performed for 43 minutes. As the Washington Post put it, here was one of ‘the finest classical musicians in the world, playing some of the most elegant music ever written on one of the most valuable violins ever made’. The result? Seven people stopped to listen for a while. Over a thousand more walked straight past. By the end of his performance, Bell had collected a measly $32.17 in his hat.

  What we consider ‘good’ also changes. The appetite for certain types of classical music has been remarkably resilient to the passing of time, but the same can’t be said for other art forms. Armand Leroi, a professor of evolutionary biology at Imperial College London, has studied the evolution of pop music, and found clear evidence of our changing tastes in the analysis. ‘There’s an intrinsic boredom threshold in the population. There’s just a tension that builds as people need something new.’18

  By way of an example, consider the drum machines and synthesizers that became fashionable in late-1980s pop – so fashionable that the diversity of music in the charts plummeted. ‘Everything sounds like early Madonna or something by Duran Duran,’ Leroi explains. ‘And so maybe you say, “OK. We’ve reached the pinnacle of pop. That’s where it is. The ultimate format has been found.”’ Except, of course, it hadn’t. Shortly afterwards, the musical diversity of the charts exploded again with the arrival of hip hop. Was there something special about hip hop that caused the change? I asked Leroi. ‘I don’t think so. It could have been something else, but it just happened to be hip hop. To which the American consumer responded and said, “Well, this is something new, give us more of it.”’

  The point is this. Even if there are some objective criteria that make one artwork better than another, as long as context plays a role in our aesthetic appreciation of art, it’s not possible to create a tangible measure for aesthetic quality that works for all places in all times. Whatever statistical techniques, or artificial intelligence tricks, or machine-learning algorithms you deploy, trying to use numbers to latch on to the essence of artistic excellence is like clutching at smoke with your hands.

  But an algorithm needs something to go on. So, once you take away popularity and inherent quality, you’re left with the only thing that can be quantified: a metric for similarity to whatever has gone before.

  There’s still a great deal that can be done using measures of similarity. When it comes to building a recommendation engine, like
the ones found in Netflix and Spotify, similarity is arguably the ideal measure. Both companies have a way to help users discover new films and songs, and, as subscription services, both have an incentive to accurately predict what users will enjoy. They can’t base their algorithms on what’s popular, or users would just get bombarded with suggestions for Justin Bieber and Peppa Pig The Movie. Nor can they base them on any kind of proxy for quality, such as critical reviews, because if they did the home page would be swamped by arthouse snooze-fests, when all people actually want to do is kick off their shoes after a long day at work and lose themselves in a crappy thriller or stare at Ryan Gosling for two hours.

  Similarity, by contrast, allows the algorithm to put the focus squarely on the individual’s preferences. What do they listen to, what do they watch, what do they return to time and time again? From there, you can use IMDb or Wikipedia or music blogs or magazine articles to pull out a series of keywords for each song or artist or movie. Do that for the entire catalogue, and then it’s a simple step to find and recommend other songs and films with similar tags. Then, in addition, you can find other users who liked similar films and songs, see what other songs and films they enjoyed and recommend those to your user.

  At no point is Spotify or Netflix trying to deliver the perfect song or film. They have little interest in perfection. Spotify Discover doesn’t promise to hunt out the one band on earth that is destined to align wholly and flawlessly with your taste and mood. The recommendation algorithms merely offer you songs and films that are good enough to insure you against disappointment. They’re giving you an inoffensive way of passing the time. Every now and then they will come up with something that you absolutely love, but it’s a bit like cold reading in that sense. You only need a strike every now and then to feel the serendipity of discovering new music. The engines don’t need to be right all the time.

 

‹ Prev