Hello World
Page 26
Mercedes 125–6
microprocessors x
Millgarth 145, 146
Mills, Tamara 101–2, 103
MIT Technology Review 101
modern inventions 2
Moses, Robert 1
movies see films
music 176–80
choosing 176–8
diversity of charts 186
emotion and 189
genetic algorithms 191–2
hip hop 186
piano experiment 188–90
algorithm 188, 189–91
popularity 177, 178
quality 179, 180
terrible, success of 178–9
Music Lab 176–7, 179, 180
Musk, Elon 138
MyHeritage 110
National Geographic Genographic project 110
National Highway Traffic Safety Administration 135
Navlab 117
Netflix 8, 188
random forests 59
neural networks 85–6, 95, 119, 202, 219–20n11
driverless cars 117–18
in facial recognition 166–7
predicting performances of films 183
New England Journal of Medicine 94
New York City subway crime 147–50
anti-social behaviour 149
fare evasion 149
hotspots 148, 149
New York Police Department (NYPD) 172
New York Times 116
Newman, Paul 127–8, 130
NHS (National Health Service)
computer virus in hospitals 105
data security record 105
fax machines 103
linking of healthcare records 102–3
paper records 103
prioritization of non-smokers for operations 106
nuclear war 18–19
Nun Study 90–2
obesity 106
OK Cupid 9
Ontario 169–70
openworm project 13
Operation Lynx 145–7
fingerprints 145
overruling algorithms
correctly 19–20
incorrectly 20–1
Oxbotica 127
Palantir Technologies 31
Paris Auto Show (2016) 124–5
parole 54–5
Burgess’s forecasting power 55–6
violation of 55–6
passport officers 161, 164
PathAI 82
pathologists 82
vs algorithms 88
breast cancer research on corpses 92–3
correct diagnoses 83
differences of opinion 83–4
diagnosing cancerous tumours 90
sensitivity and 88
specificity and 88
pathology 79, 82
and biology 82–3
patterns in data 79–81, 103, 108
payday lenders 35
personality traits 39
advertising and 40–1
inferred by algorithm 40
research on 39–40
Petrov, Stanislav 18–19
piano experiment 188–90
pigeons 79–80
Pomerleau, Dean 118–19
popularity 177, 178, 179, 183–4
power 5–24
blind faith in algorithms 13–16
overruling algorithms 19–21
struggle between humans and algorithms 20–4
trusting algorithms 16–19
power of veto 19
Pratt, Gill 137
precision in justice 53
prediction
accuracy of 66, 67, 68
algorithms vs humans 22, 59–61, 62–5
Burgess 55–6
of crime
burglary 150–1
HunchLab algorithm 157–8
PredPol algorithm 152–7, 158
risk factor 152
Strategic Subject List algorithm 158
decision trees 56–8
dementia 90–2
development of abnormalities 87, 95
homicide 62
of personality 39–42
of popularity 177, 178, 179, 180, 183–4
powers of 92–6
of pregnancy 29–30
re-offending criminals 55–6
recidivism 62, 63–4, 65
of successful films 180–1, 182–3, 183
superiority of algorithms 22 see also Clinical vs Statistical Prediction (Meehl); neural networks
predictive text 190–1
PredPol (PREDictive POLicing) 152–7, 158, 228–9n27
assessing locations at risk 153–4
cops on the dots 155–6
fall in crime 156
feedback loop 156–7
vs humans, test 153–4
target hardening 154–5
pregnancy prediction 29–30
prescriptive sentencing systems 53, 54
prioritization algorithms 8
prisons
cost of incarceration 61
Illinois 55, 56
reduction in population 61
privacy 170, 172
false sense of 47
issues 25
medical records 105–7
overriding of 107
sale of data 36–9
probabilistic inference 124, 127
probability 8
ProPublica 65–8, 70
quality 179, 180
‘good’
changing nature of 184
defining 184
quantifying 184–8
difficulty of 184
Washington Post experiment 185–6
racial groups
COMPAS algorithm 65–6
rates of arrest 68
radar 119–20
RAND Corporation 158
random forests technique 56–9
rape 141, 142
re-offending 54
prediction of 55–6
social types of inmates 55, 56
recidivism 56, 62, 201
rates 61
risk scores 63–4, 65
regulation of algorithms 173
rehabilitation 55
relationships 9
Republican voters 41
Rhode Island 61
Rio de Janeiro–Galeão International Airport 132
risk scores 63–4, 65
Robinson, Nicholas 49, 50, 50–1, 77
imprisonment 51
Rossmo, Kim 142–3
algorithm 145–7
assessment of 146
bomb factories 147
buffer zone 144
distance decay 144
flexibility of 146
stagnant water pools 146–7
Operation Lynx 145–7
Rotten Tomatoes website 181
Royal Free NHS Trust 222–3n48
contract with DeepMind 104–5
access to full medical histories 104–5
outrage at 104
Rubin’s vase 211n13
rule-based algorithms 10, 11, 85
Rutherford, Adam 110
Safari browser 47
Sainsbury’s 27
Salganik, Matthew 176–7, 178
Schmidt, Eric 28
School Sisters of Notre Dame 90, 91
Science magazine 15
Scunthorpe 2
search engines 14–15
experiment 14–15
Kadoodle 15–16
Semmelweis, Ignaz 81
sensitivity, principle of 87, 87–8
sensors 120
sentencing
algorithms for 62–4
COMPAS 63, 64
considerations for 62–3
consistency in 51
length of 62–3
influencing 73
Weber’s Law 74–5
mitigating factors in 53
prescriptive systems 53, 54
serial offenders 144, 145
serial rapists 141–2
Sesame Credit 45–6, 168
sexual attacks 141–2
shoplifters 170
shopping habits 28, 29, 31
similarity 187
Slash X (bar) 113, 114, 115
smallpox inoculation 81
Snowden, David 90–2
social proof 177–8, 179
Sorensen, Alan 178
Soviet Union
detection of enemy missiles 18
protecting air space 18
retaliatory action 19
specificity, principle of 87, 87–8
speech recognition algorithms 9
Spotify 176, 188
Spotify Discover 188
Sreenivasan, Sameet 181–2
Stammer, Neil 172
Standford University 39–40
STAT website 100
statistics 143
computational 12
modern 107
NYPD 172
Stilgoe, Jack 128–9, 130
Strategic Subject List 158
subway crime see New York City subway crime
supermarkets 26–8
superstores 28–31
Supreme Court of Wisconsin 64, 217n38
swine flu 101–2
Talley, Steve 159, 162, 163–4, 171, 230n47
Target 28–31
analysing unusual data patterns 28–9
expectant mothers 28–9
algorithm 29, 30
coupons 29
justification of policy 30
teenage pregnancy incident 29–30
target hardening 154–5
teenage pregnancy 29–30
Tencent YouTu Lab algorithm 169
Tesco 26–8
Clubcard 26, 27
customers
buying behaviour 26–7
knowledge about 27
loyalty of 26
vouchers 27
online shopping 27–8
‘My Favourites’ feature 27–8
removal of revealing items 28
Tesla 134, 135
autopilot system 138
full autonomy 138
full self-driving hardware 138
Thiel, Peter 31
thinking, ways of 72
Timberlake, Justin 175–6
Timberlake, Justin (artist) 175–6
Tolstoy, Leo 194
TomTom sat-nav 13–14
Toyota 137, 210n13
chauffeur mode 139
guardian mode 139
trolley problem 125–6
true positives 67
Trump election campaign 41, 44
trust 17–18
tumours 90, 93–4
Twain, Mark 193
Twitter 36, 37, 40
filtering 10
Uber
driverless cars 135
human intervention 135
uberPOOL 10
United Kingdom (UK)
database of facial images 168
facial recognition algorithms 161
genetic tests for Huntington’s disease 110
United States of America (USA)
database of facial images 168
facial recognition algorithms 161
life insurance stipulations 109
linking of healthcare records 103
University of California 152
University of Cambridge
research on personality traits 39–40
and advertising 40–1
algorithm 40
personality predictions 40
and Twitter 40
University of Oregon 188–90
University of Texas M. D. Anderson Cancer Center 99–100
University of Washington 168
unmanned vehicles see driverless cars
URLs 37, 38
US National Academy of Sciences 171
Valenti, Jack 181
Vanilla (band) 178–9
The Verge 138
Volvo 128
Autonomous Emergency Braking system 139
Volvo XC90 139–40
voting 39–43
Walmart 171
Walt Disney 180
Warhol, Andy 185
Washington Post 185–6
Waterhouse, Heidi 35
Watson (IBM computer) 101, 106, 201
Bayes’ theorem 122
contesting Jeopardy 98–9
medical genius 99
diagnosis of leukaemia 100
eradication of cancer 99
grand promises 99
motor neurone disease 100
termination of contract 99–100
patterns in data 103
Watts, Duncan 176–7
Waymo 129–30
Waze 23
Weber’s Law 74–5
whistleblowers 42
Williams, Pharrell 192–3
Windows XP 105
Wired 134
World Fair (1939) 116
Xing.com 37
Zaghba, Youssef 172
Zilly, Paul 63–4, 65
Zuckerberg, Mark 2, 25
ZX Spectrum ix
About the Author
Hannah Fry is an Associate Professor in the mathematics of cities at University College London. In her day job she uses mathematical models to study patterns in human behaviour, and has worked with governments, police forces, health analysts and supermarkets. Her TED talks have amassed millions of views and she has fronted television documentaries for the BBC and PBS; she also hosts the long-running science podcast The Curious Cases of Rutherford & Fry with the BBC.
Also by Hannah Fry
The Mathematics of Love
(with Dr Thomas Oléron Evans)
The Indisputable Existence of Santa Claus: the Mathematics of Christmas
TRANSWORLD PUBLISHERS
61–63 Uxbridge Road, London W5 5SA
www.penguin.co.uk
Transworld is part of the Penguin Random House group of companies whose addresses can be found at global.penguinrandomhouse.com
First published in Great Britain in 2018 by Doubleday an imprint of Transworld Publishers
Copyright © Hannah Fry Limited 2018
Cover design by Geoffrey Dahl
Hannah Fry has asserted her right under the Copyright, Designs and Patents Act 1988 to be identified as the author of this work.
Every effort has been made to obtain the necessary permissions with reference to copyright material, both illustrative and quoted. We apologize for any omissions in this respect and will be pleased to make the appropriate acknowledgements in any future edition.
A CIP catalogue record for this book is available from the British Library.
Version 1.0 Epub ISBN 9781473544710
ISBNs 9780857525246 (hb)
9780857525253 (tpb)
This ebook is copyright material and must not be copied, reproduced, transferred, distributed, leased, licensed or publicly performed or used in any way except as specifically permitted in writing by the publishers, as allowed under the terms and conditions under which it was purchased or as strictly permitted by applicable copyright law. Any unauthorized distribution or use of this text may be a direct infringement of the author’s and publisher’s rights and those responsible may be liable in law accordingly.
1 3 5 7 9 10 8 6 4 2
Power
fn1 This is paraphrased from a comment made by the computer scientist and machine-learning pioneer Andrew Ng in a talk he gave in 2015. See Tech Events, ‘GPU Technology Conference 2015 day 3: What’s Next in Deep Learning’, YouTube, 20 Nov. 2015, https://www.youtube.com/watch?v=qP9TOX8T-kI.
fn2 Simulating the brain of a worm is precisely the goal of the international science project OpenWorm. They’re hoping to artificially reproduce the network of 302 neurons found within the brain of the C. elegans worm. To put that into perspective, we humans have around 100,000,000,000 neurons. See OpenWorm website: http://openworm.org/.
fn3 Intriguingly, a rare exception to the superiority of al
gorithmic performance comes from a selection of studies conducted in the late 1950s and 1960s into the ‘diagnosis’ (their words, not mine) of homosexuality. In those examples, the human judgement made far better predictions, outperforming anything the algorithm could manage – suggesting there are some things so intrinsically human that data and mathematical formulae will always struggle to describe them.
Data
fn1 Adverts aren’t the only reason for cookies. They’re also used by websites to see if you’re logged in or not (to know if it’s safe to send through any sensitive information) and to see if you’re a returning visitor to a page (to trigger a price hike on an airline website, for instance, or email you a discount code on an online clothing store).
fn2 That plugin, ironically called ‘The Web of Trust’, set out all this information clearly in black and white as part of the terms and conditions.
fn3 That particular combination seems to imply that I’d post more stuff if I didn’t get so worried about how it’d go down.
Justice
fn1 Fun fact: ‘parole’ comes from the French parole, meaning ‘voice, spoken words’. It originated in its current form in the 1700s, when prisoners would be released if they gave their word that they would not return to crime: https://www.etymonline.com/word/parole.
fn2 An outcome like this can happen even if you’re not explicitly using gender as a factor within the algorithm. As long as the prediction is based on factors that correlate with one group more than another (like a defendant’s history of violent crime), this kind of unfairness can arise.
fn3 A ball at 10p would mean the bat was £1.10, making £1.20 in total.
Medicine
fn1 More on Bayes in the ‘Cars’ chapter.
fn2 You can’t actually tell if someone is Viking or not, as my good friend the geneticist Adam Rutherford has informed me at length. I mostly put this in to wind him up. To understand the actual science behind why, read his book A Brief History of Everyone Who Ever Lived: The Stories in Our Genes (London: Weidenfeld & Nicolson, 2016).
Cars
fn1 Watson, the IBM machine discussed in the ‘Medicine’ chapter, makes extensive use of so-called Bayesian inference. See https://www.ibm.com/developerworks/library/os-ind-watson/.
fn2 The eventual winner of the 2005 race, a team from Stanford University, was described rather neatly by the Stanford University mathematician Pesri Diaconis: ‘Every bolt of that car was Bayesian.’
fn3 A number of different versions of the scenario have appeared across the press, from the New York Times to the Mail on Sunday: What if the pedestrian was a 90-year-old granny? What if it was a small child? What if the car contained a Nobel Prize winner? All have the same dilemma at the core.
fn4 There are things you can do to tackle the issues that arise from limited practice. For instance, since the Air France crash, there is now an emphasis on training new pilots to fly the plane when autopilot fails, and on prompting all pilots to regularly switch autopilot off to maintain their skills.