Book Read Free

The Crowd and the Cosmos: Adventures in the Zooniverse

Page 28

by Lintott, Chris


  ing it to the rest of us). Mike Walmsley has just finished working

  on a specialized neural network that can find the faint structures

  around galaxies which indicate a past merger. Looking for these

  faint tails of stars is important if we want to understand how

  normal galaxies react to collisions—it’s sort of the opposite tech-

  nique to finding the bulgeless galaxies I discussed in Chapter 4.

  Bigger collisions (those with more massive galaxies) leave more

  debris, so at least in theory there’s also the chance of reconstruct-

  ing the crash that led to stars being scattered out of the main gal-

  axy itself, if only we can find them. The trouble is there are very

  few surveys where experts have done the painstaking work of

  sorting through the images themselves.

  Nonetheless the results, despite the handicap of a small train-

  ing set, are pretty good. The network is indeed capable of finding

  galaxies which show signs of a merger. It’s not perfect, matching

  expert classifications 80 per cent of the time, but that’s a huge

  advance on where we were before. In the old days of 2007 or so,

  we’d have set up a citizen science project to gather more training

  data and to try and improve this figure. A few years ago we might

  have looked at how to combine human and machine classifica-

  tion, like we did in the supernova project. But in this machine-

  optimistic scenario, another year or so’s work will break the back

  of the problem, and we can expect the robots to win before too

  long. If neural networks really can be adapted to deal with such

  Three PaThs 233

  small training sets, then we won’t need large numbers of classifi-

  cations from volunteers.

  Progress might come from more of Mike’s work, which uses a

  new kind of neural network introduced to us by a colleague in

  computer science, Yarin Gal. This network not only classifies

  things, but can tell us how certain it is about its classifications.

  It’s thus producing data which is of the same kind as that pro-

  duced, collectively, by Galaxy Zoo volunteers. By the time you

  read this, we’ll be running it alongside the main project, and

  incorporating its results into our decisions about galaxies.

  Another major area of research in machine learning is in

  finding unusual objects. Actually, that’s not quite true. Finding

  unusual objects—the images in the original Galaxy Zoo data set

  of nearly a million galaxies that look least like the others, for

  example—is not a hugely difficult problem. As I wrote earlier, the

  difficult bit is finding unusual objects which are actually interest-

  ing. It’s one thing to pick out the images where the camera mal-

  functioned, where a bright star overwhelmed the chip or where

  someone turned a light on by mistake, but quite another to find

  the peas and the Voorwerp among that pile of images which are

  occasionally visually interesting but mostly scientifically junk.

  Still, progress is being made. Techniques which use ‘clustering’—

  sorting similar images into piles—look promising. If you end up

  with many piles with a few images in, it’s not a huge amount of

  effort to decide which of these outliers are truly interesting.

  Future surveys might do this as a matter of course, with their

  professional astronomers presented with a few representative

  objects from each class for consideration.

  Perhaps this focus on the unusual is in any case wrong-headed.

  If astrophysics is heading for a future where we produce truly

  enormous data sets then we might have no choice but , like

  Dr Strangelove, to stop worrying and learn to love the algorithm.

  234 Three PaThs

  Maybe we can get more insight from things that occur often than

  from the odd weird exception. Particle physicists at the LHC are,

  for the most part, already living in this future; as mentioned in

  Chapter 1, if some completely unexpected cascade of particles

  happens in this most massive and sophisticated of experiments,

  it will be discarded by a system looking for specific triggers. The

  LHC detectors simply couldn’t operate any other way without

  being completely overwhelmed by noise.

  Cosmologists, too, seeking to discover type 1a supernovae so

  as to measure the effect of dark energy in the acceleration of the

  Universe’s expansion (see Chapter 6) may not mind if explosions

  that don’t fit the expected pattern are discarded. If you can find

  enough supernovae of the right type, you may even get better

  results by assembling a nice, well-behaved group rather than

  including anything odd. For predictable science, where we’re

  testing well-defined hypotheses—something that would fit well

  into the science fair I described in Chapter 1—trusting the

  machines and hoping we end up in this future might well be a

  sensible way to go.

  A second possible future is one in which, though machine

  learning continues to improve, we never really break free from

  the tyranny of the training set. The techniques that are driving

  the artificial intelligence revolution simply are, like an easily distracted student, dependent on being walked through example

  after example after example.

  There are some ways of dealing with this. Techniques like

  transfer learning, where a neural network or other solution is

  trained on one survey before most of its guts are used to con-

  struct a new network capable of dealing with a different data set,

  do make things easier. A network trained to recognize animals in

  the Serengeti will do pretty well when deployed on images of

  wildlife in the US; though the species are different, the layers of

  Three PaThs 235

  the network that identify the animal amid the background will

  be shared between the two problems.*

  For a project like the LSST survey, where there are a thousand

  different scientific investigations that all need access to the same, consistent data set and where rare objects matter, it’s less clear

  what the solution is. After all, finding unusual and unexpected

  objects is part of the reason we build telescopes like this; when-

  ever we’ve done something fundamentally new, in this case

  monitoring such a large area of sky this frequently with such a

  powerful instrument, we have found new things.

  And if LSST is going to challenge machine learning, then once

  the data from the radio astronomers’ new toy, the SKA, starts

  flooding in then we’ll really be in trouble. In this scenario, the

  problems faced (and caused) by scientists in general and by

  astronomers in particular are odd enough that whatever Silicon

  Valley gets up to we’ll need help ourselves.† This means that well

  into the next decade, we’ll need plenty of classifications from

  humans and their expert pattern recognition systems. Indeed,

  looking at what’s coming, the existing effort across all Zooniverse

  projects won’t be enough to cope.

  We need to get smarter if, in this reality, we’re going to pre-

  serve a space for citizen science. Probably the easiest
way to do

  this is to recruit more volunteers to help. (Despite this being a

  vision of the future I’m making up, let’s assume that even in this

  universe it’s not the case that millions of people have read this far

  * This sort of work is being led for the Zooniverse by Lucy Fortson’s group at the University of Minnesota.

  † This isn’t completely unrealistic; there aren’t too many cases where the most important things are the rarest objects, or where such precisely accurate classifications are required. If Facebook identifies the wrong friend in a photo, it’s at worst slightly embarrassing, and is unlikely to lead you to predict the wrong future for the Universe.

  236 Three PaThs

  so as to be inspired to rush to the keyboard and contribute). I’m

  sure there’s more we could do,* but to really tackle the bulk of

  LSST let alone SKA data we’ll need an enormous increase in the

  amount of effort available.

  The answer may be staring us in the face. If human beings are

  game-playing creatures, then maybe we should build games

  rather than citizen science projects. Indeed, the first moves in

  this direction have already been made. Eyewire is a project run by

  researchers at MIT, who want volunteers to help map the com-

  plex structure of neurons in the brain. Volunteers see slices of the

  complex tangle of cells and are asked to separate the structures

  visible in the images from the background; additional help and

  complication is provided by the fact that these are in fact three-

  dimensional objects. It sounds complicated, but the team have

  provided an engaging and interesting interface that has attracted

  tens of thousands of volunteers to help, producing results that, in

  a preliminary study, were impressively accurate.

  Eyewire participants also chat to each other, and to helpful

  chatbots which offer advice, in real time while they’re classifying.

  It’s a much less isolated experience than our Zooniverse projects,

  where the act of classification is performed in sacred solitude so

  as to prevent groupthink (as we’ve seen, discussion and collabo-

  ration through our forums happens after the initial classification

  is recorded). Eyewire volunteers also score points for their par-

  ticipation, and an ever-growing set of challenges and competi-

  tions aims to make the game more engaging, and to bring

  classifiers back for more.

  A recent email newsletter sent to me and the worldwide net-

  work of my fellow Eyewire volunteers gives you the idea. During

  * Have you considered buying a copy of this book for a friend? Or three? Or for everyone you know?

  Three PaThs 237

  the summer of 2018, alongside the real thing in Russia there was

  an Eyewire World Cup. Participants representing a country had

  their effort counted towards their team’s total, and could win

  ‘buckets of points, six new badges and speciality swag [they’d]

  only be able to get if [they] participate’.

  These are the techniques of modern software development

  and game design, being used here to drive people towards taking

  part in a scientific project. I’m an enthusiastic participant in the

  project, so please don’t think that I consider the idea of point col-

  lecting and competitions beneath me. The reality is quite the

  opposite; their techniques work especially well on me!*

  Others have gone further, and made a game of the science

  itself. Probably the best known of these projects is an old one,

  predating even Galaxy Zoo. Fold.it asked volunteers to investi-

  gate the three-dimensional structures of proteins. In many cases,

  we know the basic chemistry of these important biological mol-

  ecules in the sense of being able to write down what connects to

  what. However, secondary effects as the atoms bond together

  will cause the protein to twist and buckle in a way that is cur-

  rently very hard to predict; it’s impossible to calculate, and any

  automated search for a likely solution runs the risk of getting

  stuck in a local minimum, a possible solution that looks plausi-

  ble (technically, it’s likely better than any solution that is similar to it) but which has not been tested sufficiently to find out

  whether it is overall the best.

  Exploring a vast range of possibilities to find a good solution

  to a problem like this is another type of task that humans have

  evolved to be good at, just like the more basic pattern recognition

  * I am, in fact, a sucker for this sort of thing. I have an enormous pile of coffee shop loyalty cards from places I will never again visit, and have used the Foursquare app to check in everywhere I’ve been since 2011.

  238 Three PaThs

  that we in Zooniverse have been using all this time. Once a struc-

  ture is proposed, it is easy to calculate its energy, based on the

  interactions between the various components. The game is to

  look for the lowest energy structure, as we trust nature to have

  found a way to fold proteins efficiently. All this effort is import-

  ant because it is the three-dimensional shape of a protein that

  determines how it interacts with other molecules, particularly in

  the complex and not fully understood dance that is molecular

  biochemistry.

  The results from Fold.it have been great, with players often

  outperforming the best computer science efforts at attacking the

  same problem in large competitions and challenges designed to

  test protein-folding methods. Sometimes the best players turn

  out to be those with some sort of relevant expertise, but more

  often the game finds people who turn out to have an instinct for

  how to play. Because the ‘rules’—things like the angle at which

  hydrogen atoms can be placed—are encoded in the game itself,

  Fold.it players don’t need to know any chemistry at all.

  It’s a neat solution, and the game is actually quite fun to play,

  even if I can’t get past the first few levels. I’ve never been patient with puzzles, but it seems I’m not that typical. A few years ago,

  when I visited the Fold.it team at the University of Washington,

  they told me that at any one time a few people are deep enough

  into the game that they’re providing real and useful results, while

  most players are still learning. If the number of useful players

  drops too far, the team will run competitions or advertise to

  encourage a new cohort of Fold.it players to work their way

  deeper into the system. The entire structure of the game is a con-

  veyor belt designed to carry the best players onward to the point

  where they’re working on scientifically useful data.

  It would be possible to play Fold.it without realizing it had a

  scientific purpose at all, though I doubt anyone does so. Other

  Three PaThs 239

  teams have gone even further, disguising citizen science projects

  within existing games. Probably the most ambitious example is a

  Swiss project that created a mission within the science fiction-

  themed online multiplayer game, Eve Online. Players of the game

  can choose to review data from Kepler in the hope of finding a planet, but also in order to receive rewards in the form of the

  game’s inter
nal, online currency. The experience is noticeably a

  bit odd, but in essentials indistinguishable from the experience

  of completing one of the other missions within the game world

  itself.

  With millions of people taking part in such games, here, per-

  haps, is the crowd we need in order to cope with the data sets of the future. In this imagined future, projects like those hosted on the

  Zooniverse will become both more ubiquitous and almost com-

  pletely invisible. In fact, the more invisible they become the better, as the more seamlessly they can be integrated into the games

  we’re playing anyway the more people will take part. Instead of

  having to make the choice to participate in science, something

  which many people find intimidating, it will just happen.

  Will this work? Maybe. Half a million people took part in the

  Eve Online planet hunting experiment, though I haven’t seen any

  discoveries come from it yet. That’s not too surprising, as these

  things take time, but it will be the acid test of whether the project has succeeded. (A similar effort, which involved more than

  300,000 players in the task of labelling features in high-resolution

  images of cells, has recently produced a paper which shows that

  the technique works, at least in this one case.) Even our modest

  experiments with gamification in the original Old Weather

  project (described in Chapter 4) seemed to work well. All we did

  was give people a rank when they started transcribing records

  from a ship, and yet it seems to have encouraged some people to

  work very hard indeed. One ‘ship’ in the project was, I’m pretty

  240 Three PaThs

  sure, a building—a training facility given, as is normal in naval

  tradition, a ship’s name. Despite the fact that it didn’t go any-

  where, people dutifully worked their way through the log book.

  (I haven’t followed this up, because the implications of being able

  to inspire people to work their way through the log of a building

  pretending to be a ship scare me a little.) With the help of games

  designers, maybe we can hide enough tasks that citizen science

  even at the scale needed for these big surveys will become pos-

  sible, and all without anyone knowing they are participating.

  This second future reality is efficient, and science gets done,

  but I’m not sure I like it. Actually, I’m certain that I don’t. I’ve

 

‹ Prev