The Crowd and the Cosmos: Adventures in the Zooniverse
Page 22
here. They’re an exceptionally thoughtful and careful bunch, and
Phil Marshall in particular—one of the three leading scientists
alongside Aprajita Verma and Anupreeta More—is one of the
nicest people you could ever hope to meet. As a result, the idea of
labelling volunteers, in all their human complexity, with a rating
derived from nothing more than a few clicks on a website was
anathema to Phil. In all the team’s papers, therefore, they set up a
system where they represent each volunteer by an ‘agent’. An
agent is a representation of the volunteer, but necessarily an
imperfect one, as the agent knows only about the volunteer’s
behaviour within the project. We can then label the agent, know-
ing they are a poor reflection of their human counterpart. I’m
less fastidious, and am happy to trust that you know I’m not
really reducing people in all their glorious complexity to their
performance in one project.
This sort of analysis is useful for checking on the progress of
the project; looking at the distribution of skill one sees that the
average volunteer is pretty good. While both the highly skilled
and the more confused contribute a few classifications, those
who go on to contribute tens of thousands of classifications are
all highly skilled. This data alone doesn’t tell you whether people
are learning as they go, so that their skill inevitably improves
over time, or whether those who are struggling are simply giving
up, but it does show that we’re not wasting people’s time.
178 From Supernovae to Zorill aS
The real power comes when we move beyond this simple,
single value. The SpaceWarps model sets up what’s known as a
confusion matrix for each volunteer, keeping track of four key
numbers. For each contributor, we estimate first the probability
that they will say that there is a lens when there is indeed one
there; second, the probability that they will say there is a lens
when there isn’t; third, the probability that they’ll say there is
nothing there when there isn’t; and finally (deep breath) the
probability that they’ll say there’s nothing there when there is
indeed a lens.
Armed with this information, we can find ways to get more
knowledge out of the system. There are four kinds of volunteer
to consider. There are those who are always, or nearly always,
right; the SpaceWarps team called these ‘astute’ volunteers,
and they are very welcome in any project. There are also those
who are always wrong, who miss lenses when they’re there
and who see them when they’re not. These people are just as
useful—someone who is wrong all the time provides just as much
information as someone who is right all the time, as long as you
know that they’re wrong.* So because we’re able to use the
simulations to measure how people are doing, we can increase
the amount of useful information we can get from the project.
There are two more categories of people. There are optimists,
who see a lens where there isn’t one but are reliable when they say
there’s nothing there, and pessimists, who miss lenses but are
accurate when they do identify one. Once we’ve spotted someone’s
proclivities, we can work out how seriously to take their opinions,
but we can also start to play games with who gets to see what.
Before we throw away an image, confident that there’s nothing
there, then perhaps we should make sure to show it to an optimist,
* You may find this a useful strategy for life in general.
From Supernovae to Zorill aS 179
just in case. If we think we’ve found a lens, then we should show it
to a pessimist—if even they reckon there’s something there, then
our confidence should grow sky high that we’re on to something.
Playing with task assignment in this way promises much more effi-
cient classification, and more science produced more quickly.
The only trouble is that this gets complicated fast. With tens of
thousands of people participating in even a small project, and
hundreds of thousands of images to view, the number of possible
solutions is unbelievably large. Even when we consider that our
choice is restricted by the fact that not everyone is online at the
same time, complex mathematics is required to work out what a
sensible path is. Work by Edwin Simpson of Oxford University’s
Department of Engineering showed quickly that clever task assign-
ment could produce results of the same accuracy with nearly one-
tenth of the classifications, an enormous acceleration and one that
is especially welcome when looking for the rarest of objects.
SpaceWarps is among the most sophisticated Zooniverse pro-
jects in how it treats its data, and in offering a faster route to science it seemed to be a template which we could apply in all of the
other fields that we’re working on. Plenty of work on this sort of
task assignment has been done by researchers in a field of com-
puter science known as human–computer interaction, typically
using Amazon’s Mechanical Turk system to connect researchers
with those who will complete tasks for small payments.
Yet things aren’t so easy with citizen scientists who are them-
selves volunteers, and a simple experiment with a project we ran
called Snapshot Serengeti shows why. Whenever I lecture on the
Zooniverse, one of the most common questions is whether we
really need humans given all the progress in machine learning. I’ve
hopefully dealt with this already, but the disease seems especially
acute around projects like this one, which uses motion sensitive
cameras to monitor wildlife in the Serengeti National Park.
180 From Supernovae to Zorill aS
The images the cameras produce are wonderful, beautiful, and
varied. Some would easily grace the cover of National Geographic,
while others are more quirky. The team’s favourite comes from a
camera programmed to take three photos in quick succession
once triggered. The first of this particular sequence shows a
hyena staring at the camera as the flash goes off. The second
shows the same hyena skulking innocently in the background,
but the third shows some sharp canines and the inside of the
hyena’s mouth. Apparently getting chewed by the local wildlife
was a common end for the project’s cameras (not a problem
Penguin Watch faced in Antarctica), and elephants using camera
stands as scratching posts didn’t help either.
Despite the immense variety in what the project’s cameras
capture, there seems to be something about the task of identi-
fying animals in images that seems to convince people they can
quickly write a script or produce an off-the-shelf machine-
learning solution that will solve the problem. It turns out it’s
harder than it looks. While we’ll share our data with anyone
who wants it, no one’s yet come up with a completely robust
solution yet. I have a soft spot for the attempts of a team we
worked with at the Fraunhofer Institute in Munic
h (home to the
inventors of the MP3, the format which encodes music on your
phone and other digital devices) who developed an especial dis-
like of ostriches, which thanks to their bendy necks and bandy
legs turn out to be able to twist into a computer-defying set of
shapes.
Nonetheless, some tasks are definitely easier than others.
Wildebeest are common enough to trigger complaints from
regular classifiers, and so building up a suitable training set for
them will be easier than, for example, doing so for the small,
skunk-like zorillas which appear in one in every three million
images (Figure 22). Easiest of all, though, is to identify the images
From Supernovae to Zorill aS 181
Figure 22 A rare image of a zorilla as captured by the Snapshot Serengeti cameras.
with precisely zero animals in them at all.* Almost three-quarters
of the data consisted of such images; either a camera would mal-
function, and take image after image of nothing until its memory
card was used up, or waving grass would do a good enough
impression of a passing lion that it too would be captured.
We know that volunteers care about getting science done, and
we hate wasting their time, so removing these animal-free images
was an obvious thing to do. What happened next was surprising.
As volunteers saw more and more images with animals in, the
total number of classifications the project received dropped.
People might like contributing to science, but in trying to make
* Notice I do not, as I would have done once upon a time, call these ‘blank’
images. I was cured of that when speaking to a room full of plant scientists.
Pointing at an image of a tree and grassland, I confidently told them there was
‘nothing there’ and saw the audience rise up as one. Apparently they call it plant blindness.
182 From Supernovae to Zorill aS
it faster for them to do so we’d done something that made the
experience less pleasing, and we weren’t quite sure what it was.
One theory suggested that there was a total amount of work that
people would be willing to invest in the project. It’s faster and easier to say that there is nothing in an image than it is to distinguish a
Thompson’s from a Grant’s gazelle, and so maybe by giving them
more to do we were using up people’s effort faster. I don’t think that’s the right explanation; we know that all else being equal encountering an animal in Snapshot Serengeti made people more, not less
likely to keep classifying, and so it seems to me that we would be at least as likely to encourage as discourage people from classifying.
Instead, I think we’d changed how exciting the project seemed
to people. Whereas before they’d seen nothing, then nothing,
then nothing again, nothing again, and then suddenly a zebra,
now they endured the apparent tedium of zebra followed by
zebra followed by wildebeest followed by yet another zebra.
While almost all the research on how to assign tasks for effi-
ciency uses paid subjects, who can be assumed to stay put regard-
less, our volunteers are free to walk away at any point. By trying
to make things better, we made their experience worse. The
choice between getting more science done and providing ‘fun’
online is stark, even with such a simple experiment.
This, of course, won’t be a surprise to any game designers who
are reading. Since the first computer games bleeped their way
into our collective consciousness in the 1970s and 1980s, players
have been participating in an enormous collective experiment to
find what will keep us clicking. While almost all games pay atten-
tion to this, it’s most obvious in simple phone games that occupy
so many commutes, most of which are optimized to produce
just the right level of micro-excitement to keep us clicking. I
don’t mean to sound snobbish about this, not least because I’m
currently about 500 levels into something called Two Dots. We, as
From Supernovae to Zorill aS 183
humans, are just wired to respond in this way, and we behave as
if we’re playing games even when it’s not deliberate. In the early
days of Galaxy Zoo, lots of people told us that their experience of
classifying galaxies was like eating crisps; you don’t mean to have
just one more, but you do, again and again and again, rewarded
with the next image each time you click on a galaxy.
The implications of this seem obvious. For all that Zooniverse
projects are scientific projects, they are also experienced as
games. We could, perhaps, make them much more popular by
manipulating the data so that a suitable fraction of animal-free
images were served without worrying about whether such classi-
fications were useful. We might take the most spectacular images
and make them appear more frequently, even if we already know
what they show, making further classification redundant. If
manipulating the data this way makes for more classifications
and hence more science overall, then perhaps there’s no harm.
Plenty of projects have taken this route, and walked much fur-
ther along it than we have. It feels like an obvious choice. If people like playing games, and are willing to contribute their time to do
science, then a game that lets you contribute to science feels like
the best of both worlds. But this feels like a step too far for me.
Our participants take part because they want to contribute to
science; it feels wrong to feed them images that we don’t need
help with. This kind of dilemma will only become more acute
once machines start picking up more of the slack, and we start
deciding what is really worth sending to classifiers.
At the end of the book, I want to use these ideas to talk about
where citizen science is going. First, though, I need to tell you
about what has clearly become the real strength of public
participation in Zooniverse projects—the ability to find the truly
unexpected, and to uncover stories of objects which would
otherwise remain forever hidden from view.
7
SERENDIPITY
In SpaceWarps and other projects, it’s clear that people, unlike machines, cope well with the unexpected. The example of the
red-lensed galaxy shows that nicely, but it’s a risky argument. As
training sets become larger, it’s going to become harder to sur-
prise a machine, and so taking this to its logical conclusion one’s
left with a vision of the citizen scientists of the next decade being chased from task to task as machines improve. Hunting the rarest of objects might still be a useful occupation, but the oppor-
tunities to make a real contribution will become scarce. Given all
the good that comes from projects that offer everyone a chance
to help science, that would be a shame.
It’s premature, I believe, to declare Zooniverse-style citizen
science a passing phase. There is a more interesting future in
store—one in which the line between the work done by ama-
teurs and professionals, and between the amateurs and the pro-
fessionals themselves, blurs still further. Evidence for this future
is found in stori
es from many projects, in discussions that spring
up around the unusual and the unexpected. I could give many
examples, but let me tell you about two that I was personally
close to. They’re stories of old-fashioned science, in which pro-
fessional and citizen astronomers used a bucketload of ingenuity
186 Serendipity
to work out the solutions to new mysteries. These stories involve
groups of people from a variety of backgrounds and with myriad
life experiences.
The first story dates back to the crazy first year of running
Galaxy Zoo. The forum had quickly become a busy place, with
posts about anything from astrophysical techniques to tea, but
among the creativity of that community there was plenty of chat
about what people were seeing on the site. A Dutch school-
teacher named Hanny van Arkel was the first to point to a blue
blob that appeared near an otherwise unremarkable galaxy in
one of the images.
The galaxy had a catalogue number—IC2497—and Hanny
named the blob the ‘Voorwerp’ (Plate 11). When, a little later, the
Galaxy Zoo team found her posts, I think we all assumed that it
was a Dutch technical term. It turns out to mean ‘object’, or
‘thingy’, but ‘Hanny’s Voorwerp’ is now the official name of the
blob, endorsed by several major journals. To be honest, at the
start the most interesting thing about the Voorwerp was prob-
ably the amusing story of the name, but Hanny wanted to know
what it was.
If I’d come across the blob while sorting through images
myself, I think I’d have ignored it, placing it to one side while getting on with more straightforward tasks, if I would even have
noticed it at all. Yet Hanny, the citizen scientist, was captivated by the discovery and pressed us to find out more. It was an early lesson that experts aren’t always right; not only do highly trained
professionals occasionally make silly mistakes, they also can’t
always be trusted to focus on what is truly interesting.
There are plenty of examples scattered across the scientific lit-
erature. Take the work of a group led by Trafton Drew at Harvard
Medical School, for example, presented in the journal Psychological Sciences with an arresting title straight from a horror film: ‘The
Serendipity 187
invisible gorilla strikes again’. (Astronomers need to have more