The One World Schoolhouse: Education Reimagined

Page 10

by Salman Khan

Other educators, it should be pointed out, share my skepticism regarding the quick but shallow adoption of new classroom technologies. Duke University professor Cathy N. Davidson has written that “if you change the technology but not the method of learning, then you are throwing good money after bad practice…. [The iPad] is not a classroom learning tool unless you restructure the classroom…. The metrics, the methods, the goals and the assessments all need to change.”13

Let’s think a moment about those methods and those metrics. The dominant method in our traditional classrooms is still the broadcast lecture; one of the most cited metrics in our public debates is class size. But there’s a disconnect between those things. If a teacher’s main job is lecturing, what does it really matter how many students are in the room? Whatever the class size, how customized can instruction be when kids sit passively, taking notes, and the great majority of the teacher’s time and energy is devoted to lesson plans, grading papers, and paperwork?

The promise of technology is to liberate teachers from those largely mechanical chores so that they have more time for human interactions. In many standard classrooms, teachers are so overburdened with mundane tasks that they are lucky to carve out 10 or 20 percent of class time to actually be with students—face-to-face, one-on-one, talking and listening. Imagine what could happen if that figure went to 90 or 100 percent of class time. The student-to-time-with-the teacher ratio would improve by a factor of five or ten. And this is the metric we should care about.

Does all this sound utopian? Purely theoretical? It’s neither. In actual fact, this liberated style of teaching is already being deployed in the real world. In the next part of our book, we will examine how this came to be and how it seems to be working.

PART 3

Into the Real World

Theory versus Practice

If complaining about the status quo is easy, theorizing about how things ought to be is not much harder. Academic papers pile up, advocating this or that approach—more grading, less grading; more testing, less testing. In education as in every other field, there are fads and fashions. Looking at it positively, these fads sometimes point the way to true innovation. But other times they prove to be overly generalized dead ends, costly in terms of both money and wasted time.

As an example of this, consider the hypothesis that people have different “learning styles.” Around thirty years ago, it was proposed that some people are primarily “verbal learners” while others are mainly “visual learners.” On the face of it, this seemed a reasonable idea. Some people, after all, seem better with names than with faces, and vice versa. Confronted with a user’s manual for some new device, some people will read the text while others will go straight to the diagrams. Ergo, visual learning versus verbal learning. This seemingly commonsensical observation gained favor and thereby “created a thriving commercial market amongst researchers, educators, and the general public.”1 Separate exercises and even textbooks were devised for each purported learning style. Shiny new teacher’s guides were printed up and put on sale to willing school districts. As many as seventy-one different learning styles had been suggested.

There were only two problems with the “learning styles” theory. The first was that it really didn’t hold water. In 2009, a report published in Psychological Science in the Public Interest reviewed the major studies that had suggested that people have different learning styles. The great majority of studies didn’t meet the minimum standards to be scientifically valid. The few that did seem valid—that rigorously examined whether instructing people in their purported learning style really improved their results—seemed to contradict the thesis. Teaching according to “learning styles” had no discernible effect.

The second problem was that given the very laborious chores of designing research studies, compiling sufficient data, analyzing the data, and publishing the results, it took thirty years to find this out. Who knows how much money and time—both teachers’ and students’—was squandered during that three-decade experiment.

While thirty years seems egregious, some significant time lag is probably unavoidable when it comes to testing new approaches, and at the very least this should make us cautious when a promising learning theory comes along—especially if it purports to be a universal theory. The human brain is so complex that we should never become dogmatic about a particular approach being the best way for everyone.

In medicine, I can give a real pill of a certain drug to one group of patients and a sugar pill—the placebo—to another group. After a few months or years of this, I can then see whether the group taking the real pill had a statistically significant improvement in their health versus the placebo group. If this happened, I can generalize that the particular drug would be appropriate for patients like those in the test groups. What I can’t do is overgeneralize. I can’t posit that the same drug would necessarily work for different populations of patients, still less patients with different diseases.

In fields like education, however, this tendency to overgeneralize is a constant danger.

Say I want to figure out the best way to make educational materials, maybe science videos. My theory is that videos that show a dialog between a student and a professor will be more effective than just the lecturer alone. I get two sets of videos produced that cover the same topic—say Newton’s laws—in both styles. I then randomly assign students to watch either set of videos and give them an assessment. Say I find that the students who watched the dialog version perform significantly better, enough of a difference that it would be unlikely due to chance alone. I therefore publish a paper titled “Dialog More Effective Than Lectures When Teaching Science Through Videos.”

Now, would it be appropriate to make this generalization? Assuming the same professor was in both videos, maybe he in particular is more effective at dialog than lectures. Maybe another professor might have been better in the lecture style. Maybe the professor was uniformly mediocre in both, but the dialog videos had the benefit of a student with the knack for asking the right questions and summarizing the professor’s words. Maybe getting that student to make pure lectures would be even better because they would be unfettered by the professor. Perhaps the results would have been different had the topic been relativity or if the lecture videos didn’t show the professor’s face or if a different type of assessment were used.

The point is that the only conclusion that can responsibly be made from this experiment is that the particular videos that happened to be made in the dialog style performed better than the particular videos made in the lecture style for that particular topic and according to that particular assessment. It says nothing about whether in general all science videos should be in the dialog style.

Now, if you are properly skeptical of everything I am saying, a thought should be nagging you right now: Sal has been writing this entire book about ways to improve education, and now he is saying that it is irresponsible to make sweeping statements about the best way to educate. The difference is in how the arguments are made and how general the statements are. I am arguing for a particular set of practices that are already showing results with many students and can be tested and refined with many others; I’m not arguing for a generalized theory.

I am not saying that “science” has proven that any self-paced videos and exercises coupled with any in-class projects will be better than any 300-person lecture. In fact, I think that statement is outright false. What I am saying is that although we are in the early stages of this adventure, we are seeing compelling evidence—both anecdotal and statistical—that particular types of practices with videos and software seem to be resonating with particular students and teachers. I really don’t know if it is the absolute best way to reach every student—frankly, there probably are students who might do better in the more passive Prussian model. What we want to do is to use the traction and data we have to continue to repeatedly refine and test our particular content and software and make it as effective as
possible for as many people as possible.

My personal philosophy is to do what makes sense and not try to confirm a dogmatic bias with pseudoscience. It is grounded in using data to iteratively refine an educational experience without attempting to make sweeping statements about how the unimaginably complex human mind always works. Use video-based lectures for certain contexts; use live dialogs, when possible, for others. Use projects when appropriate and traditional problem sets when appropriate. Focus both on what students need to prove to the world through assessments and on what students actually need to know in the real world. Focus on the pure and the thought-provoking as well as on the practical. Why restrict oneself to one or the other? The old answer was that there wasn’t enough time to do both. Thanks to technology, that excuse no longer applies. Nor does education need to be hostage to any dogmatic theory. We can now craft more particular and individual solutions than ever before, thanks to the availability of data from millions of students on a daily basis.

This is not theory and this is not the future. It’s happening in the real world and it’s happening now.

The Khan Academy Software

Let’s do a quick rewind to 2004 to revisit how all this began.

Back then I still had my day job at the hedge fund. The Khan Academy, as well as the YouTube videos that have come to be its most visible feature, was far off in the future. I was just a guy who did a little private tutoring by telephone.

Right from the start, I was troubled, even shocked, to realize that most of my tutees—even though they were generally motivated and “successful” students—had only a very shaky grasp of core material, especially in math. There were many basic concepts that they sort of half understood. They might, for example, be able to describe what a prime number was (a number divisible only by itself and 1), but not explain how that concept related to the more general idea of least common multiples. In brief, the formulas were there, the rote stuff had been memorized, but the connections were missing. The intuitive leaps had not been made. Why not? Chances are that the material had been gone over too quickly and shallowly in class, with related concepts ghettoized by their artificial division into units. The bottom line was that kids didn’t really know math; they knew certain words and processes that described math.

This half-understanding had consequences that showed up very quickly during the one-to-one tutoring sessions. In response to even the simplest questions, students tended to give very tentative answers—answers that sounded like guesses even when they weren’t. It seemed to me there were two reasons for this lack of assertiveness. The first was that because the students’ grasp of core material stopped short of true conceptual understanding, they were seldom quite sure exactly what was being asked or which conceptual tool should be used to solve the problem. To offer a rough analogy, it was as if they’d been taught, in two different lessons, how to use a hammer and how to use a screwdriver. Told to hammer, they could hammer. Told to put in a screw, they could use a screwdriver. But told to build a shelf, they’d be paralyzed even though it was just a combination of concepts that they should have learned.

The second issue was simple confidence. The kids gave wishy-washy answers because they knew deep down that they were bluffing. This, of course, was not their fault; their previous education had been of the Swiss cheese sort and had left them teetering on an inadequate foundation.

In terms of the live tutoring sessions, these deficiencies in core understanding became a big headache. Identifying and remediating each student’s particular gaps would have been hugely time-consuming, and would have left little time or energy to move on to more advanced concepts. The process, I imagine, would also have been painful and humiliating for the student. Okay, tell me what else you don’t know.

So with the goal of creating a time-efficient way to help repair my tutees’ educational gaps, I wrote some very simple software to generate math problems. To be sure, this early software was pretty basic. All it did was spit out random problems on various topics such as adding and subtracting negative numbers or working with simple exponents. Students could work on as many of these problems as they needed to, until they felt they had a concept nailed. If they didn’t know how to do a particular problem, the software would show steps for coming to the right answers.

But the primitive problem-generating software still left a number of things unaddressed. My tutees could work as many exercises as they chose to, but I, the tutor, had no real information on the process. So I added a database that allowed me to track how many problems each student got right or wrong, how much time they spent, even the time of day when they were working. At first I thought of this as a mere convenience, an efficient way of keeping tabs. Only gradually did the full potential usefulness of this feedback system occur to me; by expanding and refining the feedback I could begin to understand not only what my students were learning but how they were learning. In terms of real-world results, this struck me as important.

For example, did students spend more time on problems they got right or on problems they got wrong? Did they grind their way through to solutions (by logical steps), or see answers in a flash (by pattern recognition)? Were the mistakes just carelessness or the result of an inability to complete a strand of connections? What happened when a student truly “got” a concept? Did this happen gradually by seeing a repetition of examples, or in a sudden Aha moment? What happened when students did a bunch of problems focusing on one concept rather than a mixed hodgepodge of problems focused on many types of concepts?

Working with my small roster of tutees, I was fascinated by the variations in the data on these sorts of questions about the how of learning. As we shall see, this accumulated data would over time become a valuable resource for teachers, administrators, and educational researchers.

In the meantime, however, I had more immediate difficulties to solve. As my number of students grew, I came closer and closer to hitting a wall that millions of teachers have hit before me when attempting to personalize instruction. How could I manage twenty or thirty students working on different subjects, at different grade levels, each at his or her own pace? How could I keep track of who needed what and who was ready to advance to more challenging material?

Fortunately, this kind of information management is exactly what computers are good at. So the next step in the refinement of the software was to devise a hierarchy or web of concepts—the “knowledge map” we’ve already seen—so that the system itself could advise students what to work on next. Once they’d mastered the addition and subtraction of fractions, for example, they could move on to simple linear equations. Having the software hand out the “assignments” left me free to do the essentially human parts of the job—the actual mentoring and tutoring.

But this raised an absolutely crucial question: How could I determine when a student was ready to advance? How would I define “mastery” of a given concept? This proved to be a philosophical as well as a practical question.

One possibility was to use the traditional percentage of right answers that most exams defined as “passing.” But this just didn’t feel right. In a traditional classroom, you could pass with 70 percent—which meant there was almost one-third of the material that you didn’t know. I could arbitrarily raise my own passing grade to 80 or 85 or 90 percent, but this seemed rather lazy and beside the point. As we’ve seen, even a 95 percent grasp of basic concepts led to difficulties later on, so why settle for that?

The issue, I eventually realized, came down not to some numerical target but to a much more human consideration: expectations. What level of application and understanding should we expect from our students? In turn, what sort of messages are we sending by way of our expectations and the standards they imply? My gut feeling was that in general the expectations of teachers and educators are far too low, and, further, that there is something condescending and contagious in this attitude. Kids come to doubt their own abilities when they sense that the bar is being set so low. O
r they develop the corrosive and limiting belief that good enough is good enough.

I eventually formed the conviction that my cousins—and all students—needed higher expectations to be placed on them. Eighty or 90 percent is okay, but I wanted them to work on things until they could get ten right answers in a row. That may sound radical or overidealistic or just too difficult, but I would argue that it was the only simple standard that was truly respectful of both the subject matter and the students. (We have refined the scoring details a good bit since then, but the basic philosophy hasn’t changed.) It’s demanding, yes. But it doesn’t set students up to fail; it sets them up to succeed—because they can keep trying until they reach this high standard.

I happen to believe that every student, given the tools and the help that he or she needs, can reach this level of proficiency in basic math and science. I also believe it is a disservice to allow students to advance without this level of proficiency, because they’ll fall on their faces sometime later.

With those core beliefs in place, I still had the practical question of how to cultivate and measure 100 percent proficiency. Typically, I had no grand theory about this; I just decided to try the heuristic of ten-in-a-row. My reasoning was that if students could correctly solve ten consecutive problems on a given subject, it was a good indication that they truly understood the underlying concept. Lucky guesses would fall short, as would mere “plug-ins.” Admittedly, ten was an arbitrary number of solutions to shoot for; I might have gone with eight or twelve or whatever, and different concepts probably require a different number. But insisting on a particular number of right answers gave students something to aim at. If they fell short, they could always go back and review. If they needed more problems to try, the software would create them.

‹ Prev Next ›