If the MyLifeBits project is any indicator, the answer is, quite a ways.
MyLifeBits is an ongoing project involving Gordon Bell and others at Microsoft Research. Inspired in part by Vannevar Bush’s 1945 article, “As We May Think,” which described an electronic memory extender that Bush called the memex, the MyLifeBits team set out to put as much of Bell’s life in digital form as possible. They focused initially on the simple task of digitizing his legacy materials, such as his past and current writings, photos, and CD collection.
Bell, for those unfamiliar with him, is one of the grand older men of computing. He joined the then-new and now vanished Digital Equipment Corporation (DEC) in 1960, worked on many projects there, including the early multiprocessor PDP6 system, and was the father of the very influential and highly successful VAX minicomputer architecture for DEC. He worked in multiple multiprocessor-computer companies, co-founded The Computer Museum, and ultimately joined Microsoft Research, where he still works today. (Disclosure: One company that Bell co-founded, Encore Computer Corp., purchased a company I co-owned. I met Bell but never really worked with him; my loss.)
When Bell and the MyLifeBits team began the project somewhere around 2000, they estimated that a terabyte of storage would be enough to hold the readings and writings of a typical 80-year human life. At the time, a terabyte was an imposing quantity of storage, though disk price and capacity trends were making it an increasingly more approachable figure even for those without IT departments.
Even now, if you don’t follow disk storage a terabyte may sound like an expensive and large amount of storage, but it’s not; many of us could afford it, and companies routinely fill many times that much online space. You can buy 300GB disks for a little over a hundred bucks, so hitting a raw terabyte of capacity will set you back less than $500. With that much storage, you’d almost certainly want some integrated redundancy, but the chassis and controllers necessary to provide those features aren’t expensive; vendors sell pre-packaged terabyte storage devices in the $1.2K to $2K range. So, though it would obviously be a luxury, the storage itself is not an obstacle. (The PC sitting at the end of my desk, one I built and plan to make my primary system, has a terabyte of useful space in a disk array with built-in redundancy.)
The future of storage, of course, looks even more promising, and the improvements are, like those in all the key technologies I cited, arriving with exponentially increasing speed. In the short gap between the first and second complete drafts of this column, Seagate announced a 500GB disk drive that uses perpendicular recording, a technique that’s been heading toward mass commercialization for some time. A Seagate spokesperson said that we can expect that capacity to grow by a factor of five over the next three to five years.
Meanwhile, memory vendors are exploring a variety of new technologies in their quest for ever faster, larger, and less volatile non-moving storage. Nantero, a start-up company targeting the potentially huge market for non-volatile RAM (NRAM) announced recently that it had created the basis of a 10G-bit NRAM storage array. This development is nowhere near a product, and there’s no guarantee that their particular approach will work, but it’s a good bet that some company will be creating multigigabyte NRAM modules within a decade.
The MyLifeBits team was definitely right to conclude that storage will not be a problem.
The challenge became filling that storage. Scanning all the paper materials and capturing all the already digital content, such as songs on CDs and existing digital photos, was a matter of labor and software. Bell had a decided advantage over most of us in these efforts, because he was working with a technical team that provided the labor, but any of us with either the time and expertise to do it ourselves or the money to pay someone else to handle it could take the same steps.
While putting existing content online, they also started capturing the content Bell was using and generating in his daily life. Photos were easy; Bell used digital cameras. Saving instant messages is no harder than turning on some straightforward software logging tools. You can record phone calls with digital recorders or, even better, by using VoIP (Voice over Internet Protocol) phone software on your PC and capturing the calls directly.
Bell and his team did it, and many of us could do it, too, were we so inclined.
Put all that together, and you end up with a large surrogate memory.-one that Bell has said he’s come to depend on rather heavily.
As you might expect, the MyLifeBits team quickly realized that they could reasonably and inexpensively assemble a great deal more than a terabyte of storage, that demanding more storage, even a great deal more, was no problem, and that they had many more ideas for filling that electronic space. Thus, the MyLifeBits project evolved its mission to become one in which they would capture everything possible, not just existing paper, photo, and video content.
Today, they’re dealing with an ever-expanding realm of digital materials. Bell is wearing a small camera under his hat and recording conversations and meetings as they happen. He’s using a GPS that’s constantly tracking and recording his location. A BodyBugg armband full of sensors is monitoring and capturing data about his body, including the number of calories he’s burning throughout the day. Logging software working with the graphical user interface on his system is recording everything he’s doing on the computers he uses.
Most of this is manageable for the rest of us. Even recording video full-time is possible, albeit with huge storage requirements, because a decent digital video camera costs less than a grand. Want to log the TV shows you watch? Set up a media center PC and record them. Movies you see at theaters are harder to copy, but by waiting until their DVD versions hit the stores or they appear on cable TV you can get the data, admittedly with a time delay (and possibly in violation of copyright laws via software the entertainment industry would prefer you not use).
The constantly accelerating technology trends make the data-capture ever simpler, of course, with cameras shrinking in size and improving in quality, more and more content becoming instantly available digitally, and so on.
The more data you store, of course, the larger an information organization problem you face. The MyLifeBits team ran into this issue big-time.
They began by using a simple PC file system and some file naming conventions, but over time this approach was not, as you might imagine, up to the task. They simply had too much data.
Today, they use a software system built around a SQL Server database. They store both the raw content-text, pictures, email, whatever-and some associated attributes and comments (“metadata” in geek-speak). They also store links among the items in the databases. For example, a photo of a meeting might link to the transcript of that meeting and to entries for all the participants. The combination of the database’s searching power, decent metadata, and links between data items makes the stored information quite powerful-though still not as easy to access as they’d like. The more metadata and the more associations between items they can get Bell to make, the more useful the data becomes.
The power of the links, by the way, is much greater than might be initially apparent. For one thing, the links bring the database closer to the way our brains seem to operate than it would be without them. Trying to remember the name of that guy you met briefly in a meeting last Tuesday? Look up either the entries for the day or for the meeting, whichever works best for you. Searching for an article you read on a Web site while on a business trip in London? Start with the GPS location of the hotel you were in and follow the position-based links. A strong set of links goes a long way toward mimicking the multi-way associational memory store our brains provide.
Of course, the more Bell has to add metadata and create links, the more time the system is demanding from him, turning it from a servant to a master. The MyLifeBits team works constantly to find ways to generate links automatically and to make comments and other metadata as easy as possible for Bell to supply.
If you follow the news at all, the problem of finding associations in vast quantities of data will sound very familiar; it’s certainly high on the NSA’s to-do list. Once again, technology developments, this time in data mining, will help our cause-and have the potential to hurt us, of course, as the life-web of data we weave becomes a commodity anyone can search; there’s a dark side to everything.
But is it valuable?
A database of this magnitude and highly personal nature is interesting (and raises some scary issues; more on that below), but you have to wonder if it’s useful. In an article on this project in Communications of the ACM, Bell and his co-authors commented on how valuable the information store had become:
“Having a surrogate memory creates a freeing, uplifting, and secure feeling-similar to having an assistant with a perfect memory.”
The value of this stored information has proven to be so high, in fact, that Bell has changed the way he works and lives. If he has a spare moment, for example, he might well very briefly visit a Web page on the off chance that he might later want to have its contents in his “memory.”
The more valuable something is, of course, the more we feel its loss when it goes missing. A hard drive crash resulted in the loss of four months of captured Web pages, something that Bell has commented he felt as an emotional blow.
As interesting, successful, and wide-reaching as the MyLifeBits data has been, its team has noted that the list of things they’d like to do is still growing. From content, such as paper books Bell reads, that they’ve chosen not to capture for copyright reasons, to limitations of the current software, the system’s flaws and areas of potential improvement are many. For example, if they’re not already doing it, they could indulge in speculative recording, in which software and/or human agents analyze Bell’s existing stored information and add more data they think he might find interesting. They’ve commented that they always end up regretting not the data they capture, but rather the information they don’t.
They’re also acutely aware that they’re only touching the edges of the possible value of the information. They’ve noted, for example, that the body statistics might be very useful over time in spotting health trends and issues and the reasons underlying both.
One of the difficulties they face in figuring out what to record is a phenomenon well known to many of us, and certainly to writers: you frequently can’t know the value of a piece of information until well after you’ve obtained it. Traffic data for the north side of town is not interesting or useful when you live and work on the south side, until, of course, you have to run an errand in the other direction. Tidbits of all sorts of apparently useless readings and experiences turn up in my fiction all the time. It’s the nature of the way our brains work, so it’s only reasonable that the same should be true of our digital memory extenders.
The MyLifeBits team also understands that the amount of value Bell gets from the stored information depends a great deal on how easy it is for him to search that information quickly. Making it easy to search by any of the available types of paths and links is obviously key, but that’s just a beginning. They’re grappling with data visualization alternatives as they try to find the best ways to present different types of information. They’ve found, for example, that a screen saver that throws up semi-random selections of photos and short video clips has proven useful both as a way to refresh Bell’s (physical) memory and as a means for making it easy and even fun for Bell (and other viewers) to add more metadata.
These folks are by no means alone in these efforts, of course; researchers all over the world are grappling constantly with the challenge of making the constantly growing store of digital information more useful and easier to use. I already mentioned the NSA, but they’re far from alone in being obsessed with data mining; every large company that’s ever gotten a taste of your credit cards would like to know more about you.
Transactional data is also but one of the types of information researchers are working on mining. At an Intel Developer Forum last year, for example, I saw a demonstration of technology from the Diamond project, a collaboration between Intel Research Pittsburgh and Carnegie Mellon University. The Diamond project focuses on ways to make it easier for users to search large databases of images, such as photos or medical images. In the demo I watched, the user wanted to find a photo of a particular speaker from the previous year’s conference, but none of the photos had any metadata or even labels or dates associated with them. Because he was seeking a person, the user first told the software to search for images with faces in them. Because the speaker always wore a blue shirt when presenting, the user next instructed the software to search for blue. In only a few steps, he found the photo he wanted. Sure, the demo was carefully orchestrated, but the underlying algorithms and software at work were quite impressive nonetheless.
The more advances we make in search technology, the more useful the information becomes.
Issues abound
r /> Of course, the issues this type of stored information raises are both many and profound.
Privacy is an obvious and unavoidable concern with any such effort. Store your life online, and you’d better either tightly control who can see the resulting database or abandon any hope of privacy. You would also quite reasonably want fine-grain control, so that different people could see only different portions of the information.
What happens, though, when the government wants to subpoena your memory?
Speculative recording is a cool notion and one you might find very useful, but from the moment it started the database-your online memory-would contain information you’d never seen. Would that data be part of what others should consider to be your memory? Could you be reasonably blamed for forgetting it (e.g., your friend’s first recital)? Subpoenaed for witnessing it (think porn)?
The problem gets even tougher if you allow others to add information they believe you should know. I don’t even want to think about the domestic arguments that capability could cause.
If such extended memories were to become common, we’d run smack into major issues regarding the online memory rights of those people, notably children but also older people living with caregivers, under the legal control of others. If you hated it when one of your parents poked through your stuff, how would you feel if they’d been running their search algorithms over your stored phone calls, instant messages, music selections, and so on?
Everything I’ve described is possible today, and a lot of it is going on right now in the MyLifeBits project. Even within the very limited restrictions of today’s technologies, this effort is blurring the line between our biological selves and what we might reasonably think of as our intelligences and memories. What Bush described in 1945 as a vision of the far future is now, like so many speculations, happening, at least on a small scale, in this project. More importantly, none of it is beyond the reach of a moderately wealthy person. Even the wealth is only necessary if you want a support team to do the work for you; the hardware and software would set you back less than the price of a low-end car. As the rate of improvement of the supporting technologies continues to increase, the cost will only lessen as the capabilities grow.
Jim Baen’s Universe Page 78