The University of Chicago Magazine: April 2004

Father of the Grid
By Amy M. Braverman
Photography by Dan Dry

Computer scientist Ian Foster has developed the software to take shared computing to a global level.

In a bare Research Institutes Building room with white, cinder-block walls, Ian Foster sits at a red table holding his laptop, blinds shut to block the window’s glare, eyes glazed behind wire-rimmed glasses. “I might not be too articulate today,” the Arthur Holly Compton distinguished service professor of computer science warns. “I’m on two hours’ sleep.” The previous night a West Coast student’s paper was due at midnight, Pacific time, and then, awake anyway, he worked online with some European colleagues. And because the “father of grid computing” is also—with wife Angela Smyth, MD’00, a Hospitals psychiatry fellow—the father of a five- and a six-year-old, he rarely gets to sleep in.

So when asked to predict how grid computing will change everyday life in five, ten, 15 years, he thinks for a moment but comes up short. “I’m not feeling very creative right now,” he says in the quick cadence of a native New Zealander. But Foster, 45, who heads the Distributed Systems Lab at Argonne National Laboratory, clearly has had more inspired moments, persuading the federal government to invest in several multimillion-dollar grid-technology projects and convincing companies such as IBM, Hewlett-Packard, Oracle, and Sun Microsystems that grids are the answer to complex computational problems—the next major evolution of the Internet.

Just as the Internet is a tool for mass communication, grids are a tool for amplifying computer power and storage space. By linking far-flung supercomputers, servers, storage systems, and databases across existing Internet lines, grids allow more numbers to be crunched faster than ever before. Several grid projects exist today, but eventually, Foster says, a huge global grid—“the Grid,” akin to “the Internet”—will perform complex tasks such as designing semiconductors or screening thousands of potential pharmaceutical drugs in an hour rather than a year.

Though corporations recently have begun to show interest in grids, research institutions have long been a ripe testing ground, in the same way that the Internet sprouted in academia before blossoming in the commercial world. Large projects are already using the technology. The Sloan Digital Sky Survey—an effort at Chicago, Fermilab, and 11 other institutions to map a quarter of the night sky, determining more than 100 million celestial objects’ positions and absolute brightness—harnesses computer power from labs nationwide to perform in minutes scans that previously took a week. The National Digital Mammography Archive (NDMA) in the United States and eDiamond in the United Kingdom are creating digital-image libraries to hold their respective countries’ scans. With an expected 35 million U.S. mammograms a year, “at 160 megabytes per exam,” the NDMA Web site explains, “the annual volume could exceed 5.6 petabytes [a petabyte is 1 million gigabytes] a year, and the minimal daily traffic a day is expected to be 28 terabytes [a terabyte is 1,024 gigabytes]”—traffic that wouldn’t be possible without a grid. By combining computer power and storage space from multiple locations, doctors can view a patient’s progress over time, compare her with other patient populations, or access diagnostic tools. A similar venture, the Biomedical Informatics Research Network, compiles brain images from different databases so researchers can compare the brains of Alzheimer’s patients, for example, to those of healthy people.

Still another project is a grid for the Network for Earthquake Engineering Simulation (NEES). An $82 million program funded by the National Science Foundation, NEES seeks to advance earthquake-engineering research and reduce the physical havoc earthquakes create. The grid, to be completed in October, links civil engineers around the country with 15 sites containing equipment such as 4-meter-by-4-meter shake tables or tsunami simulators. Through the grid, engineers building San Francisco’s new Bay Bridge tested their design structures remotely to make sure they met the latest earthquake-resistance standards. At Argonne, a NEESgrid partner, an 18-square-inch mini shake table, used for software development and demonstration, sits in material scientist Nestor Zaluzec’s office. A researcher in, say, San Diego can activate the mini shake table, moving it quickly back and forth to agitate the 2-foot-tall plastic model sitting on it. Likewise, from his desktop Zaluzec can maneuver video cameras in places like Boulder, Colorado, or Champaign, Illinois, to watch or participate in experiments.

At Argonne even some meetings about grids are held using grids. With the Access Grid, developed by Argonne’s Futures Lab for remote group collaboration, scientists nationwide convene in a virtual conference room, from large groups such as a 2002 National Science Foundation meeting, where 28 sites popped in, Star Trek–like, on a white Argonne wall, to smaller Thursday test cruises held to keep the system bug-free. At these sessions Access Grid programmers Susanne Lefvert and Eric Olson sit at personal computers, talking with wall-projected images of scientists from other Energy Department labs, including the Princeton Plasma Physics Lab and Lawrence Berkeley National Lab.

By now the Access Grid, first used in 1999, has more than 250 research “nodes”—rooms equipped to connect—on five continents. A major automobile company and some oil and gas companies have developed their own access grids, notes Futures Lab Research Manager and computer-science doctoral student Mike Papka, SM’02, and Chicago researchers also are experimenting with the technology. Last fall Jonathan Silverstein, assistant professor of surgery and senior fellow in the joint Argonne/ Chicago Computation Institute, along with Chicago anesthesiologist Stephen Small and Argonne/Chicago computer scientist Rick Stevens, won a National Institutes of Health contract to install Access Grid nodes at the U of C Hospitals. Connecting operating rooms, the emergency room, radiology, ambulances, and residents’ hand-held tablet PCs, the three-year prototype project could change the way hospitals process information. Students will watch not only real-time operating-room video feeds but also feeds from laparoscopic devices and robotic surgeons. Radiologists will beam three-dimensional X-ray scans to surgeons—minus middlemen and waiting time. “We are in all these complex environments,” Silverstein says. The grid allows medical workers literally to “share environments, eliminate hand-offs, avoid phone tag”—instead of passing messages between multiple physicians or waiting before taking the next step, “we could all meet for one moment” and relay necessary information.

Then there’s the TeraGrid. Launched in 2001 by the National Science Foundation with $53 million, the TeraGrid aims to be “the world’s largest, most comprehensive, distributed infrastructure for open scientific research,” its Web site declares. Beginning with five sites—Argonne; the University of Illinois– Urbana-Champaign; the University of California, San Diego; the California Institute of Technology; and the Pittsburgh Supercomputing Center—the project has since picked up four more partners. To be finished by late September, TeraGrid executive director Charlie Catlett says, it will have 20 teraflops (a teraflop equals a trillion operations per second) of computing power and a petabyte of storage space. Many of its sites, the Web page says, already boast “a cross-country network backbone four times faster than the fastest research networks currently in existence.”

The TeraGrid aims to revolutionize the speed at which science operates. The multi-institutional MIMD Lattice Computation collaboration, for instance, which tests quantum chromodynamic theory and helps interpret high-energy accelerator experiments, uses more than 2 million processor hours of computer time per year—and needs more. Another project, NAMD, a parallel molecular dynamics code designed to simulate large biomolecular systems, has maxed out the fastest system available. On the TeraGrid, already used by some projects, such research can move forward.

Sharing resources—a practice known as “distributed computing”—goes back to computers’ early days. In the late 1950s and early 1960s researchers realized that the machines, then costing tens or even hundreds of thousands of dollars, needed to be more efficient. Because they spent much time idly waiting for human input, the researchers reasoned, multiple users could share them by doling out that unemployed power. Today computers are cheaper, but they’re still underutilized—“five percent usage is normal,” Foster says—which is one reason many companies connect their computers to form unified networks. In a sense grids are simply another variety of distributed computing, now used in many forms. Cluster computing, for example, links multiple PCs to replace unwieldy mainframes or supercomputers. In peer-to-peer computing, such as Napster, users who have downloaded specific software can connect to each other and share files. And there’s Internet computing, most notably SETI@home, a virtual supercomputer based at the University of California, Berkeley, that analyzes data from Puerto Rico’s Arecibo radio telescope to find signs of extraterrestrial intelligence. PC users download SETI@home’s screen-saver program, and when their computers are otherwise idle they retrieve data from the Internet and send the results to a central processing system.

But a lot had to happen between the Grid’s earliest inklings and its current test beds. Foster, who switched from studying math and chemistry to computer science at New Zealand’s University of Canterbury before earning a doctorate in the field at London’s Imperial College, came to Argonne in 1989. Programming specialized languages for computing chemistry codes, he used parallel networks, similar to clusters. “High-speed networks were starting to appear,” he writes in the April 2003 Scientific American, “and it became clear that if we could integrate digital resources and activities across networks, it could transform the process of scientific work.” Indeed research was occurring more and more on an international scale, with scientists from different institutions trying to share data that was growing exponentially. In 1994 Foster refocused his research to distributed computing. With Steven Tuecke, today the lead software architect in Argonne’s Distributed Systems Laboratory, and Carl Kesselman, now director of the Center for Grid Technologies at the University of Southern California’s Information Sciences Institute, he began the Globus Project, a software system for international scientific collaboration. In the same way that Internet protocols became standard for the Web, creating a common language and tools, they envisioned Globus software that would link sites into a “virtual organization,” with standardized methods to authenticate identities, authorize specific activities, and control data movement.

The concept was quickly put to use. At a 1995 supercomputing conference Rick Stevens, who also directs Argonne’s math and computer-science division, and Thomas A. DeFanti, director of the University of Illinois–Chicago’s Electronic Visualization Lab, headed a prototype project, called i-way (Information Wide Area Year), that linked 17 high-speed research networks for two weeks. Foster’s team developed the software that, he writes in Scientific American, “knitted” the sites “into a single virtual system,” so users could “log on once, locate suitable computers, reserve time, load application codes, and then monitor their execution.” Scientists performed computationally complicated simulations such as colliding neutron stars and moving cloud patterns around the planet. “It was the Woodstock of the Grid,” Larry Smarr, the conference’s program chair and now director of the California Institute for Telecommunications and Information Technology, told the New York Times last July, “everyone not sleeping for three days, running around and engaged in a kind of scientific performance art.”

The experience inspired much enthusiasm—and funding. The U.S. Defense Advanced Research Projects Agency gave the Globus Project $800,000 a year for three years. In 1997 Foster’s team unveiled the first version of the Globus Toolkit, the software that does the knitting. The National Science Foundation, NASA, and the Energy Department began grid projects, with Globus underlying them all. And while Foster and his crew have used an open-source approach to develop the technology, making the software freely available and its code open for outside programmers to read and modify, in 1998 he and his colleagues also began the Global Grid Forum, a group that meets three times a year to adopt basic language and infrastructure standards. Such standards, Foster writes in “What is the Grid?” (July 2002), allow users to collaborate “with any interested party and thus to create something more than a plethora of balkanized, incompatible, non-interoperable distributed systems.”

The Globus Toolkit, named the “most promising new technology” by R&D Magazine in 2002, a top-ten “emerging technology” by Technology Review in 2003, and given a Chicago Innovation Award last year by the Sun-Times, still needs work to perfect security and other measures. But the open-source model, much like that used to develop the Internet, has proved useful in ferreting out bugs and making improvements. When physicists overloaded one grid system by submitting tens of thousands of tasks at once, for example, the University of Wisconsin helped design applications to manage a grid’s many users. As the technology moves from research institutions, whose data is stored mostly in electronic files, to corporations, which favor databases, the UK’s e-Science program is developing ways to handle the different systems.

Without the open-source approach, Foster says, the software might not have become the de facto standard for most grid projects, and IBM, the Globus Toolkit’s sole corporate funder for the past three years, wouldn’t have taken such an active role. “Success of the Grid depends on everyone adopting it,” he says, “so it’s counterproductive to work in private.” Brokerage firm Charles Schwab uses a grid developed by IBM to give its clients real-time investment advice. The computer company also has projects under way with Morgan Stanley and Hewitt Associates. For Foster—the British Computer Society’s 2002 Lovelace Medal winner and a 2003 American Association for the Advancement of Science fellow—such corporate ventures are a critical step in making grids, already a powerful scientific tool, important in everyday life, when the Grid will be as common as the Internet—and as seamless. In the 1960s MIT’s Fernando Corbato, whom Foster calls “the father of time-sharing operating systems,” described shared computing as a “utility,” meaning computer access would operate like water, gas, and electricity, where a client would connect and pay by usage amount. Today the Grid is envisioned similarly, and “utility computing” is used synonymously.

But when grids will become so ubiquitous remains a big question. Even on a full night’s sleep Foster—today’s father figure—hesitates to guess beyond “that’s some way out,” happily encouraging his virtual child but not wanting to impose unrealistic expectations. “It’s a process,” he says. Although large grids are running in both the United States and Europe, and Foster skipped the March Global Grid Forum meeting in Berlin to talk up grids in his homeland New Zealand, “we haven’t nailed down all the standards. There’s more to be done.” It’s a global, multi-industry path he’s forging, and if he can’t predict where the next generation will head, he’s prepared the Grid to lead the way.