Father of the Grid
By Amy M. Braverman
Photography by Dan Dry
Computer scientist Ian Foster has developed
the software to take shared computing to a global level.
In a bare Research Institutes Building room with
white, cinder-block walls, Ian Foster sits at a red table holding
his laptop, blinds shut to block the window’s glare, eyes
glazed behind wire-rimmed glasses. “I might not be too articulate
today,” the Arthur Holly Compton distinguished service professor
of computer science warns. “I’m on two hours’
sleep.” The previous night a West Coast student’s paper
was due at midnight, Pacific time, and then, awake anyway, he worked
online with some European colleagues. And because the “father
of grid computing” is also—with wife Angela Smyth, MD’00,
a Hospitals psychiatry fellow—the father of a five- and a
six-year-old, he rarely gets to sleep in.
So when asked to predict how grid computing
will change everyday life in five, ten, 15 years, he thinks for
a moment but comes up short. “I’m not feeling very creative
right now,” he says in the quick cadence of a native New Zealander.
But Foster, 45, who heads the Distributed Systems Lab at Argonne
National Laboratory, clearly has had more inspired moments, persuading
the federal government to invest in several multimillion-dollar
grid-technology projects and convincing companies such as IBM, Hewlett-Packard,
Oracle, and Sun Microsystems that grids are the answer to complex
computational problems—the next major evolution of the Internet.
Just as the Internet is a tool for mass communication,
grids are a tool for amplifying computer power and storage space.
By linking far-flung supercomputers, servers, storage systems, and
databases across existing Internet lines, grids allow more numbers
to be crunched faster than ever before. Several grid projects exist
today, but eventually, Foster says, a huge global grid—“the
Grid,” akin to “the Internet”—will perform
complex tasks such as designing semiconductors or screening thousands
of potential pharmaceutical drugs in an hour rather than a year.
Though corporations recently have begun to show
interest in grids, research institutions have long been a ripe testing
ground, in the same way that the Internet sprouted in academia before
blossoming in the commercial world. Large projects are already using
the technology. The Sloan Digital Sky Survey—an effort at
Chicago, Fermilab, and 11 other institutions to map a quarter of
the night sky, determining more than 100 million celestial objects’
positions and absolute brightness—harnesses computer power
from labs nationwide to perform in minutes scans that previously
took a week. The National Digital Mammography Archive (NDMA) in
the United States and eDiamond in the United Kingdom are creating
digital-image libraries to hold their respective countries’
scans. With an expected 35 million U.S. mammograms a year, “at
160 megabytes per exam,” the NDMA Web site explains, “the
annual volume could exceed 5.6 petabytes [a petabyte is 1 million
gigabytes] a year, and the minimal daily traffic a day is expected
to be 28 terabytes [a terabyte is 1,024 gigabytes]”—traffic
that wouldn’t be possible without a grid. By combining computer
power and storage space from multiple locations, doctors can view
a patient’s progress over time, compare her with other patient
populations, or access diagnostic tools. A similar venture, the
Biomedical Informatics Research Network, compiles brain images from
different databases so researchers can compare the brains of Alzheimer’s
patients, for example, to those of healthy people.
Still another project is a grid for the Network
for Earthquake Engineering Simulation (NEES). An $82 million program
funded by the National Science Foundation, NEES seeks to advance
earthquake-engineering research and reduce the physical havoc earthquakes
create. The grid, to be completed in October, links civil engineers
around the country with 15 sites containing equipment such as 4-meter-by-4-meter
shake tables or tsunami simulators. Through the grid, engineers
building San Francisco’s new Bay Bridge tested their design
structures remotely to make sure they met the latest earthquake-resistance
standards. At Argonne, a NEESgrid partner, an 18-square-inch mini
shake table, used for software development and demonstration, sits
in material scientist Nestor Zaluzec’s office. A researcher
in, say, San Diego can activate the mini shake table, moving it
quickly back and forth to agitate the 2-foot-tall plastic model
sitting on it. Likewise, from his desktop Zaluzec can maneuver video
cameras in places like Boulder, Colorado, or Champaign, Illinois,
to watch or participate in experiments.
At Argonne even some meetings about grids are
held using grids. With the Access Grid, developed by Argonne’s
Futures Lab for remote group collaboration, scientists nationwide
convene in a virtual conference room, from large groups such as
a 2002 National Science Foundation meeting, where 28 sites popped
in, Star Trek–like, on a white Argonne wall, to smaller
Thursday test cruises held to keep the system bug-free. At these
sessions Access Grid programmers Susanne Lefvert and Eric Olson
sit at personal computers, talking with wall-projected images of
scientists from other Energy Department labs, including the Princeton
Plasma Physics Lab and Lawrence Berkeley National Lab.
By now the Access Grid, first used in 1999,
has more than 250 research “nodes”—rooms equipped
to connect—on five continents. A major automobile company
and some oil and gas companies have developed their own access grids,
notes Futures Lab Research Manager and computer-science doctoral
student Mike Papka, SM’02, and Chicago researchers also are
experimenting with the technology. Last fall Jonathan Silverstein,
assistant professor of surgery and senior fellow in the joint Argonne/
Chicago Computation Institute, along with Chicago anesthesiologist
Stephen Small and Argonne/Chicago computer scientist Rick Stevens,
won a National Institutes of Health contract to install Access Grid
nodes at the U of C Hospitals. Connecting operating rooms, the emergency
room, radiology, ambulances, and residents’ hand-held tablet
PCs, the three-year prototype project could change the way hospitals
process information. Students will watch not only real-time operating-room
video feeds but also feeds from laparoscopic devices and robotic
surgeons. Radiologists will beam three-dimensional X-ray scans to
surgeons—minus middlemen and waiting time. “We are in
all these complex environments,” Silverstein says. The grid
allows medical workers literally to “share environments, eliminate
hand-offs, avoid phone tag”—instead of passing messages
between multiple physicians or waiting before taking the next step,
“we could all meet for one moment” and relay necessary
information.
Then there’s the TeraGrid. Launched in
2001 by the National Science Foundation with $53 million, the TeraGrid
aims to be “the world’s largest, most comprehensive,
distributed infrastructure for open scientific research,”
its Web site declares. Beginning with five sites—Argonne;
the University of Illinois– Urbana-Champaign; the University
of California, San Diego; the California Institute of Technology;
and the Pittsburgh Supercomputing Center—the project has since
picked up four more partners. To be finished by late September,
TeraGrid executive director Charlie Catlett says, it will have 20
teraflops (a teraflop equals a trillion operations per second) of
computing power and a petabyte of storage space. Many of its sites,
the Web page says, already boast “a cross-country network
backbone four times faster than the fastest research networks currently
in existence.”
|
Foster
in Argonne’s machine room. |
|
The TeraGrid aims to revolutionize the speed
at which science operates. The multi-institutional MIMD Lattice
Computation collaboration, for instance, which tests quantum chromodynamic
theory and helps interpret high-energy accelerator experiments,
uses more than 2 million processor hours of computer time per year—and
needs more. Another project, NAMD, a parallel molecular dynamics
code designed to simulate large biomolecular systems, has maxed
out the fastest system available. On the TeraGrid, already used
by some projects, such research can move forward.
Sharing resources—a
practice known as “distributed computing”—goes
back to computers’ early days. In the late 1950s and early
1960s researchers realized that the machines, then costing tens
or even hundreds of thousands of dollars, needed to be more efficient.
Because they spent much time idly waiting for human input, the researchers
reasoned, multiple users could share them by doling out that unemployed
power. Today computers are cheaper, but they’re still underutilized—“five
percent usage is normal,” Foster says—which is one reason
many companies connect their computers to form unified networks.
In a sense grids are simply another variety of distributed computing,
now used in many forms. Cluster computing, for example, links multiple
PCs to replace unwieldy mainframes or supercomputers. In peer-to-peer
computing, such as Napster, users who have downloaded specific software
can connect to each other and share files. And there’s Internet
computing, most notably SETI@home, a virtual supercomputer based
at the University of California, Berkeley, that analyzes data from
Puerto Rico’s Arecibo radio telescope to find signs of extraterrestrial
intelligence. PC users download SETI@home’s screen-saver program,
and when their computers are otherwise idle they retrieve data from
the Internet and send the results to a central processing system.
But a lot had to happen between the Grid’s
earliest inklings and its current test beds. Foster, who switched
from studying math and chemistry to computer science at New Zealand’s
University of Canterbury before earning a doctorate in the field
at London’s Imperial College, came to Argonne in 1989. Programming
specialized languages for computing chemistry codes, he used parallel
networks, similar to clusters. “High-speed networks were starting
to appear,” he writes in the April 2003 Scientific American,
“and it became clear that if we could integrate digital resources
and activities across networks, it could transform the process of
scientific work.” Indeed research was occurring more and more
on an international scale, with scientists from different institutions
trying to share data that was growing exponentially. In 1994 Foster
refocused his research to distributed computing. With Steven Tuecke,
today the lead software architect in Argonne’s Distributed
Systems Laboratory, and Carl Kesselman, now director of the Center
for Grid Technologies at the University of Southern California’s
Information Sciences Institute, he began the Globus Project, a software
system for international scientific collaboration. In the same way
that Internet protocols became standard for the Web, creating a
common language and tools, they envisioned Globus software that
would link sites into a “virtual organization,” with
standardized methods to authenticate identities, authorize specific
activities, and control data movement.
The concept was quickly put to use. At a 1995
supercomputing conference Rick Stevens, who also directs Argonne’s
math and computer-science division, and Thomas A. DeFanti, director
of the University of Illinois–Chicago’s Electronic Visualization
Lab, headed a prototype project, called i-way (Information Wide
Area Year), that linked 17 high-speed research networks for two
weeks. Foster’s team developed the software that, he writes
in Scientific American, “knitted” the sites
“into a single virtual system,” so users could “log
on once, locate suitable computers, reserve time, load application
codes, and then monitor their execution.” Scientists performed
computationally complicated simulations such as colliding neutron
stars and moving cloud patterns around the planet. “It was
the Woodstock of the Grid,” Larry Smarr, the conference’s
program chair and now director of the California Institute for Telecommunications
and Information Technology, told the New York Times last
July, “everyone not sleeping for three days, running around
and engaged in a kind of scientific performance art.”
The experience inspired much enthusiasm—and
funding. The U.S. Defense Advanced Research Projects Agency gave
the Globus Project $800,000 a year for three years. In 1997 Foster’s
team unveiled the first version of the Globus Toolkit, the software
that does the knitting. The National Science Foundation, NASA, and
the Energy Department began grid projects, with Globus underlying
them all. And while Foster and his crew have used an open-source
approach to develop the technology, making the software freely available
and its code open for outside programmers to read and modify, in
1998 he and his colleagues also began the Global Grid Forum, a group
that meets three times a year to adopt basic language and infrastructure
standards. Such standards, Foster writes in “What is the Grid?”
(July 2002), allow users to collaborate “with any interested
party and thus to create something more than a plethora of balkanized,
incompatible, non-interoperable distributed systems.”
The Globus Toolkit, named the “most promising
new technology” by R&D Magazine in 2002, a top-ten
“emerging technology” by Technology Review
in 2003, and given a Chicago Innovation Award last year by the Sun-Times,
still needs work to perfect security and other measures. But the
open-source model, much like that used to develop the Internet,
has proved useful in ferreting out bugs and making improvements.
When physicists overloaded one grid system by submitting tens of
thousands of tasks at once, for example, the University of Wisconsin
helped design applications to manage a grid’s many users.
As the technology moves from research institutions, whose data is
stored mostly in electronic files, to corporations, which favor
databases, the UK’s e-Science program is developing ways to
handle the different systems.
Without the open-source approach, Foster says,
the software might not have become the de facto standard for most
grid projects, and IBM, the Globus Toolkit’s sole corporate
funder for the past three years, wouldn’t have taken such
an active role. “Success of the Grid depends on everyone adopting
it,” he says, “so it’s counterproductive to work
in private.” Brokerage firm Charles Schwab uses a grid developed
by IBM to give its clients real-time investment advice. The computer
company also has projects under way with Morgan Stanley and Hewitt
Associates. For Foster—the British Computer Society’s
2002 Lovelace Medal winner and a 2003 American Association for the
Advancement of Science fellow—such corporate ventures are
a critical step in making grids, already a powerful scientific tool,
important in everyday life, when the Grid will be as common as the
Internet—and as seamless. In the 1960s MIT’s Fernando
Corbato, whom Foster calls “the father of time-sharing operating
systems,” described shared computing as a “utility,”
meaning computer access would operate like water, gas, and electricity,
where a client would connect and pay by usage amount. Today the
Grid is envisioned similarly, and “utility computing”
is used synonymously.
But when grids will become so ubiquitous remains
a big question. Even on a full night’s sleep Foster—today’s
father figure—hesitates to guess beyond “that’s
some way out,” happily encouraging his virtual child but not
wanting to impose unrealistic expectations. “It’s a
process,” he says. Although large grids are running in both
the United States and Europe, and Foster skipped the March Global
Grid Forum meeting in Berlin to talk up grids in his homeland New
Zealand, “we haven’t nailed down all the standards.
There’s more to be done.” It’s a global, multi-industry
path he’s forging, and if he can’t predict where the
next generation will head, he’s prepared the Grid to lead
the way.
|
|