
Cloudy Judgment
By Paul Boutin
Slate, April 3, 2008
Edited by Andy Ross
There's now a flood of Web-based applications that serve as simplified
versions of popular desktop software. Google Docs, the in-your-browser
competitor to Microsoft Office, is probably the best example. Still, the
more time I spend using Web-based apps like Google Docs, the more I
appreciate my desktop computer.
First, networks are flaky. Part of
what makes the Internet so powerful is that it doesn't have to maintain a
live, nonstop, real-time connection. As long as your mail gets transferred
and Web pages download within a reasonable amount of time, you don't notice
if your connection briefly goes down once in a while. If you're using that
connection to edit photos, you do notice.
Second, today's network
apps run inside another application — your Web browser. That makes them
slower, and it limits the possibilities for the apps' user interface. The
Google Docs slide-show editor has the same functionality of an early-1990s
version of Microsoft PowerPoint and has just as many bugs in the way it
formats text.
The people who build browsers need to do a better job,
too. Don't even get me started on the daily hell wherein I hit a Web site
that locks up Firefox, killing all of my browser windows. Even Microsoft
Word doesn't crash that often anymore.
In theory, Web-based apps —
also known as cloud computing — are the future of computers. That ignores
the huge progress in personal computers that sit on your desktop, in your
lap, or in your pocket. Multi-core processors, touch screens, motion sensors
— all major computing advances, none of which are happening in the cloud.
For me, it'll be years before Photoshop Express can become powerful
enough to replace my desktop version, or before Google Docs gets me to
uninstall Microsoft Office. One of the nice things about Word and Photoshop
is that once I fire them up and start working, I can forget all about the
Internet for a few hours.
The Google Cloud
By Stephen Baker Business Week, December 13, 2007
Edited by Andy Ross
Google's globe-spanning network of computers blitz through mountains of data
faster than any machine on earth. Most of this hardware isn't on the Google
campus. It's just out there, somewhere, whirring away in big refrigerated
data centers. Folks at Google call it the cloud.
In 2006, Google
launched a course at the University of Washington to introduce programming
at the scale of a cloud. Call it Google 101. It led to an ambitious
partnership with IBM to plug universities around the world into Google-like
computing clouds.
As this concept spreads, it promises to expand
Google's footprint in industry far beyond search, media, and advertising,
leading the giant into scientific research and perhaps into new businesses.
In the process Google could become, in a sense, the world's primary
computer.
Google's cloud is a network made of maybe a million cheap
servers. It stores staggering amounts of data, including numerous copies of
the World Wide Web. This makes search faster, helping ferret out answers to
billions of queries in a fraction of a second.
Cloud computing, with
Google's machinery at the very center, fits neatly into the company's grand
vision, established a decade ago by founders Sergey Brin and Larry Page: "to
organize the world's information and make it universally accessible."
For small companies and entrepreneurs, clouds mean opportunity — a
leveling of the playing field in the most data-intensive forms of computing.
To date, only a select group of cloud-wielding Internet giants has had the
resources to scoop up huge masses of information and build businesses upon
it. A handful of companies — the likes of Google, Yahoo, or Amazon —
transform the info into insights, services, and, ultimately, revenue.
This status quo is already starting to change. In the past year, Amazon
has opened up its own networks of computers to paying customers, initiating
new players, large and small, to cloud computing. In November, Yahoo opened
up a small cloud for researchers at Carnegie Mellon University. And
Microsoft has deepened its ties to communities of scientific researchers by
providing them access to its own server farms.
For clouds to reach
their potential, they should be easy to program and navigate. This should
open up growing markets for cloud search and software tools — a natural
business for Google and its competitors. Google CEO Eric E. Schmidt won't
say how much of its own capacity Google will offer to outsiders, or under
what conditions or at what prices. "Typically, we like to start with free,"
he says, adding that power users "should probably bear some of the costs."
And how big will these clouds grow? "There's no limit," Schmidt says.
Google is poised to take on a new role in the computer industry. Not so
many years ago scientists and researchers looked to national laboratories
for the cutting-edge research on computing. Now, says Daniel Frye,
vice-president of open systems development at IBM, "Google is doing the work
that 10 years ago would have gone on in a national lab."
MapReduce
is the software at the heart of Google computing. While the company's famous
search algorithms provide the intelligence for each search, MapReduce
delivers the speed and industrial heft. It divides each task into hundreds
or thousands of tasks and distributes them to legions of computers. In a
fraction of a second, as each one comes back with its nugget of information,
MapReduce quickly assembles the responses into an answer. It was developed
by University of Washington alumnus Jeffrey Dean.
Students rushed to
sign up for Google 101 as soon as it appeared in the U-Dub syllabus. Within
weeks the students were learning how to configure their work for Google
machines and designing ambitious Web-scale projects, from cataloguing the
edits on Wikipedia to crawling the Internet to identify spam.
Luck
descended on the Googleplex in the person of IBM Chairman Samuel J.
Palmisano. The winter day was chilly, but Palmisano and his team sat down
with Schmidt and a handful of Googlers and discussed cloud computing. It was
no secret that IBM wanted to deploy clouds to provide data and services to
business customers.
Over the next three months they worked together
at Google headquarters. The work involved integrating IBM's business
applications and Google servers, and equipping them with a host of
open-source programs. In February they unveiled the prototype for top brass
in Mountain View, California, and for others on video from IBM headquarters
in Armonk, New York.
The Google cloud got the green light. The plan
was to spread cloud computing first to a handful of U.S. universities within
a year and later to deploy it globally. The universities would develop the
clouds, creating tools and applications while producing legions of new
computer scientists.
Yahoo Research Chief Prabhakar Raghavan says
that in a sense, there are only five computers on earth: Google, Yahoo,
Microsoft, IBM, and Amazon.
Tony Hey, vice-president for external
research at Microsoft, says research clouds will function as huge virtual
laboratories, with a new generation of librarians curating troves of data,
opening them to researchers with the right credentials.
Mark Dean,
head of IBM research in Almaden, California, says that the mixture of
business and science will lead, in a few short years, to networks of clouds
that will tax our imagination: "Compared to this, the Web is tiny. We'll be
laughing at how small the Web is."
The Grid
By Jonathan Leake The Sunday Times, April 6, 2008
Edited by Andy Ross
The scientists who pioneered the internet have now built a much faster
replacement — the grid. At speeds about 10,000 times faster than a typical
broadband connection, the grid is the latest spin-off from CERN, the
particle physics centre that created the web.
David Britton,
professor of physics at Glasgow University and a leading figure in the grid
project, believes grid technologies could revolutionise society: "With this
kind of computing power, future generations will have the ability to
collaborate and communicate in ways older people like me cannot even
imagine."
The grid will be activated at the same time as the Large
Hadron Collider (LHC) at CERN, based near Geneva. Scientists at CERN started
the grid computing project seven years ago when they realised the LHC would
generate many petabytes of data per year.
The grid has been built
with dedicated fibre optic cables and modern routing centres, meaning there
are no outdated components to slow the deluge of data. The 55,000 servers
already installed are expected to rise to 200,000 within the next two years.
Professor Tony Doyle, technical director of the grid project, said:
"We need so much processing power, there would even be an issue about
getting enough electricity to run the computers if they were all at CERN.
The only answer was a new network powerful enough to send the data instantly
to research centres in other countries."
That network is now built,
using fibre optic cables that run from CERN to 11 centres in the United
States, Canada, the Far East, Europe and around the world. From each centre,
further connections radiate out to a host of other research institutions
using existing high-speed academic networks.
Ian Bird, project
leader for CERN's high-speed computing project, said grid technology could
make the internet so fast that people would stop using desktop computers to
store information and entrust it all to the internet: "It will lead to
what's known as cloud computing, where people keep all their information
online and access it from anywhere."
Although the grid itself is
unlikely to be directly available to domestic internet users, many telecoms
providers and businesses are already introducing its pioneering
technologies, such as dynamic switching, which creates a dedicated channel
for internet users downloading large volumes of data such as films.
The LHC Computing Grid
CERN
Edited by Andy Ross
The Large Hadron Collider (LHC), currently being built at CERN near Geneva,
is the largest scientific instrument on the planet. When it begins
operations in 2007, it will produce roughly 15 petabytes of data annually,
which thousands of scientists around the world will access and analyse. The
mission of the Worldwide LHC Computing Grid (LCG) project is to build and
maintain a data storage and analysis infrastructure for the entire high
energy physics community that will use the LHC.
The data from the LHC
experiments will be distributed around the globe, according to a four-tiered
model. A primary backup will be recorded on tape at CERN, the Tier-0 centre
of LCG. After initial processing, this data will be distributed to a series
of Tier-1 centres, large computer centres with sufficient storage capacity
and with round-the-clock support for the grid.
The Tier-1 centres
will make data available to Tier-2 centres, each consisting of one or
several collaborating computing facilities, which can store sufficient data
and provide adequate computing power for specific analysis tasks. Individual
scientists will access these facilities through Tier-3 computing resources,
which can be local clusters in university departments.
When the LHC
accelerator is running optimally, access to experimental data needs to be
provided for the 5000 scientists in some 500 research institutes and
universities worldwide that are participating in the LHC experiments. In
addition, all the data needs to be available over the 15 year estimated
lifetime of the LHC. The analysis of the data, including comparison with
theoretical simulations, requires of the order of 100 000 CPUs at 2006
measures of processing power.
A globally distributed grid for data
storage and analysis provides several key benefits. The costs are more
easily handled in a distributed environment, where individual institutes and
participating national organisations can fund local computing resources and
retain responsibility for these. Also, there are fewer single points of
failure. Multiple copies of data and automatic reassigning of computational
tasks to available resources ensures load balancing of resources and
facilitates access to the data for all the scientists involved.
There
are also some challenges. These include ensuring adequate levels of network
bandwidth between the contributing resources, maintaining coherence of
software versions installed in various locations, coping with heterogeneous
hardware, managing and protecting the data so that it is not lost or
corrupted over the lifetime of the LHC, and providing accounting mechanisms
so that different groups have fair access.
The major computing
resources for LHC data analysis are provided by the Worldwide LHC Computing
Grid Collaboration — comprising the LHC experiments, the accelerator
laboratory and the Tier-1 and Tier-2 computer centres. The computing centres
providing resources for LCG are embedded in different operational grid
organisations. The LCG project is also following developments in industry,
where leading IT companies are testing and validating cutting-edge grid
technologies using the LCG environment.
Google Data Centers
Steven Levy
Wired, October 2012
Edited by Andy Ross
Google can build, organize, and operate a huge network of servers and
fiber-optic cables with an efficiency and speed that rocks physics. Google
has spread its infrastructure across a global archipelago of massive
buildings where a huge number of machines process and deliver the continuing
chronicle of human experience.
This network of thousands of fiber
miles and those many thousands of servers add up to the mother of all
clouds. This multibillion-dollar infrastructure allows the company to index
20 billion web pages a day. To handle more than 3 billion daily search
queries. To conduct millions of ad auctions in real time. To offer free
email storage to 425 million Gmail users. To zip millions of YouTube videos
to users every day. To deliver search results before the user has finished
typing the query. In the near future, when Google releases the wearable
computing platform called Glass, this infrastructure will power its visual
search results.
Google decided to build and operate its own data
centers more cheaply and efficiently than anyone had before. They saw that
server rooms did not have to be kept so cold. The machines throw off
prodigious amounts of heat, and traditional data centers cool them off with
giant air conditioners that require massive amounts of energy.
Google
keeps the cold aisle in front of the machines at 30 C, and the hot aisle at
the rear of the servers hits 50 C. The heat is absorbed by coils filled with
water, which is then pumped out of the building and cooled before
circulating back inside. To cool the water, the data centers employ giant
towers where the hot water trickles down through vast radiators. Google
installs a backup battery next to each server, doing away with the need for
uninterrupted power supply systems, which leak electricity and require their
own cooling systems.
These innovations help Google achieve
unprecedented energy savings. The standard measure of data center efficiency
is power usage effectiveness, where 1.0 is perfect and 2.0 says half the
power is wasted. Google gets an amazing 1.2.
Google builds much of
its own equipment. The company knows exactly what it needs and saves money
by not buying unnecessary extras. And in the early 2000s, taking advantage
of the failure of some telecom operations, Google began buying up abandoned
fiber-optic networks, paying pennies on the dollar. Now the company has
built a mighty empire of glass.
Over the years, Google has also built
a software system that allows it to manage its countless servers as if they
were one giant entity. Its developers can act like puppet masters,
dispatching thousands of computers to perform tasks as easily as running a
single machine. In 2002 its scientists created Google File System.
MapReduce, a Google system for writing cloud-based applications, was so
successful that an open source version called Hadoop has become an industry
standard. Google tackles load balancing with an automated system called
Borg. Google engineers just write their production code, and the system
distributes it across the server floor.
To ensure reliability,
Google has innovated an answer involving people. The Site Reliability
Engineering team have ultimate responsibility for keeping Google and its
services running. SREs are not merely troubleshooters but engineers who are
also in charge of getting production code onto the bare metal of the
servers. Every year, the SREs run a simulated war, called disaster recovery
testing (DiRT), on the infrastructure.
Google says it has hundreds
of thousands of servers. But the magic number is basically meaningless.
Today's machines, with multicore processors and other advances, have many
times the power and utility of earlier versions. In any case, Google intends
to render its present data centers obsolete.
The Borg
By Cade Metz Wired, March 2013
Edited by Andy Ross
The Borg is the software system at the heart of Google. Borg parcels out
work across Google's vast fleet of computer servers. It is one of the
secrets of Google becoming the dominant force on the web. Google has been
using it for about 10 years and is now building a new version called Omega.
Borg provides central control of tasks across Google's data centers.
Rather than building a separate cluster of servers for each software system,
Google defines a cluster that does several jobs at once. All this work is
divided into tiny tasks, and Borg sends these tasks wherever it can find
free computing resources. Minimizing the complexity of task allocation let
Google reduce the size of its infrastructure by a few percent.
Twitter is a much small company, but its engineers built a similar system
using the open-source platform Mesos. Ben Hindman founded the Mesos project
at UC Berkeley and now runs Mesos at Twitter. When he was at Berkeley,
multicore processors were new. Traditionally, a chip had one processor that
ran one task at a time. A multicore processor lets you run many tasks in
parallel. Hindman built a system that ran multiple software apps evenly
across all the cores, to balance the load and use the full power of the
chip.
Hindman worked with a single computer but he could apply his
system to an entire data center. Traditional data management systems like
Hadoop run one massive server cluster. If you want to run another
distributed system, you set up a second cluster. But Hindman and his team
found that they could run distributed systems more efficiently if they Mesos
ran many systems on a single cluster.
Google helps fund the Berkeley
lab that developed Mesos. Omega brings the Borg closer to the Mesos model.
Both cluster management systems let you run multiple distributed systems on
the same cluster of servers. They also provide an interface for software
designers to run their own apps, but the Borg interface offers lots of
controls for engineers to tweak. Omega will hide most of them and automate
more. It runs the data center like one big computer. The next step is to run
the world.


|