Google Cloud, CERN Grid

Cloudy Judgment

By Paul Boutin
Slate, April 3, 2008

Edited by Andy Ross

There's now a flood of Web-based applications that serve as simplified versions of popular desktop software. Google Docs, the in-your-browser competitor to Microsoft Office, is probably the best example. Still, the more time I spend using Web-based apps like Google Docs, the more I appreciate my desktop computer.

First, networks are flaky. Part of what makes the Internet so powerful is that it doesn't have to maintain a live, nonstop, real-time connection. As long as your mail gets transferred and Web pages download within a reasonable amount of time, you don't notice if your connection briefly goes down once in a while. If you're using that connection to edit photos, you do notice.

Second, today's network apps run inside another application — your Web browser. That makes them slower, and it limits the possibilities for the apps' user interface. The Google Docs slide-show editor has the same functionality of an early-1990s version of Microsoft PowerPoint and has just as many bugs in the way it formats text.

The people who build browsers need to do a better job, too. Don't even get me started on the daily hell wherein I hit a Web site that locks up Firefox, killing all of my browser windows. Even Microsoft Word doesn't crash that often anymore.

In theory, Web-based apps — also known as cloud computing — are the future of computers. That ignores the huge progress in personal computers that sit on your desktop, in your lap, or in your pocket. Multi-core processors, touch screens, motion sensors — all major computing advances, none of which are happening in the cloud.

For me, it'll be years before Photoshop Express can become powerful enough to replace my desktop version, or before Google Docs gets me to uninstall Microsoft Office. One of the nice things about Word and Photoshop is that once I fire them up and start working, I can forget all about the Internet for a few hours.

The Google Cloud

By Stephen Baker
Business Week, December 13, 2007

Edited by Andy Ross

Google's globe-spanning network of computers blitz through mountains of data faster than any machine on earth. Most of this hardware isn't on the Google campus. It's just out there, somewhere, whirring away in big refrigerated data centers. Folks at Google call it the cloud.

In 2006, Google launched a course at the University of Washington to introduce programming at the scale of a cloud. Call it Google 101. It led to an ambitious partnership with IBM to plug universities around the world into Google-like computing clouds.

As this concept spreads, it promises to expand Google's footprint in industry far beyond search, media, and advertising, leading the giant into scientific research and perhaps into new businesses. In the process Google could become, in a sense, the world's primary computer.

Google's cloud is a network made of maybe a million cheap servers. It stores staggering amounts of data, including numerous copies of the World Wide Web. This makes search faster, helping ferret out answers to billions of queries in a fraction of a second.

Cloud computing, with Google's machinery at the very center, fits neatly into the company's grand vision, established a decade ago by founders Sergey Brin and Larry Page: "to organize the world's information and make it universally accessible."

For small companies and entrepreneurs, clouds mean opportunity — a leveling of the playing field in the most data-intensive forms of computing. To date, only a select group of cloud-wielding Internet giants has had the resources to scoop up huge masses of information and build businesses upon it. A handful of companies — the likes of Google, Yahoo, or Amazon — transform the info into insights, services, and, ultimately, revenue.

This status quo is already starting to change. In the past year, Amazon has opened up its own networks of computers to paying customers, initiating new players, large and small, to cloud computing. In November, Yahoo opened up a small cloud for researchers at Carnegie Mellon University. And Microsoft has deepened its ties to communities of scientific researchers by providing them access to its own server farms.

For clouds to reach their potential, they should be easy to program and navigate. This should open up growing markets for cloud search and software tools — a natural business for Google and its competitors. Google CEO Eric E. Schmidt won't say how much of its own capacity Google will offer to outsiders, or under what conditions or at what prices. "Typically, we like to start with free," he says, adding that power users "should probably bear some of the costs." And how big will these clouds grow? "There's no limit," Schmidt says.

Google is poised to take on a new role in the computer industry. Not so many years ago scientists and researchers looked to national laboratories for the cutting-edge research on computing. Now, says Daniel Frye, vice-president of open systems development at IBM, "Google is doing the work that 10 years ago would have gone on in a national lab."

MapReduce is the software at the heart of Google computing. While the company's famous search algorithms provide the intelligence for each search, MapReduce delivers the speed and industrial heft. It divides each task into hundreds or thousands of tasks and distributes them to legions of computers. In a fraction of a second, as each one comes back with its nugget of information, MapReduce quickly assembles the responses into an answer. It was developed by University of Washington alumnus Jeffrey Dean.

Students rushed to sign up for Google 101 as soon as it appeared in the U-Dub syllabus. Within weeks the students were learning how to configure their work for Google machines and designing ambitious Web-scale projects, from cataloguing the edits on Wikipedia to crawling the Internet to identify spam.

Luck descended on the Googleplex in the person of IBM Chairman Samuel J. Palmisano. The winter day was chilly, but Palmisano and his team sat down with Schmidt and a handful of Googlers and discussed cloud computing. It was no secret that IBM wanted to deploy clouds to provide data and services to business customers.

Over the next three months they worked together at Google headquarters. The work involved integrating IBM's business applications and Google servers, and equipping them with a host of open-source programs. In February they unveiled the prototype for top brass in Mountain View, California, and for others on video from IBM headquarters in Armonk, New York.

The Google cloud got the green light. The plan was to spread cloud computing first to a handful of U.S. universities within a year and later to deploy it globally. The universities would develop the clouds, creating tools and applications while producing legions of new computer scientists.

Yahoo Research Chief Prabhakar Raghavan says that in a sense, there are only five computers on earth: Google, Yahoo, Microsoft, IBM, and Amazon.

Tony Hey, vice-president for external research at Microsoft, says research clouds will function as huge virtual laboratories, with a new generation of librarians curating troves of data, opening them to researchers with the right credentials.

Mark Dean, head of IBM research in Almaden, California, says that the mixture of business and science will lead, in a few short years, to networks of clouds that will tax our imagination: "Compared to this, the Web is tiny. We'll be laughing at how small the Web is."

The Grid

By Jonathan Leake
The Sunday Times, April 6, 2008

Edited by Andy Ross

The scientists who pioneered the internet have now built a much faster replacement — the grid. At speeds about 10,000 times faster than a typical broadband connection, the grid is the latest spin-off from CERN, the particle physics centre that created the web.

David Britton, professor of physics at Glasgow University and a leading figure in the grid project, believes grid technologies could revolutionise society: "With this kind of computing power, future generations will have the ability to collaborate and communicate in ways older people like me cannot even imagine."

The grid will be activated at the same time as the Large Hadron Collider (LHC) at CERN, based near Geneva. Scientists at CERN started the grid computing project seven years ago when they realised the LHC would generate many petabytes of data per year.

The grid has been built with dedicated fibre optic cables and modern routing centres, meaning there are no outdated components to slow the deluge of data. The 55,000 servers already installed are expected to rise to 200,000 within the next two years.

Professor Tony Doyle, technical director of the grid project, said: "We need so much processing power, there would even be an issue about getting enough electricity to run the computers if they were all at CERN. The only answer was a new network powerful enough to send the data instantly to research centres in other countries."

That network is now built, using fibre optic cables that run from CERN to 11 centres in the United States, Canada, the Far East, Europe and around the world. From each centre, further connections radiate out to a host of other research institutions using existing high-speed academic networks.

Ian Bird, project leader for CERN's high-speed computing project, said grid technology could make the internet so fast that people would stop using desktop computers to store information and entrust it all to the internet: "It will lead to what's known as cloud computing, where people keep all their information online and access it from anywhere."

Although the grid itself is unlikely to be directly available to domestic internet users, many telecoms providers and businesses are already introducing its pioneering technologies, such as dynamic switching, which creates a dedicated channel for internet users downloading large volumes of data such as films.

The LHC Computing Grid

CERN

Edited by Andy Ross

The Large Hadron Collider (LHC), currently being built at CERN near Geneva, is the largest scientific instrument on the planet. When it begins operations in 2007, it will produce roughly 15 petabytes of data annually, which thousands of scientists around the world will access and analyse. The mission of the Worldwide LHC Computing Grid (LCG) project is to build and maintain a data storage and analysis infrastructure for the entire high energy physics community that will use the LHC.

The data from the LHC experiments will be distributed around the globe, according to a four-tiered model. A primary backup will be recorded on tape at CERN, the Tier-0 centre of LCG. After initial processing, this data will be distributed to a series of Tier-1 centres, large computer centres with sufficient storage capacity and with round-the-clock support for the grid.

The Tier-1 centres will make data available to Tier-2 centres, each consisting of one or several collaborating computing facilities, which can store sufficient data and provide adequate computing power for specific analysis tasks. Individual scientists will access these facilities through Tier-3 computing resources, which can be local clusters in university departments.

When the LHC accelerator is running optimally, access to experimental data needs to be provided for the 5000 scientists in some 500 research institutes and universities worldwide that are participating in the LHC experiments. In addition, all the data needs to be available over the 15 year estimated lifetime of the LHC. The analysis of the data, including comparison with theoretical simulations, requires of the order of 100 000 CPUs at 2006 measures of processing power.

A globally distributed grid for data storage and analysis provides several key benefits. The costs are more easily handled in a distributed environment, where individual institutes and participating national organisations can fund local computing resources and retain responsibility for these. Also, there are fewer single points of failure. Multiple copies of data and automatic reassigning of computational tasks to available resources ensures load balancing of resources and facilitates access to the data for all the scientists involved.

There are also some challenges. These include ensuring adequate levels of network bandwidth between the contributing resources, maintaining coherence of software versions installed in various locations, coping with heterogeneous hardware, managing and protecting the data so that it is not lost or corrupted over the lifetime of the LHC, and providing accounting mechanisms so that different groups have fair access.

The major computing resources for LHC data analysis are provided by the Worldwide LHC Computing Grid Collaboration — comprising the LHC experiments, the accelerator laboratory and the Tier-1 and Tier-2 computer centres. The computing centres providing resources for LCG are embedded in different operational grid organisations. The LCG project is also following developments in industry, where leading IT companies are testing and validating cutting-edge grid technologies using the LCG environment.

Google Data Centers

Steven Levy
Wired, October 2012

Edited by Andy Ross

Google can build, organize, and operate a huge network of servers and fiber-optic cables with an efficiency and speed that rocks physics. Google has spread its infrastructure across a global archipelago of massive buildings where a huge number of machines process and deliver the continuing chronicle of human experience.

This network of thousands of fiber miles and those many thousands of servers add up to the mother of all clouds. This multibillion-dollar infrastructure allows the company to index 20 billion web pages a day. To handle more than 3 billion daily search queries. To conduct millions of ad auctions in real time. To offer free email storage to 425 million Gmail users. To zip millions of YouTube videos to users every day. To deliver search results before the user has finished typing the query. In the near future, when Google releases the wearable computing platform called Glass, this infrastructure will power its visual search results.

Google decided to build and operate its own data centers more cheaply and efficiently than anyone had before. They saw that server rooms did not have to be kept so cold. The machines throw off prodigious amounts of heat, and traditional data centers cool them off with giant air conditioners that require massive amounts of energy.

Google keeps the cold aisle in front of the machines at 30 C, and the hot aisle at the rear of the servers hits 50 C. The heat is absorbed by coils filled with water, which is then pumped out of the building and cooled before circulating back inside. To cool the water, the data centers employ giant towers where the hot water trickles down through vast radiators. Google installs a backup battery next to each server, doing away with the need for uninterrupted power supply systems, which leak electricity and require their own cooling systems.

These innovations help Google achieve unprecedented energy savings. The standard measure of data center efficiency is power usage effectiveness, where 1.0 is perfect and 2.0 says half the power is wasted. Google gets an amazing 1.2.

Google builds much of its own equipment. The company knows exactly what it needs and saves money by not buying unnecessary extras. And in the early 2000s, taking advantage of the failure of some telecom operations, Google began buying up abandoned fiber-optic networks, paying pennies on the dollar. Now the company has built a mighty empire of glass.

Over the years, Google has also built a software system that allows it to manage its countless servers as if they were one giant entity. Its developers can act like puppet masters, dispatching thousands of computers to perform tasks as easily as running a single machine. In 2002 its scientists created Google File System. MapReduce, a Google system for writing cloud-based applications, was so successful that an open source version called Hadoop has become an industry standard. Google tackles load balancing with an automated system called Borg. Google engineers just write their production code, and the system distributes it across the server floor.

To ensure reliability, Google has innovated an answer involving people. The Site Reliability Engineering team have ultimate responsibility for keeping Google and its services running. SREs are not merely troubleshooters but engineers who are also in charge of getting production code onto the bare metal of the servers. Every year, the SREs run a simulated war, called disaster recovery testing (DiRT), on the infrastructure.

Google says it has hundreds of thousands of servers. But the magic number is basically meaningless. Today's machines, with multicore processors and other advances, have many times the power and utility of earlier versions. In any case, Google intends to render its present data centers obsolete.

The Borg

By Cade Metz
Wired, March 2013

Edited by Andy Ross

The Borg is the software system at the heart of Google. Borg parcels out work across Google's vast fleet of computer servers. It is one of the secrets of Google becoming the dominant force on the web. Google has been using it for about 10 years and is now building a new version called Omega.

Borg provides central control of tasks across Google's data centers. Rather than building a separate cluster of servers for each software system, Google defines a cluster that does several jobs at once. All this work is divided into tiny tasks, and Borg sends these tasks wherever it can find free computing resources. Minimizing the complexity of task allocation let Google reduce the size of its infrastructure by a few percent.

Twitter is a much small company, but its engineers built a similar system using the open-source platform Mesos. Ben Hindman founded the Mesos project at UC Berkeley and now runs Mesos at Twitter. When he was at Berkeley, multicore processors were new. Traditionally, a chip had one processor that ran one task at a time. A multicore processor lets you run many tasks in parallel. Hindman built a system that ran multiple software apps evenly across all the cores, to balance the load and use the full power of the chip.

Hindman worked with a single computer but he could apply his system to an entire data center. Traditional data management systems like Hadoop run one massive server cluster. If you want to run another distributed system, you set up a second cluster. But Hindman and his team found that they could run distributed systems more efficiently if they Mesos ran many systems on a single cluster.

Google helps fund the Berkeley lab that developed Mesos. Omega brings the Borg closer to the Mesos model. Both cluster management systems let you run multiple distributed systems on the same cluster of servers. They also provide an interface for software designers to run their own apps, but the Borg interface offers lots of controls for engineers to tweak. Omega will hide most of them and automate more. It runs the data center like one big computer. The next step is to run the world.