Der Spiegel, May 2013
Edited by Andy Ross
Big Data is the next big thing. It promises both total control and the
logical management of our future lives. An estimated 2.8 ZB of data was
created in 2012, with a predicted volume of 40 ZB by 2020. This exponential
growth doubles every two years.
Google and Facebook are giants of Big
Data. But many other organizations are analyzing all this data. Memory is
cheap, so new computers can analyze a lot of data fast. Algorithms create
order from chaos. They find hidden patterns and offer new insights and
business models. Algorithms bring vast power.
Blue Yonder is a small
young company. Managing director Uwe Weiss analyzes the data generated by
supermarket cash registers, weather services, vacation schedules, and
traffic reports. All this data flows into analysis software that learns as
it goes and finds new patterns. Blue Yonder has used its data to drive a
market research system on buying behavior. Weiss: "Big Data is currently
revamping our entire economy, and we're just at the beginning."
Data now brings hope for millions of cancer patients. In the Hasso Plattner
Institute (HPI), in Potsdam, near Berlin, a €1.5 million SAP HANA analytic
engine with a thousand cores has so much memory it can process Big Data
thousands of times faster than other machines. SAP co-founder Hasso Plattner
sponsors the institute and personally pushed the "Oncolyzer" rig. The HANA
in-memory technology has won prizes for innovation and is now the flagship
Researchers at the University of Manchester are working
on another Big Data project to help senior citizens who live alone. The
device is installed on the floor like an ordinary carpet, with sensors
recording footsteps. It can determine whether the person is up and about,
and can analyze activities to see how they compare with the person's normal
movements. Anomalies can trigger an alarm.
The military and
intelligence communities also employ the power of data analysis. Big Data
played a key role in the hunt for Osama bin Laden, leading investigators to
Abbottabad in Pakistan.
California software company Splunk ws named a
few weeks ago as one of the five most innovative companies in the world.
Governments, agencies, and businesses in almost a hundred countries are
customers, as are the Pentagon and the Department of Homeland Security.
Splunk apps analyze data supplied by all kinds of machines, including cell
phone towers, air-conditioners, web servers, and airplanes.
Hamburg-based startup Kreditech lends money via the Internet. Instead of
requiring credit information from their customers, Kreditech determines the
probability of default using a social scoring method based on fast data
analysis. The company extracts as much data as possible from its users,
including personal data from EBay and Facebook profiles and other social
networking sites. It even records how long applicants take to fill out the
questionnaire, the frequency of errors and deletions, and what kind of
computer they use. The more information it has, the higher a customer's
potential credit line.
Kreditech is expanding rapidly in eastern
Europe and plans to launch soon in Russia. But it terminated its service in
Germany when the Federal Financial Supervisory Authority (BaFin) proposed to
examine its business model. The model generates revenue not only from
microcredit deals and interest but also from renting credit scores to other
companies. Despite all this, investors find social scoring very attractive.
Business models like Kreditech's illustrate the sensitivity of the
issues that Big Data raises. Users give up their data freely, bit by bit,
and everyone adds to this huge new data resource every day. But what happens
to a stash of credit profiles if its owners are taken over or go bust?
TomTom, a Dutch manufacturer of GPS navigation equipment, sold its data
to the Dutch government, which then passed on the data to the police. They
used it to set up speed traps in places where they were most likely to
generate revenue from speeding TomTom users. TomTom issued a public apology.
Big Data applications are especially valuable when they generate
personalized profiles. This may be appealing to retailers and some
consumers, but data privacy advocates see many Big Data concepts as Big
Brother scenarios of a completely new dimension.
Many companies say
the data they gather, store, and analyze remains anonymous. But our mobility
patterns alone can be used to identify almost all of us uniquely. The more
data is in circulation and available for analysis, the more likely it is
that anonymity becomes algorithmically impossible.
Most people don't
want companies to store their personal data or to track their online
behavior. A proposed European data protection directive includes a "right to
be forgotten" on the web. But this may be utopian. We face an impending
tyranny of algorithms.
AR I worked in the SAP HANA development team
from 2003 to 2009.
MIT Technology Review, May 2013
Edited by Andy Ross
SAP likes Big Data. SAP is working with young companies to help them take
advantage of its revolutionary HANA in-memory Big Data platform. Some of the
most adroit users of Big Data are small startups. Fortune 1000 CIOs often
say they have a lot of data but haven't yet figured out a way to translate
it into real results.
SAP HANA was the brainchild of Hasso Plattner
and Vishal Sikka. The HANA platform takes advantage of a new generation of
columnar databases running on multicore processors. The entire system is in
RAM, and users say data queries that used to take days now run in seconds.
AR HANA was the brainchild of all of us in the
HANA team too.
Google Brains For Big Data
Wired, May 2013
Edited by Andy Ross
Stanford professor Andrew Ng joined Google's X Lab to build huge AI systems
for working on Big Data.
He ended up building the world's largest
artificial neural network (ANN). Ng's new brain watched YouTube videos for a
week and taught itself all about cats. Then it learned to recognize voices
and interpret Google StreetView images. The work moved from X Labs to the
Google Knowledge Team. Now "deep learning" could boost Google Glass, Google
image search, and even basic web search.
Ng invited AI pioneer
Geoffrey Hinton to come to Mountain View and tinker with algorithms. Android
Jelly Bean included new algorithms for voice recognition and cut the error
rate by a quarter. Ng departed and Hinton joined Google, where he plans to
take deep learning to the next level.
Hinton thinks ANN models of
documents could boost web search like they did voice recognition. Google's
knowledge graph is a database of nearly 600 million entities that when you
search for something pops up information about it to the right of your
search results. Hinton says ANNs could study the graph and then cull the
errors and refine new facts for it.
ANN research has boomed as
researchers harness the power of graphics processors (GPUs) to build bigger
ANNs that can learn fast from Big Data. With unsupervised learning
algorithms the machines can learn on their own, but for really big ANNs
Google first had to write code that would harness all the machines and still
run if some nodes failed. It takes a lot of work to train ANN models.
Training the YouTube cat model used 16 000 chip cores. But then it took just
100 cores to spot cats in videos.
Hinton aims to test a teranode ANN
AR I like the idea of using ANNs for document
search. It will improve result relevance as much as Google probability-based
translation improved quality over rule-based translation.
Mark P. Mills
City Journal, July 2013
Edited by Andy Ross
What makes Big Data useful is software. When the first microprocessor was
invented in 1971, software was a $1 billion industry. Software today has
grown to a $350 billion industry. Big Data analytics will grow software to a
multi-trillion dollar industry.
Image data processing lets Facebook
track where and when vacationing is trending. Looking at billions of photos
over weeks or years and correlating them with related data sets (vacation
bookings, air traffic), tangential information (weather, interest rates,
unemployment), or orthogonal information (social or political trends), we
can associate massive data sets and unveil all manner of facts.
Asimov called the idea of using massive data sets to predict human behavior
psychohistory. The bigger the data set, he said, the more predictable the
future. With Big Data analytics, we can see beyond the apparently random
motion of a few thousand molecules of air to see the balloon they are
inside, and beyond that to the bunch of party balloons on a windy day. The
software world has moved from air molecules to weather patterns.
new era will involve data collected from just about everything. Until now,
given the scale and complexities of commerce, industry, society, and life,
you couldn't measure everything, so you approximated by statistical sampling
and estimation. That era is almost over. Instead of estimating how many cars
are on a road, we will count each and every one in real time as well as
hundreds of related facts about each car.
Big data sets can reveal
trends that tell us what will happen without the need to know why. With
robust correlations, you don't need a theory, you just know. Observational
data can yield enormously predictive tools. The why of many things that we
observe, from entropy to evolution, has eluded physicists and philosophers.
Big data may amplify our ability to make sense of nearly everything in the
The Big Data revolution is propelled by the convergence of
three technology domains: powerful but cheap information engines, ubiquitous
wireless broadband, and smart sensors. Nearly a century ago, the air travel
revolution was enabled by the convergent maturation of powerful combustion
engines, aluminum metallurgy, and the oil industry.
show $3 trillion in the global information and communications technology
(ICT) infrastructure spending planned for the next decade. This puts Big
Data in the same league as Big Oil, projected to spend $5 trillion over the
same decade. All this is bullish for the future of the global economy.
Big Data Analysis
By Jennifer Ouellette
Quanta, October 9, 2013
Edited by Andy Ross
Since 2005, computing power has grown largely by using multiple cores and
multiple levels of memory. The new architecture is no longer a single CPU
plus RAM and a hard drive. Supercomputers are giving way to distributed data
centers and cloud computing.
These changes prompt a new approach to
big data. Many problems in big data are about managing the movement of data.
Increasingly, the data is distributed across multiple computers in a large
data center or in the cloud. Big data researchers seek to minimize how much
data is moved back and forth from slow memory to fast memory. The new
paradigm is to analyze the data in a distributed way, with each node in a
network performing a small piece of a computation. The partial solutions are
then integrated for the full result.
MIT physicist Seth Lloyd says
quantum computing could assist big data by searching huge unsorted data
sets. Whereas a classical computer runs with bits (0 or 1), a quantum
computer uses qubits that can be 0 and 1 at the same time, in
superpositions. Lloyd has developed a conceptual prototype for quantum RAM
(Q-RAM) plus a Q-App — "quapp" — targeted to machine learning. He thinks
his system could find patterns within data without actually looking at any
individual records, to preserve the quantum superposition.
physicist Harvey Newman foresees a future for big data that relies on armies
of intelligent agents. Each agent records what is happening locally but
shares the information widely. Billions of agents would form a vast global
distributed intelligent entity.
MIT Technology Review, October 22, 2013
Edited by Andy Ross
Technology companies and government agencies have a shared interest in the
collection and rapid analysis of user data.
The analyzed data can
help solve problems like obesity, climate change, and drunk driving by
steering our behavior. Devices can ping us whenever we are about to do
something stupid, unhealthy, or unsound. This preventive logic is
coercive. The technocrats can neutralize politics by replacing the messy
stuff with data driven administration.
Privacy is not an end in
itself but a means of realizing an ideal of democratic politics where
citizens are trusted to be more than just suppliers of information to
technocrats. In the future we are sleepwalking into, everything seems to
work but no one knows exactly why or how. Too little privacy can endanger
democracy, but so can too much privacy.
Democracies risk falling
victim to a legal regime of rights that allow citizens to pursue their own
private interests without any reference to the public. When citizens demand
their rights but are unaware of their responsibilities, the political
questions that have defined democratic life over centuries are subsumed into
legal, economic, or administrative domains. A democracy without engaged
citizens might not survive.
The balance between privacy and
transparency needs adjustment in times of rapid technological change. The
balance is a political issue, not to be settled by a combination of
theories, markets, and technologies. Computerization increasingly appears as
a means to adapt an individual to a predetermined, standardized behavior
that aims at maximum compliance with the model patient, consumer, taxpayer,
employee, or citizen.
Big data constrains how we mature politically
and socially. The invisible barbed wire of big data limits our lives to a
comfort zone that we did not choose and that we cannot rebuild or expand.
The more information we reveal about ourselves, the denser but more
invisible this barbed wire becomes. We gradually lose our understanding of
why things happen to us. But we can cut through the barbed wire. Privacy is
the resource that allows us to do that.
Think of privacy in economic
terms. By turning our data into a marketable asset, we can control who has
access to it and we can make money. To ensure a good return on my data
portfolio, I need to ensure that my data is not already available elsewhere.
But my decision to sell my data will impact other people. People who hide
their data will be considered deviants with something to hide. Data sharing
should not be delegated to an electronic agent unless want to cleanse our
life of its political dimension.
Reducing the privacy problem to the
legal dimension is worthless if the democratic regime needed to implement
our answer unravels. We must link the future of privacy with the future of
1 We must politicize the
debate about privacy and information sharing.
We must learn how to sabotage the system with information boycotts.
3 We need provocative digital services to
reawaken our imaginations.
The digital right to privacy is secondary.
The fate of democracy is primary.