If you’ve watched a video or two of mine,
by now, you’ll have recognized that I am a physicist. And not just any kind of physicist. I’m
a particle physicist, which means that I get all excited about quarks and leptons and big,
monster particle accelerators. I mean- this stuff is totally cool. By processing an enormous amount of data taken
by the LHC, my colleagues and I can learn a great deal about the laws that govern the
universe. That phrase “processing the data” is not
something easily accomplished. To do that requires tremendous computer resources and
that’s the point of today’s video. So how much data are we talking about? Just
how much information is recorded by an LHC experiment? I could make this a short video and just say
twenty petabytes a year. But if you’re not a computer wonk, you probably won’t know
what that means. So let’s get some context. Computer information is stored in a series
of ones and zeros. Eight ones and zeros is called a byte. After that, we use the metric
system to name larger and larger sets of data. A kilobyte is a thousand bytes of information,
a megabyte is a million and a gigabyte is a billion. And a gigabyte is already a lot of information.
A gigabyte is seven minutes of HD-TV. Two gigabytes is the information stored in a shelf
of books 60 feet long. And a standard DVD can hold 5 gigabytes. However, gigabytes are small potatoes in the
LHC world. A terabyte is a trillion bytes and a petabyte is a quadrillion bytes. In
other words, a petabyte is a million gigabytes. And a petabyte is the most relevant unit for
the LHC data. So how big is a petabyte? Suppose that we represent a single byte by
a floor tile that is half a square meter. That’s a square 70 centimeters on a side
or a little over two feet square for my American viewers.
A kilobyte is then 500 square meters, which
is an eighth of an acre, or half the size of the size of the parcel of land your house
sits on if you’re a typical American suburban homeowner with a quarter-acre lot.
A megabyte is much bigger and corresponds
to the size of the Pentagon if you include the parking lots. That’s half a square
kilometer. It’s also about the size of Vatican City.
A gigabyte is a thousand times bigger still
and is the size of Tulsa, Oklahoma, a fine town if ever there was one, and birthplace
of Route 66. It has an area of about 500 square kilometers.
A terabyte is equivalent to half a million
square kilometers. That’s about the same as the combined area of four U.S. states:
Illinois, home of Fermilab, my favorite laboratory, plus Indiana, Wisconsin and Ohio. If you’d
like to imagine a singe country instead, that’s the area of Thailand.
But to get a petabyte, this is represented
by half a billion square kilometers and, for that, you need the surface of the entire Earth.
I hope this cements just how big a petabyte
is. If a byte is as big as a floor tile, a petabyte is the surface of the entire planet.
And remember that the LHC experiments record lots of petabytes per year, and that is a
ton of data. CERN is ready for this enormous amount of
data. Combined with the Wigner data center in Budapest, Hungary, CERN has available 150
petabytes of disk storage. That’s enough to store over a thousand years of HD movies.
The CERN computing facility can absorb up to 10 gigabytes a second. And each year, the
LHC experiments generate over 50 petabytes of data that is stored to tape. So I’ve just been talking about storage
capacity, but you also need computers to crunch the data. For the CMS experiment, we’re
talking about 100,000 independent CPU cores, spread across the globe in a giant network
called the Grid. The Grid consists of over sixty independent computer centers, distributed
across the world. If you try to run a computer program that analyzes LHC data, the system
scours the world for un-utilized computers and runs your program on the distant computer.
When the computer is finished, it ships back the result to you. If you’re going to be shipping data all
across the world, you need excellent connectivity. You really do need primo networks. To give you a sense of scale, if you needed
to send a petabyte of data from Europe to the US using DSL, it would take ten years.
Even using the network cable connection you might have to your house would take eight
months. However, using the state-of-the-art Transatlantic links that run at 340 billion
bits per second, we can transfer a petabyte of data in under seven hours. That’s smokin’. When you get right down to it, the discoveries
of the LHC rely crucially on computing systems around the world. And that trend will continue.
So the next time you feel the need to swear at the network responsivity of your home computer,
keep in mind the problems of the LHC computer professionals. Your problems could be way