LHC Computing
Articles,  Blog

LHC Computing

If you’ve watched a video or two of mine,
by now, you’ll have recognized that I am a physicist. And not just any kind of physicist. I’m
a particle physicist, which means that I get all excited about quarks and leptons and big,
monster particle accelerators. I mean- this stuff is totally cool. By processing an enormous amount of data taken
by the LHC, my colleagues and I can learn a great deal about the laws that govern the
universe. That phrase “processing the data” is not
something easily accomplished. To do that requires tremendous computer resources and
that’s the point of today’s video. So how much data are we talking about? Just
how much information is recorded by an LHC experiment? I could make this a short video and just say
twenty petabytes a year. But if you’re not a computer wonk, you probably won’t know
what that means. So let’s get some context. Computer information is stored in a series
of ones and zeros. Eight ones and zeros is called a byte. After that, we use the metric
system to name larger and larger sets of data. A kilobyte is a thousand bytes of information,
a megabyte is a million and a gigabyte is a billion. And a gigabyte is already a lot of information.
A gigabyte is seven minutes of HD-TV. Two gigabytes is the information stored in a shelf
of books 60 feet long. And a standard DVD can hold 5 gigabytes. However, gigabytes are small potatoes in the
LHC world. A terabyte is a trillion bytes and a petabyte is a quadrillion bytes. In
other words, a petabyte is a million gigabytes. And a petabyte is the most relevant unit for
the LHC data. So how big is a petabyte? Suppose that we represent a single byte by
a floor tile that is half a square meter.  That’s a square 70 centimeters on a side
or a little over two feet square for my American viewers.
  A kilobyte is then 500 square meters, which
is an eighth of an acre, or half the size of the size of the parcel of land your house
sits on if you’re a typical American suburban homeowner with a quarter-acre lot.
  A megabyte is much bigger and corresponds
to the size of the Pentagon if you include the parking lots.  That’s half a square
kilometer.  It’s also about the size of Vatican City.
  A gigabyte is a thousand times bigger still
and is the size of Tulsa, Oklahoma, a fine town if ever there was one, and birthplace
of Route 66. It has an area of about 500 square kilometers.
  A terabyte is equivalent to half a million
square kilometers.  That’s about the same as the combined area of four U.S. states:
Illinois, home of Fermilab, my favorite laboratory, plus Indiana, Wisconsin and Ohio. If you’d
like to imagine a singe country instead, that’s the area of Thailand.
  But to get a petabyte, this is represented
by half a billion square kilometers and, for that, you need the surface of the entire Earth.
  I hope this cements just how big a petabyte
is.  If a byte is as big as a floor tile, a petabyte is the surface of the entire planet. 
And remember that the LHC experiments record lots of petabytes per year, and that is a
ton of data. CERN is ready for this enormous amount of
data. Combined with the Wigner data center in Budapest, Hungary, CERN has available 150
petabytes of disk storage. That’s enough to store over a thousand years of HD movies.
The CERN computing facility can absorb up to 10 gigabytes a second. And each year, the
LHC experiments generate over 50 petabytes of data that is stored to tape. So I’ve just been talking about storage
capacity, but you also need computers to crunch the data. For the CMS experiment, we’re
talking about 100,000 independent CPU cores, spread across the globe in a giant network
called the Grid. The Grid consists of over sixty independent computer centers, distributed
across the world. If you try to run a computer program that analyzes LHC data, the system
scours the world for un-utilized computers and runs your program on the distant computer.
When the computer is finished, it ships back the result to you. If you’re going to be shipping data all
across the world, you need excellent connectivity. You really do need primo networks. To give you a sense of scale, if you needed
to send a petabyte of data from Europe to the US using DSL, it would take ten years.
Even using the network cable connection you might have to your house would take eight
months. However, using the state-of-the-art Transatlantic links that run at 340 billion
bits per second, we can transfer a petabyte of data in under seven hours. That’s smokin’. When you get right down to it, the discoveries
of the LHC rely crucially on computing systems around the world. And that trend will continue.
So the next time you feel the need to swear at the network responsivity of your home computer,
keep in mind the problems of the LHC computer professionals. Your problems could be way


  • Nite Explorer

    so how are all these bytes stored? on lol hard drives? Fermilab videos are low on information and high on tooting their horn, toot, toot $,$, you might as well send all your data to Meca, remember what Obama said Muslim science helped make America

  • MRK

    Would love to learn more about how this data is translated to information that we can interpret and what the process of data scouring is like at the LHC

  • Sean Rhoades

    Nice explanation. Putting this into perspective was very helpful. CERN's network cables must be fiber optics, is my guess.

  • International Space Station

    Not to mention everybody running BOINC on their computers 😛 
    Although it sort of melts my laptop. BUT I'M HELPING THE CAUSE so it's totally worth it. TOTALLY. WORTH. IT. BECAUSE SCIENCE.

  • Astrogirl1usa

    What a great demonstration of just how much a Petabyte is.  Thank you Don!   I love these videos, keep them coming.  🙂

  • Jess Stuart

    The kilo, mega, giga, etc prefixes for bytes are based on powers of 2, not 10 like the metric system.

    1kB = 2^10 = 1,024
    1MB= 2^20 = 1,048,576
    1GB= 2^30 = 1,073,741,824
    1TB= 2^40 = 1,099,511,627,776
    1PB=2^50 = 1,125,899,906,842,624

  • Ben Samuel

    I have never understood these weird analogies to try and explain big numbers. You start off with "if a byte is like a floor tile," and finally show us that a petabyte would be like the Earth's surface. But all you've really explained is the area of a trillion floor tiles.

  • Zbigniew Zdanowicz

    Kilobyte is 1024 bytes, megabyte is 1024 kilobytes, gigabyte is 1024 megabytes, so it is not natural conversion from other units like kilogram is 1000 grams and megagram is 1000 kilograms. I can understand, why this is missed, so general audience can grip on these numbers and units and I like all Fermilab videos.

  • Eclecticer T Perplexer

    Future AI grid, FERMILAB + CERN + Neutrino Research – Look for system analyzer / architect and coder Quinnn Michaels.

    What is Tyler and the Tyler Advanced Intelligence Network?
    The TAIN is a collection of personas that are/were developed for use on the future Ai Grid that is being developed by CERN using the White Rabbit network protocols and hardware.

    At the center of a collider complex is an intelligence architecture/framework/application that manages data through the use of intelligent agents who report data, maintain system stability, and accomplish general computing functions at subnanosecond speeds. …Quinn Micheals #TeamTyler #Tyler

  • The Artificial Society

    A petabyte is really just 2000 home hard drives. Thats a fair amount but not so much that you need to use facilities all over the world. In terms of cost, for that many home computers, your only talking a couple million dollars. Add the inflated pricing of servers and your talking 20 million dollars. Considering how many billions of dollars was spent on the facility, the computer aspects of it seem modest.

  • Võ Thái Sơn

    What is the difference between a physicist and a computer scientist?

    The physicist says 1 kilobyte equal to 1000 bytes
    The computer scientist says 1 kilometer equal to 1024 meters.

  • RME76048

    3:46 — stored to tape? Interesting… I recall from way back when robotic tape libraries (no, not the big 9-track reel-to-reel 6250 BPI tapes, the 3480-style carts). So, are tapes still used for mass storage? What is the cost (cash and time) for storage, retrieval, data redundancy (for errors) and the ultimate slow-down of access for data reduction versus conventional magnetic disk platter systems and SSDs? Just curious.

Leave a Reply

Your email address will not be published. Required fields are marked *