Research & High Performance Computing – Computerphile
Articles,  Blog

Research & High Performance Computing – Computerphile

Traditionally where people have used compute in their their research They know what to do because they’ve they’re experts in the field and a lot of our classical users using HPC know exactly what they want they know how to process their data They know they need to use scientific Linux They can write software to process their data, and they can get their own research questions answered by themselves and you know part of information services job is simply just to provide the hardware in the grunt that they need to do that but Increasingly we’re finding that other people are wanting to use compute in their research But they’re not really knowing how they haven’t got those skills so part of my role is to sort of link up their research problem with the skills needs the software and the analysis tools that they need to you know answer their questions I Was a medicinal chemist so that’s a type of synthetic chemist who designs and makes new drugs, so All the different processes through you know designing a small molecule for a particular drug target Doing some computational docking using the hpc to find out of the million possibilities Which ten should I make and then test to see which might be the cure for cancer? asthma Alzheimer’s disease or just the new painkiller I Started off my training doing a PhD in chemistry just the simple. How do you make this compound what reagents to use? How do you make it but as I sort of progressed in my career moved into the drug discovery space? Where the chemistry’s just applied as the technique? That’s just one of the tools and the the interesting and the intelligent thing is This is the drug target. What should I make to interfere with that target to have that biological effect? And my interest really looked it moved into sort of the using computers to answer those kind of questions So what do we make why do we make it? Where should we direct our efforts? Traditionally in all style research it would just be somebody sat in an office thinking about a particular problem, and then proposing an answer But then you know that as a scope and possibilities. They increase you know we’ve got more data available We’ve got more possibilities available it really expands beyond what one person can hold in their head so as research is interdisciplinary and we use chemists biologists engineers mathematicians So that you know in a traditional submit’ like wet organic chemistry we now need to use computers to help analyze possibilities data and questions, so yeah kind of expanding that Research space into using computers is really sort of becoming increasingly more important how I got involved in HP pcs and computing was doing something called docking so if we have a Small molecule of a drug molecule say like aspirin or salbutamol though, we think might be a good molecule in a particular drug target What we can do is use the computer and ask it a question saying does this small drug molecule fit into This receptor protein how good a fit is it? Where does it fit in what shape is it when it fits so what we can do is use specialist software packages to ask that question of hundreds of thousands or even millions of small molecules so you will prepare a question How well do these five million drug molecules fit into this receptor with the question with the software? I’ll submit that question to the HPC queue, and the that’s where the HPC takes over and say okay, this guy’s got this question That’s broken down into actually 25 million sub questions And it’s the HPC scheduler that then splits that job up into separate Nords and distributing out to different processor cores So I could have 2,000 processor cores One working on each drug molecule into that separate receptor when it’s finished with that one It will tell this and a master controller that it’s done and it will be allocated the next one to do so it’s kind of like that the HPC is acting as my research assistant and answering all those millions and millions of questions for me while I’m in the lab making – or Having a cup of coffee or lunch Or chatting to the boss about what the next question is that particular question sounds like a very very complicated Puzzle really isn’t it and it sounds like a really complicated puzzle But that the techniques and the software tools for asking that question. How well does that molecule fit into there a very mature? It’s very well known. It’s very well understood a subject the problem is really the scale and I Can’t have enough computing power to dock every single possible molecule into every single drug target It’s got to be an intelligent choice, but as computing power is ever increasing You know Moore’s law more processors more memory I can ask more of the computer get it to tell me more information so I can concentrate on the the chemistry specific knowledge I’m imagining Researchers from across the university well certainly across the world wanting to to use computers to do these to answer these questions Is there one science that uses it more than others or? No, I don’t think there is really I mean. I think traditionally users have come from physics astronomy chemistry Engineering in particular, but also increasingly biology genomics researchers the life sciences And we’re now seeing people from the social sciences humanity and even arts coming in to start using Computers in their research and are these high-performance computing systems Or clusters are becoming the norm are people expecting these as part of their research now absolutely Yes, you know again Going back years what we need to provide to an academic Would be an office a desk a green board and some chalk now people really expect and you know that high performance Computing facilities are available to them and the university does that by providing HPC facilities like we’re just looking at today But also renting them from cloud providers like Amazon and Azure Microsoft and Google and others as well thinking of those leaders of the field Google and Amazon cloud computing etc Why would you do it yourself if all those options are available was a number of reasons we might want to do it ourselves First and foremost those companies are really good at it But most companies are really good at it because they charge for doing it So there is a higher cost associated with renting somebody else’s kit to do it There are times when we need to do it on-site for security reasons So if we’re working on some very sensitive research material, maybe something in the medical field with patient data We’ll need to guarantee to the funders and who owns that data It’s very safe and secure by keeping it on-site Equally we might be doing something say with genomics research where the quantity of data is so vast in you know Terabytes of data per hour that we need to analyze it and process it here on site And that the costs and the time associated with shipping that data somewhere else to process it and answer those research questions And then bring the data back again is is prohibitive equal at those times where we’re using remote data. Say satellite imagery or data from the Human Genome Project Well that data already exists off in the cloud So it makes far more sense there for our researchers to take the question to the data And analyze that data off-site so kind of by offering both We can hopefully give researchers the ability to choose this one’s better for me or this one’s better for me does it ever go horribly wrong in that you ask a question and you’ve made a mistake and Wasted hours and hours or days of HPC – absolutely, it’s happened to me more times than I care to mention One time in particular I was doing those docking small molecules as a drug mark Targets only a couple of hundred thousand there was some sort of software error it started churning out an ever-increasing error file size of the error file went through 300 gigabytes Blocks the entire system everybody else’s jobs failed 20 or so people quite angry with me that I just killed all of their research But that kind of thing does happen in research and I guess that’s the that’s the computational equivalent of the professor having an explosion in the lab and Spraying stuff all over the room which I’ve also done Maybe I’m in the physics lab working with Professor Moriarty And I want to do some computing time how much it gonna cost me to use that kit That’s a good question at the minute We don’t actually charge our researchers directly for using an HPC facility So there’s no per hour charge for using a Computer core you do know this is going to be available for them to all watch and then they’re all going to start clamoring after this absolutely you know and If more people need more resources and we can provide them we can we can look to work for them what we want our research To be ambitious and to try and push the boundaries, and we can’t do that by restricting Unnecessary access to kit if money doesn’t really come into it in that way how do you decide who does what and are there fights? That erupt as to who needs to compute power more than who else there are vigorous discussions each time we get a new HPC system and you know you you you can see in there it’s a large HPC there are so many processor cores and it can be used for such and such a period of time and there’s always a debate about How much of it at any one time is somebody allowed to use and how long can any one person’s job go on through so? Very much like Tetris how can I fit these variable width and size blocks in? Is it better for me to use a thousand calls for 10 hours or 100 calls for many thousands of hours? And how do I fit that workload it always causes vigorous discussion few disagreements? There’s always morning that the cues far too long, but that’s just how it is Where does the book stop in terms of the decision-making is it left at a software world as a human come in and say? Every time we sort of review the process a group of humans will sit sit down and decide okay we think it’s fair if Everybody’s allowed to run up to a thousand cause for up to four days time and we kind of sit down and make those decisions As a research community, and then the software vigorously enforces those limits for us so you are only allowed up to so many hundreds or thousands and your job is only allowed to run for a Maximum of four days and if it goes over four days it stopped and the next person’s given access to that Resource in a way that could be limiting research, right? It’s something that the researchers have got to trying to fit into their research plans such as in the same way that Office space and laboratory space can be a limiting factor It’s how do I divide my research question up onto the computer am I better? using two thousand calls for a short period of time or is it the kind of question that is Only limited to a thousand cores and needs to run for a longer period of time Do you need to know a fair bit about computing just to make those decisions, though? traditionally yes, I think there has been an expectation you have to know a lot about how computers work to make those decisions, but We are seeing now as the use of computers in research moves into other areas part of my role is to try and help people to Understand how computers think as computers work in a very different way to a researcher they know good at answering the same question for a long period of time in parallel so sort of Changing how you would do it in In person how you would add do some calculations to how a computer would do it lots of different things at the same time it is a Is a step change for some people? The equipment itself is fairly generic. You know that these are standard Blade enclosures the storage is standard storage we have about 240 terabytes in this It’s all connected up by a minivan


  • Jesus of suburbia

    Apperently YouTube notifications brought me here before the first like.
    I mean, what is even this timespan, sometimes it pops up a few seconds after the video is out.
    And sometimes it literally only tells me about new video 30 mins later

    And thats how you write a fancy "first" comment

  • Callum Watson

    "equivelant to the professor having an explosion in the lab, …, which i've also done" – Totally inept professor /s

  • Reckless Roges

    2:28 Hello! I'm from the Internet. I'd advise you to explain what something is before announcing its name. #domainSpecificTerms #YouAreOnTheInternet #BOINC

  • Vivid Abstractions

    Would it be possible to simulate such a computing job? From designing a simple abitrary problem, to implement it into the cluster and see the result?

  • mipmipmipmipmip

    Working in HPC is awesome. You are by definition years ahead of everybody else. The problem is this also means nobody yet understands the problems you run in to 🙂

  • Phroggster

    4,000 core days seems like a pretty generous limit, at least until my job gets stuck on a single-core node for 4,000 days.

  • rageagainstthebath

    I wish to know something. I run some experiments on my Raspberry Pi server 24/7 since it's always on, unlike my main PC. I design them so after a week or two I can read a file and see what it came up with so far without interrupting. Is there a possibility for a researcher to run an HPC task for unspecified period of time and see the results in the meantime? And re-run the task with different parameters if the results went in the wrong direction?

  • Jin

    Pretty sure if Brits didn't waste time and energy on things like how they pronounce 'H' we wouldn't have won our independence. "Haych" sounds ridiculously laborious for such a small thing. It's practically two syllables.

  • dossod

    good job on bluring out the labels out on the pc"s next to the professor ive seen amature people bluring out stuff and ending up screwing up for one frame and reveling the info

  • Jin

    Well done. You should get into the problem of parallelization a bit more. Non-techies don't come in with already parallel algorithms so getting their research goals to run on 1000 cores can be interesting. And in general, even for the best techies, parallelization can be difficult at times.

  • Karl Young

    Back to the Computerphile we know and love – lots of images of blinking lights and big machines that go bing AND we can hear the interview ??

  • Karl Young

    Would love to hear more about any recent algorithmic development in HPC – after a lot of early hype about ways to get around the surface to volume (data to computation) it sounds from this interview that most problems can still only take advantage of HPC if they are “trivially parallelizable”, i.e. a bunch of completely independent processes that don’t share data, and that HPC in this case is just fancy job distribution book keeping.

  • irregularexpression

    I am surprised he did not talk about checkpointing — HPC jobs dump their state at regular intervals to persistent storage so even when the job crashes or gets killed abruptly, it need not start from scratch; only from the last checkpoint saved.

  • metalpachuramon

    It reminds me to that part on the shin Godzilla movie where the japanese found out a way to halt Godzilla's cells but had to figure out the appropriate active molecules to dock with Godzilla's receptors, so they asked Germany (if I recall correctly) unlimited use of their HPCs . I'm glad to see that writers handled these kinds of details

  • MasthaX

    I'd love to work as a tech in such an interesting facility. I've got loads of datacenter, computing and linux experience. Gib me job pls :).

  • Seasinator Sead

    I don't know why but this guy is genuinely the best person I wachted talking on YouTube in 2017 plus the bit for 2018 so far.

  • frognik79

    This is the reason why I hate crypto currency mining.
    All those wasted cpu cycles that could have been used for this type of research.
    Money over human life.

  • Noah Jones

    That doesn’t sound like “high performance” so much as it just sounds like “a lot”. How much does he work with scientists to perfectly optimize their programs to run as quickly as it can on the hardware?

  • Adam Hill

    Surely this drug molecule matching he is referring to will just be calculated once and then saved to a database for further reference, hopefully a public database.

  • Klittan Rosé

    In the biology field for testing receptor sites in molecules and cells, I really hope they do not just test for the intended target site, but also at the same time every other site the same molecule could attach to. That way they can find out possible side effect areas beforehand.

    Another problem could be medicine chirality. Test it in a parallell simulation before developing the drug only to find out afterwards it ALSO attached to a not intended target site somewhere else in the body. Or as in the case of "the pill" it ends up in frogs or crocodiles and make them female thereby destroying whole species because of unintended consequences.

    Test 1: Will it hit target?
    Test 2: What else will it probably hit?
    Test 3: Can we hide our cryptomining operations to the tax payer and server park manager?
    Result: Retire early will billions of bits in the ba, in the vau, on your harddrive.

  • Sina Madani

    HPC and performance evaluation: two things which don't go well at all. HPC is great for researchers outside of Computer Science, but not for software developers.

  • Fiyaaah

    I work as a software engineer in the semiconductor industry and the physicist we work together with knows more about (HP)C than our whole team combined (we only write regular software ourselves). It's quite weird to be outclassed like that on your own area of expertise.

  • bloody_albatross

    OT: Funny, people seem to have real trouble pronouncing azure. I heard Escher, Asia and now ad-sure. All wrong. The German pronunciation (I'm from Austria) is also wrong, but as far as I can tell out of these still the closest (at least to my Austrian ear). Written in phonetic German it should be Azür, I think.

  • krokotube

    Think about what (if) could come out of "daap supercomputers" (off the top of my head, currently the ones in testnets are Golem, SONM, and partly iExec) in terms of scientific research, especially medicine.

  • Max

    I think you should do more videos on parallel programming. It's a very relevant topic at the moment and there's quite a lot to cover, like programming for multiple cores, GPUs, and multiple processors (as in the case of this video). Not to mention processors, threads, scheduling, and all that jazz.

  • MrRobot600

    Dam this guy is what I dream to be. I'm completing my bachelor's in chemistry right now and just did a summer internship in high performance computing at a national laboratory. Before this, I thought I was somewhat knowledgeable about computer science but now it feels like I just know about 1%. Found his university bio but would be interested in to know that apart from chemistry what education/certifications/trainings did he do in HPC? I assume initial hpc knowledge could be self taught as there's ton of information online about it and then forming your own project with hpc

Leave a Reply

Your email address will not be published. Required fields are marked *