App Server Architecture – Web Development
Articles,  Blog

App Server Architecture – Web Development

Behind HAProxy we actually get into the meat of Reddit’s infrastructure. We had a bunch of app servers. These are physical machines that are running Python programs like you’ve been working on in this class. When I left I think we had maybe 20 of these. It wasn’t that many. How many do you guys have now?>>180. Okay. That’s pretty significant. They went from 20 when Reddit was a big site to 180 when Reddit became a huge site. Now, these are running Python. They’re using a web framework that I talked about previously in this unit called Pylons. Do you guys still use Pylons?>>Yes. This handled just about every request. I don’t recall if we had special web servers for static content. Do you guys have special web servers now for static content?>>We use S3. S3 for static content. Oh, yes. We just transitioned to that at some point before I left. Everything we do is on Amazon AWS. Basically you rent machines from Amazon, and one of the systems in AWS is S3. Maybe you could explain what S3 is an how it works.>>Sure. S3 is a simple storage service, and it’s basically a distributed file storage thing in “the cloud.” Amazon let’s you put objects into the buckets, and it’s literally just a key value store of these files, and other people can hit a URL and grab the object. We store all of our static content like CSS and JavaScript on S3. When you’re hitting the site, you’re actually going via Akamai to S3 instead of our infrastructure. So for static content, a user never even hits HAProxy or the app servers or anything at all. Yes, for the most part. And we can get away with that because the content never changes.>>Right. So I know in the early days of Reddit–and certainly in the applications we’ve been building in this class–all of the content, static or not, gets served from these app servers. As you grow, there is no reason why you need to waste all these resources handling connections for JavaScript and CSS and images. For a while we were using EngineX for the static content. I think we were just in transition when I left.>>Some of the static was on S3. We just moved all of the CSS and stuff in October. We were using EngineX and it got to the point where the one EngineX server doing all of the static content couldn’t handle it anymore when we changed the content. Everybody’s caches were invalidated. So we would do a deploy of new code, and there would be this static content change, and all of a sudden that EngineX would get overloaded and everybody would be getting completely unstyled Reddit. Okay, I’ve seen that happen a few times. So EngineX is just a web server. Back in the early days of the internet, almost all the content online was static. It was served by these things called web servers which basically take HTTP requests, find the file that was in the URL– like the actual file name mapped directly to a file on the web server– and they would serve it. That was basically almost the entire Internet. Over the last 15 years, the content online has changed from basically being 100% static to almost 100% dynamic. In fact, when I started teaching this course, in Unit 1 I was trying to think of a website that was 100% static, and I can’t think of one. Do you have any ideas of one off the top of your head? There has actually been a resurgence of static in the form of websites that get compiled from files, and so there is actually a lot of blogs out there now that are purely static served but are generated from files dynamically. We’re going to talk about how much content Reddit precomputes as like a for of this–we talked about in Unit 6 this notion of caching. You can wait to cache something until it’s needed or you can cache it ahead of time, so that the user never actually touches the database. What you’re getting at is like an extreme form of that. Yeah. You only compile it when you need to, when you change it. So you build up your whole blog, I guess, with static content. Static content–the whole thing is its really easy to serve, because it’s the same for everybody. Having static content in S3 or in EngineX, even when Reddit was pretty big when I left, we could get away with one web server there, because Akamai handled most of the load. Akamai is pretty clever about this piece of content is about to expire. I’ll go ahead and fetch it in advance to keep the users from all slipping through Akamai and pounding this guy.


  • saxxi

    There's so much difference between "teaching" and "doing", you can tell seeing by these two guys how these two worlds are massively distant

  • Richard Harris

    Would be interesting to see where Reddit are at 4 years later.

    If this course has taught me anything, it's that when you start out, just implement it without view to scaling up, as Reddit did. If your project gets popular you can invest time scaling it out later, but if you worry too much it'll never launch at all.

Leave a Reply

Your email address will not be published. Required fields are marked *