Painting the bike shed
I thought I would carry the anti-stealth torch down into our development cave and write a few words about the technology we are using. Charlie asked me to be nice and I promised to try.
Charlie and I share some strong opinions about what we do not want to do in our business, such as being all league of super-secrecy assholes and slinging around NDA's. Similarly, our technical strategy can be defined as much by what we are not going to do as by what we are going to do. For example, we are not interested in re-inventing proven technology such as writing our own fucking compilers, xml parsers, or web frameworks. No. Please go directly to FAIL. We are not trying to win a nobel prize in web 2.0. Nor are we here to solve difficult academic problems. Academia will have to stagger along without our assistance for just a bit longer.
Make no mistake - our technology is going to be fucking great, but it's not our business, it's in service of our business. It's a vehicle not a destination. If we do a good job on the tech side, my hope is that most people will not notice or think about our tech. I hope instead that people notice the great content and can focus on what they came to the site to do.
So what are we planning to use? We are going to use a dynamic language to take advantage of rapid development cycles and prototyping. We're using a variant of the battle-proven LAMP stack which for us means Linux*, Apache, MySQL, and Perl (* the L may actually end up being FreeBSD or OpenSolaris). You can check out some other software components we're considering at ohloh.
"Why aren't you doing it in VisualBasic-on-Vicodin or Super-New-Lang++??" Because those languages suck and they don't scale. Obviously.
Regardless of what technology we choose, in the end it's going to come down to disk I/O, most often around the database. Our performance will likely be determined by how well we avoid touching the disk (through caching strategies) and by reducing resource contention through distributing the load across as many spindles as possible.
Our architecture is also based on a reliable template - lightweight caching reverse proxys in front, proxying to the heavier app servers, which in turn are driven by the databases. We will aim for a shared-nothing architecture - decouple everything, push the state out to the client, compress, minify, and cache the static content, eventually pushing it out to a CDN.
Far more important than our choice of technology is how we execute on the following:
Choosing a particular technology is certainly a very important decision, but I feel like it's a bit of a "color of the bike shed" question - it is just one component in a much larger strategy.
So back-seat drivers: welcome aboard! Exits are located here and here. Please argue among yourselves and once you all decide what color we should paint the bike shed, please let us know.
Charlie and I share some strong opinions about what we do not want to do in our business, such as being all league of super-secrecy assholes and slinging around NDA's. Similarly, our technical strategy can be defined as much by what we are not going to do as by what we are going to do. For example, we are not interested in re-inventing proven technology such as writing our own fucking compilers, xml parsers, or web frameworks. No. Please go directly to FAIL. We are not trying to win a nobel prize in web 2.0. Nor are we here to solve difficult academic problems. Academia will have to stagger along without our assistance for just a bit longer.
Make no mistake - our technology is going to be fucking great, but it's not our business, it's in service of our business. It's a vehicle not a destination. If we do a good job on the tech side, my hope is that most people will not notice or think about our tech. I hope instead that people notice the great content and can focus on what they came to the site to do.
So what are we planning to use? We are going to use a dynamic language to take advantage of rapid development cycles and prototyping. We're using a variant of the battle-proven LAMP stack which for us means Linux*, Apache, MySQL, and Perl (* the L may actually end up being FreeBSD or OpenSolaris). You can check out some other software components we're considering at ohloh.
"Why aren't you doing it in VisualBasic-on-Vicodin or Super-New-Lang++??" Because those languages suck and they don't scale. Obviously.
Regardless of what technology we choose, in the end it's going to come down to disk I/O, most often around the database. Our performance will likely be determined by how well we avoid touching the disk (through caching strategies) and by reducing resource contention through distributing the load across as many spindles as possible.
Our architecture is also based on a reliable template - lightweight caching reverse proxys in front, proxying to the heavier app servers, which in turn are driven by the databases. We will aim for a shared-nothing architecture - decouple everything, push the state out to the client, compress, minify, and cache the static content, eventually pushing it out to a CDN.
Far more important than our choice of technology is how we execute on the following:
- Data Model - the design of the database will be a major factor as will the degree to which we de-normalize and duplicate data for performance gains.
- Algorithmic approaches to optimization - one of my favorite hobby horses. Optimization is not only about optimizing the code that is the biggest bottleneck. Instead of spending an hour optimizing a loop iteration for a 25% speed increase (which, don't get me wrong, can be significant), realizing that instead of looping over an array and performing slow_ass_function() on every item, you can skip the loop if condition x is met and get a 200% performance increase. Code optimizations, while important, generally lead to modest performance increases, while smart algorithmic optimizations often achieve order of magnitude performance increases.
- Caching at every level - at the browser, static content on the server side, database, and in the code.
- Efficient content delivery - compression, minification, and eventually CDN.
- Cost-effective content delivery - we won't come out of the gate with F5's blazing (as much as I love them and can't wait until we legitimately need them). We will conserve our cash and ramp up more gradually by starting out with software-based load balancing.
- Testing and coverage - code quality should be more quantified than "uh it works for me."
- Development process - if our development process doesn't scale, we'll be just as screwed as if our architecture didn't scale. Communication, avoiding regression, measurability, and low-friction collaboration are very important. Maintainability is also crucial - keeping the codebase under control and trying to avoid trading instant (coding or feature) gratification for technical debt.
Choosing a particular technology is certainly a very important decision, but I feel like it's a bit of a "color of the bike shed" question - it is just one component in a much larger strategy.
So back-seat drivers: welcome aboard! Exits are located here and here. Please argue among yourselves and once you all decide what color we should paint the bike shed, please let us know.
0 TrackBacks
Listed below are links to blogs that reference this entry: Painting the bike shed.
TrackBack URL for this entry: http://machinetext.com/mt/mt-tb.cgi/4






I can't believe you're not building it with Rails!
jk ;-)
dude i just tried one of the exits and reached the end of the interntets WTF.
great post, keep em coming
Go directly to FAIL...hhahaha you win.
I see you are having trouble finding your blogging voice. On request: try to increase the number of references to your past employer so as to make your blog even more enjoyable for me!
Peter, any resemblance to actual businesses living or dead, is entirely coincidental.