I recently did up a diagram of how our Bugzilla site was set up, mostly for the benefit of other sysadmins trying to find the various pieces of it. Several folks expressed interest in sharing it with the community just to show an example of how we were set up. So I cleaned it up a little, and here it is:
At first glance it looks somewhat excessive just for a Bugzilla, but since the Mozilla Project lives and dies by the content of this site, all work pretty much stops if it doesn’t work, so it’s one of our highest-priority sites to keep operating at all times for developer support. The actual hardware required to run the site at full capacity for the amount of users we get hitting it is a little less than half of what’s shown in the diagram.
We have the entire site set up in two different datacenters (SJC1 is our San Jose datacenter, PHX1 is our Phoenix datacenter). Thanks to the load balancers taking care of the cross-datacenter connections for the master databases, it’s actually possible to run it from both sites concurrently to split the load. But because of the amount of traffic Bugzilla does to the master databases, and the latency in connection setup over that distance, it’s a little bit slow from whichever datacenter isn’t currently hosting the master, so we’ve been trying to keep DNS pointed at just one of them to keep it speedy.
This still works great as a hot failover, though, which got tested in action this last Sunday when we had a system board failure on the master database server in Phoenix. Failing the entire site over to San Jose took only minutes, and the tech from HP showed up to swap the system board 4 hours later. The fun part was that I had only finished setting up this hot failover setup about a week prior, so the timing couldn’t have been any better for that system board failure. If it had happened any sooner we might have been down for a long time waiting for the server to get fixed.
When everything is operational, we’re trying to keep it primarily hosted in Phoenix. As you can see in the diagram, the database servers in Phoenix are using solid-state disks for the database storage. The speed improvement when running large queries that is gained by using these instead of traditional spinning disks is just amazing. I haven’t done any actual timing to get hard facts on that, but the difference is large enough that you can easily notice it just from using the site.