Drupal Performance White Paper - Drupal and the LAMP stack
Drupal is a scalable, flexible, and open source content management system that is built to run on a variety of server architectures. The only real requirement is that PHP runs on your system. You can run Linux, Microsoft, Mac OS X, etc., along with Apache, IIS, nginx, MariaDB, MySQL, PostgreSQL, etc. if you're willing to do a few extra things.
However, the overwhelming majority of Drupal websites use the most popular LAMP stack on the backend: Linux, Apache, MySQL and PHP. This white paper (which is a living document – I'll be updating it as time progresses) provides my thoughts on performance considerations for Drupal on a LAMP stack, but this information can be used for pretty much any system on any server, if you look at the basic principles.
- Front End Performance
- Drupal Performance
- Apache Performance
- PHP Performance
- MySQL Performance
- Linux/Server Tuning
- Disaster/Data Recovery
- Other Tuning/Expanding Horizontally
These three technologies rule the web. Almost everything worth doing on the web involves these three languages at some point. There are a ton of things you can do to speed up your website by simply looking at the code generated by your website that reaches the end-user—in fact, you should do this before even thinking about looking at the LAMP stack.
There are a few tools you can use to measure the general performance of your site's front-end:
- YSlow for FireBug (for FireFox) - great performance measurement tool
- Google Page Speed (for FireFox) - another in-browser performance measurement tool
- GTMetrix - online page speed analysis and load time waterfall display
All of these tools will show you how long it takes to load your page, and give suggestions for how you can improve the load time of the front end of your website.
You'll want to run tests at different times, and from as many different locations as you can (especially if your website is used by a geographically-diverse population!).
Optimizing CSS and JS
But, in terms of practical performance benefits, it's usually more important to make sure you have as few files transferred between your server and a user's computer as possible. Many sites have 20+ files to serve for a first time visitor—it would be better if there were only a few files, because each file means more overhead and delay before all the files are downloaded.
Many modern themes for websites have images included via the site's CSS files as backgrounds, icons, etc. If this is the case, Drupal's CSS Embedded Images module is extremely helpful (in tandem with the CSS Aggregation setting). It will embed the image data in your site's CSS files, so the browser doesn't have to download all the extra images separately.
If you don't want to use the CSS Embedded Images module, another great technique is using CSS image sprites. I don't have adequate space to describe how to make them here, but I offer A List Apart's excellent article, "CSS Sprites: Image Slicing's Kiss of Death."
For other images you use on your website (like those embedded in a page or a node, or uploaded using the FileField or ImageField module), there are more optimizations you can do to make sure no extraneous data is being downloaded by your site's visitors:
- Yahoo's Smush.it tool allows you to squeeze the fat out of image files (JPEGs, GIFs, and PNG files), and make sure no extra bytes are left on a file as a result of a bad compression technique in your image editor.
- You can do a similar thing on your computer for PNG images with ImageOptim (for Mac), OptiPNG (*nix), or PNGOUTWin (for Windows).
- Drupal has an amazing module Image Resize Filter which will look at an image's size and the size of the image in a node/block/wherever and make a smaller-resolution file if the image has been resized by the user (ever loaded a site and waited a minute for a tiny graphic to load because it's actually a 5 MB JPG?).
CDNs, More Concurrent Downloads
A simple way to speed up the front-end performance of your site (besides simply limiting the number of files that need to be downloaded to make your theme look nice) is to make a faux-CDN using something like Drupal's CDN module or some other technique.
This will allow browsers to download more resources at a time (browsers typically only download 2-4 files concurrently from a single domain, like example.com). If you configure your server correctly, you can set it so that file resources and such don't have any associated cookies, and could even be served on a different server like lighttpd or nginx.
You can also employ a 'real' CDN or a separate server (many people use Amazon cloud services) to dump your files off your own server, and have them served from the cloud straight to your site's users. This allows your server to concentrate only on building pages and serving dynamic data and resources—files, stylesheets and CSS files are changed infrequently, and can be served off-site to allow a web browser to load a full page faster.
One of the first things that you should do on your Drupal site (or any site that has a modular architecture) is re-evaluate whether you need all the modules/plugins/theme features you have installed. Is this site pretty much set in stone, design-wise? Do you still need Views UI, theme helpers, etc. clogging up page loads and caches?
Most Drupal websites could gain quite a bit of speed by simply cutting down the modules in use by half. And, typically, users will be happier—especially if some of the superfluous features were slowing down their browsing! Modules like the statistics module and watchdog, while nice to have, are database performance hogs. You should always see if there's an easier or better way to achieve what a particular module does. For example, rather than enable Statistics, use Google Analytics on your site, or another stat tracker to see how many people visit your site.
Drupal Performance Measurement
To get an understanding of what pages take longer to load than others, what particular database queries take a long time, or to keep a log/history of page performance over time, look no further than the robust Devel module. I have it installed on all my sites, though I disable it when I'm not using the development tools or performance monitoring actively.
The Views module (which I use on most every site I've ever built, because it's so darn versatile, and fairly optimized out of the box) has a built-in performance monitoring tool—if you preview your view in Views' edit mode, you can see the time it took to get the database results, and the time it took Views to render that data, below the preview of your Views output.
The Drupal handbook has more server tuning considerations to read, but I'll be covering many of those considerations in later parts of this white paper (especially server/software tuning).
Caching is Your Best Ally
Using any content management system means you have content stored in some sort of database that must be retrieved and displayed to the end user. This means there are many steps involved, all of which can be sped up to deliver better site performance. Drupal has to connect to the database. The database has to retrieve the data, and send it to PHP. PHP/Drupal has to process this data, make it look nice, then present the results to Apache. Apache sends the data to the end user.
On every single level of your website/server, you can avoid the overhead of retrieving, processing, and sending data by using different caching mechanisms. Caching basically means storing the result of some previous operation so that the operation doesn't need to be run again to get the same data. And it helps. A lot.
At the most basic level, turn on Drupal's built-in page cache (read more here). If you can avoid using any modules that don't work with Aggressive caching, turn on Aggressive caching. Normal caching will help, too. When an anonymous user goes to a page on your site, Drupal needs only to retrieve the page from a special cache, already rendered, rather than go through the whole process of building the page again.
Stepping it up a notch, if you have mostly anonymous user traffic on your site, you can give your site hundreds or thousands of times higher performance and capacity by simply using a cache system like Varnish or Boost. These systems don't even require Drupal, PHP, or MySQL to do any work at all, because an actual file is saved and served up to the end user outside of the database.
I use Boost on most of my sites, since it works even on shared hosting—you simply add a bit of code to your website's .htaccess file (more on that later), then configure Boost to save pages to your server's filesystem. Boost will then help Apache to serve the files directly rather than call on Drupal to serve the files, saving your server from doing a TON of work.
If you want to get started with Varnish, there are a lot of great tutorials out there, but one of the latest and most complete is this presentation from Tree House Agency, Coat Your Website in Varnish.
Going a little deeper, we can do a few things to speed up our database access by using caching mechanisms like memcached, or something supported by the Cache Router module. These modules tie into server-side caching systems that store frequently-accessed pages and database results in your server's memory, meaning the server can get this information very quickly, even for logged-in users. I'll talk a little more about the server-side use of these tools later.
@todo - Use Views, Blocks, etc. caching mechanisms.
@todo - Pressflow
Three Drupal features that are used on most every site stand out as easy candidates for using external services or integrations instead of the built-in solution:
- Use Google Analytics (mentioned above) or another stat tracker instead of Statistics/Tracker modules.
- Use Syslog instead of Watchdog/DBLog.
- Use ApacheSolr Search (or Google Custom Search) instead of built-in search.(Note: Midwestern Mac, LLC offers very inexpensive hosted Solr search!).
.htaccess rules - Etags, etc.
httpd.conf tuning - max connections, max connections per child
Alternatives to Apache - nginx, lighttpd
APC cache. Etc?
MySQL gets trashed a lot these days because people think NoSQL-style data storage is blazing fast (comparatively), and MySQL is old school. MySQL doesn't have to be slow, though, and many millions of users are quite satisfied with its speed. There are a few things you can (and should) do to speed up database access before jumping on the hip and cool NoSQL bandwagon. It might save you a ton of time and hassle.
Tune your queries
If you do nothing else, simply monitoring query performance and tuning your individual queries will get you very far—especially on lower-traffic sites. Often times, by changing the structure of a query, you'll be able to radically speed up the query. Using MySQL's EXPLAIN feature and the slow_query_log is very helpful here.
Additionally, while developing, enable the Devel module's query log so you can see a list of every query and the time the query took to complete at the bottom of every page on the site. This is a quick and easy way to see from where queries are coming, and how long they're taking. It also helps to see if there are a lot of duplicate queries on a page—that's one place a query cache comes in handy.
Use Indexes Properly
More to come...
User Master-Master replication
More to come...
Eliminate Disk I/O Contention
Most of the slowness associated with SQL queries (especially those with complex table joins, involving temporary table creation that requires writes to the disk) is caused because of the relative slowness of hard disk I/O. Hard drives are thousands of times slower reading and writing data than RAM. There are a few things you can do to greatly speed up disk access (besides simply caching things in RAM or using memcached... you have to hit the hard drive sometimes!):
- Move database files to another drive on your system.
- Use an SSD (either as a dedicated database drive, or as your system's main drive if you can only afford one—make sure you back it up or use a RAID array, though! SSDs fail often). Solid State Disks are still slower than RAM, but they're eons faster than hard drives, especially for random reads and writes—this is a huge boon for database access!
- Use a RAID array instead of a single disk. Using RAID 0 or RAID 10 will give you a much faster disk system than using a single drive. If using a non-mirrored RAID array (like RAID 0, which I don't really recommend), make sure you backup or use master-master replication so you always have a hot copy of the database in case of failure. Hard drives fail often, and fail hard.
Know Your Table Structures
One time, when I was importing a huge dataset into a new Drupal site (on a server with a super fast SSD), I noticed that switching the tables from InnoDB to MyISAM sped things up a very large amount. It turns out that InnoDB actually logs every row it writes to disk, and does some other row-level and transaction-level accounting that slows things down considerably.
When you're just adding data here and there (like normal site operation), that's a good thing, and row-level locking is one of the nice advantages of using InnoDB tables. However, when importing huge amounts of data, you might want to consider following some of these InnoDB Performance Tuning Tips suggested by MySQL.
Tune your my.cnf file
Find your system's my.cnf file, and tune the following settings (others may need tweaking as well, depending on your circumstances) for Drupal:
- key_buffer_size (for MyISAM index cache)
- query_cache_size (usually good to cache queries for high-read environments)
- innodb_log_file_size / innodb_log_buffer_size
- sort_buffer_size, join_buffer_size, read_buffer_size
Memcache(d) is a blazing-fast in-memory caching backend that can be used to cache database results, and avoid expensive and slow database queries. There's a Drupal module for Memcached, and it's not too hard to set up. It's debatable whether it helps a whole lot if you run it on the same hardware as your database server, because of the overhead of setting up network connections between memcached and PHP, but it's definitely killer for multiple-server setups.
SSD Drives, increase RAM, RAID array
Processor, Network Interface
Munin as essential tool (alternatives?)
What's the slowest kind of website? One that doesn't load.
It happens to everyone at some point or another—no matter how many redundant servers and failover plans you have, no matter how many data centers you're housed inside, no matter what CDN you use, you're going to have your website go down or your data become corrupted at some point.
It's a good idea to make a great data/disaster recovery plan, and practice implementing it. This is universal to all IT operations—but is especially important in web development, because many websites translate minutes of downtime or lost data into hundreds or thousands of lost dollars or mindshare.
This could actually be the first line-item in your list of critical issues to tackle for any important website. But this white paper is more about the front-end performance of the website, and less about stability, backup, etc. I'll still write a little bit about some of my favorite tools and methods for backup and recovery in Drupal—some I use on almost every website; others I use on the more mission-critical sites.
Backup and Restore module
Shell script backups, cPanel backups, drush sync
Multiple data locations, multiple backups, non-electronic backups (EMP recovery).
Operating in the Cloud
Amazon EC2, S3, CDNs, etc.
Hadoop, memcached clusters, MySQL clusters, etc.
Here are some other articles on Life is a Prayer.com that you may be interested in reading. Also, be sure to check out Jeff Geerling's blog.