Drupal Performance White Paper - Drupal and the LAMP/LEMP stack
Drupal is a scalable, flexible, and open source content management system that is built to run on a variety of server architectures. The only real requirement is that PHP runs on your system. You can run Linux, Microsoft, Mac OS X, etc., along with Apache, IIS, nginx, MariaDB, MySQL, PostgreSQL, etc. if you're willing to do a few extra things.
However, the overwhelming majority of Drupal websites use the most popular LAMP stack on the backend: Linux, Apache, MySQL and PHP, or the 'LEMP' variation, with Nginx instead of Apache. This white paper (which is a living document – I'll be updating it as time progresses) provides my thoughts on performance considerations for Drupal on a LAMP stack, but this information can be used for pretty much any system on any server, if you look at the basic principles.
- Front End Performance
- Drupal Performance
- Apache Performance
- PHP Performance
- MySQL Performance
- Linux/Server Tuning
- Disaster/Data Recovery
- Other Tuning/Expanding Horizontally
These three technologies rule the web. Almost everything worth doing on the web involves these three languages at some point. There are a ton of things you can do to speed up your website by simply looking at the code generated by your website that reaches the end-user—in fact, you should do this before even thinking about looking at the LAMP stack.
There are a few tools you can use to measure the general performance of your site's front-end:
- YSlow - great baseline front-end page load benchmarking tool
- Google PageSpeed Insights - great baseline front-end page load benchmarking tool
- GTMetrix - online page speed analysis and load time waterfall display using PageSpeed and YSlow
- WebPagetest.org - like GTMetrix, but on steroids with tons more features
- Pingdom Website Speed Test - Like GTMetrix and WebPagetest.
All of these tools will show you how long it takes to load your page, and give suggestions for how you can improve the load time of the front end of your website.
You'll want to run tests at different times, and from as many different locations as you can (especially if your website is used by a geographically-diverse population).
Optimizing CSS and JS
But, in terms of practical performance benefits, it's usually more important to make sure you have as few files transferred between your server and a user's computer as possible. Many sites have 20+ files to serve for a first time visitor—it would be better if there were only a few files at most, because each file means more overhead and delay before all the files are downloaded.
Many modern themes for websites have images included via the site's CSS files as backgrounds, icons, etc. If this is the case, Drupal's CSS Embedded Images module is extremely helpful (in tandem with the CSS Aggregation setting). It will embed the image data in your site's CSS files, so the browser doesn't have to download all the extra images separately.
If you don't want to use the CSS Embedded Images module, another great technique is using CSS image sprites or SVGs instead of having multiple JPG/PNG/GIF resources. I don't have adequate space to describe how to make image sprites or deal with SVGs here, but a few Google searches should get you up to speed.
For other images you use on your website (like those embedded in a page or a node, or uploaded using the FileField or ImageField module), there are more optimizations you can do to make sure no extraneous data is being downloaded by your site's visitors:
- Yahoo's Smush.it tool allows you to squeeze the fat out of image files (JPEGs, GIFs, and PNG files), and make sure no extra bytes are left on a file as a result of a bad compression technique in your image editor.
- You can do a similar thing on your computer for PNG images with ImageOptim (for Mac), OptiPNG (*nix), or PNGOUTWin (for Windows).
- The Image Resize Filter module uses an image's real size and the size of the image in a node/block/wherever and make a smaller-resolution file if the image has been resized by the user (ever loaded a site and waited a minute for a tiny graphic to load because it's actually a 5 MB JPG?).
CDNs, More Concurrent Downloads
A simple way to speed up the front-end performance of your site (besides simply limiting the number of files that need to be downloaded to make your theme look nice) is to make a faux-CDN using something like Drupal's CDN module or some other technique.
This will allow browsers to download more resources at a time (browsers typically only download 2-4 files concurrently from a single domain, like example.com). If you configure your server correctly, you can set it so that file resources and such don't have any associated cookies, and could even be served on a different server like lighttpd or nginx.
You can also employ a 'real' CDN or a separate server (many people use Amazon or CloudFlare) to cache files closer to site end users, and have them served more directly to your site's users, without the request even needing to be sent to your webserver. This allows your server to concentrate only on building pages and serving dynamic data and resources—files, stylesheets and CSS files are changed infrequently, and can be served off-site to allow a web browser to load a full page faster.
One of the first things that you should do on your Drupal site (or any site that has a modular architecture) is re-evaluate whether you need all the modules/plugins/theme features you have installed. Is this site pretty much set in stone, design-wise? Do you still need Views UI, theme helpers, etc. clogging up page loads and caches?
Most Drupal websites could gain quite a bit of speed by simply cutting down the modules in use by a third, or even a half. And, typically, users will be happier—especially if some of the superfluous features were slowing down their browsing! Watch out especially for modules like the statistics module and watchdog. While their features may be nice to have, they can be database performance hogs. You should always see if there's an easier or better way to achieve what a particular module does. For example, rather than enable Statistics, use Google Analytics on your site, or another stat tracker to see how many people visit your site. And rather than watchdog, use Syslog and view logged messages on your server directly.
Drupal Performance Measurement
To get an understanding of what pages take longer to load than others, what particular database queries take a long time, or to keep a log/history of page performance over time, look no further than the robust Devel module. I have it installed on all my sites, though I disable it when I'm not using the development tools or performance monitoring actively.
The Views module (which I use on most every site I've ever built, because it's so darn versatile, and fairly optimized out of the box) has a built-in performance monitoring tool—if you preview your view in Views' edit mode, you can see the time it took to get the database results, and the time it took Views to render that data, below the preview of your Views output.
The Drupal handbook has more server tuning considerations to read, but I'll be covering many of those considerations in later parts of this white paper (especially server/software tuning).
Caching is Your Best Ally
Using any content management system means you have content stored in some sort of database that must be retrieved and displayed to the end user. This means there are many steps involved, all of which can be sped up to deliver better site performance. Drupal has to connect to the database. The database has to retrieve the data, and send it to PHP. PHP/Drupal has to process this data, make it look nice, then present the results to Apache. Apache sends the data to the end user.
On every single level of your website/server, you can avoid the overhead of retrieving, processing, and sending data by using different caching mechanisms. Caching basically means storing the result of some previous operation so that the operation doesn't need to be run again to get the same data. And it helps. A lot.
At the most basic level, turn on Drupal's built-in page cache (read more here). If you can avoid using any modules that don't work with Aggressive caching, turn on Aggressive caching. Normal caching will help, too. When an anonymous user goes to a page on your site, Drupal needs only to retrieve the page from a special cache, already rendered, rather than go through the whole process of building the page again.
Stepping it up a notch, if you have mostly anonymous user traffic on your site, you can give your site hundreds or thousands of times higher performance and capacity by simply using a cache system like Varnish or Boost. These systems don't even require Drupal, PHP, or MySQL to do any work at all, because an actual file is saved and served up to the end user outside of the database.
I use Boost on most of my simple/smaller sites, and it even works on shared hosting—you simply add a bit of code to your website's .htaccess file (more on that later), then configure Boost to save pages to your server's filesystem. Boost will then help Apache to serve the files directly rather than call on Drupal to serve the files, saving your server from doing a TON of work.
If you want to get started with Varnish, there are a lot of great tutorials, but one of the latest and most complete is this presentation from Tree House Agency, Coat Your Website in Varnish.
Going a little deeper, we can do a few things to speed up caches that are normally served by MySQL by using dedicated caching mechanisms like memcached or Redis. These modules tie into server-side caching systems that store frequently-accessed pages and database results in your server's memory, meaning the server can get this information more quickly, and this even increases performance logged-in users. I'll talk a little more about the server-side use of these tools later.
Views, Blocks, and Panels also have built-in caching mechanisms. At a minimum, you should turn on time-based caching for almost any view or panel on a site, except in specific circumstances where it wouldn't make sense (e.g. an up-to-the-moment stock ticker or something of that nature).
Three Drupal features that are used on most every site stand out as easy candidates for using external services or integrations instead of the built-in solution:
- Use Google Analytics (mentioned above) or another stat tracker instead of Statistics/Tracker modules.
- Use Syslog instead of Watchdog/DBLog. If you need a pretty front-end to your logs, consider using something like ELK to visualize and search log entries.
- Use Apache Solr Search (or Search API Solr) instead of built-in search. (Note: I also run Hosted Apache Solr for Drupal, which is an inexpensive hosted Solr search provider, if you can't install Solr on your own server).
.htaccess rules - Etags, etc.
httpd.conf tuning - max connections, max connections per child
Alternatives to Apache - nginx, lighttpd
Opcache, APC cache.
MySQL gets trashed a lot these days because people think NoSQL-style data storage is blazing fast (comparatively), and MySQL and other relational databases are dated. MySQL doesn't have to be slow, though, and many millions of users are quite satisfied with its speed and reliability. There are a few things you can (and should) do to speed up database access before jumping on the NoSQL bandwagon, or dumping MySQL for the latest version of PostgreSQL or MariaDB. It might save you a ton of time and hassle.
Tune your queries
If you do nothing else, simply monitoring query performance and tuning your individual queries will get you very far—especially on lower-traffic sites. Often times, by changing the structure of a query, you'll be able to radically speed up the query. Using MySQL's EXPLAIN feature and the
slow_query_log is very helpful here.
Additionally enable the Devel module's query log while you're developing your site so you can see a list of every query and the time the query took to complete at the bottom of every page on the site. This is a quick and easy way to see from where queries are coming, and how long they're taking. It also helps to see if there are a lot of duplicate queries on a page—that's one place a query cache comes in handy.
Use Indexes Properly
Eliminate Disk I/O Contention
Much of the slowness associated with SQL queries (especially those with complex table joins, involving temporary table creation that requires writes to the disk) is caused because of the relative slowness of hard disk I/O (even with SSD storage). Hard drives are thousands of times slower reading and writing data than RAM. There are a few things you can do to greatly speed up disk access (besides simply caching things in RAM or using memcached... you have to hit the hard drive sometimes!):
- Move database files to another drive on your system.
- Use an SSD (either as a dedicated database drive, or as your system's main drive—make sure you back it up or have it in a RAID array, though! SSDs can fail just like normal spinning drives). Solid State Disks are still slower than RAM, but they're much faster than hard drives, especially for random reads and writes—this is a huge boon for database access!
- Use a RAID array instead of a single disk. Using RAID 0 or RAID 10 will give you a much faster disk system than using a single drive. If using a non-mirrored RAID array (like RAID 0, which I don't really recommend), make sure you backup or use master-master replication so you always have a hot copy of the database in case of failure. Hard drives fail often, and at the worst possible moment.
Know Your Table Structures
One time, when I was importing a huge dataset into a new Drupal site (on a server with a super fast SSD), I noticed that switching the tables from InnoDB to MyISAM sped things up a very large amount. It turns out that InnoDB actually logs every row it writes to disk, and does some other row-level and transaction-level accounting that slows things down considerably if configured incorrectly.
When you're just adding data here and there (like normal site operation), that's a good thing, and row-level locking is one of the nice advantages of using InnoDB tables. However, when importing huge amounts of data, you might want to consider following some of these InnoDB Performance Tuning Tips suggested by MySQL.
Tune your my.cnf file
Find your system's my.cnf file, and tune the following settings (others may need tweaking as well, depending on your circumstances) for Drupal:
innodb_buffer_pool_size(if you have room, try to set this to 1.5x the size of your Drupal database, so all or most of the database reads can be performed in RAM)
key_buffer_size(for MyISAM index cache)
query_cache_size(usually helpful for environments with mostly reads)
Use Memcached or Redis
Memcached is a blazing-fast in-memory caching backend that can be used to cache database results, and avoid expensive and slow database queries. There's a Drupal module for Memcached, and it's not too hard to set up.
Alternatively, you can use Redis, which is about as reliable and fast as Memcached for most Drupal sites.
SSD Drives, increase RAM, RAID array
Processor, Network Interface
Munin/Cacti as essential monitoring tool
What's the slowest kind of website? One that doesn't load.
It happens to everyone at some point or another—no matter how many redundant servers and failover plans you have, no matter how many data centers you're housed inside, no matter what CDN you use, you're going to have your website go down or your data become corrupted at some point.
It's a good idea to make a great data/disaster recovery plan, and practice implementing it. This is universal to all IT operations—but is especially important in web development, because many websites translate minutes of downtime or lost data into hundreds or thousands of lost dollars or mindshare.
This could actually be the first line-item in your list of critical issues to tackle for any important website. But this white paper is more about the front-end performance of the website, and less about stability, backup, etc. I'll still write a little bit about some of my favorite tools and methods for backup and recovery in Drupal—some I use on almost every website; others I use on the more mission-critical sites.
Backup and Restore module
Shell script backups, cPanel backups, drush sync
Multiple data locations, multiple backups, non-electronic backups (EMP recovery).
Operating in the Cloud
Amazon EC2, S3, CDNs, etc.
Memcached clusters, Solr clusters, MySQL clusters, etc.
Here are some other articles on Life is a Prayer.com that you may be interested in reading. Also, be sure to check out Jeff Geerling's blog.