The only great code is code that never has to run. Everything else is just good code. So, to start performance tuning, the best option is to first ensure that as little code is run as possible.
The first, quickest, and easiest option is to enable an opcode cache. More information on OpCode Caches can be found here.
Here we see what happens when enabling Zend OpCache, the first row is our baseline, without the cache enabled.
The middle row we see a small performance improvement, and a big reduction in memory usage. The small performance gain is (probably) from the optimizations performed by Zend OpCache in addition to the opcode caching.
The top row is then with both optimizations and the opcode caching, where we see a large performance gain.
Now we take a look at APC before and after. Contrast to Zend OpCache we see a performance degradation on the initial (middle) request as the cache is built, both in time taken and the memory used.
We then see a similar performance increase with the cached opcodes.
The second thing we can do is to cache content — this is an easy option for WordPress with a number of easy to install plugins to do this for us, including WP Super Cache. WP Super Cache will create static versions of your site that expire automatically when things like comments occur depending on your setup (in very high-load situations for example, you may want to disable cache expiration for any reason).
Content caching only works effectively when there are few writes operations (that invalidate cache) compared to reads.
You should also be caching content received by the application from third party APIs, thereby limiting both your latency and reliance on the availability of the API.
Both of these will create static HTML copies of the site to serve instead of generating the page every time its requested, as well as compressing responses.
If you are developing your own application then most frameworks have a caching module:
Another caching option is query caching. For MySQL there is a general query cache which can help tremendously; for other databases, and in addition to the query cache for MySQL, caching result sets in a memory cache like memcached, or cassandra can prove very beneficial.
As with content caching, it is most effective in read-heavy scenarios. The MySQL query cache in particular should not be relied upon for performance as it is easy to invalidate large segments of the cache with minor changes to data.
Query Caching may help performance when generating content caches.
As can be seen below, when we turn on query caching we see wall time reduced by 40%, although memory usage does not change in any meaningful way.
There are three types of caching options available, controlled by the
- Setting the value to
OFFwill disable the cache
- Setting the value to
ONwill cache all selects except those that start with
- Setting the value to
DEMANDwill only cache selects that start with
Also, you should set the
query_cache_size to a non-zero value. Setting it to zero will disable the cache regardless of the
For help in setting these, and numerous other performance related settings, check out the mysql-tuning-primer script.
The primary issue with the MySQL query cache is that it is global, any changes to a table that is part of a cached result set will cause that cache to be invalidated. On high-write applications, this can make the cache next to useless.
As noted earlier, database queries are a common cause of slow-downs, performing query optimization will likely have more low hanging fruit and give you more benefit than optimizing code.
Query optimization will help performance when generating content caches, as well as helping with the worst case scenario of not being able to cache.
In addition to profiling, there is another option for MySQL that can help identify slow queries, the slow query log. The slow query log will log any query that takes longer than a specified time, and optionally queries that don't use indexes.
You can enable the log using the following configuration in your
If any query is slower than
long_query_time (in seconds) then the query is logged to the log file
log_slow_queries. The default value for this is 10 seconds, and the minimum 1 second.
log-queries-not-using-indexes option will make it so that any queries not using indexes are captured to the log.
We can then examine the log using the
mysqldumpslow command bundled with MySQL.
Using these options with our WordPress install, we get the following after loading the homepage and running:
First, note that all string values are anonymized with
S and numbers with
N. You can stop this by supplying the
Next, notice that both of these queries took 0.00s, because this means they came in under the 1 second threshold, that must mean that no indexes were used.
We can examine the cause of the slowdown with
EXPLAIN in the MySQL console:
Here we see that
possible_keys is NULL, confirming that no index is being used.
EXPLAIN is a very powerful tool for optimizing MySQL queries, more information can be found here.
Typically only once you are to the point that you are no longer being hampered by PHP itself (by using an opcode cache), have cached as much content as possible, and optimized your queries, are you are now ready to start tweaking code.
Code and query optimization is also where you will need to have the most performance to create the other caches; the better performant your code is in the worst situation (no cache), the more stable your app will be, and the quicker it will be able to rebuild caches.
Lets walk through (potentially) optimizing our WordPress install.
First, let's take a look at our slowest functions:
Surprisingly to me, the first item in our list is not MySQL (in fact
mysql_query() is fourth), but is instead the
For those who don't know the WordPress codebase, it features an event-based filter system to perform multiple transformations on data sequentially as added by both core and plugins as callbacks.
apply_filters() function is where these callbacks are applied.
The first thing you might notice is that the function is called 4194 times. If we click on it to view further details, we can then order the "Parent functions" table by "Call Count" descending, to reveal there are 778 calls to
apply_filter() from the
This is interesting as I don't actually use any translations given that I (and I suspect the majority of users) use the WordPress software in its native language: English.
So let's take a further look at what this
translate() function is doing by clicking through its details.
Here we see two interesting things, first we see that of the parents, one is called 773 times,
Looking at the source code for this function, we see that it is an wrapper for
As a rule of thumb, function calls are expensive and should be avoided; given that we are always calling
__() instead of
translate(), we should instead flip the alias, so that
translate() is the alias to preserve backwards compatibility, and
__() doesn't call an unnecessary function.
Realistically though, this change isn't going to make much of a difference and is a micro-optimization — but it does improve code readability and simplify the call-graph.
Moving on from this, let's look at the Child functions:
We are now getting into the meat of this function, we see that there are 3 functions/methods called, 778 times each:
Ordering by inclusive wall time descending, we can see that
apply_filters() is by far the most expensive call.
Looking at the code we see:
What this code is doing is retrieving a translation object, then passing the result of
apply_filters(). We can tell from then list that
$translations is an instance of the
Based on the name alone (
NOOP), and confirmed by a helpful comment in the code, we see that our translator is in fact not doing anything!
So perhaps we can avoid this code entirely!
With a little debugging on the code, we can see that we are using the default domain, so let's change the code to ignore the translator when using this:
Next we profile again, be sure to run it at least twice this ensures that all caches have been built and it's a fair comparison!
This run is now faster! But how much and why?
We can find this out by using the XHGui compare runs feature. Going back to our original run, we click the "Compare this run" button at the top-right and choose our new run from the list.
Doing this we can see that we have reduced the number of function calls by 3%, the Inclusive Wall time by 9%, and the Inclusive CPU time by 12%!
We can then order the details table by Call Count in ascending order, this confirms that (as expected) we see a reduction in calls to
NOOP_Translations::translate(). Just as importantly, we can confirm that no unexpected changes occurred.
A 9-12% performance improvement is great for about 30 minutes of work; this translates to real world performance gains, even after we've applied our opcache.
We can now repeat this procedure for other functions, and keep doing so until we find no more to optimize.
Note: This change was submitted to WordPress.org and has been updated. You can follow the discussion and see the process in practice on the WordPress Bug Tracker. It is scheduled for inclusion in WordPress 4.1.
In addition to the fantastic XHProf/XHGui, there are several other great tools available.
uprofiler is an as-yet unreleased fork of Facebooks XHProf, with the intent of removing the CLA required by Facebook. At present they are identical in terms of feature-set, with just some renaming taking place.
XHProf.io is an alternative UI for XHProf. XHProf.io uses MySQL for profile storage and is not as user friendly as XHGui.
Before XHProf came onto the scene, there was Xdebug — Xdebug is an active profiler, which means that it should never be used in production, but can offer some fantastic insights into your code.
Webgrind does not provide nearly the same feature-set as KCachegrind, but is a PHP web app and easy to install pretty much anywhere.
When paired with KCachegrind you can very easily explore and find performance issues (in fact, this is my preferred view of any profiling tool!)
Profiling, and performance tuning are very complex subjects. With the right tools, and a good understanding of how to use them, we can examine and improve our code tremendously — even on codebases we don't have much experience with.
It is definitely worthwhile taking the time to explore and learn these tools.