Home

Your premium source for custom modification services for phpBB

  logo

HomeForumsBlogMOD ManagerFAQSearchRegisterLogin

Comments August 21, 2008

phpBB3 Caching Strategies for 3.0

Filed under: Performance Tuning, phpBB, phpBB3 — Dave Rathbun @ 6:20 am CommentsComments (2) 

Through some sort of technical glitch most of this post was missing. I’ve updated it with the rest of the content. My apologies.

A few days ago I posted Part III in my ongoing series where I am comparing phpBB3 to phpBB-Dave (my customized phpBB2-based board) from a feature perspective. I hopefully have made it clear that I am not claiming that my board is technically superior, as I am quite sure that it is not. The experimentation I did with the template engines several months back proved that, at the very least.

Part III included a review of the caching process for each board. A vanilla phpBB2 board does not offer any caching (outside of an early form of template caching). My version is caching quite a bit of information, as does phpBB3. I’ve reproduced that specific table here, as it will help clarify the points made later in this post.

One interesting result of Londonvasion for me was that I found out that there are team members that stop by and read my blog (at least occasionally) that don’t leave comments. DavidMJ, one of the current members of the phpBB3 development team, is apparently one of those folks. He caught me on IRC a few nights ago and offered to help explain the caching routines from phpBB3 and I was quick to take him up on his offer. Here is an excerpt from our conversation; I found it extremely enlightening.

[22:03] DavidMJ: drathbun: hey
[22:03] drathbun: greetings
[22:04] DavidMJ: I happened to catch your blog, was wondering if you wanted to know what we mean by “arbitrary” data wrt caching
[22:04] drathbun: at some point, yes, would love to
[22:04] drathbun: I hadn’t had time to investigate yet, but if you’re inclined to share, I’m listening :)
[22:04] DavidMJ: sure
[22:04] DavidMJ: we make a distinction between caching queries and everything else
[22:05] DavidMJ: so if we want to cache a query, all we have to do is make it so that we specify a TTL, the rest of the code is unchanged
[22:06] DavidMJ: everything else falls under “arbitrary”, we provide a nice mechanism for saying “take this array/string/object and cache it for some amount of time, I will remember because I have given it a name”
[22:06] drathbun: like smilies?
[22:06] DavidMJ: yep
[22:06] DavidMJ: so caching a query is totally unnamed while data is completely named
[22:06] drathbun: aha
[22:06] DavidMJ: it allows us to also make sure that we only cache old things wrt queries
[22:07] drathbun: so as a stupid noobish question, what exactly is cached when you say query cache? the sql or the results?
[22:07] DavidMJ: as we only will cache, and recall, something we have seen before
[22:07] DavidMJ: technically, both
[22:07] drathbun: ok
[22:07] DavidMJ: we hash the sql to be able to know that _exact_ query
[22:07] drathbun: so by caching the sql, you avoid the sql build step
[22:07] DavidMJ: we store the entire results very efficiently in 3.2
[22:07] DavidMJ: we store them quite well for 3.0
[22:09] drathbun: I do some what you might consider fairly primitive caching now…
[22:09] DavidMJ: drathbun: what do you do now?
[22:11] drathbun: what I call my “primitive” cache is just a dump to a file of a series of assignment statements
[22:11] drathbun: so things that are static, or nearly so, are included as needed rather than running queries on every page
[22:11] drathbun: I figure I’ve eliminated somewhere on the order of 500,000 queries a day from my server
[22:12] DavidMJ: drathbun: effective, but not as robust as the 3.0 mechanism :)
[22:12] drathbun: I’m sure it’s not :)
[22:12] drathbun: I have a cache-loader that checks the page being processed, and loads the cache related to those pages, also does the same for language files
[22:13] DavidMJ: ah, that is a bit strange
[22:13] drathbun: so I hope that the file I/O I added for the cache is offset by the reduced file I/O for unneeded language files
[22:13] DavidMJ: what we do is we load up caches as needed
[22:13] DavidMJ: we see if we recognize a query is in the cache, if so we load it
[22:13] DavidMJ: this way, identical queries on multiple pages are cached once, loaded once
[22:13] DavidMJ: given the same TTL, etc.
[22:14] DavidMJ: it also totally hides the caching logic
[22:14] DavidMJ: another nice trick is bypassing I/O alltogether
[22:14] drathbun: there are so many customized queries that can be run because of board permissions and so on, I never really applied any thought to caching queries because I figured it would be a lot of work for little benefit
[22:15] DavidMJ: 3.0 really does not need board permissions cached, it is all stored in a bitfield
[22:15] DavidMJ: the bitfield is stored per forum and is always easy to get to, the lookup is quite fast…
[22:16] DavidMJ: we have some issues when people do permission set up without using roles on huge boards
[22:16] drathbun: but in theory, with 20 different people online, couldn’t you have 20 different permission settings?
[22:16] drathbun: for the same forum?
[22:16] DavidMJ: yep
[22:16] DavidMJ: and it is stored with each user
[22:16] drathbun: aha
[22:16] DavidMJ: it is not cached anywhere, there is no need
[22:16] * drathbun sees a lightbulb
[22:16] DavidMJ: we grab the whole row anyway
[22:17] drathbun: right
[22:17] DavidMJ: so permissions are quite efficient
[22:17] drathbun: so you already know the permissions when you get the user data
[22:17] DavidMJ: yep
[22:17] DavidMJ: 3.0 is light years ahead of 2.0 wrt organization :)

So first, thanks to DavidMJ for taking the time to explain the caching routine to me. At some point I will be reading some code, but I have a much better understanding of what the phrases on the chart were intended to mean now. The way user permissions are stored sounds incredibly efficient, for one thing. The idea of being able to cache / share both the sql build output and the query results is also interesting. Clearly what is in 3.0 as far as caching is far, far above what I have implemented.

In case you missed it, “arbitrary data” is relatively static data like smilies. That’s what I thought they meant by database query caching. In that case, I do not do any query caching, only arbitrary data. So that means the new format for the feature comparison table is this:

Caching

Caching
Feature phpBB2 phpBB3 phpBB-Dave
Database Query Caching: No Yes No
Template Caching: No Yes Yes
Arbitrary Data: No Yes Yes
Manual Cache Refreshing: No Yes Yes

The change doesn’t alter the way I scored this category. It just helps me understand more about how the caching routines work, and I am quite happy that DavidMJ offered to educate me.

Oh, and my favorite quote from the conversation? DavidMJ is nothing, if not bold. :lol: Here was a prediction he made about 3.2 during the conversation:

DavidMJ: 3.2 will have robust and reliability guarantees beyond anything I have seen in modern forum software

The thing is, I believe he along with the rest of the developer team can back that statement up and deliver on that prediction. I really do.

2 Comments

  1. Like I’ve said before, steal all the code you can from phpBB 3, because it is the superior system.

    I took the ACM arbitrary caching code, but haven’t done the query caching stuff yet because I don’t need it at the moment. The cache is nice, you do something like

    if(!$smilies = $cache->get('_smilies'))
    {
    run query to get smilies here
    now save cache
    $cache->put('_smilies', $smilies_row, 1440);
    }
    continue processing here....

    Though in 3, smilies caching is implemented as a query cache. Either way works, I suppose.

    The neat thing about how I did permissions, is that it is similar to how phpBB 3 does the thing, where each user row has all his permissions, except for I needed to store permissions for many other parts of the site other than forums, so I created zones, now each user who has some permissions in a zone has his own row with the auths serialized. Nice for caching, I think, though I’m not doing that yet, since most my users are using the basic auth where the only check is against the user level.

    Comment by Dog Cow — August 24, 2008 @ 2:25 pm

  2. what about automatic cache function…

    $data = $cache->fetch($cache_id, $cache_ttl, $callback);

    in fetch() function, it automatically check for $cache_id whether it is still fresh or not, and run the callback function if cache not available

    Comment by john doe — August 31, 2011 @ 10:01 am

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress