Through some sort of technical glitch most of this post was missing. I’ve updated it with the rest of the content. My apologies.
A few days ago I posted Part III in my ongoing series where I am comparing phpBB3 to phpBB-Dave (my customized phpBB2-based board) from a feature perspective. I hopefully have made it clear that I am not claiming that my board is technically superior, as I am quite sure that it is not. The experimentation I did with the template engines several months back proved that, at the very least.
Part III included a review of the caching process for each board. A vanilla phpBB2 board does not offer any caching (outside of an early form of template caching). My version is caching quite a bit of information, as does phpBB3. I’ve reproduced that specific table here, as it will help clarify the points made later in this post.
One interesting result of Londonvasion for me was that I found out that there are team members that stop by and read my blog (at least occasionally) that don’t leave comments. DavidMJ, one of the current members of the phpBB3 development team, is apparently one of those folks. He caught me on IRC a few nights ago and offered to help explain the caching routines from phpBB3 and I was quick to take him up on his offer. Here is an excerpt from our conversation; I found it extremely enlightening.
[22:03] DavidMJ: drathbun: hey
[22:03] drathbun: greetings
[22:04] DavidMJ: I happened to catch your blog, was wondering if you wanted to know what we mean by “arbitrary” data wrt caching
[22:04] drathbun: at some point, yes, would love to
[22:04] drathbun: I hadn’t had time to investigate yet, but if you’re inclined to share, I’m listening
[22:04] DavidMJ: sure
[22:04] DavidMJ: we make a distinction between caching queries and everything else
[22:05] DavidMJ: so if we want to cache a query, all we have to do is make it so that we specify a TTL, the rest of the code is unchanged
[22:06] DavidMJ: everything else falls under “arbitrary”, we provide a nice mechanism for saying “take this array/string/object and cache it for some amount of time, I will remember because I have given it a name”
[22:06] drathbun: like smilies?
[22:06] DavidMJ: yep
[22:06] DavidMJ: so caching a query is totally unnamed while data is completely named
[22:06] drathbun: aha
[22:06] DavidMJ: it allows us to also make sure that we only cache old things wrt queries
[22:07] drathbun: so as a stupid noobish question, what exactly is cached when you say query cache? the sql or the results?
[22:07] DavidMJ: as we only will cache, and recall, something we have seen before
[22:07] DavidMJ: technically, both
[22:07] drathbun: ok
[22:07] DavidMJ: we hash the sql to be able to know that _exact_ query
[22:07] drathbun: so by caching the sql, you avoid the sql build step
[22:07] DavidMJ: we store the entire results very efficiently in 3.2
[22:07] DavidMJ: we store them quite well for 3.0
[22:09] drathbun: I do some what you might consider fairly primitive caching now…
[22:09] DavidMJ: drathbun: what do you do now?
[22:11] drathbun: what I call my “primitive” cache is just a dump to a file of a series of assignment statements
[22:11] drathbun: so things that are static, or nearly so, are included as needed rather than running queries on every page
[22:11] drathbun: I figure I’ve eliminated somewhere on the order of 500,000 queries a day from my server
[22:12] DavidMJ: drathbun: effective, but not as robust as the 3.0 mechanism
[22:12] drathbun: I’m sure it’s not
[22:12] drathbun: I have a cache-loader that checks the page being processed, and loads the cache related to those pages, also does the same for language files
[22:13] DavidMJ: ah, that is a bit strange
[22:13] drathbun: so I hope that the file I/O I added for the cache is offset by the reduced file I/O for unneeded language files
[22:13] DavidMJ: what we do is we load up caches as needed
[22:13] DavidMJ: we see if we recognize a query is in the cache, if so we load it
[22:13] DavidMJ: this way, identical queries on multiple pages are cached once, loaded once
[22:13] DavidMJ: given the same TTL, etc.
[22:14] DavidMJ: it also totally hides the caching logic
[22:14] DavidMJ: another nice trick is bypassing I/O alltogether
[22:14] drathbun: there are so many customized queries that can be run because of board permissions and so on, I never really applied any thought to caching queries because I figured it would be a lot of work for little benefit
[22:15] DavidMJ: 3.0 really does not need board permissions cached, it is all stored in a bitfield
[22:15] DavidMJ: the bitfield is stored per forum and is always easy to get to, the lookup is quite fast…
[22:16] DavidMJ: we have some issues when people do permission set up without using roles on huge boards
[22:16] drathbun: but in theory, with 20 different people online, couldn’t you have 20 different permission settings?
[22:16] drathbun: for the same forum?
[22:16] DavidMJ: yep
[22:16] DavidMJ: and it is stored with each user
[22:16] drathbun: aha
[22:16] DavidMJ: it is not cached anywhere, there is no need
[22:16] * drathbun sees a lightbulb
[22:16] DavidMJ: we grab the whole row anyway
[22:17] drathbun: right
[22:17] DavidMJ: so permissions are quite efficient
[22:17] drathbun: so you already know the permissions when you get the user data
[22:17] DavidMJ: yep
[22:17] DavidMJ: 3.0 is light years ahead of 2.0 wrt organization
So first, thanks to DavidMJ for taking the time to explain the caching routine to me. At some point I will be reading some code, but I have a much better understanding of what the phrases on the chart were intended to mean now. The way user permissions are stored sounds incredibly efficient, for one thing. The idea of being able to cache / share both the sql build output and the query results is also interesting. Clearly what is in 3.0 as far as caching is far, far above what I have implemented.
In case you missed it, “arbitrary data” is relatively static data like smilies. That’s what I thought they meant by database query caching. In that case, I do not do any query caching, only arbitrary data. So that means the new format for the feature comparison table is this:
|Database Query Caching:||No||Yes||No|
|Manual Cache Refreshing:||No||Yes||Yes|
The change doesn’t alter the way I scored this category. It just helps me understand more about how the caching routines work, and I am quite happy that DavidMJ offered to educate me.
Oh, and my favorite quote from the conversation? DavidMJ is nothing, if not bold. Here was a prediction he made about 3.2 during the conversation:
DavidMJ: 3.2 will have robust and reliability guarantees beyond anything I have seen in modern forum software
The thing is, I believe he along with the rest of the developer team can back that statement up and deliver on that prediction. I really do.