A Check-Up From the Doctor
I will not install code for clients that was written by someone else without reviewing it first. There is a lot of bad code out there. There is also some very good code, don’t get me wrong about that. But many excellent MOD authors just don’t know databases as well as they would like. So the queries that they write may be functional, but not optimal. At this point in my involvement with phpBB I write most of my own MODs, so most of the time this review is for a private client who wants me to install something that I don’t already have or have no interest in writing.
At one point after I essentially rewrote a MOD for one of my clients they suggested that I offer this service via the Doctor Board. Hmm. Well. It’s an interesting idea. It wasn’t in my plans for this board when I started (and still isn’t) and I would much rather get my desired services completed and released first, but yeah, I could do that.
Probably in about 2012.
The reality is I won’t offer this service, for a variety of reasons. It’s a lot of work, and the level of effort goes up substantially on complex MODs. So instead, here are a few tips that I will offer. If you are a MOD author these are the things that send up red flags to me when I read code.
Queries inside a loop
There are good reasons to put a query inside a loop. There are many more bad reasons to put a query inside a loop, especially authorization checks. Each auth check in phpbb makes two queries. I recently discovered that one of the first custom pages I created for myself was doing this. When I fixed it the number of queries per page dropped from well over 100 to 10. That’s 90+ queries that got eliminated. My server is running 30+ queries a second, so dropping 90 queries from a single page view is a substantial improvement. My current project (one of about a dozen #1 priority projects) is tuning my biggest board by going over every page specifically looking for queries that can be moved out of a loop, combined with another query, or eliminated altogether.
Some examples of MODs that I have seen that use extra queries inside a loop include some of the fancy “color group” MODs, and MODs that display a tool-tip (title attribute) that contains the text of the post. In both of these cases the MODs I rewrote the MOD that I was asked to use and they became much more efficient as a result.
Queries inside a loop can be ugly.
More than one query hitting the same tables
When I installed the Attachment MOD I noticed that the attachment icon was not visible on the search results. In order to know whether to show the icon or not, I had to do two things. First I had to know if the topic had an attachment or not, so I included the topic_attachment column to one of the standard queries. Then I had to know if the user viewing the search results was authorized to view downloads or not, so I created a new query to get the forum information. This was only one query, and it was not inside a loop.
During my review (mentioned above) I noticed that I was already hitting the phpbb_forums table in an earlier query in order to get the forum name for the search results output. So I dropped this extra query and simply added the auth_download column to the existing query. Now my search page uses one less query.
I think MOD authors end up with this situation because they don’t want to alter the existing query so they write their own instead. And in some cases having two smaller queries can be much more efficient than one really big one. But in general if you are hitting the table already, adding one more field to the select list does not have an impact on performance. Adding a second query will.
Don’t recalculate something already stored
This example comes from something that I did for a client. This was the “Top Posters on Index” MOD, and what it was supposed to do was display the top “N” members of the board ranked by post count. You may know that the user_posts field does not get updated every time a post gets deleted. In older versions of phpBB2 you could delete a post and not affect your post count. That got fixed. Pruning still does not impact the post count. What the original MOD author elected to do was recalculate the user post count every time someone viewed the index.
This is all well and good until your index starts getting hit multiple times a second. This particular client was complaining that her index took several seconds to display, and that her host was warning her about server loads. (By the way, her board is listed on big-boards.com. That means she has at least 500,000 posts and 50,000 users. This is not a small board I’m talking about.) The culprit was the Top Posters on Index MOD.
What I did was rewrite the code so that it used the user_posts field instead of recalculating the user post count every time. That should be good enough for most folks, and as a bonus, the top 10 users on the index will now match the top 10 users on the memberlist. The next thing that I did was create an index on the user_posts field so that when I do a query / order by user_posts the results come back nearly instantly. The index.php query time went from 10+ seconds to nearly instant. Oh, and the person I did the work for? I didn’t hear from her for two years after that, so I assume that the work is still holding up under an ever increasing load.
The moral of this example is that correctness (counting posts) is not necessarily as important as speed. If you can get an answer that is “good enough” and get it really really fast, that’s fine. In this case I would argue that using the user_posts field is the correct answer anyway.
Using Outer or Left Joins
Sometimes you need these. Most of the time you don’t. Outer joins are a subject for a Database Tip article that I started writing a long time ago and have not finished yet. (It will be done when… you know the rest.) Without recreating that entire article, here is the important point: Outer joins generally void index use.
Indexes exist to make your queries efficient. Earlier I mentioned creating an index on user_posts in order to speed up the Top Members on Index MOD. Without the index the code optimizations still helped. With the index the extra query added essentially zero load to the board. If you give up the index, you give up a lot of efficiency.
Avoid outer joins.
An example of an outer join is the “You posted here” MOD. There was a MOD author writing this one at phpbb.com, and he posted his SQL code. I suggested that instead of one big query he break it into two. This would allow him to avoid the outer join. He balked, at first, quoting the standard response:
Fewer queries are always better.
If you’ve been paying attention, that was what I suggested earlier in this very topic! But the point here is that you don’t always participate in every topic. Getting the “you posted here” indicator to work means you need to check and set the mark for those topics that you did post in, and leave the rest out.
This is best done with a second query.
But not a query inside the loop, no, never that.
What I do for my version of this MOD is this. First I run the general viewforum query which gets all of the topic data. I collect the topic_id values that will be shown on the page into an array. Then I run a second query that returns a count of posts that I have made per topic in the topics to be displayed on the page. I set up an array using the topic_id as the key, and the value TRUE if I have posted at least once. Now, during the viewforum output, I simply check the array for the existance of a key that matches the topic, and if I find it, I display the proper indicator.
On the search page I cheat. If you have selected one of the “canned” searches like “egosearch” then the indicator is simply turned on for every topic and I skip the extra query.
If it seemed that all of these optimization tips were related to queries, well, you are very observant. There is a reason for that. I work with databases in real life too, so that’s where I feel my expertise is. And in my experience there are a lot more “bad” MODs that can be improved by tweaking either the position or the structure of the query than by altering the actual php code.
If you have a MOD that you are thinking about, but you are wondering if it’s efficient or not, try checking it out with some of these tips in mind. If it passes, then odds are decent that it won’t kill your server. If it’s really important to you, check back in about
2016 2017 when I’ll have this service live here on the Doctor Board.