Home

Your premium source for custom modification services for phpBB

  logo

HomeForumsBlogMOD ManagerFAQSearchRegisterLogin

Comments March 9, 2008

Search Tweak: Page 2 Not Found

Filed under: Search, phpBB — Dave Rathbun @ 3:11 am CommentsComments (3) 

I have had frequent complaints from folks on my biggest board regarding the search results. It seems that for many users page two (and beyond) was not available. Frankly, I was stumped. I worked with some users to verify that they have cookie settings that work and to ensure that their search session was not timing out. I checked (and tweaked) the code that clears out old search sessions to make sure their search results were not being truncated too early. The main thing was, I was never able to reproduce the error. That makes it very frustrating when trying to debug and fix an issue.

It turns out that the root cause had nothing to do with a cookie or session problem. The cause was that these users were running searches that returned a huge number of posts or topics. I figured that out by looking at the MySQL query logs and noted something specific to this issue.

Why should the number of resulting posts matter if I am only trying to go to page 2? Shouldn’t it be an issue only if I am trying to view page 800 instead?

First, nobody is going to read 800+ pages of search results. It’s not going to happen. :) But let’s suppose that someone was going to read through all of those pages. How is that information managed from one session to the next? Meaning, just how does someone get to page 2 without passing the entire list of topics on the URL?

There is a table named phpbb_search_results that is used to contain the results of the search. Makes sense, right? :-) The search results table has a very simple structure. It contains the session_id, the search_id, and the search_array column which is text. The session_id is used to clear out old search results after a session has expired. The search_id is either a text string like “unanswered” or “egosearch” (my personal favorite) or a numeric key like the following URL:

search.php?search_id=1605853721&start=25

The search_id in that prior URL is used to look up the results from the user’s search, skip the first 25 topics (or posts), and start from there. That would be page 2. And that would fail in many cases.

Why Page Two Fails

The reason page 2 fails is right there in the table definition, but it helps to know a bit more about how the search_array field is used. Simply put, there are tons of options that a user may select on a search. Those include sort options, topics or posts, search topic title or message body, search for a forum, and so on. There are way too many options to pass on a URL. So as the search is performed, these values are all serialized and stored in the text field. Here is the code from 2.0.22 search.php:

for($i = 0; $i < count($store_vars); $i++)
{
	$store_search_data[$store_vars[$i]] = $$store_vars[$i];
}

$result_array = serialize($store_search_data);

If you want to see some examples, use phpMyAdmin (or something similar) and you'll see what goes into that column in the table.

The search_array column format is text. A text column in MySQL stores 64K, or 65,536 characters of data. That column does not just store all of the search options selected. It also includes the topic or post ID values that matched the search. So here is the problem: It is entirely possible to return more search values than will fit in the text field!

Simple Example of a Broken Search

Suppose that I do a search for posts. Suppose that on the board being used that all of the post_id values are all 5 digits in length. The post_id values are separated by ", " in the search_array text field. That means every post_id takes 7 characters of space. (That would be five characters for the post_id, one for the comma, and one for the trailing space for a total of 7.) Now suppose that a search for a common sequence of words returns 10,000 posts or perhaps even more. If I take 10,000 * 7, well, that's 70,000 characters of data. It will not fit into a text field.

First Fix

I have two fixes to suggest for this if you are seeing this behavior on your board. First, remove the space. There is nothing different between this:

AND p.post_id in (1,2,3,4)

and this:

AND p.post_id in (1, 2, 3, 4)

The extra space is cosmetic. It does not change the results. But does it help to remove it? Take my earlier example where I had 10,000 posts and 7 characters for each. By removing the space, I will reduce the amount of data that I am trying to store by 10,000 characters, dropping me from 70,000 (too many) down to 60,000. That just might fit. If you look in search.php you will find a number of places where this code exists:

WHERE post_id IN (" . implode(", ", $search_id_chunks[$i]) . ")

Simply change the implode "glue" from ", " to "," and you're done. You'll have to do this in a number of places to get them all. Frankly I would do this even if you are not having problems with search results being too big, as there is less data to push around.

Second Fix

Even after you do this you might still get some searches that are too big. It turns out that the phpBB2 developers were aware of this issue and there is code in place to take care of it. In phpBB2 2.0.22 it exists but is commented out so that it doesn't get executed. Here is the code I am talking about:

//
// Limit the character length (and with this the results displayed at all following pages) to prevent
// truncated result arrays. Normally, search results above 12000 are affected.
// - to include or not to include
/*
$max_result_length = 60000;
if (strlen($search_results) > $max_result_length)
{
	$search_results = substr($search_results, 0, $max_result_length);
	$search_results = substr($search_results, 0, strrpos($search_results, ','));
	$total_match_count = count(explode(', ', $search_results));
}
*/

In this case they are going to limit the result set to 60,000 total characters. If the search results are too long, the first line of code reduces the string to the proper size. The second line makes sure the end of the string is a valid post (or topic) ID and that something didn't get chopped off in the middle. And the last line resets the $total_match_count so the paging will be correct.

Conclusion

Should you do this? If your board has fewer than 70,000 posts I would suggest that it doesn't matter. Even if you're over 100,000 posts you are probably okay as long as you are keeping up with your stopwords file (or table, if you have implemented the ideas I posted about moving those values into the database in earlier search posts.) In my case just removing the extra space from the glue in the implode() function was enough for now, but with my board just about to exceed 400,000 posts I feel like the more bullet-proof solution provided by the phpBB2 developers will be required.

Earlier in this post I mentioned that I finally figured this out by reviewing my MySQL query logs. One of the queries that I saw was an insert into the search results table, and the list of post_id values was so large that the query was truncated before the array of posts could be stored. Once I saw that (and by the way, that is also where I saw how the post_id values were stored and decided to remove the extra spaces) the fix was easy to recognize and implement.

So that's how you fix not being able to view Page 2 of your 800 pages of search results. :-)

3 Comments

  1. Third fix

    Change in phpbb_search_results field search_array to longtext

    Comment by Milos — March 11, 2008 @ 6:31 pm

  2. Hi, Milos, and welcome to the Doctor Blog. :)

    Your fix certainly would work, and I did consider that when I first realized what the problem was. However, I decided that extending the field size would not really be appropriate. In my opinion, if a user has built a search that exceeds the 65K limit of a text field, they’re never going to get all the way to the end of their list of results anyway. Making the field bigger just means they can get more search matches that they won’t read. It also puts a heavier load on the server during the search process as it has to manage a larger array of posts (or topics).

    Comment by Dave Rathbun — March 12, 2008 @ 7:50 am

  3. I am using phpbb2.0.23. The first fix u say – I see a diff code now:


    AND f.forum_id = t.forum_id
    AND p.post_id = t.topic_first_post_id
    AND p2.post_id = t.topic_last_post_id
    AND u2.user_id = p2.poster_id";

    So not sure what to do with this.

    The implode glue seems to be fixed in my version though.

    The second fix – Its commented out in the code page. And my forum has about 1000+ posts – so not sure what to do with that.

    Pls advice.

    Jack.

    Comment by Jack — April 16, 2008 @ 2:27 am

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress