I continue to get feedback from my users that – to be concise – the search process sucks. As regular readers of my blog will probably remember, I have done a lot of work to understand and fine-tune the standard phpBB search process. I have moved stop words into the database. I have adjusted the regular expression used to parse and index the words. I have added code to provide cleaner input to the search routine. All of these changes were made to optimize the process as it works today.
But folks are still not happy.
They don’t like the fact that certain words are on the stop words list. My board is related to a specific brand of software used for reporting. It’s not too surprising, then, that the word “report” appears in nearly 30% of the half-million posts on my board. Yet they still feel like they would gain value by having that word in their search for some reason.
They don’t like the fact that short words (which in our case includes version numbers) are not included either.
They don’t like the fact that they can’t search for word combinations (exact phrase search).
So today I started testing out a FULLTEXT index on my posts table. I created the index on both the post text and the title. It took a minute and a half and spiked my CPU to about 33% use. The index is over half the size of the database table. On the other hand, the index is smaller than the index on the search_wordmatch table so that’s something positive.
Over the coming weeks I am going to be experimenting with different search keywords and will be trying to get some metrics as to how well the fulltext index performs. There are three aspects that I am hoping to use to rate the success of this experiment. First, how fast are the results provided. Second, how effective are the results. Third, how easy is it going to be to give the user an interface to use the new index.
Stay tuned for more details.