This is part IV of a series of posts about the phpBB2 search process. Previous posts include:
- Part I: Table Review
- Part II: Making Effective Use of “Stop Words”
- Part III: Efficient clean_words() Function
You don’t have to read all of the prior parts in order to read this one. The last post was quite long, and so part of what I wanted to cover there was postponed until this post. In this post I’m going to analyze what one particular regex (regular expression) from the clean_words() function is doing. In very early versions of phpBB2 it worked very well at keeping short and long words out of your search index tables. In later versions it did not work so well. In this post I will explain why, and provide an extremely easy fix.