While I have not started in-depth MODding on phpBB3 yet, I do read the phpBB3 MODders forum from time to time just to start to get the flavor of how things have changed. The other day a database (query) question came up and I suggested an answer that I originally thought was only slightly different from what had already been proposed. However, after being asked which of the two solutions would be the least CPU intensive I did a bit more investigating.
I discovered that one solution was clearly better than the other, but only if the proper index was created.
Disclaimer: I tested on phpBB2. The index that I created does not exist in a standard phpBB2, nor does it exist in a standard phpBB3 install, so I suspect this post applies to both. More…
A long time ago I wrote a MOD for my phpBB2 board that puts a nice red banner at the top of the posting screen if you are getting ready to “bump” your post. Bumping is defined as posting two (or more) times in a row without someone else replying in between, and without waiting 24 hours first. Does it work? From a functional standpoint it certainly does. From a procedural standpoint, not so much. More…
One of the frequent questions that comes up on phpbb.com is whether phpbb offers SEO or Search Engine Optimization features. I rarely (if ever) get involved in these discussions because they often degenerate into a “great taste / less filling” sort of argument. (Please see the “related links” section at the bottom of this blog post for an explanation of the reference if you don’t understand it.) Earlier today I read a reply by user “Eelke” that provides a nice viewpoint on the subject:
If you are in a subject area where there is heavy competition from other boards, you may want to try every trick in the book. If you’re not, than carefully consider whether the stuff you are applying really is worth the extra hazzle[sic]
After just cleaning up yet another gmail spammer (I so love the Spammer Hammer™ MOD, is one of my favorites ) tonight I found myself wondering: Is it worth setting up an extra activation step for gmail.com accounts? More…
I continue to get feedback from my users that – to be concise – the search process sucks. As regular readers of my blog will probably remember, I have done a lot of work to understand and fine-tune the standard phpBB search process. I have moved stop words into the database. I have adjusted the regular expression used to parse and index the words. I have added code to provide cleaner input to the search routine. All of these changes were made to optimize the process as it works today.
But folks are still not happy.
They don’t like the fact that certain words are on the stop words list. My board is related to a specific brand of software used for reporting. It’s not too surprising, then, that the word “report” appears in nearly 30% of the half-million posts on my board. Yet they still feel like they would gain value by having that word in their search for some reason.
They don’t like the fact that short words (which in our case includes version numbers) are not included either.
They don’t like the fact that they can’t search for word combinations (exact phrase search).
So today I started testing out a FULLTEXT index on my posts table. I created the index on both the post text and the title. It took a minute and a half and spiked my CPU to about 33% use. The index is over half the size of the database table. On the other hand, the index is smaller than the index on the search_wordmatch table so that’s something positive.
Over the coming weeks I am going to be experimenting with different search keywords and will be trying to get some metrics as to how well the fulltext index performs. There are three aspects that I am hoping to use to rate the success of this experiment. First, how fast are the results provided. Second, how effective are the results. Third, how easy is it going to be to give the user an interface to use the new index.
Stay tuned for more details.
It has been a while since I visited my honeypot board. I decided to have a look today…
Our users have posted a total of 385789 articles
We have 43968 registered users
And when I logged in, I had 33 unread PMs as well.
Bots have been busy. I intend to go back and see what additional patterns I can get from the data. In light of one of my recent posts about gmail being the most abused email domain, here are some stats that speak for themselves. These are the top ten email domains in use on my honey pot board:
| email_domain | users |
| gmail.com | 11323 |
| mail.ru | 6034 |
| meltmail.com | 1179 |
| gawab.com | 859 |
| getciallis.info | 855 |
| spambox.us | 479 |
| serpdomains.com | 449 |
| atlantaclubs.cn | 282 |
| coolgwen.cn | 274 |
| coolsanta.cn | 255 |
One of the things I did before going “live” with my first board was add code to increment a page-view counter. Initially the code was in includes/page_tail.php but later on it was added to the banner code. I have been actively tracking daily page view activity over the past year or so. I have a nightly script that emails a report of the daily activity to my blackberry every night at midnight. But I had not looked at the cumulative total for quite a while.
Tonight I looked.
It’s over 100M page views for the life of the board.
The board launched in August of 2002. My page counter officially started on August 12th. On that day we had 2,123 views. On August 15th, which was the official launch announcement, we jumped all the way up to 8,757 page views for the day. On August 29th which was the day the mailing list was sent a note about the list being retired we hit 13,155 views, and then 19,441 for the following day. We didn’t come close to twenty thousand daily page views again until the following July, almost a year after the board launched.
Now we are averaging over 100,000 page views daily and have a cumulative total of 107,710,630 as I type this. That’s a lot of database queries!
A while ago I posted about a software product that lets you run backups and store them on Amazon.com’s S3 data center service. It was an interesting idea, but mostly it got me thinking about how to determine an optimal backup strategy for other board owners. I do my backups every night. I guess I should actually say I never do backups; I have a script do them for me instead. That’s one aspect to consider when setting up a backup strategy for your board.
For this post I would like to cover what are probably some fairly obvious concepts for experienced board owners. The first question that needs to be asked is: What do I need to include in my backup strategy?