As I was working through some code last night I found another “in progress” MOD that I wanted to add to the list of MODs in progress that I published yesterday. Over the years I’ve seen cases where someone from the other side of the planet has a dicey Internet connection and they end up submitting the same post twice because their browser submit times out. Or someone might post the same question in more than one forum, thinking that they’ll get more attention. Or a spammer might hit multiple forums with the same post multiple times.
I think I’ve managed to come up with something that definitely helps solve the first two scenarios and as a bonus helps the spammer problem as well. I call this my “Cross Post / Double Post” MOD, and it’s being tested on my beta board now.
The MOD design has so far turned out to be fairly simple. I tie into the flood control process and retrieve the post text for the last three posts by the user. From there I take the current post text and compare it to the prior posts. The first check is a straight equality check, meaning I check for the exact same post text. This will catch the “copy/paste” folks with very little overhead. If the post text is not identical, then next I use a function called
similar_text(). (similar text reference at php.net) This function takes three arguments. The first two are the two strings to compare, and the third is a variable to store the results of the comparison, which is a number from 0 to 100. The result code should essentially be treated as a percentage. If the two posts are 95% similar then I check to see if the original post already in the database is in the same forum as the new post being attempted. If the forums are the same, then a “Double post” exception is triggered. If the forums are different, then a “Cross post” exception is triggered instead.
The number of posts (3) and percentage of similarity (95) are both controlled via the board configuration screen, so it’s quite flexible. Setting the percentage threshold to zero (0) is the same as turning the comparison process off.
This MOD is being tested on my “beta release” board right now. The first version of the MOD did not use the
similar_text() function mentioned above. I attempted to use the
soundex() function instead. However it seemed that the
soundex() function did not look at enough text, so posts that were clearly different were still being reported as being the same. Switching functions solved that issue.
I’m now wondering if I need to deal with setting different threshold values for different forums. I hate to do that, as it drastically increases the complexity of the code. But for example there are many forum “games” that people play in an “off topic” type of forum. Some of those games look very repetitive, and would potentially trigger the CP/DP exception handling. Then again, the current logic looks across all forums, so as long as the person is active in more areas than just the off-topic games area it might be okay. I don’t want this feature to get in the way of normal use, but I do want to help out the moderator team by capturing / rejecting double post and cross post events.
Stay tuned for details as we start user testing this week.