<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Welcome to the phpBB Doctor Blog &#187; MOD Writing</title>
	<atom:link href="http://www.phpbbdoctor.com/blog/category/phpbb/mod-writing/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.phpbbdoctor.com/blog</link>
	<description>Your premium source for custom modification services for phpBB</description>
	<lastBuildDate>Fri, 30 Apr 2010 02:58:53 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Practical Jokes&#8230;</title>
		<link>http://www.phpbbdoctor.com/blog/2010/02/12/practical-jokes/</link>
		<comments>http://www.phpbbdoctor.com/blog/2010/02/12/practical-jokes/#comments</comments>
		<pubDate>Fri, 12 Feb 2010 16:15:56 +0000</pubDate>
		<dc:creator>Dave Rathbun</dc:creator>
				<category><![CDATA[Board Management]]></category>
		<category><![CDATA[MOD Writing]]></category>

		<guid isPermaLink="false">http://www.phpbbdoctor.com/blog/?p=347</guid>
		<description><![CDATA[We have a topic on my board with the title &#8220;Please do not post in this topic&#8221;. Needless to say, this topic has survived for nearly three years, even in the &#8220;off topic&#8221; area where topics are pruned after 14 days of no activity.   So lately I have been trying to have some [...]]]></description>
			<content:encoded><![CDATA[<p>We have a topic on my board with the title &#8220;Please do not post in this topic&#8221;. Needless to say, this topic has survived for nearly three years, even in the &#8220;off topic&#8221; area where topics are pruned after 14 days of no activity. <img src='http://www.phpbbdoctor.com/blog/wp-includes/images/smilies/icon_lol.gif' alt=':lol:' class='wp-smiley' />  So lately I have been trying to have some fun with it.</p>
<p>First I added some javascript to the page (but only for that topic) that made the Reply and Quote buttons move away from the mouse. That made it impossible to click on the button, but you could still tab to the buttons and invoke the required code. Yesterday I switched the normal images for the buttons with the spacer.gif and sized it to zero by zero pixels, essentially making the button invisible. I also altered the tab index to -1 which according to a few sites I read makes the button disappear from the tab sequence.</p>
<p>Of course there are still several ways for folks to post in the topic. That&#8217;s sort of the point, to see how long it takes folks to figure out how to work around the challenges I have put in place. For example in the first version someone could disable javascript and the buttons would no longer move, giving them another way to click the button rather than using the tab key.</p>
<p>To continue the fun, I am looking for suggestions for other ways to challenge folks, and keep them from posting in that one topic. The key is there has to be some sort of loophole, as I&#8217;m not trying to completely lock folks out.</p>
<p>Any ideas?</p>
]]></content:encoded>
			<wfw:commentRss>http://www.phpbbdoctor.com/blog/2010/02/12/practical-jokes/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Post Already Reported? Then Tell Me!</title>
		<link>http://www.phpbbdoctor.com/blog/2009/11/10/post-already-reported-then-tell-me/</link>
		<comments>http://www.phpbbdoctor.com/blog/2009/11/10/post-already-reported-then-tell-me/#comments</comments>
		<pubDate>Wed, 11 Nov 2009 02:17:22 +0000</pubDate>
		<dc:creator>Dave Rathbun</dc:creator>
				<category><![CDATA[Board Management]]></category>
		<category><![CDATA[MOD Writing]]></category>
		<category><![CDATA[phpBB]]></category>

		<guid isPermaLink="false">http://www.phpbbdoctor.com/blog/?p=340</guid>
		<description><![CDATA[phpBB3 includes a &#8220;report a post&#8221; feature that was often requested in phpBB2 and available as a variety of MODs. I wrote my own that integrates with other MODs that I have implemented. But one of the things that I did different (and that I prefer) is that I provide a visual indication when a [...]]]></description>
			<content:encoded><![CDATA[<p>phpBB3 includes a &#8220;report a post&#8221; feature that was often requested in phpBB2 and available as a variety of MODs. I wrote my own that integrates with other MODs that I have implemented. But one of the things that I did different (and that I prefer) is that I provide a visual indication when a post has been reported.</p>
<p>Just a few minutes ago I was on phpbb.com and saw a post in the General Discussion with the title &#8220;Is this new home page nice?&#8221; Anyone that has been around phpbb.com for a while knows that this sort of post &#8211; even in GD &#8211; is against the rules. I figured that someone might have reported it already, but there&#8217;s no indication that such an action was taken. I decided to go ahead and report the post.</p>
<p>When I clicked the proper icon, here&#8217;s the message I got:</p>
<blockquote><p>This post has already been reported.</p></blockquote>
<p>Well. If that&#8217;s the case, why not tell me? <img src='http://www.phpbbdoctor.com/blog/wp-includes/images/smilies/icon_confused.gif' alt=':-?' class='wp-smiley' />  <span id="more-340"></span></p>
<h3>Reporting Posts With Feedback</h3>
<p>I prefer my method. When a post is reported, a red &#8220;alert box&#8221; is added to that specific post, detailing when and why it was reported. It does not include who did the reporting, but that information is captured as well. Here&#8217;s what that box might look like:</p>
<p><img src="/blog/tips/post_report/reported_post.jpg" /></p>
<p>This box serves a couple of purposes. First, it keeps a second (and third and fourth) person from attempting to report the post when it&#8217;s already in the queue for a moderator to review.  Second, it serves as an immediate feedback to the user who posted in the wrong forum (as in the case above) and helps them learn the board rules and procedures faster.</p>
<p>Once the post has been acted upon, the alert box changes to show the updated status. </p>
<p><img src="/blog/tips/post_report/handled_post.jpg" /></p>
<p>Suppose a person reported a post for being spam, or as in the example above for being in the wrong forum. The moderator may disagree with the assessment and decide to leave the post in the original forum. If there was no indication that this process had taken place, the post might very well be reported again. And again. </p>
<p>Both of the alert boxes pictured above are only shown to logged in users. Since guests can&#8217;t report posts, there&#8217;s no need for them to see this information. But any logged-in user with permissions to report a post will be told <strong>before</strong> they attempt to send the report whether it&#8217;s necessary.</p>
<h3>Icon Explanation</h3>
<p>The first image shown above has three icons, so I thought I would explain them briefly. The yellow flag icon allows the moderator to flag the post as &#8220;in process&#8221; or &#8220;being reviewed&#8221; for now. That means a moderator has picked up this post off of the queue and is working through the process but hasn&#8217;t decided what action to take yet. The green check icon allows the moderator to close the post and enter notes about the action(s) taken. Finally, the red X icon allows the moderator to reject the report and explain why.</p>
<p>If a user has accumulated a certain number of rejected reports, then their permission to report additional posts is revoked. This is to prevent someone from running around and reporting every single post they see just to be a nuisance. Over time I might also review the accept / reject ratio on post reports to determine if I want to extend an invitation to a particular user to join the moderator team.</p>
<h3>Conclusion</h3>
<p>It&#8217;s all about communication. The phpBB3 post report process is largely hidden. Mine is more visible. Is this a better design? I think so, but I would welcome any input, either in support of or contrary to my opinion.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.phpbbdoctor.com/blog/2009/11/10/post-already-reported-then-tell-me/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Storing Post Revisions / Post Locking</title>
		<link>http://www.phpbbdoctor.com/blog/2009/11/02/storing-post-revisions-post-locking/</link>
		<comments>http://www.phpbbdoctor.com/blog/2009/11/02/storing-post-revisions-post-locking/#comments</comments>
		<pubDate>Mon, 02 Nov 2009 06:53:06 +0000</pubDate>
		<dc:creator>Dave Rathbun</dc:creator>
				<category><![CDATA[Board Management]]></category>
		<category><![CDATA[MOD Writing]]></category>

		<guid isPermaLink="false">http://www.phpbbdoctor.com/blog/?p=338</guid>
		<description><![CDATA[I&#8217;ve seen this on other boards but only recently have I started seeing it on my own: people that edit the first post (or potentially even all of their posts) of a topic and remove all of the content. They might leave behind something like &#8220;&#8230;&#8221; because as we all know you can&#8217;t have a [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve seen this on other boards but only recently have I started seeing it on my own: people that edit the first post (or potentially even all of their posts) of a topic and remove all of the content. They might leave behind something like &#8220;&#8230;&#8221; because as we all know you can&#8217;t have a truly empty post. The net result is the topic is then worthless because nobody knows what we&#8217;re talking about.</p>
<p>With phpBB3 the moderator team can lock a post to prevent further editing. But once the original content is gone it doesn&#8217;t help. So tonight I started thinking about how and where to store post revisions in order to recover from this sort of action. <span id="more-338"></span></p>
<h3>Defining the Problem</h3>
<p>Here&#8217;s what I want to be able to do:</p>
<ol>
<li>Capture the prior text of any edited post and store is somewhere</li>
<li>Track who last edited a post</li>
<li>Give moderators the ability to review the post edit history and revert back to an older version</li>
<li>Provide the ability to lock a post so further editing (except by moderators) is no longer possible</li>
</ol>
<p>This MOD is still a work in progress so I don&#8217;t have final code to share yet. But there are a few interesting wrinkles that I thought about during the design process that I thought I would share.</p>
<h3>Locking Posts</h3>
<p>Locking posts is one of the easier parts. I added a status field to the phpbb_posts table. For topics there is a topic_status field that contains several different status values. I don&#8217;t need anything that complex so I called my new field post_locked and made it a tinyint(1) unsigned with a default of zero. That means that every new post that gets inserted is going to default to unlocked. Of course I added the new field to the insert statement rather than rely on the database to provide the default for me.</p>
<p>Once the field is present in the table I have to check it. At the beginning of viewtopic.php there are a number of authorization checks that determine which buttons / icons are displayed on the post. If the post is locked I do not display the edit button; that&#8217;s simple enough. However, what if someone creates their own URL rather than clicking the button image? In that case I have code at the top of posting.php to see if the requested function is &#8220;editpost&#8221; and then check to see if the post_locked field is set to 1 instead of 0, and if so I reject the edit attempt.</p>
<p>At this point I have not decided just how moderators will lock / unlock posts. One easy way would be to add a lock / unlock button on each post that moderators can see and use. However, I currently envision the locking process only being used when a board member has abused their edit permissions. That means it would be more efficient to provide my moderators a way to lock a post as they are reverting the post to a prior version. </p>
<h3>Who Edited The Post?</h3>
<p>With a default phpBB2 installation we don&#8217;t store who edited a post. In fact no edit history is kept at all in many cases. For example if a moderator edits a post belonging to someone else, it does not trigger the edit history. If a normal user edits a post that is the last post of a topic, that does not trigger the edit history either. The post edit history is only stored when a user edits their own post after at least one reply has been made. Since we only store the fact that a user edits their own post, there is no user_id stored. The code stores the last edit date/time as well as incrementing the overall edit count. That&#8217;s it.</p>
<p>To address this I added a new field called last_edit_user to the phpbb_posts table. I altered the edit history so that it runs <strong>every time a post is edited</strong> instead of only when a user edits their own post. I already have a post notes feature that records who edited the post and when. But this additional step will store the last edit user (and only the last edit user) on the post itself which means I don&#8217;t have to join out to my post notes table. I made a slight adjustment to the viewtopic.php code so that the &#8220;Last edited by&#8230;&#8221; message now includes the true user name for all edits.</p>
<h3>What Was Changed?</h3>
<p>This was the fun part: how do I store revisions of the post? After thinking through this and considering a number of esoteric ways to store differences in the post text I threw up my hands and decided simply to copy the entire text of the post to a new row. Why? I&#8217;m using a fraction of my disk space. I can also take advantage of the fact that in phpBB2 the post table and the post text table are separate. So here&#8217;s what I did to do that.</p>
<p>First I added a new field to the phpbb_posts_text table called post_version that is manditory (not null). It stores an unsigned integer value. The current version of the post text is <strong>always version zero</strong> in this table. Second, I went through the base phpBB2 code and added the following to every SQL statement that joins to the phpbb_posts_text table:</p>
<p><code>AND pt.post_version = 0</code></p>
<p>There were about 20 files that needed to have this change, but in the grand scheme of things it was a low-impact update. Once I added the column to my table and updated the code everything ran perfectly. The next step is to figure out what to store in this new field.</p>
<h3>Tracking Versions</h3>
<p>For most of what goes on during the post processing on a phpBB board there isn&#8217;t much to change other than the extra join column. I only ever want to search the current version. I only want to display the current version on viewtopic.php. In fact, every time I reference the phpbb_posts_text table I want only the current version&#8230; unless I am a moderator that needs to review the actual post history. Why did I make the current post version zero instead of taking the maximum number? Simple. Every post has at least one version to start with, and that version will always be zero. By ensuring that the &#8220;current&#8221; version of the post text is always stored as version 0 my join logic is extremely simple.</p>
<p>But what does the version number mean then? In this model, I think of the version as a number representing how many &#8220;versions ago&#8221; the text was posted. Take a post with 5 version rows in the post text table. Version 0 is the currently displayed (and search indexed) version. Version 1 is one version ago. Version 2 is two versions ago, and so on up to version 4 (the original post text) which was four versions ago compared to the current text.</p>
<p>How are these versions created? It&#8217;s fairly simple. In the standard phpBB2 code there is a statement that checks to see if the posting process is in edit mode or new post mode. It creates either an INSERT or UPDATE statement based on the mode. All I did was add this check:</p>
<pre>        if ( $mode == 'editpost' )
        {
                $sql =  "UPDATE " . POSTS_TEXT_TABLE . "
                        SET     post_version = post_version + 1
                        WHERE   post_id = $post_id
                        ORDER BY post_version DESC";
                if ( !$db->sql_query($sql) )
                {
                        message_die (GENERAL_ERROR, 'Error incrementing post version', '', __LINE__, __FILE__, $sql);
                }
        }</pre>
<p>There are a couple of interesting things here. During the edit process I increment the post version for every existing record for the post by one. This means 1 becomes 2, 2 becomes 3, and so on. This generates a SQL error because when 1 becomes 2 and 2 already exists it violates the unique primary key constraint. I added an ORDER BY clause to the update statement to ensure that I start with the end of the chain instead and it fixed that problem. By starting at the end, 3 becomes 4 before 2 becomes 3 and the constraint is never violated.</p>
<p><em>This may or may not be cross-database compatible; I don&#8217;t know which databases allow an update to have an ORDER BY clause.</em></p>
<p>After the post version numbers are incremented I always do an insert because post version zero no longer exists. That means that posts are never edited&#8230; they are always inserts. This means fewer database locks which is also a good thing.</p>
<h3>Why Not Store Version On the Post Table?</h3>
<p>I am going to anticipate the following question because I think it&#8217;s quite logical. In fact, I reviewed this idea myself before rejecting it in favor of what I ultimately decided on. The question?</p>
<blockquote><p>Why not store the post version on the post table and increment it during edits, rather than using the backwards zero-based logic?</p></blockquote>
<p>This is a good question, I think. But it&#8217;s considered bad form to ever update a database key. Once it&#8217;s set it should never change. If the <code>post_id</code> + <code>post_version</code> combination key in the post table ever got out of sync with the post text table I would have a problem. By using only the <code>post_id</code> as the main key and the <code>post_version</code> as a qualifier (and only storing the <code>post_version</code> in one place) I will never have that problem.</p>
<p>Would it have saved me some work? The only benefit is that the post table itself would store an indication of how many versions there are. But wait, I have that already. <img src='http://www.phpbbdoctor.com/blog/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  When I altered the edit process to that every edit increments the <code>post_edit_count</code> field I essentially got that key value. Why not use it in my join instead?</p>
<p>I will say it again: updating a key is a bad idea. The post edit count is informative but should not be used as a key. What happens if there is a database hiccup between the update of the posts table and the insert to the posts_text table? The edit count might be updated to 5 but I only have rows 1-4 in my post text table, and because of that the join fails. In the solution I decided to go with, I will only ever have a problem if the initial post insert process fails, otherwise I will <strong>always</strong> have a post version of zero stored in my table.</p>
<h3>What&#8217;s Next?</h3>
<p>At this point I have created the field needed to allow moderators to lock a post and I have written the code to check this field and prevent users from editing their post once it has been locked. I have not written the code that would allow a moderator to lock / unlock the post.</p>
<p>I have updated the code that tracks edits and added the last edit user to the posts table. I have also updated the code so that every edit &#8211; not just the qualifying edits as determined by the standard phpBB2 code &#8211; will trigger this code.</p>
<p>Finally, I have altered the posts text table so that it includes a version tag, updated all of the base phpBB2 code to reference version zero, and altered the posting process so that every edit is an insert rather than an edit to the table.</p>
<p>At this point I have opened a discussion with my moderator team to design the interface that they will use to interact with this new information. I need to have some way to indicate to the team that edits have been performed (already in place with the edit post notes). I need to have some way for the team to open the post history and review it. I need to allow them to decide which version of the text to revert to, and I need to provide them the option to lock the post to that text once the revert process is done.</p>
<p>I will be sure to post an update as the discussion moves forward. If anyone has done this before and would be willing to share your own decision ideas / process I would certainly be interested in hearing from you.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.phpbbdoctor.com/blog/2009/11/02/storing-post-revisions-post-locking/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>Why Is 86400 A Magic Number?</title>
		<link>http://www.phpbbdoctor.com/blog/2009/10/06/why-is-86400-a-magic-number/</link>
		<comments>http://www.phpbbdoctor.com/blog/2009/10/06/why-is-86400-a-magic-number/#comments</comments>
		<pubDate>Tue, 06 Oct 2009 13:47:54 +0000</pubDate>
		<dc:creator>Dave Rathbun</dc:creator>
				<category><![CDATA[Database Tips]]></category>
		<category><![CDATA[MOD Writing]]></category>
		<category><![CDATA[phpBB]]></category>

		<guid isPermaLink="false">http://www.phpbbdoctor.com/blog/?p=324</guid>
		<description><![CDATA[Anyone who has worked with database date/time fields probably recognizes the number from the title of this blog post. If not, it&#8217;s simple: there are 86400 seconds in a day. Why do I care about this? Because there are all sorts of fun things that I can do with that number.   
What Happened [...]]]></description>
			<content:encoded><![CDATA[<p>Anyone who has worked with database date/time fields probably recognizes the number from the title of this blog post. If not, it&#8217;s simple: there are 86400 seconds in a day. Why do I care about this? Because there are all sorts of fun things that I can do with that number. <img src='http://www.phpbbdoctor.com/blog/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  <span id="more-324"></span></p>
<h3>What Happened Yesterday?</h3>
<p>One of the frequent requests that I used to see on phpbb.com was something like this:</p>
<blockquote><p>How many visitors came to my board yesterday?</p></blockquote>
<p>The problem I have with questions like this is that your &#8220;yesterday&#8221; is not the same as mine, unless you happen to live in the central time zone in the United States. When I wrote a MOD to do this for a client, I convinced them that rather than showing what happened &#8220;yesterday&#8221; it would be better to show what happened in the last 24 hours.</p>
<p>The <code>user_lastvisit</code> field shows the date/time that a user last logged in. This field is used to track new topics during a user session. It&#8217;s also used to drive the difference between &#8220;new&#8221; and &#8220;unread&#8221; personal messages. (A &#8220;new&#8221; message arrived since the last session. An &#8220;unread&#8221; message is one that hasn&#8217;t been read yet but arrived before the current session started.) I have altered my memberlist.php code to show when the user last visited as well.</p>
<p>Like most date fields in phpbb, this field is stored as int(11) rather than as a date/time field. (Other examples are the user&#8217;s registration date, the post time, new topic time. &#8230; the list goes on from there.) The content of the field is a very large integer value and is officially known as a unix timestamp.</p>
<blockquote cite="Wikipedia"><p>Unix time, or POSIX time, is a system for describing points in time, defined as the number of seconds elapsed since midnight proleptic Coordinated Universal Time (UTC) of January 1, 1970, not counting leap seconds.</p></blockquote>
<p>The standard for storing date/time fields in unix timestamp is to use a signed integer rather than unsigned. This allows a developer to store negative numbers to reflect dates prior to 1970. It also has its own Y2K issue as the int(11) field will overflow in 2038. <img src='http://www.phpbbdoctor.com/blog/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  But let me get back on track for this blog post.</p>
<h3>SQL Code for Last 24 Hours</h3>
<p>Because of the way the user last visit time is stored, I can easily get a list of people that have visited my board in the last 24 hours with this SQL code:</p>
<pre>select  user_id
,       username
from    phpbb_users
where   user_lastvisit >= (unix_timestamp() - 86400)
order by user_lastvisit desc</pre>
<p>The MySQL function <code>unix_timestamp()</code> returns the current date and time in a unix timestamp format so I don&#8217;t have to convert anything. Since the unix timestamp is a number of seconds, and since one day has 86400 seconds, by subtracting 86400 from the current time I get the matching time from 24 hours ago. Easy stuff. <img src='http://www.phpbbdoctor.com/blog/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>If I wanted to truly get a list of people that signed in &#8220;yesterday&#8221; then the first thing I have to do is define what yesterday means. Time zones could get involved. It could get messy. I much prefer the &#8220;last 24 hours&#8221; definition because it&#8217;s the same for everybody everywhere.</p>
<h3>What About More Than One Day?</h3>
<p>Sometimes I want to calculate more than one day. Instead of memorizing multiples of 86400 I simply multiply by the number of days. So if I want to count how many people have logged in for the past 7 days (as defined by 24-hour periods rather than &#8220;days&#8221;) I would do this:</p>
<pre>select  count(user_id)
from    phpbb_users
where   user_lastvisit >= (unix_timestamp() - ( 86400 * 7 ) )</pre>
<p>This is easy enough to do, and the code becomes &#8220;self-documenting&#8221; in a manner of speaking. I know that there are 86400 seconds in a day, and if I multiply by 7 I get a week. This is much easier to read and understand than using the number 604800.</p>
<h3>Measuring Board Activity</h3>
<p>About two years ago I told folks I was eagerly looking forward to the first week where my board averaged 86400 page views daily. Now that I have explained what the number is, that statement makes more sense. <img src='http://www.phpbbdoctor.com/blog/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  I was looking for the first week that I averaged a page view every second for an entire week. That happened over a year ago, and in fact my board averages over 100K page views daily at this point.</p>
<p>Now I am looking forward to the first week that I average 172800 page views a day. Hmm, I wonder why that is? <img src='http://www.phpbbdoctor.com/blog/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' /> </p>
<p><strong>Related Links</strong></p>
<ul>
<li><a href="http://en.wikipedia.org/wiki/Unix_timestamp">Wiki on Unix Timestamps</a></li>
<li><a href="http://en.wikipedia.org/wiki/UTC">Wiki on UTC</a></li>
<li><a href="http://en.wikipedia.org/wiki/Year_2038_problem">Unix Millenium Bug</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.phpbbdoctor.com/blog/2009/10/06/why-is-86400-a-magic-number/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Optimizing Random Users Via Last Visit Time</title>
		<link>http://www.phpbbdoctor.com/blog/2009/09/27/optimizing-random-users-via-last-visit-time/</link>
		<comments>http://www.phpbbdoctor.com/blog/2009/09/27/optimizing-random-users-via-last-visit-time/#comments</comments>
		<pubDate>Sun, 27 Sep 2009 16:33:39 +0000</pubDate>
		<dc:creator>Dave Rathbun</dc:creator>
				<category><![CDATA[MOD Writing]]></category>
		<category><![CDATA[Performance Tuning]]></category>
		<category><![CDATA[phpBB]]></category>
		<category><![CDATA[phpBB3]]></category>

		<guid isPermaLink="false">http://www.phpbbdoctor.com/blog/?p=336</guid>
		<description><![CDATA[While I have not started in-depth MODding on phpBB3 yet, I do read the phpBB3 MODders forum from time to time just to start to get the flavor of how things have changed. The other day a database (query) question came up and I suggested an answer that I originally thought was only slightly different [...]]]></description>
			<content:encoded><![CDATA[<p>While I have not started in-depth MODding on phpBB3 yet, I do read the phpBB3 MODders forum from time to time just to start to get the flavor of how things have changed. The other day a database (query) question came up and I suggested an answer that I originally thought was only slightly different from what had already been proposed. However, after being asked which of the two solutions would be the least CPU intensive I did a bit more investigating.</p>
<p>I discovered that one solution was clearly better than the other, but only if the proper index was created. </p>
<p><em>Disclaimer: I tested on phpBB2. The index that I created does not exist in a standard phpBB2, nor does it exist in a standard phpBB3 install, so I suspect this post applies to both.</em> <span id="more-336"></span></p>
<h3>Defining the Problem</h3>
<p>Here is a partial quote of the <a href="http://www.phpbb.com/community/viewtopic.php?f=71&#038;t=1793305">original question in the phpBB3 MOD forum</a>:</p>
<blockquote><p>I want to select 5 random memebers from last 30 active users.</p></blockquote>
<p>Simple enough, yes? First, get the last 30 members that have visited the board, then randomly select 5 of those. This could easily be done procedurally via php code, but it can also be done directly in the database with the correct SQL code.</p>
<h3>Order By Random</h3>
<p>The first suggestion given was this:</p>
<pre>$start = 0;
$number = 30;
$sql = 'SELECT *
    FROM ' . USERS_TABLE . '
    ORDER BY user_lastvisit  DESC, RAND()'  ;
    $result = $db->sql_query_limit($sql, $number, $start);</pre>
<p>With apologies to evil&lt;3 who posted this <img src='http://www.phpbbdoctor.com/blog/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  there are a couple of things that I suggested changing. First is to avoid using the * to select every column in the table unless it&#8217;s absolutely needed. In this case, I made the assumption that the original poster was looking for a way to display a random five &#8220;recent visitors&#8221; somewhere on a page on the site. To do this doesn&#8217;t require every single bit of information about the user, just certain columns. If you select * then the entire row is returned. There are 73 columns in a standard phpBB3 users table, several of them varchar(255), and in my test installation all of the fields are mandatory. That means every single column has a value, even if it is just a space or other placeholder value. By setting up a specific list of columns to request in the query the amount of I/O is reduced. Less I/O should mean if all other things are equal the query will run faster because there are fewer bits and bytes to move around.</p>
<p>The other issue with the query as provided is that it&#8217;s not very efficient. It has two order by columns, one of which cannot possibly be indexed. (You can&#8217;t index something that doesn&#8217;t exist until the runtime of the query, so the rand() function result is impossible to tune.) Here is an abbreviated display for the explain plan for this query:</p>
<pre>+-------------+------+---------------+------+---------+------+-------+---------------------------------+
| select_type | type | possible_keys | key  | key_len | ref  | rows  | Extra                           |
+-------------+------+---------------+------+---------+------+-------+---------------------------------+
| SIMPLE      | ALL  | NULL          | NULL | NULL    | NULL | 43367 | Using temporary; Using filesort |
+-------------+------+---------------+------+---------+------+-------+---------------------------------+</pre>
<p>This shows that no indexes will be used for this query at all, which is not good. I ran this query five times in a row on my large user table and got execution times of 0.12, 0.12, 0.12, 0.13, and 0.12 seconds. </p>
<h3>Select Last 30 Then Random 5</h3>
<p>As you can see from the explain data above, I have 43,367 rows in my users table right now, which is fairly large. Instead of scanning the entire table, it would be much more efficient to get the last 30 visitors in one pass and then pick five members randomly from that list. I suggested this SQL to do that:</p>
<pre>SELECT  u.user_id
,       u.username
FROM    phpbb_users u
,       (SELECT user_id
         FROM   phpbb_users v
         ORDER BY user_lastvisit desc limit 30) v
WHERE   u.user_id = v.user_id
ORDER BY rand() limit 5;</pre>
<p>This is an interesting technique called &#8220;inline tables&#8221; as I am creating a new table on the fly by writing SQL code inside the FROM clause. Every database I have worked with supports this technique, so it should be portable. (I do not count Microsoft Access as a real database. <img src='http://www.phpbbdoctor.com/blog/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' />  ) What this SQL code will do is run the inline table to return a list of 30 users, then join that virtual table to the real table by <code>user_id</code> (which is a unique key) and randomly select five users from the joined result set.</p>
<p>Is it more efficient?</p>
<p>I ran this query five times (as I did with the other one) and got run times of 0.10, 0.12, 0.10, 0.10, and 0.11 seconds. It seems that it&#8217;s not really that much more effective, so is there really a clear winner?</p>
<h3>Index Key Columns</h3>
<p>I ran this query:</p>
<p><code>show indexes from phpbb_users</code></p>
<p><em>Side Note: I run my SQL directly on the database using the MySQL command line, rather than phpMyAdmin. If you use the GUI interface, then you can check for keys by looking at the appropriate screen instead of doing as I documented here.</em></p>
<p>The results of the query did not show an index on <code>user_lastvisit</code> which is crucial to this solution. Here is the explain plan for my query without the index:</p>
<pre>+-------------+--------+---------------+---------+-----------+-------+---------------------------------+
| select_type | type   | possible_keys | key     | ref       | rows  | Extra                           |
+-------------+--------+---------------+---------+-----------+-------+---------------------------------+
| PRIMARY     | ALL    | NULL          | NULL    | NULL      |    30 | Using temporary; Using filesort |
| PRIMARY     | eq_ref | PRIMARY       | PRIMARY | v.user_id |     1 |                                 |
| DERIVED     | ALL    | NULL          | NULL    | NULL      | 43367 | Using filesort                  |
+-------------+--------+---------------+---------+-----------+-------+---------------------------------+</pre>
<p>Notice that is also scans all 43,367 user rows. That&#8217;s okay. What isn&#8217;t okay is that it does so without the benefit of an index and it also has to do some additional work since more than one table is involved. It would seem that the first query should be more efficient since it only has one explain step and the second one has three.</p>
<p>However, the magic of a database indexing can fix this. The driver for this entire question is the <code>user_lastvisit</code> column. After creating an index on this field (which is not indexed by default in either phpBB2 or phpBB3) here is the new explain plan.</p>
<pre>+-------------+--------+---------------+----------------+-----------+-------+---------------------------------+
| select_type | type   | possible_keys | key            | ref       | rows  | Extra                           |
+-------------+--------+---------------+----------------+-----------+-------+---------------------------------+
| PRIMARY     | ALL    | NULL          | NULL           | NULL      |    30 | Using temporary; Using filesort |
| PRIMARY     | eq_ref | PRIMARY       | PRIMARY        | v.user_id |     1 |                                 |
| DERIVED     | index  | NULL          | user_lastvisit | NULL      | 43367 |                                 |
+-------------+--------+---------------+----------------+-----------+-------+---------------------------------+</pre>
<p>Yup, still have to look at (or so the database optimizer thinks) all 43,367 users. But this time we do so with the benefit of an index. What is the impact?</p>
<p>The query without an index, remember, ran in 0.10, 0.12, 0.10, 0.10, and 0.11 seconds. After creating the index I ran the same query five times and got 0.00 seconds of execution time on every trial. </p>
<p>Does the index help the first query? Interestingly enough, it does not. The explain plan is identical with or without the index, and the query execution times do not improve either. </p>
<p>However, it gets worse. <img src='http://www.phpbbdoctor.com/blog/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  It doesn&#8217;t perform as required. My query gets the last 30 users that have logged in and then randomly selects five of those. The ORDER BY clause happens after all of the select and join process is complete, so I am ordering by RAND() on at most 30 rows. <strong>The other suggestion will pick the same five users nearly every single time</strong>. Why? The secondary sort column (the &#8220;random factor&#8221;) will only come into play if two users have exactly the same last visit time (down to the second). When you have two columns in the ORDER BY clause, the first column is the primary sort and every row returned will first be sorted by that column. If there are ties in the first column then and only then will the second column be sorted.</p>
<p>So the first solution suggested is the worst of both worlds: not only is it slower, it is also incorrect. <img src='http://www.phpbbdoctor.com/blog/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<h3>Conclusion</h3>
<p>There may be other solutions to this. I don&#8217;t mean to present this post as the ultimate answer to this question. What I hoped to accomplish with this post was to show how two solutions that look the same are not always equivalent. Subtle differences can have a huge impact on functionality. </p>
<p>The second lesson is that if you&#8217;re going to be asking the same question from your database over and over you should carefully consider indexing the columns used in the WHERE or ORDER BY clauses. I did some work for someone a while back (their board is one of the top ten phpBB boards on the &#8220;big boards&#8221; site). They wanted to display the top ten posters on their index. The code as written was taking over ten seconds just to run the query <img src='http://www.phpbbdoctor.com/blog/wp-includes/images/smilies/icon_eek.gif' alt=':shock:' class='wp-smiley' />  and then the php code / template process still had to complete. I rewrote the code and added an index on the <code>user_posts</code> column and the code ran in less than a hundredth of a second.</p>
<p>On the other hand, too many indexes can also be a problem, so don&#8217;t go out and create an index for every single column in your database. Just the ones that truly matter. In this case, it makes a substantial difference. <img src='http://www.phpbbdoctor.com/blog/wp-includes/images/smilies/icon_cool.gif' alt='8-)' class='wp-smiley' /> </p>
]]></content:encoded>
			<wfw:commentRss>http://www.phpbbdoctor.com/blog/2009/09/27/optimizing-random-users-via-last-visit-time/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Post Bump + Cross Posting Prevention</title>
		<link>http://www.phpbbdoctor.com/blog/2009/09/27/post-bump-cross-posting-prevention/</link>
		<comments>http://www.phpbbdoctor.com/blog/2009/09/27/post-bump-cross-posting-prevention/#comments</comments>
		<pubDate>Sun, 27 Sep 2009 12:24:32 +0000</pubDate>
		<dc:creator>Dave Rathbun</dc:creator>
				<category><![CDATA[Board Management]]></category>
		<category><![CDATA[MOD Writing]]></category>
		<category><![CDATA[phpBB]]></category>

		<guid isPermaLink="false">http://www.phpbbdoctor.com/blog/?p=335</guid>
		<description><![CDATA[A long time ago I wrote a MOD for my phpBB2 board that puts a nice red banner at the top of the posting screen if you are getting ready to &#8220;bump&#8221; your post. Bumping is defined as posting two (or more) times in a row without someone else replying in between, and without waiting [...]]]></description>
			<content:encoded><![CDATA[<p>A long time ago I wrote a MOD for my phpBB2 board that puts a nice red banner at the top of the posting screen if you are getting ready to &#8220;bump&#8221; your post. Bumping is defined as posting two (or more) times in a row without someone else replying in between, and without waiting 24 hours first. Does it work? From a functional standpoint it certainly does. From a procedural standpoint, not so much. <span id="more-335"></span></p>
<h3>Implementing the Bump Warning MOD</h3>
<p>The MOD was fairly simple to implement. It does require an extra query during the posting process. This query is used to find out if the current poster was also the last person to post in the selected topic. If so, it checks the post time to see if 24 hours have expired. If not, then it sets a value for a template switch which turns on the red warning label on the posting screen. Here&#8217;s the code from posting.php.</p>
<pre>// BEGIN Bump Warning 1.0.0 (www.phpBBDoctor.com)
if ($mode == 'reply')
{
        $sql = 'SELECT  p.post_id
                ,       p.poster_id
                ,       p.post_time
                FROM    ' . POSTS_TABLE . ' p
                ,       ' . TOPICS_TABLE . ' t
                WHERE   t.topic_id = ' . $topic_id . '
                AND     p.post_id = t.topic_last_post_id+0';

        if (!($result = $db->sql_query($sql)))
        {
                message_die(GENERAL_ERROR, 'Invalid cursor getting last poster', '', '', '', $sql);
        }
        $check = $db->sql_fetchrow($result);
        $last_post_id = $check['post_id'];
        $db->sql_freeresult($result);

        if ( ($check['poster_id'] == $userdata['user_id']) &#038;&#038; ($check['post_time'] > time() - (86400)) )
        {
                $template->assign_block_vars('switch_bump_warning', array(
                        'U_EDIT_PRIOR_POST' => append_sid("posting.$phpEx?mode=editpost&amp;p=$last_post_id")
                        ));
        }
}
// END Bump Warning 1.0.0 (www.phpBBDoctor.com)</pre>
<p>Then of course in the template file I have the switch defined.</p>
<p><code>        &lt;!-- BEGIN switch_bump_warning --&gt;<br />
        &lt;tr&gt;<br />
          &lt;td class="favrow_red" width="22%" height="35">&lt;span class="gen"&gt;&lt;strong&gt;Bump Warning&lt;/strong>&lt;/span>&lt;/td&gt;<br />
          &lt;td class="favrow_red" width="78%"&gt;&lt;span class="gen">It appears that you were the last person to post in this topic and 24 hours has not yet elapsed. Please do not reply to your post unless you have new information to add. You may also &lt;a href="{switch_bump_warning.U_EDIT_PRIOR_POST}" class="gen"&gt;edit your prior post&lt;/a&gt; since nobody has replied yet. Thanks.&lt;/td&gt;<br />
        &lt;/tr&gt;<br />
        &lt;!-- END switch_bump_warning --&gt;</code></p>
<p>This is all well and good, but it seems that some people can be blind. Somehow the red banner across the top of the screen isn&#8217;t enough to let them know they&#8217;re doing something that I don&#8217;t want them to do. I even give them a link to edit the prior post instead of adding a new one&#8230;</p>
<p><img src="/blog/tips/post_bump/bump_warn.png" width="500" height="162" border="0" alt="Post Bump Screen Shot" title="Red warning bar for bump warning MOD" /></p>
<p>Since I have asked nicely and they&#8217;re still bumping posts, what do I do next?</p>
<h3>Duplicate Post Prevention</h3>
<p>There are two kinds of post bumps that I typically see. The first kind is what I have described above where the person is simply ignoring my request to not do what he or she is about to do. The second kind is also common and is not really something I can blame a user for. At times the Internet gets slow, and the posting process times out. It may also be that there are several hundred people watching a topic and it takes time to send those emails. (Something else I want to modify; I want to set up a job-queue to handle this process so the poster doesn&#8217;t pay a penalty for that during their posting process.) The bottom line is that sometimes a post times out, so the user resubmits. And resubmits. And when they&#8217;re done, there are two (or three or more) posts in a row that contain the same text. </p>
<p>So tonight I am starting to test a new feature. I am supplementing my bump &#8220;warning&#8221; with a bump &#8220;rejection&#8221; as well. The bump rejection does not prevent a legitimate post bump, but it will catch users that enter the same post twice due to the time-out issue I described above. As an added bonus, this new code will also detect a cross-posted topic and prevent that as well.</p>
<p>To make this change I opened includes/functions_post.php and added code to the Flood Control section. As this is very preliminary I am not using any language strings. That being said, here&#8217;s the code.</p>
<pre>// Cross-post capture
$sql = 'select	post_text
	,	p.post_id
	,	p.topic_id
	from	' . POSTS_TEXT_TABLE . ' pt
	,	' . POSTS_TABLE	. ' p
	where	p.poster_id = '	. $userdata['user_id'] . '
	AND	p.post_id = pt.post_id
	ORDER BY p.post_id desc	limit 3';

if (!($result =	$db->sql_query($sql) ) )
{
	message_die(GENERAL_ERROR, 'Invalid cursor while checking for cross posts');
}

while ($cp_row = $db->sql_fetchrow($result))
{
	if (soundex($post_message) == soundex($cp_row['post_text']))
	{
		if ($topic_id == $cp_row['topic_id'])
		{
			message_die(GENERAL_MESSAGE, 'You are attempting to post the same item twice. It is possible that your browser timed-out on your prior post submission.	If you think this is an	error, please click the	"back" button on your browser and review your post text. Otherwise please &lt;a href="' . append_sid('viewtopic.php?p=' . $cp_row['post_id'] . '#'	. $cp_row['post_id']) .'"&gt;Click	Here&lt;/a&gt; to review your	prior post.');
		}
		else
		{
			message_die(GENERAL_MESSAGE, 'You seem to be posting the same text in two different topics. Please do not do this, as it is called cross-posting and it	does not help our board. Cross-posting can lead	to fragmented discussions and extra work for people that want to help you. Please pick only one	forum to post your question. If	needed the topic can be	moved to a different forum later. Thanks for your help in this matter. Please &lt;a href="' . append_sid('viewtopic.php?t=' . $cp_row['topic_id'])	.'"&gt;Click Here&lt;/a&gt; to return to	your prior topic.');
		}
	}
}
$db->sql_freeresult($result);</pre>
<p>The query here is simple. Go get the last three posts by this user. Other than the overhead of reading the large text field into memory this query should be extremely quick. Next, check to see if the text of any of their prior posts is essentially identical to the post they are about to enter. In the case of the browser time-out situation I described earlier the post would be 100% identical. In the case of a cross-post attempt I would expect the text to be the same, as most people type up the post and then copy/paste to enter the next. By using the soundex() function on the data I can ignore spacing and punctuation differences.</p>
<p>If the post text is the same, the next step is to see if the post is going into the same topic or not. A time-out warning is issued if the user is entering the same text for two posts in a row in the same topic. If they&#8217;re not posting in the same topic, then a cross-post warning is issued instead.</p>
<p>I am testing this now, and hope to put it into production next week. </p>
<h3>Summary</h3>
<p>Does this MOD impact the user experience? Yes, it does. But I am willing to do that in this case to &#8220;train&#8221; the user how to be a better board citizen. There won&#8217;t be any more post duplicates because of time-out issues, and cross-posting should be cut down as well. Both of those are desirable outcomes. If my users complain to much, I may consider reworking it. Then again, all I am doing is adding code to help them follow the rules.</p>
<p>Personally, I&#8217;m looking forward to no more duplicate (or triplicate) posts as well as seeing what new and creative ways people find to get around the cross-posting rejection. <img src='http://www.phpbbdoctor.com/blog/wp-includes/images/smilies/icon_cool.gif' alt='8-)' class='wp-smiley' /> </p>
]]></content:encoded>
			<wfw:commentRss>http://www.phpbbdoctor.com/blog/2009/09/27/post-bump-cross-posting-prevention/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>MySQL Bug Breaks Banner System</title>
		<link>http://www.phpbbdoctor.com/blog/2009/07/12/mysql-bug-breaks-banner-system/</link>
		<comments>http://www.phpbbdoctor.com/blog/2009/07/12/mysql-bug-breaks-banner-system/#comments</comments>
		<pubDate>Sun, 12 Jul 2009 18:42:06 +0000</pubDate>
		<dc:creator>Dave Rathbun</dc:creator>
				<category><![CDATA[Database Tips]]></category>
		<category><![CDATA[MOD Writing]]></category>
		<category><![CDATA[phpBB]]></category>

		<guid isPermaLink="false">http://www.phpbbdoctor.com/blog/?p=312</guid>
		<description><![CDATA[One of the reasons I wasn&#8217;t around much earlier this year was I was in the process of moving a bunch of sites over to a new server (including this one). In most cases the move went without a hitch. In one particular case there was an interesting bug that didn&#8217;t show up right away. [...]]]></description>
			<content:encoded><![CDATA[<p>One of the reasons I wasn&#8217;t around much earlier this year was I was in the process of moving a bunch of sites over to a new server (including this one). In most cases the move went without a hitch. In one particular case there was an interesting bug that didn&#8217;t show up right away. It was related to the banner system I wrote for my largest board. Fortunately it was an error on the &#8220;good&#8221; side, so I didn&#8217;t make any sponsors angry. <span id="more-312"></span></p>
<p>The banner system is fairly complex, but at the most basic level there is a cron (scheduled) job that periodically decrements the sponsor view balance. Once the view balance hits zero, the sponsor&#8217;s banners are deactivated until they pay for another round. The code is quite simple:</p>
<pre>$sql = 'UPDATE  ' . SPONSORS_TABLE . '
	SET     view_balance =  view_balance - ' . $decrement_views . '
	WHERE   sponsor_id = ' . $sponsor_data[$i]['sponsor_id'];</pre>
<p>The value for <code>$decrement_views</code> is assigned earlier in the loop. The definition for the <code>view_balance</code> column is an unsigned integer (mediumint specifically) so it will not allow negative values. On my old server this worked perfectly. If <code>$decrement_views</code> was greater than <code>view_balance</code> the sponsor view balance was set to zero. My old server was running MySQL 4.1.</p>
<p>My new server is running 5.0, and this same code did not work. Unfortunately it did not generate a syntax or other runtime error. Instead it did the math wrong.</p>
<h3>Integer Storage in MySQL</h3>
<p>Before I talk more about the bug I think I should talk about how computers store numbers. This is not specific to MySQL, it can affect any system that stores numeric values. When I store a number I have a choice of adding the unsigned attribute. In MySQL it takes the following format. The first will store a tiny integer without a sign, and the second will store a tiny integer with a sign.</p>
<p><code>create table dave (new_column tinyint unsigned);</code></p>
<p><code>create table dave (new_column tinyint);</code></p>
<p>What is the difference? When numbers are stored they take space. A <code>tinyint</code> column in MySQL can store values from -128 to 127. An unsigned <code>tinyint</code> can store values from 0 to 255. How does this work, and why are the numbers different in each case? </p>
<p>A <code>tinyint</code> is stored in one byte or eight bits of information. With eight bits I have a range of 0000 0000 to 1111 1111. With an unsigned value I can use all eight bits for my number, so 0000 0000 = 0 and 1111 1111 is 1 + 2 + 4 + 8 + 16 + 32 + 64 + 128, or &#8211; if you do the math &#8211; 255. That&#8217;s how an unsigned tiny integer column can store a value ranging from 0 to 255. If, however, I want to use a signed value, the first bit becomes an indication of whether the value is negative or not. That means I only have seven bits left to determine the value. 0111 1111 becomes 1 + 2 + 4 + 8 + 32 + 64 which is 127, or the maximum <strong>signed</strong> value that can be stored in a signed tiny integer field. What happens when the eighth bit gets flipped to a 1? That&#8217;s an indication that the number is negative instead of positive. So while both signed and unsigned values take the same amount of space, a signed value is one order of magnitude smaller because the most significant bit (the leading bit) is used to indicate the sign of the value that is stored. </p>
<p>Put another way: a signed <code>tinyint</code> has seven available bits and therefore can store 2<sup>7</sup>-1 or 127 as the maximum value. An unsigned <code>tinyint</code> has eight available bits and therefore can store 2<sup>8</sup>-1 or 255 as the maximum value. Suppose I am looking at a number in memory and the bit values are 1000 0001. What is the value represented by these bits?</p>
<p>The fact is I can&#8217;t make that determination until I know if the value is signed or not. If the value is unsigned, the number represented by 1000 0001 is 129. If it&#8217;s signed, it gets complicated <img src='http://www.phpbbdoctor.com/blog/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  but the value returned would be -127. Keep in mind that I am using <code>tinyint</code> example values here. </p>
<h3>How Unsigned Math Broke My Sponsor System</h3>
<p>In my banner system I don&#8217;t try to track the page views down to exactly zero. A sponsor will pay for two million page views at a time. If they actually use two million and twelve I am not going to complain about the few extra views. <img src='http://www.phpbbdoctor.com/blog/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  So my system is designed to allow each sponsor more views in the interest of good will but mostly for system performance. In my system, each sponsor&#8217;s view balance is only updated once an hour. Let me repeat the code that I showed above:</p>
<pre>$sql = 'UPDATE  ' . SPONSORS_TABLE . '
	SET     view_balance =  view_balance - ' . $decrement_views . '
	WHERE   sponsor_id = ' . $sponsor_data[$i]['sponsor_id'];</pre>
<p>First (not shown) I get a total of the banner views that have accumulated over the past hour and store them into the <code>$decrement_views</code> variable in my php script. Next I execute the SQL script shown above for each sponsor with an active banner. Suppose sponsor number 12 has 1000 views left and they used 300 in the last hour. The SQL code resolves to this:</p>
<p><code>UPDATE  phpbb_sponsors<br />
SET     view_balance =  1000 - 300<br />
WHERE   sponsor_id = 12;</code></p>
<p>After this statement is executed the sponsor has a balance of 700 views left. Suppose the same sponsor has 100 views in their balance and they used 300 more during the last hour. The SQL ends up looking like:</p>
<p><code>UPDATE  phpbb_sponsors<br />
SET     view_balance =  100 - 300<br />
WHERE   sponsor_id = 12;</code></p>
<p>When 300 is subtracted from 100 it results in a negative number. <strong>Under my old version of MySQL that number was set to zero since the column is defined as unsigned and is not capable of storing a negative value.</strong> This is what broke during the upgrade to MySQL 5.</p>
<h3>MySQL Bug Explained</h3>
<p>The newer version of MySQL did the math as signed, which allowed the value to go negative, and then stored the results in the unsigned field. You might start to see the problem now. <img src='http://www.phpbbdoctor.com/blog/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  Instead of setting the sponsor view balance to zero for any negative result, the sponsor view balance was set to the maximum value that it could possibly hold because of the overflow. In this case, the view balance got set to 16,777,215 instead of zero. What is significant about that number? Here is a quote from the MySQL web page where it details the values that can be stored for any particular numeric column type&#8230;</p>
<blockquote><p>MEDIUMINT[(M)] [UNSIGNED] [ZEROFILL]<br />
A medium-sized integer. The signed range is -8388608 to 8388607. The unsigned range is 0 to 16777215. </p></blockquote>
<p>In a nutshell, the old version of MySQL caught the overflow exception and set the value to zero. The newer version of MySQL did not handle the overflow and instead let the signed value stored the unsigned negative value. I&#8217;m sure that would have made my sponsors happy, but it certainly wasn&#8217;t how things were intended to work.</p>
<h3>Fixing The Problem</h3>
<p>There is a good lesson to be learned here. I was lazy <img src='http://www.phpbbdoctor.com/blog/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  in the way I wrote my earlier code. Rather than check to see if the number of accumulated views was higher than the balance remaining and handling that exception with code, I relied on the fact that MySQL would not (should not) store a negative result in an unsigned field. When the database behavior changed (it has been recognized as a bug by MySQL according to my research) my system broke.</p>
<p>I have fixed the SQL by using a case statement so that this error will never occur for me again. Here is the revised SQL:</p>
<pre>$sql = 'UPDATE  ' . SPONSORS_TABLE . '
	SET     view_balance =  case
			when ' . $decrement_views . ' > view_balance then 0
			else view_balance - ' . $decrement_views . '
			end
	WHERE   sponsor_id = ' . $sponsor_data[$i]['sponsor_id'];</pre>
<p>This updated code uses a <code>case</code> statement structure to check to make sure that the remaining balance is larger than the value to be decremented. If it is not, the value is simply set to zero.</p>
<p>Finally, now that I&#8217;ve explained signed versus unsigned it makes the following cartoon from xkcd.com more meaningful, doesn&#8217;t it? <img src='http://www.phpbbdoctor.com/blog/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> </p>
<p><img src="http://imgs.xkcd.com/comics/cant_sleep.png" /></p>
<p><strong>Related Links</strong></p>
<ul>
<li><a href="http://dev.mysql.com/doc/refman/5.1/en/numeric-type-overview.html">MySQL Numeric Types Explained</a></li>
<li><a href="http://www.rwc.uc.edu/koehler/comath/13.html">Unsigned and Signed Integers</a>, an article I found on the Internet with more details on signed versus unsigned storage, it&#8217;s short and a very easy read</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.phpbbdoctor.com/blog/2009/07/12/mysql-bug-breaks-banner-system/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>My Favorite Error Report</title>
		<link>http://www.phpbbdoctor.com/blog/2009/03/04/my-favorite-error-report/</link>
		<comments>http://www.phpbbdoctor.com/blog/2009/03/04/my-favorite-error-report/#comments</comments>
		<pubDate>Wed, 04 Mar 2009 16:41:59 +0000</pubDate>
		<dc:creator>Dave Rathbun</dc:creator>
				<category><![CDATA[MOD Writing]]></category>
		<category><![CDATA[phpBB]]></category>

		<guid isPermaLink="false">http://www.phpbbdoctor.com/blog/?p=306</guid>
		<description><![CDATA[Here&#8217;s a tip: if you encounter a problem with code that someone else (such as myself) has written, and you write to them asking for help, try to be more descriptive in your error report than this:
On localhost works but not works in linux server.
Localhost: all pages works
Sever linux: none pages works
So basically they&#8217;re telling [...]]]></description>
			<content:encoded><![CDATA[<p>Here&#8217;s a tip: if you encounter a problem with code that someone else (such as myself) has written, and you write to them asking for help, try to be more descriptive in your error report than this:</p>
<blockquote><p>On localhost works but not works in linux server.</p>
<p>Localhost: all pages works<br />
Sever linux: none pages works</p></blockquote>
<p>So basically they&#8217;re telling me that the MOD works on their localhost installation, but once they upload it to their linux server &#8220;none pages works&#8221; which, I guess, is a bit of a problem.</p>
<p>But without my ESP module addon for the PM system, I am unable to determine exactly what the problem is, so I was forced to ask them for a specific issue. So folks, if you have a problem with my code, tell me what it is. If it&#8217;s supposed to do X and it does Y instead, tell me that. If it&#8217;s supposed to do X and it&#8217;s not doing anything, tell me that. </p>
<p>But don&#8217;t tell me &#8220;it doesn&#8217;t work&#8221; and expect much help.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.phpbbdoctor.com/blog/2009/03/04/my-favorite-error-report/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>User Selectable Board Width</title>
		<link>http://www.phpbbdoctor.com/blog/2008/11/11/user-selectable-board-width/</link>
		<comments>http://www.phpbbdoctor.com/blog/2008/11/11/user-selectable-board-width/#comments</comments>
		<pubDate>Wed, 12 Nov 2008 04:57:04 +0000</pubDate>
		<dc:creator>Dave Rathbun</dc:creator>
				<category><![CDATA[MOD Writing]]></category>

		<guid isPermaLink="false">http://www.phpbbdoctor.com/blog/?p=285</guid>
		<description><![CDATA[I got an interesting request for a feature on a new board I&#8217;m working on tonight. This is a phpBB2 board (of course   ) and I am using a variation on subSilver. That template is &#8220;fluid&#8221; meaning the width will expand to fit the size of the window. Years ago having a site [...]]]></description>
			<content:encoded><![CDATA[<p>I got an interesting request for a feature on a new board I&#8217;m working on tonight. This is a phpBB2 board (of course <img src='http://www.phpbbdoctor.com/blog/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  ) and I am using a variation on subSilver. That template is &#8220;fluid&#8221; meaning the width will expand to fit the size of the window. Years ago having a site expand to fit the window was okay. Now that some people are using 1600&#215;1200 pixel resolutions it can be hard to read. The human eye / brain simply can&#8217;t scan a line of text that far without losing track of where you are. I solve this myself by running my browsers at less than full screen, but that&#8217;s my choice. If someone else chooses to run their browser window stretched over two high-resolution monitors that should be there choice as well.</p>
<p>Where is this going? Based on the remarks made earlier tonight I worked out a really quick and easy MOD for phpBB2 (and the idea would work just as well for phpBB3). It&#8217;s a few bits of code that allow users to set a fixed width for the board (by pixel count) or opt for a full screen display. It took about an hour from initial concept to execution. <img src='http://www.phpbbdoctor.com/blog/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p><span id="more-285"></span></p>
<h3>Building the User Selectable Board Width MOD</h3>
<p>Step one is creating a place to store this information.</p>
<pre>alter table phpbb_users
add user_board_width smallint(5) unsigned not null default 0;</pre>
<p>I opted to use smallint(5) because the highest value I anticipate is 1600. The MySQL smallint data type goes up to 65535 for an unsigned value, so that should be plenty. The default value is defined as 0, which my code will interpret as full screen. Any other value will be used as the screen width.</p>
<p>Step 2 is to let the user enter something into that new value. Here&#8217;s what that looks like:</p>
<p><img src="/blog/images/board_width.jpg" width="473" height="201" alt="Screen Shot" title="User Selecting Board Width" border="2" /></p>
<p>The code to build the selector and handle the user input value is all in includes/usercp_register.php which is the dual-purpose register and edit profile program.</p>
<p>What I did next I will call step 2.5 since it would not really be necessary. I added admin control that lets the board owner set the default width in case the user doesn&#8217;t pick anything. The value for that is stored in the phpbb_config table and therefore added to the $board_config array. </p>
<p>Step 4 is to collect the user preference and pass it to the template. That process takes place in includes/page_header.php like this:</p>
<pre>'TABLE_WIDTH' =&gt; (intval($userdata['user_board_width']) == 0 ? $board_config['default_board_width'] : intval($userdata['user_board_width'])),</pre>
<p>As you can see, if the $userdata array contains a zero value for the board width, the default value entered by the board admin will be used. Otherwise the user profile option is converted to an integer and sent to the template. That&#8217;s all the coding that was required. The template changes are even easier.</p>
<h3>subSilver Template Changes</h3>
<p>For subSilver the final step is very simple. Open the overall_header and find the top table that contains the rest of the page structure. It starts out looking like this:</p>
<p><code>&lt;table width="<strong>100%</strong>" cellspacing="0" cellpadding="0" border="0" align="center"&gt;</code></p>
<p>Change it to this, and you&#8217;re done:</p>
<p><code>&lt;table width="<strong>{TABLE_WIDTH}</strong>" cellspacing="0" cellpadding="0" border="0" align="center"&gt;</code></p>
<h3>Fluid versus Fixed Design Debate</h3>
<p>The debate over fixed versus fluid design can be as contentious as the debate comparing Mac to PC or IE to FF. Well, perhaps not that bad, but I&#8217;ve seen it get close. <img src='http://www.phpbbdoctor.com/blog/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' />  As a user, I perfer fluid sites so I can resize the window to fit my preferences. As a designer I think fixed width designs are easier to create. Making something scale well from 800 pixels all the way up to 1600 pixels (or beyond) is quite a challenge. When I was looking for templates for one of my blogs, I found that the ratio of fixed to fluid templates was probably about 5:1 (five fixed for every one fluid). And many of the fluid templates were &#8230; less than useful.</p>
<p>The subSilver template is a great example of a style that works both ways. For any style that fits the same pattern, having this simple MOD installed puts the choice back into the hands (and mouse) of your user. I think that&#8217;s a good thing.</p>
<p>I will be writing up the full MOD install and posting it in my MOD Catalog so that it&#8217;s available on this site.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.phpbbdoctor.com/blog/2008/11/11/user-selectable-board-width/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Processing Words Is Easy, Processing Content Is Hard</title>
		<link>http://www.phpbbdoctor.com/blog/2008/09/17/processing-words-is-easy-processing-content-is-hard/</link>
		<comments>http://www.phpbbdoctor.com/blog/2008/09/17/processing-words-is-easy-processing-content-is-hard/#comments</comments>
		<pubDate>Wed, 17 Sep 2008 19:48:02 +0000</pubDate>
		<dc:creator>Dave Rathbun</dc:creator>
				<category><![CDATA[Anti-spam]]></category>
		<category><![CDATA[MOD Writing]]></category>
		<category><![CDATA[phpBB]]></category>

		<guid isPermaLink="false">http://www.phpbbdoctor.com/blog/?p=256</guid>
		<description><![CDATA[Have you ever received an email with an advertisement for something unsavory followed by a paragraph of seemingly nonsense text? The reason for the extra text was the spammer was trying to get past one of the more common email spam filters known as Bayesian Spam Filtering. The process of adding text is called &#8220;poisoning&#8221; [...]]]></description>
			<content:encoded><![CDATA[<p>Have you ever received an email with an advertisement for something unsavory followed by a paragraph of seemingly nonsense text? The reason for the extra text was the spammer was trying to get past one of the more common email spam filters known as Bayesian Spam Filtering. The process of adding text is called &#8220;poisoning&#8221; the filter, and it&#8217;s yet another tactic in the ongoing war between legitimate content providers and spammers. I was asked at Londonvasion 2008 whether I felt that there would ever be an effective way of dealing with human spammers. My comment at the time was that the best defense against spammer posts (human or otherwise) is an active and effective moderator team. Could this sort of algorithm be adoped as an anti-spam technique for board posts? Yes, I believe it could. To the best of my knowledge nobody has yet tried to do that for phpBB2 (my google-fu may have failed me, but I did look). I would be very interested to hear of such a project if it exists. </p>
<p>The problem with this and other anti-spam techniques is that it&#8217;s based on words rather than content. This may seem like splitting hairs&#8230; after all, isn&#8217;t my content made up of words? Yes, yes it is. And that&#8217;s the problem. Confused yet? I hope so, because it gets worse from here. <img src='http://www.phpbbdoctor.com/blog/wp-includes/images/smilies/icon_lol.gif' alt=':lol:' class='wp-smiley' /> </p>
<p><span id="more-256"></span><br />
<h3>Words are Words, Content is Combinations of Words</h3>
<p>Simply put, to catch and prevent spam posts in any sort of programatic fashion the code has to be smart enough to understand content and not just words. Examine for a moment the following two phrases:</p>
<blockquote><p>Fruit flies like a banana.<br />
Time flies like an arrow.</p></blockquote>
<p>I didn&#8217;t make these up; the combination of these two phrases appears quite often in discussions about language or pattern recognition. The two phrases are nearly identical. Each has five words. In fact, the middle three words in each phrase are essentially identical. Yet the phrases mean something completely different. In one phrase the word &#8220;flies&#8221; is a noun (an object) and in the other it&#8217;s a verb. The word &#8220;like&#8221; is used in two different ways. If I were to examine these two sentences word by word I would probably conclude that there is a high degree of correlation between the two. In fact, there is very little.</p>
<p>Here&#8217;s another example that I saw recently. I could not remember it exactly, but it was something like this:</p>
<blockquote><p>Bank of New Zealand floods customer inboxes.<br />
New Zealand river floods, overflows bank.</p></blockquote>
<p>Again, if I were to look at an individual word comparision these two sentences look very similar. They each contain the words (or forms of the words) &#8220;new&#8221;, &#8220;zealand&#8221;, &#8220;river&#8221;, &#8220;flood&#8221;, and &#8220;bank&#8221; in the sentence. When I first saw the example (which I cannot find at the moment) there were some other similar words as well. In order to properly differentiate these two sentences I have to go beyond word analysis and do a context or content analysis.</p>
<p>And what about this headline:</p>
<blockquote><p>Hacker penetrates Paris Hilton</p></blockquote>
<p>Is that an article about a security flaw in a hotel network? Or a pornography video? <img src='http://www.phpbbdoctor.com/blog/wp-includes/images/smilies/icon_lol.gif' alt=':lol:' class='wp-smiley' /> </p>
<h3>Unstructured Data</h3>
<p>Unstructured data analysis is becoming more and more interesting to corporations for a wide variety of reasons. None of them are related to fighting spam. Other than hiring an army of readers, how is a company to know what is being said about it on the web? There are sites like epinions.com and resellerratings.com that allow people to log on and post reviews about various products. There are newsgroups hosted by Yahoo! and Google where people can log on and post complements or complaints. There are blogs, discussion boards, and &#8220;sucks&#8221; sites. There are legitimate news articles or press releases. There are social networking sites. In short, there is a flood (heh) of information on the web, and very little of it is structured. If programmers at billion dollar companies are struggling with how to manage that information, what are we as phpBB MOD authors supposed to do?</p>
<p>I have often talked about my &#8220;big board&#8221; on this site. The board is an independent discussion board related to the products from a company named Business Objects (which recently was acquired by SAP). One of the products that Business Objects bought in 2007 was a company called Inxight which was a result of yet another Xeroc PARC research product. This product is designed to process unstructured data and perform content recognition. They have a fairly high-level demo online; I have included a link at the end of this post. The demo is light on specifics but it does show how the product can scan unstructured data like a press release and extract the important concepts and data points.</p>
<h3>Anti-spam Application</h3>
<p>And now I am finally getting back to the idea presented in the first paragraph: can we use word analysis to combat spammers on our boards? I think that the answer is &#8220;not yet&#8221; because we don&#8217;t have algorithms that are sophisticated enough to manage context. There are a number of anti-spam MODs in various stages that look at words, but to my knowledge there aren&#8217;t any that analyze the context of the words. A collection of words taken separately might indicate spam, but when reviewed in context they might be a perfectly valid post.</p>
<p>In other words, it&#8217;s not enough to identify words, I also need to identify how those words are used.</p>
<h3>Related Posts</h3>
<p>Another application for content analysis is a &#8220;related posts&#8221; MOD. There are a number of these for phpBB2. One I read used the phpBB2 search tables to identify common words by frequency across topics. Another used a special database index on the topic title only. To be honest, if posts are related because of common word usage, in my opinion they are flawed. If the posts are related because of <strong>relevance</strong>&#8230; that&#8217;s something I would be interested in. I did some experiments with a related posts MOD of my own and ultimately never completed the project due to my lack of satisfaction with the algorithms I could find or come up with on my own. I would like to revisit this idea again in the future.</p>
<h3>Conclusion</h3>
<p>The bottom line is that blogs and boards have one very important thing in common: the data is nearly completely unstructured. I say &#8220;nearly&#8221; because with blogs we have categories, and a board has a specific category -> forum -> topic hierarchy in place. But outside of that, the content provided by a board post may have nothing to do with anything else on the board. Does that make it spam? It&#8217;s hard to say. That&#8217;s why we still need good moderators for our boards.</p>
<p>Time flies like an arrow. Fruit flies like a banana. My board can&#8217;t tell the difference, can yours?</p>
<p><strong>Related Links</strong></p>
<ul>
<li><a href="http://en.wikipedia.org/wiki/Bayesian_spam_filtering">Wiki on Bayesian Filters</a></li>
<li><a href="http://www.businessobjects.com/demos/bi_platform/index.htm">Turning Unstructured Text into Insight</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.phpbbdoctor.com/blog/2008/09/17/processing-words-is-easy-processing-content-is-hard/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
