<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Welcome to the phpBB Doctor Blog &#187; Anti-spam</title>
	<atom:link href="http://www.phpbbdoctor.com/blog/category/phpbb/anti-spam/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.phpbbdoctor.com/blog</link>
	<description>Your premium source for custom modification services for phpBB</description>
	<lastBuildDate>Fri, 30 Apr 2010 02:58:53 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Registration Protection Isn&#8217;t Enough Anymore</title>
		<link>http://www.phpbbdoctor.com/blog/2010/04/29/registration-protection-isnt-enough-anymore/</link>
		<comments>http://www.phpbbdoctor.com/blog/2010/04/29/registration-protection-isnt-enough-anymore/#comments</comments>
		<pubDate>Fri, 30 Apr 2010 02:58:53 +0000</pubDate>
		<dc:creator>Dave Rathbun</dc:creator>
				<category><![CDATA[Anti-spam]]></category>
		<category><![CDATA[Board Management]]></category>

		<guid isPermaLink="false">http://www.phpbbdoctor.com/blog/?p=304</guid>
		<description><![CDATA[The focus for the past several years for board owners has been to prevent (or at least have some easy way to ignore) spammer registrations. When spammers thought it was useful to have an entry on a board memberlist they were often satisfied with getting through the registration process. They didn&#8217;t bother to activate their [...]]]></description>
			<content:encoded><![CDATA[<p>The focus for the past several years for board owners has been to prevent (or at least have some easy way to ignore) spammer registrations. When spammers thought it was useful to have an entry on a board memberlist they were often satisfied with getting through the registration process. They didn&#8217;t bother to activate their account. As a result, one of the most popular (and fortunately very easy) MODs for discussion boards was to prevent inactive members from showing up on the member list. This is the standard configuration for phpBB3, no MOD required.</p>
<p>Spammers reacted by altering their process so they can activate accounts. (I as well as other board owners have seen a dramatic increase in use of gmail accounts for this, so clearly Google&#8217;s registration process has been cracked and automated as well.) Like many board owners, I would like to have a &#8220;clean&#8221; database. But it wasn&#8217;t a huge imposition to get spammer registrations. If they never posted, they were not a contributing member of my board but at least they weren&#8217;t getting in the way. I had a MOD that prevented board members from entering a web site until they had a minimum number of posts on my board, so at least I didn&#8217;t get a member database sprinkled with unsavory web links. There are also MODs available that prevent zero-post users from showing up, and for pruning inactive or zero-post users after some specific period of time. All of these were okay in their day, but are not as effective anymore.</p>
<p>I&#8217;ve posted many times about my Checkbox Challenge code. It has served very well in protecting my blogs, several phpBB boards, and even my comment forms from spammers. However I am starting to see some issues, and that bothers me. Why? Because the new spam seems to be coming from humans rather than bots. I don&#8217;t know how we can combat that. Spammers seem to be quite creative with their posting strategies as well. <span id="more-304"></span></p>
<h3>Spammer Posting Strategies</h3>
<p>I&#8217;ve seen many different types of spam posts. There are streams of sentences that look like they were copied from a recent news article with random links thrown in for spam. There are more creative folks that put reasonable looking text and then make the punctuation marks links. There are folks that post highly useful text like, &#8220;This post was great, it answered all of my questions.&#8221; and then create a fake signature with spammer links. There are folks that post spam and format it to match the background color of the board style. There are folks that enter a normal spam-free post and come back days (or weeks) later to edit the post to include spam links.</p>
<p>&lt;sigh&gt;</p>
<p>What is a board owner to do?</p>
<p>There is no Internet governing board where we can report this type of activity. It&#8217;s rather pointless (at least most of the time) to track down users by IP address as the spammers are either using proxies or zombie computers, or their in some foreign country that could care less if your small board was defaced by someone using a computer under their jurisdiction.</p>
<h3>Spammer Hammer</h3>
<p>One of the nicer features of phpBB3 is the ability for a moderator to be able to clean up all of the posts from a specific user in one step. The posts can be deleted or moved to a specified forum. (I prefer the move option, as I can preserve the evidence in the cases where I do decide to try to take some sort of action.) There is a separate step to ban the user that often occurs just before (or just after) the posts are removed. I generally would do the ban first to keep the user from further posting, and then do the clean-up work.</p>
<p>I wrote a MOD for my own boards that I called the phpBB Doctor Spammer Hammer. It is unfortunately getting used more because of the human element. The &#8220;hammer&#8221; takes the following steps:</p>
<ul>
<li>Deletes any session records that belong to the user, effectively logging them off.</li>
<li>Marks their account inactive, preventing them from logging back on.</li>
<li>Updates their registration &#8220;activation key&#8221; so that they can&#8217;t request a resend of the activation email. That way they can&#8217;t reactivate their account.</li>
<li>Any topic started by the spammer is moved to a hidden forum. This includes any posts from legitimate users, as most of them are probably just &#8220;ooh, this is spam&#8221; types of responses. Nothing of value there.</li>
<li>Any posts in topics that were not started by the spammer are also moved to a hidden forum. This catches any post replies in existing topics.</li>
</ul>
<p>The Spammer Hammer has several safeguards built in. First, you cannot hammer someone with more than a certain number of posts. If you&#8217;re a spammer, we&#8217;ll figure it out before you reach 200 posts, so anyone above that threshold (just as an example) is immune. Board moderators and administrator accounts cannot be hammered. And a log is made of each hammered account so I know who took the action and when. I started to write an &#8220;undo&#8221; function, but the complexity of the code increased dramatically and in my opinion there should never be a need to undo the action.</p>
<h3>Conclusion</h3>
<p>Conclusion? That&#8217;s optimistic, I guess. <img src='http://www.phpbbdoctor.com/blog/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  The story is far from concluded. As the spammers continue to get more creative the escalation will continue. As a board owner I do my best to keep spammers from getting in. If (when) they get in, I have systems in place to clean them up quickly and easily and (most importantly) completely. That&#8217;s about the best I can do at this point.</p>
<p>I started to include some statistics on spammer posts and registrations as a percentage of valuable traffic. But the truth is that with the Checkbox Challenge in place my boards continue to be relatively protected. I get a few spammers at most a month, and I am getting 25-35 new user registrations every day on my most active board. So I decided to skip it for this post. I also thought about calculating the average response time for my moderator team. I would take the date and time for the initial spam post and compare it to the application date/time for the Spammer Hammer and see how long they take. We rarely have spammers that last more than a few hours, and in many cases it&#8217;s minutes. I have a great moderator team. <img src='http://www.phpbbdoctor.com/blog/wp-includes/images/smilies/icon_cool.gif' alt='8-)' class='wp-smiley' /> </p>
]]></content:encoded>
			<wfw:commentRss>http://www.phpbbdoctor.com/blog/2010/04/29/registration-protection-isnt-enough-anymore/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>CAPTCHA Alternatives Part I: Question / Answer</title>
		<link>http://www.phpbbdoctor.com/blog/2009/10/12/captcha-alternatives-part-i-question-answer/</link>
		<comments>http://www.phpbbdoctor.com/blog/2009/10/12/captcha-alternatives-part-i-question-answer/#comments</comments>
		<pubDate>Mon, 12 Oct 2009 21:06:39 +0000</pubDate>
		<dc:creator>Dave Rathbun</dc:creator>
				<category><![CDATA[Anti-spam]]></category>
		<category><![CDATA[phpBB]]></category>

		<guid isPermaLink="false">http://www.phpbbdoctor.com/blog/?p=317</guid>
		<description><![CDATA[I don&#8217;t like most current CAPTCHA techniques. There is nothing that frustrates me more than trying to use a web site and being presented with this:

Yes, that is an actual CAPTCHA image that I was presented with. If anyone can figure out what that one is supposed to be saying, you have better eyes than [...]]]></description>
			<content:encoded><![CDATA[<p>I don&#8217;t like most current CAPTCHA techniques. There is nothing that frustrates me more than trying to use a web site and being presented with this:</p>
<p><img src="/blog/tips/captcha_q_a/why_i_hate_captchas.png" border="0" width="200" height="70" alt="captcha image" title="Weird word in my captcha" /></p>
<p>Yes, that is an actual CAPTCHA image that I was presented with. If anyone can figure out what that one is supposed to be saying, you have better eyes than I do. <span id="more-317"></span></p>
<p>These challenges are designed &#8211; in theory &#8211; to make it harder for automated processes or &#8220;bots&#8221; to use a service by requiring something like human perception or intelligence to solve a test. The full name is <strong>C</strong>ompletely <strong>A</strong>utomated <strong>P</strong>ublic <strong>T</strong>uring test to tell <strong>C</strong>omputers and <strong>H</strong>umans <strong>A</strong>part. What is a Turing Test? Wikipedia <a href="http://en.wikipedia.org/wiki/Turing_test">says</a>:</p>
<blockquote><p>The Turing test is a proposal for a test of a machine&#8217;s ability to demonstrate intelligence. It proceeds as follows: a human judge engages in a natural language conversation with one human and one machine, each of which tries to appear human. All participants are placed in isolated locations. If the judge cannot reliably tell the machine from the human, the machine is said to have passed the test. In order to test the machine&#8217;s intelligence rather than its ability to render words into audio, the conversation is limited to a text-only channel such as a computer keyboard and screen.</p></blockquote>
<p>The general concept is that the test or challenge is designed to weed out computer bots from real humans. The problem is bots are often better at solving problems than humans are, and even if they aren&#8217;t, they have a lot more patience. <img src='http://www.phpbbdoctor.com/blog/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>As a board owner, there is a fine line to walk here. I want my users to be able to register. I don&#8217;t want bots to be able to register. Anything that makes it harder for bots is also likely to make it harder for users. When the scales tip to where the inconvenience to my potential new users outweighs the bot protection then I have a problem. In my opinion, some CAPTCHA techniques tip the scale in that direction, especially some of the more complex image challenges. I&#8217;m going to save talking about image CAPTCHAs for another post and focus on alternate methods. I am going to pick three tests and try to propose how easy they are for humans to solve, and how susceptible I think they are to bots. Those methods are question/answer, picture or &#8220;kitten auth&#8221; method, and my own checkbox challenge.</p>
<h3>Question / Answer</h3>
<p>This technique was introduced during the phpBB2 days and is much easier to manage with phpBB3 since a board owner can set up custom registration fields. The basic premise is this: the board owner sets up a question on the registration page that requires an answer. The answer could be provided in the form of a drop-down list or other input control, or alternatively it could be an open text field that requires the user to enter the answer manually. The question can be related to the primary subject matter for the board or it could be a general knowledge question like what is 2 + 2 or what color is the sky. In any case, the question is supposed to be easily answered by a human and impossible to answer for a bot. Let&#8217;s look at some examples.</p>
<h3>Finite Result Set</h3>
<p>If the question is presented with a set of options, either via a drop down, radio grouping, or some other interface element, it reduces the risk that a human will fail the test. It also improves the success rate for bots. Let me present a simple example. The form below presents a question and a set of options. </p>
<p><iframe src="/blog/tips/captcha_q_a/finite_list.html" width="350" height="60" frameborder="0"><a href="/blog/tips/captcha_q_a/finite_list.html">Click for sample</iframe></p>
<p>As a typical human I should not have any trouble answering the question. I specifically left out &#8220;black&#8221; as a color choice, because some people might consider the sky at night and make that choice. I left out white (confusion with clouds) and some other colors for the same reason. My goal is to get the user to select &#8220;Blue&#8221; as the proper answer to this question. I would guess that 99.9% of humans would be able to pass this test.</p>
<p>Another advantage to this particular question is there&#8217;s no regional or subject-matter bias. No matter where you are on the planet, as long as you can read English you should be able to identify with this question and select the proper answer. </p>
<p>This challenge also fairs well for the visually impaired. A screen-scraper will be able to present this challenge and the user should be able to solve it with the information available. While the number of visually impaired people as a percentage of total users of the Internet is certainly quite small, it&#8217;s nice to consider their needs.</p>
<p>As a board owner I could provide a question that is more specific to my audience. Suppose that my board audience is made up of electrical engineers. I might present them with a series of color codes and ask them to identify the resistor rating. If my board audience is made up of fans of a particular music artist I might ask them to identify the first hit song for that artist. Knitters could get a question about yarn. Car enthusiasts could get a question about engine technologies. The popularity of this solution is partially based on the fact that the question can be as hard or easy as you want. As a result, the number of options are essentially infinite.</p>
<h3>Bots versus Question / Answer Challenge</h3>
<p>So far the question / answer challenge seems to do okay at allowing humans to register. How will it do as a bot preventative?</p>
<p>In my opinion, as it has been presented so far, it has a number of issues. The first issue is that there is a finite list of choices to make. The list has six entries: Red, Orange, Yellow, Green, Blue, and Purple. Most unsophisticated bots will pick the first option so it&#8217;s important that &#8220;Blue&#8221; (the correct answer) is not the first on the list. As I have to allow for a user to read the form incorrectly at least once, I should not block or ban the registration after the first failure. In fact I might allow two or three registration attempts before taking any action. With six answers and assuming a bot is smart enough to make different selections as it goes through the form, there is a 50% chance that a bot can &#8220;learn&#8221; or &#8220;guess&#8221; the right answer on the first series of registration attempts. Since there are only six answers (and the answer set does not change each time) there is a 100% chance that the bot will be able to register if six attempts are allowed.</p>
<p>What about a different interface choice?</p>
<p><iframe src="/blog/tips/captcha_q_a/radio.html" width="575" height="60" frameborder="0"><a href="/blog/tips/captcha_q_a/radio.html">Click for sample</iframe></p>
<p>This doesn&#8217;t solve the problem; it&#8217;s just a different interface choice (radio instead of drop-down control).</p>
<p>How about an open text box?</p>
<p><iframe src="/blog/tips/captcha_q_a/open_entry.html" width="350" height="60" frameborder="0"><a href="/blog/tips/captcha_q_a/open_entry.html">Click for sample</iframe></p>
<p>This interface choice presents an interesting dilemma. On the one hand, a bot can&#8217;t use brute force to solve this challenge. There are no options given so the answer must be determined by some other means. Problem solved, yes?</p>
<p>No. <img src='http://www.phpbbdoctor.com/blog/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  Believe it or not, I have read some articles that suggest bots are sophisticated enough to plug unknown questions into a search engine and get the answer that way. When I plug the question &#8220;What color is the sky&#8221; into Google, the top three results all mention the word &#8220;color&#8221; and the word &#8220;blue&#8221; in close proximity. A reasonably sophisticated bot could figure this out. If this particular technique (question / answer with open input field) were to become widely used on the Internet, I have no doubt that bots would very soon be able to handle this challenge as well as (or perhaps better than) humans.</p>
<p>Earlier I suggested that humans should be able to solve the drop-down challenge nearly 100% of the time, certainly with two attempts. With the open text field that percentage would almost certainly drop. Let me examine that a bit further.</p>
<h3>Validating Input</h3>
<p>Here are some answers that I would anticipate coming into my form if I used the open text box version of the question / answer challenge.</p>
<p><strong>Q: What color is the sky?</strong><br />
blue<br />
Blue<br />
BLUE<br />
Bleu<br />
blu<br />
BLU</p>
<p>&#8230; and so on with the variations. Hm. Do I see a problem here? When left on their own, users are going to provide a wide variety of answers that probably should be allowed but won&#8217;t be under a strict comparison to the expected answer <strong>Blue</strong>. I would expect variations in case, in spelling, and perhaps even answers with extra spaces or entire sentences like &#8220;The sky is blue.&#8221; Once the input becomes open for anything, then anything is what I expect to get. How can I certify these answers (all of which are reasonably correct) and allow the user to register? Ironically if a bot is able to get the answer correct, they will most certainly provide the expected spelling of &#8220;blue&#8221; rather than one of the variations shown above. </p>
<p>Fortunately there is a simple function that I can use to help solve most of these challenges. That function is called <code>soundex()</code> and I will detail it next.</p>
<h3>Introducing The soundex() Function</h3>
<p>Whether I see the word &#8220;blue&#8221; or &#8220;blu&#8221; or even &#8220;bleu&#8221; the sound of the word is the same. That&#8217;s what the soundex() function does; it returns a code that is supposed to designate the sound aspects of the word rather than the literal word. First I will check the soundex() result for the required answer:</p>
<p><code>mysql> select soundex('blue');<br />
+-----------------+<br />
| soundex('blue') |<br />
+-----------------+<br />
| B400            |<br />
+-----------------+</code></p>
<p>Next I will check the results for some of the variations shown.</p>
<p><code>mysql> select soundex('bleu');<br />
+-----------------+<br />
| soundex('bleu') |<br />
+-----------------+<br />
| B400            |<br />
+-----------------+</code></p>
<p><code>mysql> select soundex('blueu');<br />
+------------------+<br />
| soundex('blueu') |<br />
+------------------+<br />
| B400             |<br />
+------------------+</code></p>
<p><code>mysql> select soundex('blu e');<br />
+------------------+<br />
| soundex('blu e') |<br />
+------------------+<br />
| B400             |<br />
+------------------+</code></p>
<p><code>mysql> select soundex('black');<br />
+------------------+<br />
| soundex('black') |<br />
+------------------+<br />
| B420             |<br />
+------------------+</code></p>
<p>Notice in every case except for the obviously wrong answer &#8220;black&#8221; I get the code B400. I won&#8217;t go into details of how this code is derived (there is a Wiki link at the end of the post if you want those details). I will make the observation that all of the spellings &#8211; both correct and &#8220;close enough&#8221; to correct &#8211; return the same code.</p>
<p>Let me review how I got to this point. A question / answer challenge with a finite list of possible answers is susceptible to brute-force solving by bots or humans. The same challenge with an open text box for the answer is much harder for bots to solve, but it also reduces the success rate for humans due to input variations. I am proposing that the soundex() function could be used to reduce the number of humans rejected because of minor spelling variations, and I do believe it would work.</p>
<p>It also helps the bots. <img src='http://www.phpbbdoctor.com/blog/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>If I am expecting a single word answer and the user enters &#8220;The sky is blue&#8221; instead, I still have options. I can programmatically split up the phrase into component words and then apply the soundex() function to each one. As long as the expect word &#8220;blue&#8221; is in the phrase, I can decide to let the registration attempt succeed.</p>
<h3>Other Styles of Questions</h3>
<p>I have seen people propose that simple math problems are a good question. There is only one answer, right? Well, it depends.</p>
<p><strong>Q: What is 2 + 2?</strong><br />
4<br />
four<br />
for<br />
22</p>
<p>All of these are potential answers. <img src='http://www.phpbbdoctor.com/blog/wp-includes/images/smilies/icon_lol.gif' alt=':lol:' class='wp-smiley' />  And this doesn&#8217;t help against bots; try plugging 2+2= into the Google search form and see what you get.</p>
<p>I have seen people suggest that the question be embedded within an image like this:</p>
<p><img src="/blog/tips/captcha_q_a/question.jpg" border="2" alt="question image" width="241" height="43" /></p>
<p>This doesn&#8217;t really help either. Bots have already demonstrated a high degree of success against CAPTCHA images so putting our question into an image rather than text doesn&#8217;t really buy much. It also reduces or eliminates the ability of a visually impaired user to solve the challenge.</p>
<h3>Summary</h3>
<p>There are several benefits and issues with the question / answer style of CAPTCHA.</p>
<p>Advantages:</p>
<ol>
<li>With a finite list it is very easy for the user to interact with</li>
<li>With a finite list it is very easy to validate</li>
<li>The question can be tailored to the board audience</li>
<li>The question and related answers can be maintained via a simple administrative page</li>
<li>The technique does not penalize visually impared users</li>
</ol>
<p>Disadvantages:</p>
<ol>
<li>With a finite list this technique is suceptable to brute-force attacks</li>
<li>Sophisticated bots might use search engines to solve the answer</li>
<li>Use of a text field instead of a list control provides more protection from bots but requires more complex code and impacts the user experience</li>
</ol>
<p>The number of advantages does outweigh the disadvantages. There are quite a few fans of this technique. There are several MODs for phpBB2 that provide this feature, and phpBB3 essentially has it built-in with the custom registration fields option. I consider this to be a preferable option to the image CAPTCHA techniques that are much more prevalent today.</p>
<p>Next time I want to talk about the &#8220;kitten auth&#8221; technique. I hope to have that post ready soon but have been fairly busy in real life lately so please be patient if it takes a bit longer.</p>
<p><strong>Related Links</strong></p>
<ul>
<li><a href="http://en.wikipedia.org/wiki/Soundex">Wiki on the soundex() function</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.phpbbdoctor.com/blog/2009/10/12/captcha-alternatives-part-i-question-answer/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Is It Worth Adding Extra Activation Steps For gmail.com Accounts?</title>
		<link>http://www.phpbbdoctor.com/blog/2009/09/17/is-it-worth-adding-extra-activation-steps-for-gmail-com-accounts/</link>
		<comments>http://www.phpbbdoctor.com/blog/2009/09/17/is-it-worth-adding-extra-activation-steps-for-gmail-com-accounts/#comments</comments>
		<pubDate>Thu, 17 Sep 2009 06:07:51 +0000</pubDate>
		<dc:creator>Dave Rathbun</dc:creator>
				<category><![CDATA[Anti-spam]]></category>
		<category><![CDATA[phpBB]]></category>

		<guid isPermaLink="false">http://www.phpbbdoctor.com/blog/?p=331</guid>
		<description><![CDATA[After just cleaning up yet another gmail spammer (I so love the Spammer Hammer™ MOD, is one of my favorites   ) tonight I found myself wondering: Is it worth setting up an extra activation step for gmail.com accounts? 
I have mentioned more than once my frustration with the amount of spam / spam [...]]]></description>
			<content:encoded><![CDATA[<p>After just cleaning up yet another gmail spammer (I so love the <a href="http://www.phpbbdoctor.com/blog/2007/02/19/introducing-the-phpbbdoctor-spammer-hammer/">Spammer Hammer™ MOD</a>, is one of my favorites <img src='http://www.phpbbdoctor.com/blog/wp-includes/images/smilies/icon_twisted.gif' alt=':twisted:' class='wp-smiley' />  ) tonight I found myself wondering: Is it worth setting up an extra activation step for gmail.com accounts? <span id="more-331"></span></p>
<p>I have mentioned more than once my frustration with the amount of spam / spam attempts coming from gmail.com email addresses. It&#8217;s clear that I can&#8217;t ban them (even if I wanted to) because of the number of legitimate users. But what if I made them go through an extra step? Would it help?</p>
<p>I did not spend much time thinking through this yet. That means there are likely to be holes or room for improvement. But suppose the registration process flow chart went something like this:</p>
<p><img src="/blog/tips/gmail_challenge/flowchart.jpg" width="502" height="496" border="0" alt="gmail flowchart" title="extra activation step for gmail.com registrations" /></p>
<p>The &#8220;new steps&#8221; could be anything. I&#8217;m not convinced that it&#8217;s worth the effort. Most of the spammers that get through lately are clearly humans rather than bots. Anything a normal human can pass, a spammer human can also pass. The Checkbox Challenge has remained relatively effective against bots.</p>
<p>Oh well. It&#8217;s probably a dumb idea.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.phpbbdoctor.com/blog/2009/09/17/is-it-worth-adding-extra-activation-steps-for-gmail-com-accounts/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Honey Pot Board Update</title>
		<link>http://www.phpbbdoctor.com/blog/2009/09/08/honey-pot-board-update/</link>
		<comments>http://www.phpbbdoctor.com/blog/2009/09/08/honey-pot-board-update/#comments</comments>
		<pubDate>Tue, 08 Sep 2009 11:27:39 +0000</pubDate>
		<dc:creator>Dave Rathbun</dc:creator>
				<category><![CDATA[Anti-spam]]></category>
		<category><![CDATA[phpBB]]></category>

		<guid isPermaLink="false">http://www.phpbbdoctor.com/blog/?p=328</guid>
		<description><![CDATA[It has been a while since I visited my honeypot board. I decided to have a look today&#8230;  
Our users have posted a total of 385789 articles
We have 43968 registered users
And when I logged in, I had 33 unread PMs as well.
Bots have been busy.   I intend to go back and see [...]]]></description>
			<content:encoded><![CDATA[<p>It has been a while since I visited my honeypot board. I decided to have a look today&#8230; <img src='http://www.phpbbdoctor.com/blog/wp-includes/images/smilies/icon_eek.gif' alt=':shock:' class='wp-smiley' /> </p>
<blockquote><p>Our users have posted a total of 385789 articles<br />
We have 43968 registered users</p></blockquote>
<p>And when I logged in, I had 33 unread PMs as well.</p>
<p>Bots have been busy. <img src='http://www.phpbbdoctor.com/blog/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  I intend to go back and see what additional patterns I can get from the data. In light of one of my recent posts about <a href="http://www.phpbbdoctor.com/blog/2009/07/22/just-how-bad-is-the-gmailcom-problem/">gmail being the most abused email domain</a>, here are some stats that speak for themselves. These are the top ten email domains in use on my honey pot board:</p>
<pre>+-----------------+----------+
| email_domain    |  users   |
+-----------------+----------+
| gmail.com       |    11323 |
| mail.ru         |     6034 |
| meltmail.com    |     1179 |
| gawab.com       |      859 |
| getciallis.info |      855 |
| spambox.us      |      479 |
| serpdomains.com |      449 |
| atlantaclubs.cn |      282 |
| coolgwen.cn     |      274 |
| coolsanta.cn    |      255 |
+-----------------+----------+</pre>
<p><span id="more-328"></span> </p>
<p>It&#8217;s not just users either, here are the email address domains associated with posts since January 1 of 2009 on my honey pot board:</p>
<pre>+--------------+-------------+
| email_domain | total_posts |
+--------------+-------------+
| gmail.com    |      169065 |
| other        |      161505 |
+--------------+-------------+</pre>
<p>There was one particular user that was responsible for over 40,000 posts. (Yes, they used a gmail account.) Here is their posting history.</p>
<pre>+------------+-------------+
| post_month | total_posts |
+------------+-------------+
| 2009-03    |        1042 |
| 2009-04    |        5770 |
| 2009-05    |        8544 |
| 2009-06    |        7201 |
| 2009-07    |        8009 |
| 2009-08    |        9747 |
| 2009-09    |        1778 |
+------------+-------------+</pre>
<p>That user has posted from 840 different IP addresses. All of them are reportedly assigned to:</p>
<pre>person:          Remiga Alexander
address:         JSC UKRTELECOM
address:         18, Shevchenko blvd
address:         Ukraine, Kiev
phone:           +380 (44) 230-9024</pre>
<p>That&#8217;s all I have for now. I want to do more time analysis as well as go back and see if <a href="http://www.phpbbdoctor.com/blog/2008/12/13/flood-interval-as-anti-spam-measure/">altering the flood control had any impact on posting frequency</a>. At first glance it doesn&#8217;t seem like it.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.phpbbdoctor.com/blog/2009/09/08/honey-pot-board-update/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Just How Bad Is The gmail.com Problem?</title>
		<link>http://www.phpbbdoctor.com/blog/2009/07/22/just-how-bad-is-the-gmailcom-problem/</link>
		<comments>http://www.phpbbdoctor.com/blog/2009/07/22/just-how-bad-is-the-gmailcom-problem/#comments</comments>
		<pubDate>Wed, 22 Jul 2009 13:11:12 +0000</pubDate>
		<dc:creator>Dave Rathbun</dc:creator>
				<category><![CDATA[Anti-spam]]></category>
		<category><![CDATA[Board Management]]></category>
		<category><![CDATA[phpBB]]></category>

		<guid isPermaLink="false">http://www.phpbbdoctor.com/blog/?p=315</guid>
		<description><![CDATA[Not too long ago I participated in a topic at phpbb.com where the author was asking about blocking gmail email addresses. The general consensus from the community was that the board owner should not block gmail but instead rely on some other methods for blocking spammers. I don&#8217;t block gmail, but sometimes I would like [...]]]></description>
			<content:encoded><![CDATA[<p>Not too long ago I participated in a topic at phpbb.com where the <a href="http://www.phpbb.com/community/viewtopic.php?f=64&#038;t=1662765">author was asking about blocking gmail email addresses</a>. The general consensus from the community was that the board owner should not block gmail but instead rely on some other methods for blocking spammers. I don&#8217;t block gmail, but sometimes I would like to. In <a href="http://www.phpbb.com/community/viewtopic.php?p=10035705#p10035705">this post</a> I think I summarized it best, saying:</p>
<blockquote><p>hotmail, yahoo, gmail&#8230; any free email account is subject to abuse. Spammers are using the fact that board owners are, as you are, reluctant to ban gmail outright because it does have so many legitimate users.</p></blockquote>
<p>Having said that, I decided it was time to go back and work through some numbers. Instead of guessing how bad the problem is, I wanted to get actual statistics to back up my claims. Anyone can say anything they want. <img src='http://www.phpbbdoctor.com/blog/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  Having numbers makes the claims more substantial. And graphs. Pictures are always good. The data used for this post is available as an Excel file for anyone to download and review (link at the end of the post). Here&#8217;s the summary:</p>
<p>Google: Your gmail system is borked. Fix it or risk it becoming irrelevant. <span id="more-315"></span></p>
<h3>Logging Registration Attempts</h3>
<p>I have <a href="http://www.phpbbdoctor.com/blog/index.php?s=checkbox+challenge">written more than a few posts</a> about my simple Checkbox Challenge MOD. I use it for board registrations as well as comment forms. For this post I am going to concentrate only on registration attempts at my largest phpBB board. I will use registration attempts from January of 2008 through June of 2009 (eighteen months). </p>
<p>For the first step, I ran some preliminary queries to identify the top five domains used. There are plenty of obvious spammer domains out there but that isn&#8217;t the point of this post. I know that <code>mail.ru</code> and <code>gawab.com</code> are the source of a lot of spam already. I also can recognize that domains like <code>nastyteengirl.info</code> and <code>onlineovernightpharmacy.com</code> are probably not legitimate. The point I want to drive home is how bad things are for mainstream domains, and for <code>gmail.com</code> specifically. In order to do that I want to focus only on the domains that are the source of higher volumes of registration attempts.</p>
<p>The top five domains and the total registration attempts are shown here.</p>
<pre>Domain          Total Attempts   % of Total
gmail.com                12909          61%
yahoo.com                 2968          14%
mail.ru                   2704          13%
hotmail.com               1606           8%
aol.com                    843           4%
</pre>
<p>Notice that gmail is not only number one; it is in that position by a really large margin. No other email domain comes even close. My first piece of evidence clearly shows that gmail is a popular domain. It is so popular that if I were to consider banning or blocking it, I might lose 61% of my new members. But wait, is that really true? How many of those registration attempts were successful, and how many were blocked as bots?</p>
<h3>Checkbox Challenge Data Collection Process</h3>
<p>My Checkbox Challenge code presents a user with a standard registration form as well as a series of checkboxes. The user is instructed to click on only the marked checkbox in order to prove they are human. The development is well documented in other posts on my blog, so I won&#8217;t go into great detail here. Suffice it to say that bots seem to either ignore all of the checkboxes because they don&#8217;t expect them to be there, or they attempt to be smart and mark <strong>all</strong> of the checkboxes since they know they&#8217;re on the form. There are some humans that have issues with the system and might take multiple attempts to get through the screen but those situations are not very common, and for the sake of this post I will assume they don&#8217;t exist. Every attempt is logged, and it is that table that I am using for source material for this block post.</p>
<p>I listed the top five domains above. For the rest of this post I am going to drop mail.ru because most board owners know it&#8217;s a standard domain used by spammers. I am also going to drop aol.com because at 4% of the total registrations it&#8217;s not that relevant. That leaves me with three remaining domains to focus on: <code>gmail.com</code>, <code>yahoo.com</code>, and <code>hotmail.com</code>. (If you&#8217;re wondering who is in position six, it was <code>gawab.com</code>, which is another notorious spammer domain.)</p>
<h3>Who&#8217;s Your Bot?</h3>
<p>Any registration attempt is a potential board member. The concept behind most any anti-spam measure is to allow real people through and block bots. I have already established that gmail is by far the number one source of registration attempts. The next step is to evaluate how many of those attempts are desirable new users, and how many are bots. To do that, I retrieved the last 18 full months of data and determined the percentage of successful versus failed registrations. Here are those numbers for the three domains I have decided to focus on for this post.</p>
<pre>Total           Success Failed  % Success
gmail.com          5644   7265      43.7%
yahoo.com          2372    596      79.9%
hotmail.com        1384    222      86.2%</pre>
<p>Now we start to see the real problem. Both yahoo and hotmail have approximately eighty percent success rates. That means that eight out of ten registration attempts from those domains are expected to be legitimate and valuable users. With gmail over half of the registration attempts fail and therefore are presumed to be bots. Not only is gmail the number one source for registration attempts, it is the worst source in terms of the human to bot ratio.</p>
<h3>Is Google Doing Anything To Help?</h3>
<p>Given that these numbers start in January of 2008, the next question I want to answer is whether the problem is getting better or worse. I have to believe that Google is aware of the issues that they&#8217;re facing. Are they doing anything to help?</p>
<p>Here are the gmail numbers broken down by month.</p>
<pre>Log Month        Domain         Success Fail
 2008-01        gmail.com           297   79
 2008-02        gmail.com           260   42
 2008-03        gmail.com           320   94
 2008-04        gmail.com           293  107
 2008-05        gmail.com           290   65
 2008-06        gmail.com           286  139
 2008-07        gmail.com           395  147
 2008-08        gmail.com           346  380
 2008-09        gmail.com           316  398
 2008-10        gmail.com           283  561
 2008-11        gmail.com           316  367
 2008-12        gmail.com           254  484
 2009-01        gmail.com           291  898
 2009-02        gmail.com           343  510
 2009-03        gmail.com           346  808
 2009-04        gmail.com           330  981
 2009-05        gmail.com           291  614
 2009-06        gmail.com           387  591</pre>
<p>Here are a few things that I find interesting about these numbers. First, for the past 18 months I have averaged 313 new members (successful registrations) from gmail. That number is remarkably consistent, as shown by this graph. The blue line shows the raw data, and the orange line shows the trend.</p>
<p><img src="/blog/wp-content/uploads/2009/07/success.jpg" width="417" height="335" border="0" alt="trend graph for successful registrations" title="Trend for successful gmail registrations" /></p>
<p>Here is the graph for failed registration attempts from gmail.</p>
<p><img src="/blog/wp-content/uploads/2009/07/failed.jpg" width="418" height="336" border="0" alt="trend graph for failed registrations" title="Trend for failed gmail registrations" /></p>
<p>In this case the red line represents the data and the black line is the trend. The trend is not my friend in this case. <img src='http://www.phpbbdoctor.com/blog/wp-includes/images/smilies/icon_eek.gif' alt=':shock:' class='wp-smiley' />  <strong>Pay careful attention to the scale of those two graphs.</strong> While they are presented as the same size (approximately 400 pixels square) the top graph (successes) has a maximum scale of 450 while the bottom graph (failures) goes all the way up to 1200. Here&#8217;s a combined graph without trend lines that will help drive that point home.</p>
<p><img src="/blog/wp-content/uploads/2009/07/total_number.jpg" width="417" height="383" border="0" alt="graph for all registration attempts" title="Last 18 months of gmail.com registration attempts" /></p>
<p>The data does not look good for Google. Sometime back in 2008 (it looks like August for me) the number of valid registrations and bot registrations were about the same. Prior to that date, bot registrations were in the minority. After that date the bot usage of <code>gmail.com</code> has clearly soared. In February of 2009 (2009-02 on the graph) there was a dip in bot usage, at least on my board. Was it a result of something Google did? If it was, it clearly was not very successful in the longer term as bot usage popped right back up in the following months.</p>
<p>Here&#8217;s another chart that shows the value of gmail to me as a board owner. This is a percentage column chart so it ignores the overall numbers and instead presents the data as percentages. </p>
<p><img src="/blog/wp-content/uploads/2009/07/percent_failed.jpg" width="418" height="384" border="0" alt="percentage graph for gmail.com registration attempts" title="Failure percent for last 18 months of gmail.com registration attempts" /></p>
<p>Just how significant is this? Back at the beginning of this post I noted that for the past 18 months the average success rate for a registration attempt from a <code>gmail.com</code> email address was 43.7%. If I recalculate the value for the past six months it drops to 31.1%. That&#8217;s not good. Is it fair to pick on Google? During the same time that the success ratio for gmail has dropped from 43.7% to 31.3% (a difference of 12.6%) yahoo has dropped 2.4% and hotmail has dropped 3.1%. In other words, all of the top three domains have seen the ratio of legitimate registrations to bots drop, but the ratio for gmail has dropped four times as much as the other two.</p>
<h3>What Can I Do About gmail.com?</h3>
<p>New board members are important. Without new members a community will start to get stagnant, and a stagnant community typically doesn&#8217;t thrive. As I mentioned earlier, I get an average of over 300 new members a month from <code>gmail.com</code> alone. For the past 18 months I have averaged 751 new members each month, and 314 or 42% of those are from <code>gmail.com</code> email addresses. If I were to consider banning <code>gmail.com</code> that&#8217;s a large chunk of my community that would disappear. I don&#8217;t think that&#8217;s a realistic action to take. </p>
<h3>What Should I Do About gmail.com?</h3>
<p>I think that Google should be held responsible. I can take individual steps that impact my board&#8230; Google can (and should) take steps that will protect everyone on the Internet. Am I overstating the problem? I really don&#8217;t think so. All of the numbers I have used for this post came from registration attempts on my largest (and most active) phpBB board. Here are some other numbers to chew on. All of these have been filtered to show only log entries with <code>gmail.com</code> email addresses.</p>
<p><strong>Site Comment Form</strong><br />
Total attempts: 10,441<br />
Total rejected: 10,381<br />
Bot percent: 99.4%</p>
<p><strong>Another phpBB Board</strong><br />
Total attempts: 2,767<br />
Total rejected: 2,723<br />
Bot percent: 98.4%</p>
<p><strong>Still Another phpBB Board</strong><br />
Total attempts: 1,859<br />
Total rejected: 1,843<br />
Bot percent: 99.1%</p>
<p>What conclusion do I draw from these numbers? I submit that <strong>the problem is even worse that it appears</strong> based on the details I provided in this post! The numbers I used come from an extremely active board. Registration bots don&#8217;t pay too much attention to how many legitimate users are already registered on a board. The only goal of a bot is to find a board and register. For a smaller board this means the problem is even worse. My big board didn&#8217;t start out big. In the early days we got about 10-20 new registrations each month. Today I get more than that in one day. Because I get so many new legitimate users, it can actually mask just how bad the gmail problem really is. If you are a smaller board owner, having thousands of bogus gmail registrations can be extremely frustrating. If I didn&#8217;t have something in place that was &#8211; at least for now &#8211; somewhat effective in blocking these bogus attempts, I would very seriously have to consider blocking gmail accounts.</p>
<p>The problem is not new. While researching to see if I was the only one impacted by this (of course I am not) I found a post that shows how bots break the gmail CAPTCHA, and the post was from February of 2008. As we have long discussed on phpbb.com there are also services that will put real people to work breaking confirmation codes. I linked a few articles at the end of this post, and most of them are over a year old. The situation hasn&#8217;t improved since then either. If anything it has become much worse.</p>
<p>Google, are you listening? It&#8217;s time to fix this.</p>
<p><strong>Related Links</strong></p>
<ul>
<li><a href="/blog/wp-content/uploads/2009/07/gmail_data.xls">Raw Data</a> used in this post in Microsoft Excel format</li>
<li><a href="http://securitylabs.websense.com/content/Blogs/2919.aspx">Breaking Google&#8217;s CAPTCHA</a></li>
<li><a href="http://bits.blogs.nytimes.com/2008/03/13/breaking-google-captchas-for-3-a-day/">Breaking Google CAPTCHAs for $3 a Day</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.phpbbdoctor.com/blog/2009/07/22/just-how-bad-is-the-gmailcom-problem/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>Twitter Spam</title>
		<link>http://www.phpbbdoctor.com/blog/2009/06/23/twitter-spam/</link>
		<comments>http://www.phpbbdoctor.com/blog/2009/06/23/twitter-spam/#comments</comments>
		<pubDate>Tue, 23 Jun 2009 13:27:10 +0000</pubDate>
		<dc:creator>Dave Rathbun</dc:creator>
				<category><![CDATA[Anti-spam]]></category>

		<guid isPermaLink="false">http://www.phpbbdoctor.com/blog/?p=311</guid>
		<description><![CDATA[Anyone want to bet how long it takes the automated posting bots to infect twitter?
]]></description>
			<content:encoded><![CDATA[<p>Anyone want to bet how long it takes the automated posting bots to infect twitter?</p>
]]></content:encoded>
			<wfw:commentRss>http://www.phpbbdoctor.com/blog/2009/06/23/twitter-spam/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Personal Spammers</title>
		<link>http://www.phpbbdoctor.com/blog/2009/04/15/personal-spammers/</link>
		<comments>http://www.phpbbdoctor.com/blog/2009/04/15/personal-spammers/#comments</comments>
		<pubDate>Wed, 15 Apr 2009 13:25:17 +0000</pubDate>
		<dc:creator>Dave Rathbun</dc:creator>
				<category><![CDATA[Anti-spam]]></category>

		<guid isPermaLink="false">http://www.phpbbdoctor.com/blog/?p=307</guid>
		<description><![CDATA[Will the battle never end?
Apparently not.
I have seen a new style of spam coming in on another blog that I have. Based on past experience, I normally expect the spam to include links to various sites that I have no interest in. These sites will normally promote things like products I don&#8217;t want (or need).
Lately, [...]]]></description>
			<content:encoded><![CDATA[<p>Will the battle never end?</p>
<p>Apparently not.</p>
<p>I have seen a new style of spam coming in on another blog that I have. Based on past experience, I normally expect the spam to include links to various sites that I have no interest in. These sites will normally promote things like products I don&#8217;t want (or need).</p>
<p>Lately, however, I have been getting spam comments that include links to &#8220;linked in&#8221; or other social networking sites. What&#8217;s the point of that? &lt;sigh&gt; The comments include anything along these lines (these are actual spam comments)</p>
<blockquote><p>After reading through the article, I just feel that I really need more information on the topic. Can you suggest some resources ?</p></blockquote>
<blockquote><p>The style of writing is quite familiar . Have you written guest posts for other bloggers?</p></blockquote>
<blockquote><p>The topic is quite hot in the net right now. What do you pay the most attention to while choosing what to write about?</p></blockquote>
<blockquote><p>My friend on Facebook shared this link with me and Iâ€™m not dissapointed that I came here.</p></blockquote>
<p>&#8230; and many more like this. The good news is that the comments were held in the moderation queue. The bad news is that these comments were all made on a blog that is protected by the checkbox challenge code that I use here. I have plans to go out and analyze the server logs to see if the comments were made by a human or a bot, based on time spent on the various pages.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.phpbbdoctor.com/blog/2009/04/15/personal-spammers/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Flood Interval as Anti-Spam Measure</title>
		<link>http://www.phpbbdoctor.com/blog/2008/12/13/flood-interval-as-anti-spam-measure/</link>
		<comments>http://www.phpbbdoctor.com/blog/2008/12/13/flood-interval-as-anti-spam-measure/#comments</comments>
		<pubDate>Sat, 13 Dec 2008 16:49:00 +0000</pubDate>
		<dc:creator>Dave Rathbun</dc:creator>
				<category><![CDATA[Anti-spam]]></category>
		<category><![CDATA[phpBB]]></category>

		<guid isPermaLink="false">http://www.phpbbdoctor.com/blog/?p=295</guid>
		<description><![CDATA[A few weeks ago I posted about increasing the flood interval on my honey pot board. My theory was that since bots seem to have a fairly regular posting process I could cut down on the number of spam posts simply by changing the flood interval.
It didn&#8217;t seem to work.
I checked the post stats a [...]]]></description>
			<content:encoded><![CDATA[<p>A few weeks ago I posted about <a href="http://www.phpbbdoctor.com/blog/2008/11/25/spammer-evolution/">increasing the flood interval</a> on my honey pot board. My theory was that since bots seem to have a fairly regular posting process I could cut down on the number of spam posts simply by changing the flood interval.</p>
<p>It didn&#8217;t seem to work.</p>
<p><span id="more-295"></span>I checked the post stats a few minutes ago, and while the posting did drop on the days around where I first changed the flood interval, it has also dropped like that previously. So I can&#8217;t determine whether this was a natural lull in bot activity or as a result of the flood interval change. I realize today that I should have added some code that tracked how many times the flood interval warning was issued, and I did not do that, so I really don&#8217;t have any way to analyze my data or justify my conclusions. I&#8217;m a little bit disappointed in myself for not thinking about that until now.</p>
<p>Here is a chart of the posting activity. I have marked the point where I changed the flood interval. As you can probably see, it&#8217;s not really possible to draw any conclusions about the effectiveness of this technique based on the data I have collected.</p>
<p><img src="/blog/images/after_flood.jpg" width="511" height="433" alt="posting chart" title="Posting frequency on phpBB2 honey pot board" border="0" /></p>
<p>To be very clear, I don&#8217;t think that increasing the flood control time limit is a valid anti-spam measure anyway. It does exactly what you don&#8217;t want to do as a board owner&#8230; it makes your valuable real users alter their own behavior. You want to make your board experience a good one, otherwise people will find somewhere else to go.</p>
<p><em>At the time of this post there are 5 registered users and 7 guest users online, there are 45,556 total posts and 7,670 total users registered on my honey pot.</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.phpbbdoctor.com/blog/2008/12/13/flood-interval-as-anti-spam-measure/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Spammer Evolution</title>
		<link>http://www.phpbbdoctor.com/blog/2008/11/25/spammer-evolution/</link>
		<comments>http://www.phpbbdoctor.com/blog/2008/11/25/spammer-evolution/#comments</comments>
		<pubDate>Tue, 25 Nov 2008 19:19:03 +0000</pubDate>
		<dc:creator>Dave Rathbun</dc:creator>
				<category><![CDATA[Anti-spam]]></category>
		<category><![CDATA[blog]]></category>

		<guid isPermaLink="false">http://www.phpbbdoctor.com/blog/?p=289</guid>
		<description><![CDATA[Today I decided to check in on my &#8220;honey pot&#8221; board that I have running. I haven&#8217;t been there in a week or so but things were still humming along last time I looked. This time when I logged in I got a warning from my pop-up blocker. My initial reaction? I&#8217;ve been hacked.  [...]]]></description>
			<content:encoded><![CDATA[<p>Today I decided to check in on my &#8220;honey pot&#8221; board that I have running. I haven&#8217;t been there in a week or so but things were still humming along last time I looked. This time when I logged in I got a warning from my pop-up blocker. My initial reaction? I&#8217;ve been hacked. <img src='http://www.phpbbdoctor.com/blog/wp-includes/images/smilies/icon_eek.gif' alt=':shock:' class='wp-smiley' /> </p>
<h3>PM Spammers</h3>
<p>It turned out that the real answer was much more benign&#8230; it was the notification of new private messages popping up. <span id="more-289"></span> Normally I deactivate (or remove) the PM system from my boards, but since this is supposed to be a standard phpBB2 install I left it in place. The PM spamming started on October 10<sup>th</sup> it seems. However, the initial attempts did not include the board administrator account. After the initial success there was another round on November 16<sup>th</sup>, the 19<sup>th</sup>, and again several days later. Altogether I have 329 spam PMs on the board now.</p>
<p>The PMs are from four different users from six different IP addresses. I checked and there are really only four locations associated with the IP address information: Moscow, The Ukraine, The Netherlands, and another hotbed of spammer activity, the state of Illinois. Someone with a Comcast high-speed internet connection is a <a href="http://www.phpbbdoctor.com/blog/2008/10/31/spammer-techniques-are-you-a-zombie-part-ii/">zombie</a>, it seems. I left the PM system enabled for now, just to see how far this goes.</p>
<h3>Flood Interval Update</h3>
<p>The real reason I logged in to the board today was I changed the posting flood interval. If you&#8217;re not familiar with it, the &#8220;flood&#8221; is a time limit for consecutive posts from a single user. It is designed to prevent a user from overwhelming your board with frequent posts. The default setting is fifteen seconds. Based on my analysis, bots seem to be programmed to run every 30-45 seconds. So I set the flood interval to 60 seconds earlier today.</p>
<p>It will be interesting to see how the bots react.</p>
<h3>Checkbox Challenge Update</h3>
<p>In other news&#8230;</p>
<p>I noticed on one of my regular (but fairly dormant) boards that there was a user named &#8220;vitamary&#8221; registered recently. I saw here on the phpBB Doctor blog (which is linked on the other site I mentioned) several spam comments caught by Akismet from a user vmary82@gmail.com. Both the board and this blog have a variation of the Checkbox Challenge in place, and both have been victims of VitaMary.</p>
<p>I also saw another interesting blog comment that was <strong>not</strong> caught by Akismet but was in my approval queue. The complete context of the post was the single word &#8220;test&#8221; and the email address related to the comment was gmail. I <a href="http://www.phpbbdoctor.com/blog/2008/11/01/spammers-going-google/">posted recently</a> about the abuses coming from gmail, so the two of these items combined made me just a bit suspicious. I looked up the IP address associated with the comment&#8230; and it was from Russia.</p>
<p>To be brutally honest here, I started to post and release the Checkbox Challenge MOD at phpbb.com but stopped. Why? Because I was being selfish. <img src='http://www.phpbbdoctor.com/blog/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  I wanted to keep the technique all to myself. If the technique became popular enough to attract the attention of the bot writers, then I would have to do something different. Now I believe I have at least some preliminary indications that someone, somewhere, has taken an interest in my little bits of code and is trying to make their bot just a bit smarter.</p>
<p>Oh, well, I can always fall back to a suggestion from the web comic at xkcd.com:</p>
<p><img src="http://imgs.xkcd.com/comics/a_new_captcha_approach.png" alt="xkcd comic" title="xkcd CAPTCHA suggestion" /></p>
]]></content:encoded>
			<wfw:commentRss>http://www.phpbbdoctor.com/blog/2008/11/25/spammer-evolution/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>First Spam In &#8230; Over a Year</title>
		<link>http://www.phpbbdoctor.com/blog/2008/11/18/first-spam-in-over-a-year/</link>
		<comments>http://www.phpbbdoctor.com/blog/2008/11/18/first-spam-in-over-a-year/#comments</comments>
		<pubDate>Wed, 19 Nov 2008 04:08:23 +0000</pubDate>
		<dc:creator>Dave Rathbun</dc:creator>
				<category><![CDATA[Anti-spam]]></category>
		<category><![CDATA[blog]]></category>

		<guid isPermaLink="false">http://www.phpbbdoctor.com/blog/?p=288</guid>
		<description><![CDATA[Today I got my first spam that successfully navigated the Checkbox Challenge. It was caught by Akismet, which shows the power of a layered defense. On phpBB2 boards we have seen an increase in manual spam. Manual spam is really hard to defeat because it&#8217;s done by humans. On the other hand, it&#8217;s more expensive [...]]]></description>
			<content:encoded><![CDATA[<p>Today I got my first spam that successfully navigated the Checkbox Challenge. It was caught by Akismet, which shows the power of a layered defense. On phpBB2 boards we have seen an increase in manual spam. Manual spam is really hard to defeat because it&#8217;s done by humans. On the other hand, it&#8217;s more expensive for the spammers too. I will be watching this closely to see how things trend over the next few months.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.phpbbdoctor.com/blog/2008/11/18/first-spam-in-over-a-year/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
	</channel>
</rss>
