If you have ever watched poker (or other games that involve bluffing) then you might have heard people talk about “tells” from other players. A “tell” is simply something that the person does – perhaps without even being aware of it – that gives away certain information. Spammers do the same thing. If I can find their tells then I can use that information against them, just like I could use that information to my advantage in a poker game.
Here are some “tells” that I have identified after analyzing my phpBB2 honey pot board with one month of spammer data.
A while back someone posted a MOD at phpbb.com that banned anyone that registered with the time zone of GMT – 12. If you check, you’ll find that GMT – 12 is in the middle of the ocean. Which reminds me of an old joke which I will paraphrase here:
Question: What do you call 10,000 spammers at the bottom of the ocean?
Answer: A good start!
Okay, maybe that one was just for me… on with the program…
So here are the statistics from my honey pot board for all users other than the original admin (that would be me) and the Anonymous user:
+---------------+----------+ | user_timezone | count(*) | +---------------+----------+ | -12.00 | 463 | +---------------+----------+
Hm. That looks like a fairly significant tell to me. Every single spammer registered with the same time zone. Why do you suppose that is happening? Is it because that’s the default time zone? In fact, it’s not. On my honey pot board I set the board timezone to GMT – 5 which becomes the default for new user registrations. That means that spammer bots are specifically changing the time zone from -5 to -12 during their registration process. The only thing significant about -12 is that it’s the first option on the drop-down list. It would seem that the registration bots are making sure they select something, and in this case it’s something that nobody should really be selecting.
Is this a bullet-proof tell? It’s hard to know for sure, but the odds seem favorable.
What about the user location field, are there any patterns there? Here are the top 10 locations provided by spammer registrations on my board:
+-------------+----------+ | user_from | count(*) | +-------------+----------+ | Sex Relaxxx | 141 | | USA | 36 | | Russia | 33 | | adult | 19 | | US | 18 | | Canada | 18 | | Greece | 8 | | | 6 | | Jamaica | 6 | | Kazakhstan | 5 | +-------------+----------+
The first one seems to indicate a spammer, as does the fourth. It’s hard to say much about the others.
Then there are those that enter a complete web site in the location field. There are only 6 (out of 464) on my honey pot board that did this, and to be honest I have seen legitimate users do this as well, so it would be hard to classify this as a solid tell of a spammer.
For many years I observed spammers that would try to register on my boards only to get their web sites listed in their profile, which would then be displayed as a link on the memberlist. The first anti-spam measure I took was to prevent inactive members from showing up (a very simple, common, and popular MOD that can be found at phpbb.com as well). The next step was to prevent a user from entering a web site until they had posted a few times.
However, things seem to have changed. These simple measures became so popular that I suspect spammers started doing things to work around them. One of the changes made, interestingly enough, involved putting a legitimate website into their profile. Would you believe that one of the most popular web site entered by spammers now is google? Now I like to blame google for lots of things, but I doubt that they’re really behind all of the spammers joining my board.
I have had plenty of posts where I called out specific email domains being used by spammers. I think it’s relatively easy to see patterns here. For example, these are the top 10 email domains used to register on my honey pot board:
+----------------------+----------+ | email_domain | count(*) | +----------------------+----------+ | serpdomains.com | 142 | | mail.ru | 126 | | gmail.com | 33 | | gawab.com | 28 | | dp-blog.com | 25 | | mymail-in.net | 15 | | gmx.us | 15 | | greatfreemail.net | 12 | | mp3bank.in | 9 | | paydayloancourse.com | 4 | +----------------------+----------+
Notice who is number three on the list? That’s right, gmail. Along with spammer favorites like mail.ru and gawab.com I now have to deal with spammers using gmail accounts. It’s relatively easy to justify banning an email domain like anotherstupeddomain4bots.org (yes, I really got that, along with other domains in this post). I have heard of board owners that take the rather drastic step of banning all “free” email providers including hotmail and yahoo. I don’t think that’s a good step to take if you are trying to attract a wide range of members. Based on behavior I don’t have any problem adding certain domains to my banlist. I do have a problem with banning gmail and other free email accounts just because some spammers use their service.
Are any of these individual “tells” enough to block spammers? Maybe. Certain fields seem to have a higher success rate (time zone, for example) at predicting whether an account was created by a spammer or not. The problem with relying on an individual field like time zone is that it would be easy for a bot writer to change that behavior. In addition to that, I can’t be 100% sure that it’s not a legitimate user. For example, I just checked my biggest board and I have 21 users (a whopping 0.06%) that registered with the -12 time zone. Most of them have posted at least once and have survived, so they’re not spammers. If they were, I would have figured that out by now. In my opinion that means that I can’t really “auto-ban” anyone with that time zone, as attractive as that seemed at the beginning of this post.
Instead I have to look at patterns of behavior and combinations of fields. I can do that myself, or I can wait (impatiently! ) for the formal relaunch of the bbProtection service. The primary advantage of the bbProtection design is that it captures data from every subscriber and uses it to detect patterns from a much broader range of activity than any single board owner is likely to be able to do.
This post concentrated on reviewing registration data. Are there patterns in posting behavior that I can identify? It turns out the answer is “Yes”, and that there are some sobering statistics that show just how deep and wide the spammer-bot problems go.