Home

Your premium source for custom modification services for phpBB

  logo

HomeForumsBlogMOD ManagerFAQSearchRegisterLogin

Comments July 30, 2008

Building a Daily Digest Part I: Generation and Delivery

Filed under: MOD Writing, phpBB — Dave Rathbun @ 2:25 am CommentsComments (5) 

I think I have frequently mentioned that my biggest board was at one time a mailing list. The mailing list had two modes: individual message or daily digest. Because of this, one of the first feature requests I got from my user community was a way to subscribe to a daily summary of activity from selected forums. The code to generate and deliver the digest was fairly easy. The user interface was a bit more challenging. Code is always easier if you don’t have to interact with users, because things are so much more predictable without them. :lol:

One of the main factors that I tried to keep in mind was performance. If 15,000 people subscribed to a nightly digest, I wanted the generation and delivery process to be as quick and painless as possible. So before I wrote any code I sat down with my thoughts and figured out a few shortcuts.

Determine the Options

The first decision I made was that I was going to treat my digest exactly the same way as the former mailing list. It was delivered at midnight based on the server time, and you got it at that time no matter which time zone you lived in. There is a very popular digest MOD at phpbb.com that allows a user to set the delivery schedule. I decided that was not a feature that I was going to offer. If you wanted a digest, then you got it at the same time as everyone else. I’ll explain more about why this was important in just a bit.

The next decision that I made was that I had to offer both text and HTML email formats. It took a while to figure that out (and there are still some bugs in the css used in the HTML version of the email) but I did get it eventually.

Finally, a user had to be able to specify as many (or few) forums that they wanted. They had the option to subscribe to a weekday only (5 day delivery) or a full week (7 day delivery). And they had the option to suspend the digest delivery without turning everything off and losing their preferences. All of these options except the forum selection became part of the user profile screen. The forum selection was done on its own screen.

Generation and Delivery

I opted to use perl to write my digest code. I could have done it in php, but at the time I was more familiar with perl and therefore I went with that. The job is scheduled via cron, runs via a command line, and somehow I felt that perl would be faster for that sort of task anyway. It would be interesting to run a comparison at some point.

The biggest issue that I faced was planning for potentially thousands of different digest requests. This email wasn’t going to be like a topic notification, where everyone that is watching a topic gets exactly the same email. No, in this case, I might have 1,000 different users subscribe to 1,000 different digest configurations. :shock: It turns out that deciding to do everything at midnight made the entire process very simple, and quite efficient.

Midnight Process

On this board I have a script that runs at 11:59 PM every night. I run it at one minute prior to midnight, but for the rest of this post I will pretend that it really runs at midnight. It’s just easier to talk about. :) The script is responsible for doing a couple of things. The first thing it does is capture the last post_id value at the time the script was run. It also gets the last post_id from the prior day stat row and increments it by one and calls that the “first” post_id for the day. It captures the cumulative page view number from my counter table. Finally, it counts how many new members joined that day. All of these values are stored in a table called “daily stats” for later research. The values used for the digest process are the first_post_id and last_post_id. Everything else is used for analyzing the daily usage patterns on the board and is unrelated to the digest process.

The two post_id values give me a range of post_id values that might have occurred for that day. I say “might have occurred” because posts can get deleted. I don’t really care about those. By getting the first and last post from today I get a range of posts to work with.

Next the digest process starts. Before I check any of the user parameters (selected forums and so on) the first pass of the digest is executed. What it does is quite simple and the core reason why the process is so efficient. The first pass is responsible for creating a text and an html extract for every forum on the board and saving them to a file using the forum_id as the file name. The extension will be either .html or .txt to show which format was used. For example, for forum_id 12 I will have files named 12.html and 12.txt in a directory on my server. (These files are created / stored in a directory that is not visible to the web for security reasons.)

Since everyone is getting their digest prepared and sent at the same time, and since I have a separate file (extract) for every individual forum on my board, the next step is to put those building blocks together. That code loop looks like this:

$forum_id = $c_get_user_subscribed_forums->fetchrow();

while (defined $forum_id)
{
	$forums_processed ++;
	open (DIGESTFILE, "<$DIGESTPATH/$forum_id.$digest_format[$digest_text]");
	$email_body .= $_ while();
	close (DIGESTFILE);
	$forum_id = $c_get_user_subscribed_forums->fetchrow();
}

An outer loop (not shown) opens a cursor to the database and gets a list of users that have subscribed to at least one forum AND who have not suspended their digest delivery AND have not been marked as having a bounced email. The loop shown above is run for each of these users. It opens a cursor and gets a list of forum_id values they’ve subscribed to. For each forum_id found, it slurps in the pre-generated text or html formatted file based on the user’s preference. By doing this, it means that I only have to query the forum / topic / post tables once! After I’ve generated the temporary files I use them to build the digest email for each individual user.

Once the file has been built the appropriate footer (text or html format again) is appended to the body of their email, and the email is sent. I have never timed how long it takes the emails to get processed through the mail queue, but the entire digest process from start (building the extra files for each forum) to finish (sending the customized email to each individual user) runs in less than sixty seconds.

My Board

The digest delivery is part of a larger MOD that I call “My Board”. While this MOD has never been released or even mentioned in the MOD forums at phpbb.com it is one of my most requested MODs from clients. It allows a user to show or hide forums on the index based on their preferences. It also allows a user to mark which forums to search or ignore, and it is also how the user marks which forums to subscribe to. I’ll show the user interface and talk a bit more about that MOD in my next post on this subject. 8-)

5 Comments »

  1. Looking forward to hearing about some of capabilities of your “My Board”.

    >”AND have not been marked as having a bounced email.”
    >
    Is this automated somehow or just a manual process when you get bounces to your email account? I get tons of bounced mail all the time, so much that I ended up implementing an email filter to dump it into a folder in my email program. I mostly ignore them due to quantity unless I see an excessive number of rejects in a given time period. In that case, I have an admin tool that unsubscribes them from all topic notifications, by email address.

    Comment by Everett — July 30, 2008 @ 11:39 am

  2. Hi, Everett, thanks for your interest. No, the process is not automated at this time. I have set up a bogus email address as “nobody-at-mydomain-dot-com” that is used for outbound messages, and most of the time I just let the replies to into the bit bucket and increase the general entropy of the universe. :lol: But some mail handlers reply to the domain account rather than the sender account and those show up in my inbox.

    I have an option on the User Edit screen on the admin panel that lets me suspend the digest email (if the mailbox is full) or mark the account bounced which terminates all emails from the site.

    I have considered trying to make this automatic, but just like banning I don’t like to do anything that has a potential negative impact on users in an automated fashion. I prefer to do a manual review first.

    I’ve already written the next post for this series and it’s scheduled to come out soon. It provides more details about the My Board MOD including the database design and some screen captures showing the user interface.

    Comment by Dave Rathbun — July 30, 2008 @ 12:54 pm

  3. Be careful about just letting emails bounce. I have heard from some reliable source that email giants such as Yahoo, HotMail, and Google Mail will start banning IP addresses which send out too many bad emails which bounce. Essentially, you’ll get tagged as a spammer and then none of your emails will get through. You’ll be lucky to even make it to the spam box. :P

    There are automated processes for detecting a bad email and then blacklisting it. Most people reading this blog will probably know how to implement such as system.

    Comment by Dog Cow — July 30, 2008 @ 2:08 pm

  4. Dog Cow, that’s good advice, thanks. I can certainly research the process but for now it’s unfortunately going to be on the back-burner, I think.

    Comment by Dave Rathbun — July 30, 2008 @ 2:29 pm

  5. Thanks for the info. This may explain why my site IP address was blocked recently by att.net and all their variations. It was annoying but a simple process to reverse.

    ATT might be a little aggressive though as my father recently had the IP address of his internet provider blocked as well and he lives in the middle of nowhere – 44kbps dial-up to a small local provider.

    Comment by Everett — July 30, 2008 @ 7:54 pm

RSS feed for comments on this post.

Leave a comment

Tags allowed in comments:
<a href="" title=""> <acronym title=""> <blockquote cite=""> <code> <strong> <em> <u> <sup> <sub> <strike>

Confirm submission by clicking only the marked checkbox:

             *

Powered by WordPress