Home

Your premium source for custom modification services for phpBB

  logo

HomeForumsBlogMOD ManagerFAQSearchRegisterLogin

Comments August 28, 2007

SQL Challenge #1: Frequent Poster Analysis

Filed under: SQL Challenge — Dave Rathbun @ 5:05 am CommentsComments (3) 

I have no idea if this will be interesting or not, but I want to try it for a while and see what happens. I am going to start a series of SQL Challenge posts. The first one is based on a query that I wrote during one of my posts in my “Advertising on Forums” series. Since it was based on a table that is not part of phpBB I have provided a download link for a script that will create a simple version of the table for you to put into your database and play around with, if you are so inclined.

The Setup

Here is the table:

mysql> desc sql_challenge_01;
+-------------+-----------------------+------+-----+---------+-------+
| Field       | Type                  | Null | Key | Default | Extra |
+-------------+-----------------------+------+-----+---------+-------+
| total_posts | int(11)               |      |     | 0       |       |
| w1_posts    | mediumint(8) unsigned | YES  |     | NULL    |       |
| w2_posts    | mediumint(8) unsigned | YES  |     | NULL    |       |
| w3_posts    | mediumint(8) unsigned | YES  |     | NULL    |       |
| w4_posts    | mediumint(8) unsigned | YES  |     | NULL    |       |
| w5_posts    | mediumint(8) unsigned | YES  |     | NULL    |       |
| w6_posts    | mediumint(8) unsigned | YES  |     | NULL    |       |
+-------------+-----------------------+------+-----+---------+-------+

My original work table included the user_id but I dropped that from the challenge version of the table since you won’t need to join to anything. Here are some sample rows from the table so you can get a feel for what it looks like:

+-------------+----------+----------+----------+----------+----------+----------+
| total_posts | w1_posts | w2_posts | w3_posts | w4_posts | w5_posts | w6_posts |
+-------------+----------+----------+----------+----------+----------+----------+
|          56 |        3 |        4 |        8 |        9 |       16 |       16 |
|         589 |       39 |       53 |       89 |      106 |      152 |      150 |
|           7 |        1 |        0 |        0 |        0 |        3 |        3 |
|           2 |        0 |        1 |        0 |        0 |        1 |        0 |
|          10 |        0 |        1 |        3 |        3 |        1 |        2 |
|         266 |       36 |       37 |       61 |       43 |       34 |       55 |
|          14 |        1 |        1 |        3 |        0 |        1 |        8 |
|           5 |        0 |        0 |        0 |        0 |        0 |        5 |
|          15 |        2 |        3 |        2 |        1 |        3 |        4 |
|         236 |       58 |       53 |       49 |       31 |       19 |       26 |
|           5 |        0 |        2 |        1 |        0 |        2 |        0 |
|          46 |       27 |        0 |        2 |        9 |        8 |        0 |
|           3 |        2 |        0 |        0 |        0 |        0 |        1 |
|           4 |        2 |        2 |        0 |        0 |        0 |        0 |
+-------------+----------+----------+----------+----------+----------+----------+

The various columns w1_posts, w2_posts, and so on contain the number of posts that a user made during that week. I built this table to help analyze how frequently users were visiting my board. In order to do that, I needed to know how many users had visited and posted in every one of the past six weeks, along with how many had posted in at least five (any of the last five) weeks, and so on.

The Challenge

Your challenge, should you decide to accept it, is to write one SQL script (not php!) that returns a count of how many rows in this table include users that posted in all six weeks, in any five weeks, in any in four weeks, and so on. If you look carefully at the data, any single column could be zero. Checking for rows that include posts in all six weeks is easy. Checking for rows that include posts in five out of the six gets a bit more challenging as any one of the six weeks could be zero. And so on from there.

When I say “in any five” I mean exactly five weeks. Anyone that posted in six weeks also posted in five, but that’s not where I went with this. “Any five” means just that.

Here is the expected output:

+------+----------+
| wk   | count(*) |
+------+----------+
|    1 |     1139 |
|    2 |      407 |
|    3 |      198 |
|    4 |       85 |
|    5 |       64 |
|    6 |       46 |
+------+----------+

So this tells me that 64 rows in the sample data had non-zero values in five out of the six weekly columns. 407 rows had non-zero values in 2 out of the six. And so on. The expected value for “wk” based on the sample rows shown above would be this:

+-------------+----------+----------+----------+----------+----------+----------+------+
| total_posts | w1_posts | w2_posts | w3_posts | w4_posts | w5_posts | w6_posts | wk   |
+-------------+----------+----------+----------+----------+----------+----------+------+
|          56 |        3 |        4 |        8 |        9 |       16 |       16 |    6 |
|         589 |       39 |       53 |       89 |      106 |      152 |      150 |    6 |
|           7 |        1 |        0 |        0 |        0 |        3 |        3 |    3 |
|           2 |        0 |        1 |        0 |        0 |        1 |        0 |    2 |
|          10 |        0 |        1 |        3 |        3 |        1 |        2 |    5 |
|         266 |       36 |       37 |       61 |       43 |       34 |       55 |    6 |
|          14 |        1 |        1 |        3 |        0 |        1 |        8 |    5 |
|           5 |        0 |        0 |        0 |        0 |        0 |        5 |    1 |
|          15 |        2 |        3 |        2 |        1 |        3 |        4 |    6 |
|         236 |       58 |       53 |       49 |       31 |       19 |       26 |    6 |
|           5 |        0 |        2 |        1 |        0 |        2 |        0 |    3 |
|          46 |       27 |        0 |        2 |        9 |        8 |        0 |    4 |
|           3 |        2 |        0 |        0 |        0 |        0 |        1 |    2 |
|           4 |        2 |        2 |        0 |        0 |        0 |        0 |    2 |
+-------------+----------+----------+----------+----------+----------+----------+------+

So, there you go. :) The first SQL Challenge from the phpBB Doctor. What do you win? You win the knowledge that you solved a puzzle. Is that enough for you? :-) It is my hope that this series of posts will be interesting but educational as well. There is often more than one way to solve the same problem, so I personally will be interested to see some of the solutions that are presented. Eventually I will edit the post and include my original answer and optionally the best answers (judged by me) submitted to the challenge as well.

The Material

This is a zip file that contains a SQL script that will create and populate the table used in this challenge.

sql_challenge_01.zip

Some of the future challenges will use the standard phpBB tables instead of requiring you to download and install something. I used this for the first challenge because I thought it presented an interesting challenge. 8)

3 Comments

  1. Cracking idea!! Always up for a challenge. Although just heading away with work for a week or so, so not that much time free this week :(

    Comment by Esmond Poynton — August 28, 2007 @ 1:42 pm

  2. Esmond, my wife took one look at your comment and said:

    He’s not from the US, is he…

    I think it was the “cracking idea” that gave her a clue… :lol:

    Comment by Dave Rathbun — August 29, 2007 @ 2:22 am

  3. Right finally downloaed it and about to give it a go.

    Comment by Esmond Poynton — September 6, 2007 @ 9:45 am

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress