I recently rewrote my banner management system for one of my boards. The board is fairly active (in fact we’re averaging over 100,000 page views daily now) so with multiple page views per second taking place during the busiest times of the day it would make sense to be concerned about performance. And I was. But I also had to be concerned about auditing my banner and page statistics, and ensuring that if I said a banner was going to be displayed every 10 page views that it was. So the system had to be 100% efficient and just as accurate. That presented some challenges.
I needed a way to set up a sequence of banners and then ensure that nobody ever got skipped. If I said that each banner was going to be displayed an equal number of times per day, then I had better be sure that I delivered that. Random number generators are historically not very random and therefore I could not use that approach. Having a “next banner to be displayed” record in a database would work, but the lock for update / update / select / release lock process might result in a deadlock situation, and certainly would not perform very well. I decided I needed to do some research and figure out the best way to manage my banner rotation.
What I found was really interesting, and has been working amazingly well for several months now. Unfortunately the technique is unique to MySQL so I can’t plan on releasing this as a MOD anytime soon.
Random Numbers versus Incrementing Page View Counter
I had decided that I did not want to use any sort of random number generator for showing banners. My preliminary testing showed that it was quite frankly a horrible solution. There was always one banner that was displayed a lot more than any of the others. Perhaps over time it would have become more even, but after 100 trials and 10 banners it seemed that there was always one banner in the 20+ range when they should have all been (if equally weighted) around 10. That was unacceptable and I quickly eliminated this from consideration.
Next I decided to leverage a sequential number that I already had. In page_tail.php I had a very simple query that was responsible for updating a page view counter. By using the mod() function on the newly assigned page counter number I could rely on a continuing series of numbers that would never skip values.
Or could I?
On a very busy board there might be multiple updates occuring nearly at the same time. If I were to use this technique, as shown by these two simple SQL statements:
UPDATE phpbb_page_table SET page_counter = page_counter + 1; SELECT page_counter FROM phpbb_page_table;
It is entirely possible (and fact even quite likely) that someone else could have updated the page counter value before I selected my result row. That meant that even if the code looked like what I typed above the reality is that it could get executed like this:
UPDATE phpbb_page_table SET page_counter = page_counter + 1; UPDATE phpbb_page_table SET page_counter = page_counter + 1; UPDATE phpbb_page_table SET page_counter = page_counter + 1; SELECT page_counter FROM phpbb_page_table; UPDATE phpbb_page_table SET page_counter = page_counter + 1; SELECT page_counter FROM phpbb_page_table; SELECT page_counter FROM phpbb_page_table; SELECT page_counter FROM phpbb_page_table;
Note that there are still four “update” statements and four “select” statements, but they’re not guaranteed to execute in the proper order since each request is coming from a different page. Of course each page only sees one SELECT and one UPDATE statement but there is no way to ensure that someone else’s page isn’t running the same queries at essentially the exact same time that I am. The queries are sent to the database from all pages at the same time. Any page logic that requires an absolutely isolated query execution sequence is going to potentially fail in a shared user environment. It might not fail every time, but it could potentially fail. That’s a problem.
What is the result of having the queries out of order in the above example? Banner number 1 and 2 get skipped because the update runs with no select. Banner number 3 is displayed once, and banner number 4 is displayed three times because there are three selects following a single update. The net result is that I am going to potentially see the same issue here that I saw with my random number system… banners were not going to be guaranteed to come out with an even distribution of page views.
Select … For Update
To avoid having multiple updates run before I can get my selected value returned I could consider using a lock statement of some kind. That way when the select statement is issued it is guaranteed that nobody else has updated that row. After the select is performed then the lock would be released. Here’s how this would have worked out if I had wanted to try this solution.
MySQL provides the “Select FOR UPDATE” syntax. This will use page or row locking (whatever is available based on the engine in use) instead of locking the entire table. In the scenario shown in the prior section I ran the update and then selected the resulting value. This time the logic will be reversed, as shown here:
SELECT page_counter FROM phpbb_page_table FOR UPDATE; UPDATE phpbb_page_table SET page_counter = page_counter + 1;
The “FOR UPDATE” clause is used to prevent anyone else from updating the row until I get my value. Once I get the value, I increment the counter for the next person. The lock has to be an exclusive lock otherwise two pages would end up competing for the update.
I am really hesitant to issue application locks for these and other reasons. Obviously it’s fine if the database issues locks as a result of commands that I execute, but having specific code in my php files to issues locks? Not a good idea. To be honest, I didn’t even try to prototype or test this solution because it would have been horrible. What if a page dies in the middle of the lock and never releases it?
To summarize so far:
- Random numbers are efficient but not guaranteed accurate
- Sequential counters are efficient but not guaranteed accurate
- Sequential counters with locks are accurate but not efficient and could lead to a deadlock situation
It turns out that while I was researching lock statements on mysql.com I found this buried at the very bottom of the page:
The preceding description is merely an example of how SELECT … FOR UPDATE works. In MySQL, the specific task of generating a unique identifier actually can be accomplished using only a single access to the table:UPDATE child_codes SET counter_field = LAST_INSERT_ID(counter_field + 1); SELECT LAST_INSERT_ID();
The SELECT statement merely retrieves the identifier information (specific to the current connection). It does not access any table.
Hm. That sounds very interesting. In fact, it’s exactly the solution that I needed for my banner system to be both 100% accurate and efficient at the same time. I’ll talk more about how the LAST_INSERT_ID() function works in my next post.