in Software Development #NoSQL

Tracking peak users online with Redis

This probably isn't the best way to track things, but it's the best I came up with. Tracking the total number of users that log in per day or month is trivial as I have discussed in a previous post. To track the peak users online at a given time requires a bit more thought.

The major issue for me is that I don't necessarily want to be slamming Redis on every page view, so I end up throttling the writes to every couple of minutes. Delay aside, I have come up with two ways to track the peak number of users. One method results in more data being stored while the other ends up with a small bit of computation. YMMV and you should utilize a method that makes the most sense for your current scale.

The first method is to write to a sorted set periodically with the number of users online. You can then pull the peak number of users by grabbing the member with the largest score:

ZADD :key :score: :member

:key is your key (I'd name it something like ns:users:online:peak:20151224). :score is how many users were online at that moment and :member would be the current timestamp (CCYYMMDDHHMMSS).

To grab the highest value for the day (or whatever period you're using) you would run:

ZREVRANGE :key 0 0 WITHSCORES

Which would return the :member and :score for the member with the highest score. You could break this up by hour, day, week, year, whatever. Could even write to multiple keys at once to create something a bit more like a time series.

The next method, and the method I am currently using on one of my sites is to keep a value, sanity checking if the new value is larger and if so, overwriting it. The reason I took this approach is because I wanted to store the values in a hash to avoid key bloat.

As mentioned, this method requires a bit more logic as you have to retrieve the stored value, compare it to the current value and if the current value is greater, go ahead and store that. I'm sure this additional logic adds a small bit of margin of error, but like I said, it reduces some of the key bloat and in my opinion, makes it a ton easier to pull multiple day's peak values without pipelining.

First, you need to get the stored value:

HGET :key :period

As before, :period can be whatever period you want it to be, I tend to use the current date as I like having peak user data broken out by day. The :key can be something like ns:users:online:peak.

Once you have this value, you'll need to compare it to the current value (this is a code agnostic post, I trust that you know how to compare variables ;). If the current value is larger than the value we retrieved, you will need to write the current value back out:

HSET :key :period: :users

Where :users is the current number of users online.

That's pretty much it, both methods allow you to track the peak number of users per day. If so desired, you could run this on every single page load, but I find that both methods would introduce quite a bit of unnecessary overhead.

Assuming you don't prune the keys for the first method, you could end up with a better picture of your site's daily usage whereas the second method only shows you the peak.

As mentioned, you have to choose the method that best fits your use case. If you've come up with a better method of tracking, I'd love to hear about it, hit me up in the comments below!

Oh! and by the way, Merry Christmas :)