Recently I was building a piece of functionality to track the daily and monthly active users on one of my sites. I already track this data in MySQL but retrieving the data was sluggish, even after creating a few new indexes, so I decided that I would use Redis. The question then became, which data type would I use to track this?
The data itself is just a list of the unique user IDs that have accessed the site on a given day or month. I could use a sorted set and use the timestamp as the score, but I don’t need the score or the ability to sort the data, I just need the count.
I did some research and came across this package to track user actions quickly
in Redis with Node.js. I was implementing this in PHP but still took a
gander at the code. To my surprise, they were using bitmaps in Redis to track
SETBIT to track and then
GET and some extra logic to count the
number of active users. I’m unsure why they didn’t use
BITCOUNT to found the
number of set bits, but then again, I have never worked with bitmaps in Redis
so I set out to learn more.
After digging around the wonderful Redis documentation, I came to the conclusion that bitmaps are great and all, but based on time complexity, sets (not sorted sets) would be better for my particular use case.
In the following examples I’ll be tracking the monthly active users for
September 2015 and
:UID represents the unique ID for the user.
SETBIT MAU:BITMAP:201509 :UID 1
The argument of
1 tells it to flip the bit on. The time complexity for this
SADD MAU:SET:201509 :UID
The time complexity is documented as O(N) but N represents the number of items being added, not the number of items in the set, so it would also be O(1).
This would count every bit that’s set to on in the bitmap. The time complexity is O(N). Unfortunately, the documentation doesn’t clearly explain what N represents. Feel free to comment on this if you know, but from what I gathered based on the performance considerations with larger bitmaps and this command, the N represents the total number of bits set in the bitmap.
It’s documented that with smaller bitmaps the
BITCOUNT command “is still as
fast as any other O(1) Redis command”. Thus, it gets slower as the bitmap grows
Time complexity on this one is O(1) regardless of the size of the set.
Based on the vagueness of the documentation regarding
complexity, sets are the clear winner for me in this particular use case.
One thing I didn’t factor into this is the memory footprint for each of the types. I suspect that bitmaps would be smaller then sets. Do you want a smaller footprint or faster retrieval?
The other consideration is how large your user IDs are. Bitmaps only support bit offsets up to 2^32 (4,294,967,296). More than most of us will ever need but if you thought it would be cute to start your user IDs at 5 billion or something you’re pretty much fucked.
I’m planning to put together an active user PHP package similar to the Node.js one I found, be sure to follow me on GitHub if you’re interested.