Recently I was building a piece of functionality to track the daily and monthly active users on one of my sites. I already track this data in MySQL but retrieving the data was sluggish, even after creating a few new indexes, so I decided that I would use Redis. The question then became, which data type would I use to track this?

The data itself is just a list of the unique user IDs that have accessed the site on a given day or month. I could use a sorted set and use the timestamp as the score, but I don’t need the score or the ability to sort the data, I just need the count.

I did some research and came across this package to track user actions quickly in Redis with Node.js. I was implementing this in PHP but still took a gander at the code. To my surprise, they were using bitmaps in Redis to track the users. SETBIT to track and then GET and some extra logic to count the number of active users. I’m unsure why they didn’t use BITCOUNT to found the number of set bits, but then again, I have never worked with bitmaps in Redis so I set out to learn more.

After digging around the wonderful Redis documentation, I came to the conclusion that bitmaps are great and all, but based on time complexity, sets (not sorted sets) would be better for my particular use case.

In the following examples I’ll be tracking the monthly active users for September 2015 and :UID represents the unique ID for the user.

Tracking

Bitmap

SETBIT MAU:BITMAP:201509 :UID 1

The argument of 1 tells it to flip the bit on. The time complexity for this is O(1).

Set

SADD MAU:SET:201509 :UID

The time complexity is documented as O(N) but N represents the number of items being added, not the number of items in the set, so it would also be O(1).

Retrieving

Bitmap

BITCOUNT MAU:BITMAP:201509

This would count every bit that’s set to on in the bitmap. The time complexity is O(N). Unfortunately, the documentation doesn’t clearly explain what N represents. Feel free to comment on this if you know, but from what I gathered based on the performance considerations with larger bitmaps and this command, the N represents the total number of bits set in the bitmap.

It’s documented that with smaller bitmaps the BITCOUNT command “is still as fast as any other O(1) Redis command”. Thus, it gets slower as the bitmap grows in size.

Set

SCARD MAU:SET:2015

Time complexity on this one is O(1) regardless of the size of the set.

Conclusion

Based on the vagueness of the documentation regarding BITCOUNT’s time complexity, sets are the clear winner for me in this particular use case.

One thing I didn’t factor into this is the memory footprint for each of the types. I suspect that bitmaps would be smaller then sets. Do you want a smaller footprint or faster retrieval?

The other consideration is how large your user IDs are. Bitmaps only support bit offsets up to 2^32 (4,294,967,296). More than most of us will ever need but if you thought it would be cute to start your user IDs at 5 billion or something you’re pretty much fucked.

I’m planning to put together an active user PHP package similar to the Node.js one I found, be sure to follow me on GitHub if you’re interested.