Bitmaps vs. Sets to track Monthly Active Users in Redis

Recently I was building a piece of functionality to track the daily and monthly
active users on one of my sites. I already track this data in MySQL but
retrieving the data was sluggish, even after creating a few new indexes, so I
decided that I would use Redis. The question then became, which data type would
I use to track this?

The data itself is just a list of the unique user IDs that have accessed the
site on a given day or month. I could use a sorted set and use the timestamp as
the score, but I don’t need the score or the ability to sort the data, I just
need the count.

I did some research and came across this package to track user actions quickly
in Redis with Node.js
. I was implementing this in PHP but still took a
gander at the code. To my surprise, they were using bitmaps in Redis to track
the users. SETBIT to track and then GET and some extra logic to count the
number of active users. I’m unsure why they didn’t use BITCOUNT to found the
number of set bits, but then again, I have never worked with bitmaps in Redis
so I set out to learn more.

After digging around the wonderful Redis documentation, I came to the
conclusion that bitmaps are great and all, but based on time complexity, sets
(not sorted sets) would be better for my particular use case.

In the following examples I’ll be tracking the monthly active users for
September 2015 and :UID represents the unique ID for the user.

Tracking

Bitmap

SETBIT MAU:BITMAP:201509 :UID 1

The argument of 1 tells it to flip the bit on. The time complexity for this
is O(1).

Set

SADD MAU:SET:201509 :UID

The time complexity is documented as O(N) but N represents the number of items
being added, not the number of items in the set, so it would also be O(1).

Retrieving

Bitmap

BITCOUNT MAU:BITMAP:201509

This would count every bit that’s set to on in the bitmap. The time complexity
is O(N). Unfortunately, the documentation doesn’t clearly explain what N
represents. Feel free to comment on this if you know, but from what I gathered
based on the performance considerations with larger bitmaps and this command,
the N represents the total number of bits set in the bitmap.

It’s documented that with smaller bitmaps the BITCOUNT command “is still as
fast as any other O(1) Redis command”. Thus, it gets slower as the bitmap grows
in size.

Set

SCARD MAU:SET:2015

Time complexity on this one is O(1) regardless of the size of the set.

Conclusion

Based on the vagueness of the documentation regarding BITCOUNT’s time
complexity, sets are the clear winner for me in this particular use case.

One thing I didn’t factor into this is the memory footprint for each of the
types. I suspect that bitmaps would be smaller then sets. Do you want a smaller
footprint or faster retrieval?

The other consideration is how large your user IDs are. Bitmaps only support
bit offsets up to 2^32 (4,294,967,296). More than most of us will ever need but
if you thought it would be cute to start your user IDs at 5 billion or
something you’re pretty much fucked.

I’m planning to put together an active user PHP package similar to the Node.js
one I found, be sure to follow me on GitHub if you’re interested.

Josh Sherman - The Man, The Myth, The Avatar

About Josh

Husband. Father. Pug dad. Musician. Founder of Holiday API, Head of Engineering and Emoji Specialist at Mailshake, and author of the best damn Lorem Ipsum Library for PHP.


If you found this article helpful, please consider buying me a coffee.