Real-time Web Analytics with GoAccess
My mission to gain more technical independence has me questioning any and all third-party services I use. While I do value having visibility into my websites traffic, relying on Google for anything has been top of my mind during this process.
For my personal projects, Google Analytics has always felt like overkill. Products that compete directly with Google Analytics, especially if they need me to setup a database, also felt pretty heavy.
Also, I'm lazy, and didn't feel like adding another server to the fleet.
Logfile-based analytics
Logfile-based analytics software utilizes your existing web server logs. In my case, these are logs produced by nginx. Said logfiles are analyzed and pretty reports and/or webpages are generated from them.
This tends to be more than sufficient to get some insights to your traffic.
Showing my age a bit, back in the day you'd slap AWStats on your server and call it a day. Fast-forward a quarter century, I definitely considered it, but also, I wanted to explore any new options that have popped up during that time.
GoAccess seemed like a good choice
I'm not going to act like I did a ton of research on this one. I did a few web searches and GoAccess came up a bunch. I asked a few of my buddies what they are doing for web analytics, and GoAccess came up again.
Always a skeptic, I did a bit of direct research on GoAccess and here's what I had come up with: It's fast and it doesn't look old.
Good enough for me... LFG!
Back at it again with the Unix philosophy
GoAccess seemed like a great choice, until it didn't. GoAccess is first and foremost, a command-line application for logfile analysis. There is a real-time aspect, but that would require me to log into the server and launch the app.
Fortunately, you can generate a beautiful static site with GoAccess. Sticking true to the Unix philosophy of doing one thing well, I'd need to get creative to achieve my goal of having real-time analytics available in my web browser.
High-level overview
Rather than jumping right in and throwing code at the problem, I did some research to figure out if what I'm trying to accomplish is feasible.
- Runs for every site on the server
- Runs on a frequent, near real-time schedule
- Runs against logfiles and compressed logfiles
The last one I wasn't sure about. This unlocked the whole world of z* tools,
which somehow eluded me my entire career.
Everything I had outlined was do-able, and hosting was easy enough. I took the
path of least resistance, leveraging nginx authentication and autoindex. Cron
for scheduling, every 15 minutes, and each site's analytics are tossed into an
appropriately named directory.
Web analytics generation script
You could certainly wire this up with systemd but I opted for a simple cron
job. The script itself isn't very sophisticated either. Loop through the files
in sites-enabled and feed the available logfiles into goaccess.
The final touch of the site directory is bump the time that is shown next to
each directory on the nginx autoindex.
#!/bin/bash
for FILE in /etc/nginx/sites-enabled/*; do
SITE=$(basename $FILE)
echo "Site: $SITE"
mkdir -p /var/www/stats.example.com/$SITE
zcat -f /var/log/nginx/$SITE.access.log \
/var/log/nginx/$SITE.access.log.* \
/var/log/nginx/$SITE.error.log \
/var/log/nginx/$SITE.error.log.* \
| goaccess - \
-o /var/www/stats.example.com/$SITE/index.html \
--log-format=COMBINED
touch /var/www/stats.example.com/$SITE
done
Nginx site configuration
There's quite a bit going on here, but the important pieces are:
- Making sure you have authentication if you want your stats to be private
- Turning on
autoindexif you want index content to list multiple sites
server {
listen 80;
server_name stats.example.com;
return 301 https://$host$request_uri;
}
server {
listen 443 ssl;
server_name stats.example.com;
root /var/www/stats.example.com;
index index.html;
access_log /var/log/nginx/stats.example.com.access.log;
error_log /var/log/nginx/stats.example.com.error.log warn;
ssl_certificate /etc/letsencrypt/live/stats.example.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/stats.example.com/privkey.pem;
include /etc/letsencrypt/options-ssl-nginx.conf;
ssl_dhparam /etc/letsencrypt/ssl-dhparams.pem;
location / {
auth_basic "Admin Area";
auth_basic_user_file /etc/nginx/stats.htpasswd;
autoindex on;
try_files $uri $uri/ =404;
}
}
Conclusion
You could argue that there's elegance in the simplicity. You could also argue that this isn't a suitable replacement for Google Analytics. This solution is sufficient enough for my needs, and it didn't eat up a lot of time to implement.
If my sites logfiles ever take more than 15 minutes to analyze, I'll probably revisit things. Or I'll just bump the cron job's schedule up to every 30-60 minutes and call it a day.