Ditching GitHub
I'm late to the party. Better late than never, right?
I got a bug up my ass recently, and have started to move away from services in favor of self hosting. This site is back home, being hosted on a VPS with Linode. My code has mostly been moved out of the evil clutches of Microsoft, with a small number of exceptions, that I plan to remedy in the near future.
Taking back my technical independence will be its own dedicated topic for another day. Until then, I can tell you that it has been an extremely fulfilling process thus far. Learning and relearning things has been mostly fun and occasionally frustrating.
Putting in the reps has felt damn good.
Historical Motivation
I can't remember exactly what my younger "1337 haX0r" self would have said about Microsoft, but it probably would have included "fuck" and/or "sux".
I've grown up a smidgen since then, but I still reference them as "Micro$oft". I do use TypeScript, although I've grown to wonder how necessary it is on new projects. I can't shake the feeling that Microsoft caring about open source is anything more than a ruse.
Their purchase of GitHub was probably when I should have jumped ship. Sadly, my own laziness got the better of me. Promises of code being archived in perpetuity always lingering in the back of my head.
"Forever" is a long time, long enough that nobody can guarantee shit.
Copiloted Motivation
Fast-forward a few years after the purchase of the world's largest collection of open and closed source software. Copilot is released, allegedly trained on publicly available source code. Licensing drama ensues.
While I've been doing a lot of leaning into agentic coding, I'm also skeptical that private code isn't leaking into these models. There's always a lot of gray area when it comes to new things.
A lack of transparency led to the Software Freedom Conservancy to begin calling on FOSS developers to Give Up GitHub! after the general public release of Copilot.
While my own desire to own more of my stack factored in heavily, there's a lot of strong evidence out there that Microsoft is still evil, and yadda yadda, insert some quote from RMS or something.
Putting in the Reps
I've run CVS and Subversion servers before. Admittedly, and maybe it's just my
memory failing, I don't ever remember setting up git on a server and hosting
it myself. GitHub was one of my first forays into using git and ended up
becoming a bit of a synonym for git in my head.
I'm honestly a bit embarrassed by this. I love computers, and more importantly, I love knowing how things work.
Typically I "outsource" to a service because I'd like to save the time it would take in maintaining the thing myself. Too often these days, it's easier to just stitch together services, often "free", rather than learn how the underpinnings work.
I put free in quotes, because there's no such thing as free lunch. If you're not paying for something, then you're most likely the product. Also, I didn't want to bother trying to figure out how to do a parenthetical reference in markdown.
Self-hosting Git
I did it, I setup a git server. Debian 12 with keyed SSH access to the git
repos. It didn't take much to get things moving.
Initially I was eyeing cgit as a lightweight web interface to my
public repos. There's something charming about it, and it's good enough for the
Linux Kernel.
Unfortunately, there were some features that I do enjoy that software forges provide. Pull / merge requests are something that I use, even as a solo developer on a project. Also, I do like me a good CI/CD pipeline, so having that all stitched together would be nice.
Beyond Coding, We Forge
I did some research on the git forge software out there, and I landed on
Forgejo as my pick. That was after I had run the gamut of setting up
various other forges, including GitLab, which was way too resource intensive for
my needs.
Was a fun process, even if it took a bit of time to get my sea legs with. Once the forge was established, I built out a Forgejo Runner server to run Forgejo Actions. There's something truly magical about not having to wait in line to run your build pipeline.
CONTRIBUTING.md
Using a third-party git forge does have its advantages, some of which I'm
completely giving up on at this time. One such advantage is ease of
contributing for others.
Sad truth is, most of my projects have only ever gotten feature requests. It's like the pull request button was broken or something.
While I'm a huge proponent of open source, my sentiments from 2012 about social coding platforms still ring true:
I'm really starting to think that open source is just a code flea market for folks that believe copy and paste is a design pattern.
I get it, my code's so fucking perfect that I don't need any help from y'all.
In all seriousness, nature finds a way, so I'm sure if the need arises, somebody
will reach out. Maybe I'll move the code to a more open platform, like
Codeberg or I'll just ask the person to send me a git diff and call it a
day.
Work In Progress
I've drawn the line in the sand, but I'm still pragmatic. I do have some public projects I've yet to move, as they are part of my side hustle. I want to make sure I go about it in the right way. Most likely I'll move those repos to another service. We'll see though.
I'm also making hard and fast decisions about my personal projects, and not necessarily trying to crusade in the workplace. You can have strong convictions, but trying to sway stakeholders of some ideological decision is a long game.
Growing Pains
Whenever you start working out, you discover "muscles you never knew you had". The same is true when you rely on a service for over a decade, then decide to roll the dice on your own thing.
First bit of pain I encountered was trying to update some URLs over on Packagist. Using their update functionality to point to my new server flat out failed. After some research, it seemed related to using Cloudflare's protection.
Keep in mind, I was able to do the normal git stuff, like cloning and pushing,
without any issue. Seemed like maybe Packagist was hitting a bot challenge or
something, and couldn't fetch the meta data.
I've been wondering if Cloudflare was going to remain in my life as I continue
to seize my technical independence. For the moment, it's sticking around, but I
thought that gray clouding my git box would be a good test.
Packagist was happy, and at first, all was well...
Attack of the AI Botnet
Then a day or so later, I noticed my git server was using more CPU. Initially
I thought it was due to me moving over my most popular project, the work of art
known as php-loremipsum.
Packagist actually made me jump through hoops to change the URL for that one, since it's just that damn popular.
Insert sound of deflating ego
No dice, it was fucking robots. Not just any robots, but robots from AI companies. Not just aggregating the latest code, but traversing every single commit.
My poor server had approached 50% CPU, and disk space was being eaten up as archives were being generated at a record clip.
The biggest offender seemed to ClaudeBot.
I'm Under Attack
There's two sides to the situation:
- Crap, now I need to waste my Saturday dealing with this
- Holy fucking shit! I wasn't aware that it was this bad when I was hosting on GitHub
Fortunately it wasn't hard to mitigate the situation once I had identified the cause. I went with the combo loco of Cloudflare's AI bot blocking functionality as well as some Nginx rules.
I'm also exploring adding Anubis to the mix.
It may seem like overkill to have so many layers of protection, and you're right
to call out that I didn't even bother mentioning robots.txt.
Thing is, it's already been documented that these AI companies aren't all playing by the established rules, so attacking from as many sides as possible isn't overkill, it's necessity.
Conclusion
It's been a fun week or so, building out servers, testing out software, doing the work and putting in the reps.
It's also been eye opening.
While I'm embracing these so-called AI tools, it's hard not to question their motives and intentions of these companies as they so willingly cripple a server with reckless abandon.
Technical independence is important, but not everybody knows they should have a
robots.txt setup to block this sort of malicious scraping. More so, the
landscape is changing daily, so even if every company actually respected these
rules, there's always another bot that needs to be added to the list.
I have my longstanding hatred of Microsoft, but they aren't the only evil empire in town. Billion dollar valuations are being tossed around, on the shoulders of the altruistic open source community.
If you're hosting on GitHub, or any other closed platform, you have no idea what they are doing with your information.
Fortunately, we still live in a world with a lot of options. Choose wisely, my friends.