I hate foreign keys. They are rarely implemented properly, often times failing to implement cascading deletes and updates. Sure, the concept is sound, and relationships are the first letter in RDBMS, I personally avoid using foreign keys as I find the potential for orphaned data to be way more manageable.
While I don’t use foreign keys, that doesn’t mean I never have to work with them. Recently, I ran into an issue with some poorly optimized tables with foreign keys. Lacking decent indexing, the query that would clean up the older rows on the tables started to fail.
Without decent monitoring on the table, it grew to over 50 million rows. At that size, other queries started to break, reaching a point that an alarm finally sounded to let us know.
Cool, I’ll just clean up the data manually.
Famous last words.
I was able to delete a decent chunk of data, but eventually I started to run into foreign key constraint errors. This was exacerbated by the fact that one of the tables had a self referential foreign key, which led to multiple levels of nesting.
As per usual, I tried getting as creative as possible. I fiddled with sorting the data. Changed my approach by trying to delete the deepest level of nesting. I even tried to delete the single rows that were causing me grief before trying to delete larger batches again.
My creativity wasn’t quite enough to get things to a healthier spot.
Can’t we just skip those rows?
Since I try to avoid foreign keys, I can honestly admit that I don’t know as much as I could about them. One trick that has somehow has eluded me over my 25 year career is using the IGNORE
modifier when deleting.
The modifier tells MySQL to not worry about errors that are considered ignorable. When deleting, any rows that would throw an error are simply retained, while the unafflicted are deleted like normal.
The query with the additional modifier looks something like this:
-- Adjust the table name and conditions to suit your needs
DELETE IGNORE FROM table_name WHERE some_col <= ?;
SQLUpon completion, you’ll be advised of how many rows were deleted and how many warnings were encountered.
This may not be an ideal solution for some situations, but for what I was attempting to clean up, it was just fine. Given the nested nature of the self referential table, I was able to run the queries multiple times, eventually removing all of the data that I needed to.