How to loop through all files in an S3 bucket in PHP

Recently I decided to finally tackle the ever growing S3 bucket for the niche
social network I run.

The reason it’s ever growing is because I never implemented any sort of hard
deletion logic. At one point, I was planning to move the images over to
soft deletions, but never did. The soft deletions were going to help with the
infamous “so I accidentally deleted my account and…” requests.

At this point, the site’s pretty much on life support as it dies a fairly slow
death, and the S3 bill just doesn’t go down, so it was time to clean things up.

The structure of the bucket is a series of directories or shards, each
containing multiple directories named after the user’s ID. Inside of those user
ID directories, there are images that are named UNIXTIMESTAMP.{jpg,gif} as
well as background.{jpg,gif}.

Not to bog you down with too much domain information, but the database table for
the images is keyed on the user ID + Unix time stamp. Any images that were
previously deleted would lack a row in the database table.

The background.* images that belonged to users that had since deleted their
accounts also needed to be purged.

I could have looped through every deleted user and delete their respective
images, but that would leave any images that were deleted by users that still
had an active account.

Without any records in the database for the deleted images, I couldn’t
accurately generate a list of images that needed deleted. Because of this, the
logical approach was to loop through every damn file in the S3, check to see if
it had been deleted, and if so, remove it from S3.

And why would I choose PHP for such a task? To be honest, at this point in my
life, PHP wouldn’t have been my first pick. The reason I went with it though, is
because the project I was working on was originally built in PHP and it was easy
enough to hack together a new script leveraging all of the existing
infrastructure.

The logic was straight forward enough, unlike most of the AWS documentation. We
create an iterator that uses the ListObjects method and the loop until the
cows come home.

The following code is going to assume you already have the AWS SDK installed and
configured. I’ve omitted my sanity checking logic as well:

<?php
require './path/to/your/autoload.php'

$region = 'Your Region'
$bucket = 'Your Bucket'
$key => 'Your Key'
$secret = 'Your Secret'
// Include this if you want to loop through a specific directory
// $prefix = 'Your Prefix';

$s3 = new AwsS3S3Client([
    'version' => 'latest',
    'region' => $region,
    'credentials' => [
        'key' => $key,
        'secret' => $secret,
    ],


$objects = $s3->getIterator('ListObjects', [
  'Bucket' => $bucket,
  // 'Prefix' => $prefix,


foreach ($objects as $object) {
  print_r($object

  // Deletes the current file in the iterator
  // $s3->deleteObject([
  //   'Bucket' => $bucket,
  //   'Key' => $key,
  // ]);
}

Love it or hate it, this PHP code isn’t all that bad. If you’re looking to
implement this in another language, it should be pretty simple to port to
another AWS SDK in another language.

Josh Sherman - The Man, The Myth, The Avatar

About Josh

Husband. Father. Pug dad. Musician. Founder of Holiday API, Head of Engineering and Emoji Specialist at Mailshake, and author of the best damn Lorem Ipsum Library for PHP.


If you found this article helpful, please consider buying me a coffee.