Utility for Version-Enabled AWS S3 Buckets

Posted by & filed under , .

AWS S3OK, Amazon, I have to tell you something: you dropped the ball a bit on this one! I absolutely love AWS and every day I seem to find something new about them (though, granted, not sure if that says something about the innovation in Amazon or it says something about my ignorance!?). However, having used S3 for a while, I have been waiting for a while for an utility like this that in the end I had to write it myself!

The problem that I have is that I use a few buckets where I have enabled Amazon’s versioning feature on the bucket. This means every time I write a file, S3 stores all the versions of a file, which is so cool because I can keep a full trail of each file changes, together with metadata around it. However, the problem arrives when you delete a file: any S3 file browser you get (including Amazon’s own web based one) shows that the file is no longer there, however, if you proceed to delete all the files in the bucket and then try to delete the bucket, you will get an error message which informs that in fact the bucket is not empty (and as such it cannot be deleted)! If you start digging into it, you will find that AWS API offers a specific set of functions for versioning — and once you employ those you start seeing that your bucket is far from empty and has lots of versions in it; unfortunately those versions are not visible because the file has been deleted, however, unless you delete these versions you cannot delete the bucket (not to mention that they are being stored in S3 in stealth “ninja” mode taking space — and costing you money).

That’s when I started looking for an utility who can deal with this, but up to now, I wasn’t that lucky. And the rule of today’s software tools says that if it’s not there it’s time to write one! Hence this utility — and the post here where I’m offering it for download.

By the way, if you got the idea and don’t want to put up with the details of the implementation, you can skip right to the end where there’s a link for the executable jar. Download the one-jar file and simply run:

java -jar aws-version-mgmt-1.0.0-SNAPSHOT.one-jar.jar

That on it’s own will print usage instructions and you can figure out the rest from there. For those of you who want to know a bit more, read on.

Right now, this project is “hosted” on my laptop — which is why I’m offering also a zip with all the source files and everything else — however, time-permitting, I’ll try to move this onto SourceForge and make it public, as it is with all open-source projects. On that, I will need help, since so far I have only implemented the functionality that I needed myself — however, there’s quite likely much more that can be done in the space of S3 versioning and as such I’d like to hear what else is needed out there. (Just drop me a line with your ideas/requirements and we’ll take it from there, nothing fancy like improvement requests or anything like that — though hopefully once I move it to SourceForce we’ll get to that too!)

Current version of this project is 1.0.0-SNAPSHOT (very maven-ized, I know!) — which is to say it’s pretty much the first cut of this baby — I just wanted to get it out into the wild, in a proper Silicon Valley manner, where it’s important to hit the market quick and iterate fast, right? 😉

As it stands in the current version, this project does only 2 things (though I’m already thinking of working on the 3rd one too, the “list orphaned” functionality):

  • lists all the versions in a bucket — this will list all the version for all the files in a bucket, grouped by file name and sorted (per file) in decreasing date order. Also it will show an asterisk (*) next to the latest (i.e. most recent) version and will mark “delete markers” as (D). NOTE: When a file is deleted from a bucket with versinoning enabled in S3, AWS will actually create a new version of that file, with no data attached to it and marked as deleted — so a file marked by this utility as (D) is simply a delete marker. Incidentally, when the latest version is a delete marker too, the file stops becoming visible in the S3 browser and you end up with “orphaned” versions which are not accessible by any other means but the API.
  • purges a bucket — this goes through every single version found in the bucket and deletes. At the end of this, you will have a bucket that is totally empty — you can either destroy the bucket or start again fresh. NOTE: Once a bucket has versioning enabled, there’s no going back, so the only way to have that bucket without versioning is to purge it, destroy it and create a new bucket with the same name!

Hopefully the code is not too difficult to read — the main “beef” is in com.liviutudor.aws.versionmgmt.Program, which is the class that makes calls to S3. It uses the Apache Commons CLI for parsing the command line parameters and obviously relies on AWS SDK for Java for API calls to S3. At this moment in time, the application is pretty simple, it takes a few parameters:

  • Your AWS access key;
  • AWS secret key — this is used in conjunction with the above to authenticate against AWS;
  • S3 Bucket name;
  • You can also specify a directory — if you use directory structures and only want to operate against files in a certain directory;
  • Then you can specify either the command for listing versions or the one for purging bucket.

It prints progress on stdout as to be expected as each version is traversed (or deleted) — and like I said, it uses a couple of markers:

  • star / asterisk (*) to mark the latest (current) version
  • (D) to mark a deleted file

I’ve added checkstyle / findbugs / PMD / etc in the project pom to ensure a bit of code quality — also I’ve enabled the maven-site-plugin to generate reports about these too; so if you’re building from source make sure you generate the reports as well (mvn site) at the end and ensure there’s no bugs/checkstyle issues.

To make it easier for users to use this out of the box, without installing dependencies or forgetting to copy files in the right folder etc, I’ve used the one-jar plugin which packages all the code and dependencies into a single big executable jar. This then can be executed by using java -jar as you would normally with any executable jar file.

Other than that, give me a shout if you want to help with putting this on SourceForge — as and when I get to do that I will add a post here with the URL but for now, like I said, I just wanted to push this out “to the masses” so to speak.

The one-jar executable can be downloaded from here:aws-version-mgmt-1.0.0-SNAPSHOT.one-jar.jar

Or if you want to download the full project sources, pom, etc, use this link: aws-version-mgmt.tar.bz2

Hopefully I’ll move this soon to SourceForge and y’all can add your 2 cents to it. Dziekuje! 🙂