Groovy Scripting at the Power of Commons CLI

Posted by & filed under , , .

ScriptingI’ve found myself more and more scripting stuff in Groovy — partly I’m guessing it’s cause of my Java background, partly because I’m probably not such a good bash/awk/perl/sed/etc script hacker 🙂 Nevertheless, make no mistakes, Groovy is an awesome scripting tool! (I’m not going to praise its benefits in terms of building applications — they do exist! — just strictly concentrate on using it as a scripting tool.)

While it has its own powers on its own — I found a new cool mix recently: Groovy + Apache Commons CLI. The 2 allow you to quickly mix up the Groovy scripting facilities with the power of parsing command line parameters and validate them — the result being a more flexible command-line scripting framework.

Here’s a small example that tortured me in the past: take a simple text file — it contains words/characters, whitespaces (tabs, spaces and new lines) and some punctuation marks perhaps. Now you want to “compact” this a bit and eliminate redundant spaces and new lines — from a reader’s point of view for instance might not matter if you have 2, 3, 5 spaces or just one, also it might not matter if you have 5 new lines or just one (in fact one new line might be preferable as it might not involve so much scrolling). So you want to replace all the repeated spaces with just one single space. You could of course replace a regex like /\\s+/ with a space — however, that regex will replace a set of newlines with a single space too! And if you want to convert instead only a set of 2 or more newlines into a single newline that won’t do it. You could of course catch just /[ \\t]+/ and have another separate regex which catches /\\n+/ and replaces it with a single \n — however, you can now end up with the case when a line contains a single space and a newline and you need to count that as a simple new line and if it’s followed by another newline replace the 2 of them with a single new line… and the list can go on!

Hardcore perl guru’s by now are licking their lips to tell me this can be done on one line of perl with some uber-weird regex 🙂 Yes, I’m sure it can — in fact, it can be done via some awkward awk/sed too 🙂 And probably a dozen of other scripting methods. But my point here is to show how this can be done simple in a Groovy script — and secondly, it took me a while to discover how to do that using awk, but it only took me less than 15 minutes to put together this script! So it’s a matter of what you know best — as usual — when it comes to scripting.

If you’re familiar with the Apache Commons CLI framework, then you know it’s pretty much as simple as creating a bunch of Option‘s and passing them to a parser to have the parameters validated and then be able to retrieve the values passed in the command line. In this case, I want my script to be a bit more flexible — if called on it’s own it will simply read data from the console input (so it can be involved in a piped command) or one can also specify a file name, which, if present, will be used as input. (Yes, I know you can do cat <filename> > myprogram to achieve the same — but bear in mind of the implications of that: you execute a command first, then you take the output of that and use it in your program! Whereas in this case it takes just a single process to be executed to achieve the same.) Also, I thought it would be good to be able to specify an encoding too — I’ve had plenty of experiences in the past where these details are ignored, and all of a sudden consuming a file created in an Windows environment (default character encoding ISO-8859-1) on a Linux machine (default character encoding in most cases UTF-8) or a Mac (MacRoman char encoding) creates so many problems! (Again, yes, you can run the file through iconv — but you end up executing another process first, and secondly you’re generating a second file in the process!)

I’m a big fan of the POSIX command-line switches (-f <filename> etc) so I will use this parser in this example — however, if you want to use a different one, just change the way the cmdParser variable is initialized in the code below. Since we need a filename and an encoding, using the POSIX system, it comes naturally to use -f <filename> and -e <encoding>. And also it’s nice to have a -help option to present the user with a summary of usage and options supported.

As such, in our script we would do something like this:

def createOptions() {
    def opts = new Options()
    opts.addOption "help", false, "provides help on using this"
    opts.addOption "f", true, "File to read data from (assumes default console input)"
    opts.addOption "e", true, "File encoding (assumes UTF-8)"
    return opts
}
def opt = createOptions()
def cmdParser = new PosixParser()
def cmd
try {
    cmd = cmdParser.parse(createOptions(), this.args)
    if( cmd.hasOption("help") )
        throw new Exception( "show help")
} catch(Exception e) {
    new HelpFormatter().printHelp "GroovyCLI", createOptions()
    return 1
}

Note that you can take the preparation of the options outside the function — I just feel shoving all that in a function makes it clearer to read what the script does:

  1. Creates the options
  2. Creates the command-line parameters parser
  3. Runs the parameters passed in the command-line (this.args) to this script through the parser

Then comes the bit about using the command-line parameters; as I said, if a file is not specified, we’ll be using the standard input; also, if there’s no encoding specified, it’s nice to have a default one — in this case I went for UTF-8 since it’s widely-spread. The code to sort this out looks like this:

//decide encoding
def encoding = "UTF8"
if( cmd.hasOption("e") )
    encoding = cmd.getOptionValue("e")
 
//use standard input or file specified
def input = System.in
if( cmd.hasOption("f"))
    input = new FileInputStream(cmd.getOptionValue("f"))
input = new BufferedReader( new InputStreamReader(input, encoding))

I know by now some of you would argue that the whole “preparation” of the script takes about 30-40 lines — but imagine what the code would be like if you didn’t use CLI:

for( a in this.args ) {
 if( a == "-f" ) {
   //need to check next arg, if not found, error
   //if found, make sure it's not -e or -help or any of the other switches 
   //...
 }
}

And this is just to deal with the filename! The actual script can be write in more than a hundred ways to be fair (have that, perl developers! 😛 lol) — but below is my version:

def newLine = false
input.eachLine { line ->
    line = line.replaceAll( /[ \t]+/, " ")
    if( line ==~ /[ \t]?/ ) {
        //suck up empty lines and treat them as simple new lines
        newLine = true
    } else {
        //time to print a new line and continue
        println "\n" + line
        newLine = false
    }
}

Now wrap up everything in a command-line like this:

groovy --classpath commons-cli-1.2.jar GroovyCLI.groovy [add your parameters here]

and you’re off! Neat or what? 😉

Attached the source code available for download: GroovyCLI.groovy.bz2