Dependency overload … or laziness?

Posted by & filed under , .

This is something that started bugging me back in the maven area — when I switched from the likes of Ant as a build tool, which was relying on the user being explicit about a lot of things and doing a lot of the grunt work for it, to the maven world. Now maven was great some said because of the dependency management that it offered.

That did help, I must agree. When you pulled a library it saved the developers the time to find the dependencies, include them in build and so on. It put the onus on the library developer to provide the right set of dependencies needed for the library to work correctly.

And that’s where it started hitting us I think!

The issue that I have with that is that we empowered developers to provide decent dependencies but with that power comes great responsibility — and I see often this responsibilities not met. Because after all pulling an extra dependency comes with no cost for us, the tools we use, be them maven, gradle, ivy etc do this heavy lifting for us. So really, for any library we decide to pull into our project directly, the developer can do whatever the heck he wants and in most cases we won’t notice. This is also accentuated by the fact that RAM is pretty cheap for your average server nowadays so realistically until our dependencies starts growing to about 1Gb I bet you your average dev doesn’t care about what they pull in their project. And I think this is what allows a lot of projects to grow fat around the belly so to speak.

Because if I find a library that offers one class or method I need, I can just use it and save myself a few hours of coding right? I know that my library users won’t mind an extra 1Mb of jar files. And in most cases they won’t know any different!

The only time they start noticing is when one library of different versions starts getting pulled in and the classloader barfs. And that’s when they start looking at their dependency graphs — and pull their hair out 🙂

This has happened to me a few times in a few projects — and that’s when I discovered how lazy we have become since automatic dependency management became a thing.

Let me give you some examples — and I’m not singling out here a library in particular, sadly there are numerous examples of this kind.

For example, if you pull Netflix Governator library you get this dependency tree with it:

+--- com.netflix.governator:governator:1.14.+ -> 1.16.0
+--- com.netflix.governator:governator-api:1.16.0
| \--- javax.inject:javax.inject:1
+--- com.netflix.governator:governator-core:1.16.0
| +--- com.netflix.governator:governator-api:1.16.0 (*)
| +--- javax.inject:javax.inject:1
| +--- org.slf4j:slf4j-api:1.7.2 -> 1.7.25
| +--- com.google.inject:guice:4.1.0
| | +--- javax.inject:javax.inject:1
| | +--- aopalliance:aopalliance:1.0
| | \--- com.google.guava:guava:19.0 -> 20.0
| +--- com.google.inject.extensions:guice-multibindings:4.1.0
| | \--- com.google.inject:guice:4.1.0 (*)
| \--- com.google.inject.extensions:guice-grapher:4.1.0
| +--- com.google.inject.extensions:guice-assistedinject:4.1.0
| | \--- com.google.inject:guice:4.1.0 (*)
| +--- com.google.inject.extensions:guice-multibindings:4.1.0 (*)
| \--- com.google.inject:guice:4.1.0 (*)
+--- org.hibernate:hibernate-validator:4.1.0.Final -> 5.2.4.Final
| +--- javax.validation:validation-api:1.1.0.Final
| +--- org.jboss.logging:jboss-logging:3.2.1.Final
| \--- com.fasterxml:classmate:1.1.0
+--- org.ow2.asm:asm:5.0.4
\--- com.fasterxml.jackson.core:jackson-databind:2.4.3 -> 2.7.2 (*)

This looks ok at first glance, you expect a whole bunch of Google Guice-related stuff to be pulled in. But why hibernate???? And that brings in of course JBoss logging?

Here’s another one: pull the Spymemcached client for memcached and you also get …. findbugs-related libraries???

net.spy:spymemcached:2.11.4 -> 2.11.4+4
+--- com.google.code.findbugs:annotations:3.0.1 (*)
+--- org.hamcrest:hamcrest-core:1.3
+--- log4j:log4j:1.2.17
\--- org.slf4j:slf4j-api:1.7.13 -> 1.7.25

This is clearly because the maintainers are using FindBugs in their build process and annotated a few cases to be exempt from those checks. But why do we actually need to pull that library ourselves?

And did you know that if you decide to use elasticsearch client you also get yaml support???

--- org.elasticsearch:elasticsearch:1.5.2 -> 1.7.6
+--- org.apache.lucene:lucene-core:4.10.4
+--- org.apache.lucene:lucene-analyzers-common:4.10.4
| \--- org.apache.lucene:lucene-core:4.10.4
+--- org.apache.lucene:lucene-queries:4.10.4
| \--- org.apache.lucene:lucene-core:4.10.4
+--- org.apache.lucene:lucene-memory:4.10.4
| \--- org.apache.lucene:lucene-core:4.10.4
+--- org.apache.lucene:lucene-highlighter:4.10.4
| +--- org.apache.lucene:lucene-core:4.10.4
| +--- org.apache.lucene:lucene-memory:4.10.4 (*)
| \--- org.apache.lucene:lucene-queries:4.10.4 (*)
+--- org.apache.lucene:lucene-queryparser:4.10.4
| +--- org.apache.lucene:lucene-core:4.10.4
| +--- org.apache.lucene:lucene-queries:4.10.4 (*)
| \--- org.apache.lucene:lucene-sandbox:4.10.4
| \--- org.apache.lucene:lucene-core:4.10.4
+--- org.apache.lucene:lucene-sandbox:4.10.4 (*)
+--- org.apache.lucene:lucene-suggest:4.10.4
| +--- org.apache.lucene:lucene-analyzers-common:4.10.4 (*)
| +--- org.apache.lucene:lucene-core:4.10.4
| +--- org.apache.lucene:lucene-misc:4.10.4
| | \--- org.apache.lucene:lucene-core:4.10.4
| \--- org.apache.lucene:lucene-queries:4.10.4 (*)
+--- org.apache.lucene:lucene-misc:4.10.4 (*)
+--- org.apache.lucene:lucene-join:4.10.4
| +--- org.apache.lucene:lucene-core:4.10.4
| \--- org.apache.lucene:lucene-grouping:4.10.4
| +--- org.apache.lucene:lucene-core:4.10.4
| \--- org.apache.lucene:lucene-queries:4.10.4 (*)
+--- org.apache.lucene:lucene-grouping:4.10.4 (*)
+--- org.apache.lucene:lucene-spatial:4.10.4
| +--- org.apache.lucene:lucene-core:4.10.4
| +--- org.apache.lucene:lucene-queries:4.10.4 (*)
| \--- com.spatial4j:spatial4j:0.4.1 -> 0.5
\--- org.yaml:snakeyaml:1.12 -> 1.17

Why on world would I need yaml in this context?

Here’s some (huuuuge) dependency list for Apache Cassandra also:

\--- org.apache.cassandra:cassandra-all:2.0.12
+--- org.xerial.snappy:snappy-java:1.0.5 -> 1.1.1.7
+--- net.jpountz.lz4:lz4:1.2.0 -> 1.3.0
+--- com.ning:compress-lzf:0.8.4 -> 0.9.5
+--- com.google.guava:guava:15.0 -> 20.0
+--- commons-cli:commons-cli:1.1 -> 1.4
+--- commons-codec:commons-codec:1.2 -> 1.10
+--- org.apache.commons:commons-lang3:3.1 -> 3.5
+--- com.googlecode.concurrentlinkedhashmap:concurrentlinkedhashmap-lru:1.3
+--- org.antlr:antlr:3.2
| \--- org.antlr:antlr-runtime:3.2 -> 3.4 (*)
+--- org.slf4j:slf4j-api:1.7.2 -> 1.7.25
+--- org.codehaus.jackson:jackson-core-asl:1.9.2 -> 1.9.13
+--- org.codehaus.jackson:jackson-mapper-asl:1.9.2 -> 1.9.13 (*)
+--- jline:jline:1.0
+--- com.googlecode.json-simple:json-simple:1.1
+--- com.github.stephenc.high-scale-lib:high-scale-lib:1.1.2
+--- org.yaml:snakeyaml:1.11 -> 1.17
+--- edu.stanford.ppl:snaptree:0.1
+--- org.mindrot:jbcrypt:0.3m
+--- com.yammer.metrics:metrics-core:2.2.0 (*)
+--- com.addthis.metrics:reporter-config:2.1.0
| +--- org.slf4j:slf4j-api:1.7.2 -> 1.7.25
| +--- org.yaml:snakeyaml:1.12 -> 1.17
| +--- org.hibernate:hibernate-validator:4.3.0.Final -> 5.2.4.Final (*)
| \--- com.yammer.metrics:metrics-core:2.2.0 (*)
+--- com.thinkaurelius.thrift:thrift-server:0.3.7
| +--- com.lmax:disruptor:3.0.1
| +--- org.apache.thrift:libthrift:0.9.1 -> 0.9.2
| | +--- org.slf4j:slf4j-api:1.5.8 -> 1.7.25
| | +--- org.apache.httpcomponents:httpclient:4.2.5 -> 4.5.3 (*)
| | \--- org.apache.httpcomponents:httpcore:4.2.4 -> 4.4.6
| +--- org.slf4j:slf4j-api:1.6.1 -> 1.7.25
| +--- org.slf4j:slf4j-log4j12:1.7.2 -> 1.7.25 (*)
| \--- junit:junit:4.8.1 -> 4.12
| \--- org.hamcrest:hamcrest-core:1.3
+--- net.sf.supercsv:super-csv:2.1.0
+--- log4j:log4j:1.2.16 -> 1.2.17
+--- org.apache.thrift:libthrift:0.9.1 -> 0.9.2 (*)
+--- org.apache.cassandra:cassandra-thrift:2.0.12
| +--- org.apache.commons:commons-lang3:3.1 -> 3.5
| +--- org.slf4j:slf4j-api:1.7.2 -> 1.7.25
| \--- org.apache.thrift:libthrift:0.9.1 -> 0.9.2 (*)
+--- com.github.stephenc:jamm:0.2.5
+--- io.netty:netty:3.6.6.Final -> 3.10.5.Final
\--- org.slf4j:slf4j-log4j12:1.7.2 -> 1.7.25 (*)

There are some interesting ones here: yammer metrics? addthis metrics? And look at this one: supercsv! Why oh why would one need CSV support in Cassandra? Not to mention there is a bunch of stuff in there I have no idea what it’s for : jline, jamm? I can bet good money that you can explicitly exclude some of them and everything will still work fine.

And one last example: if you pull kafka client in your app, do you know you get automatically Scala support? You don’t believe me, have a look:

\--- org.apache.kafka:kafka_2.11:0.9.0.0
+--- com.101tec:zkclient:0.7
| +--- org.slf4j:slf4j-api:1.6.1 -> 1.7.25
| +--- org.slf4j:slf4j-log4j12:1.6.1 -> 1.7.25 (*)
| +--- log4j:log4j:1.2.15 -> 1.2.17
| \--- org.apache.zookeeper:zookeeper:3.4.6
| +--- org.slf4j:slf4j-api:1.6.1 -> 1.7.25
| +--- org.slf4j:slf4j-log4j12:1.6.1 -> 1.7.25 (*)
| +--- log4j:log4j:1.2.16 -> 1.2.17
| +--- jline:jline:0.9.94 -> 1.0
| \--- io.netty:netty:3.7.0.Final -> 3.10.5.Final
+--- com.yammer.metrics:metrics-core:2.2.0
| \--- org.slf4j:slf4j-api:1.7.2 -> 1.7.25
+--- org.scala-lang.modules:scala-xml_2.11:1.0.4
| \--- org.scala-lang:scala-library:2.11.4 -> 2.11.7
+--- org.scala-lang:scala-library:2.11.7
+--- org.scala-lang.modules:scala-parser-combinators_2.11:1.0.4
| \--- org.scala-lang:scala-library:2.11.6 -> 2.11.7
+--- org.apache.kafka:kafka-clients:0.9.0.0 -> 0.9.0.2-nflx.16 (*)
+--- net.sf.jopt-simple:jopt-simple:3.2
+--- org.slf4j:slf4j-log4j12:1.7.6 -> 1.7.25 (*)
\--- org.apache.zookeeper:zookeeper:3.4.6 (*)

I think gradle, Maven and all these tools are great — they are so great that we allow our projects to grow a bit too fat. This to me reads one thing: Winter is coming 🙂