Common mistake when dealing with Reader in Java

Posted April 26th, 2017 by Liv & filed under Blogroll, Tech.

I’ve encountered this one a few times and came across again recently and thought it relevant to deserve its own post, so here it is.

If you have done any I/O in Java you likely came across the Reader class, unlike the InputStream class(es) which deal with bytes, the Reader makes the transition into reading characters — taking into account encoding of characters — and providing a higher level abstraction for the programmer when dealing with data.

The programming interface to both of these classes is rather similar, they both offer such operations as: read, skip, close, reset, mark and as I mentioned one operates on bytes (InputStream) while the other operates on characters (Reader). There is a method though in the Reader class which cannot be found in InputStream: ready() — and this is the one which seems to confuse programmers occasionally.

I believe confusion comes from the naming — as “ready” (being close phonetically to “read”) might suggest that no read is possible anymore if this returns false; in other words we have reached the end of the stream! The short JavaDoc doesn’t help either as it states:

Tells whether this stream is ready to be read.

This doesn’t provide any explanation at first glance as to what it means if the stream is or isn’t ready to be read — and from what I can see a lot of programmers interpret this upon the lines of the above. As such they produce code like this:

BufferedReader reader = new BufferedReader(...);
while( reader.ready() ) { // keep reading while the stream has data !?!?
   String line = reader.readLine();
   // do some processing with the line
}

This code is wrong! And one has to look into the detailed JavaDoc to understand why — this is what the full JavaDoc for ready() reads:

True if the next read() is guaranteed not to block for input, false otherwise. Note that returning false does not guarantee that the next read will block.

So it turns out that ready() does NOT signal at all the fact that we have reached the end of the stream but instead whether the reader has any data (buffered internally) ready for us or whether it will have to block and wait for data to become available when a read() operation is invoked!

Looking at the Reader.read() method JavaDoc this is what we get:

The character read, as an integer in the range 0 to 65535 (0x00-0xffff), or -1 if the end of the stream has been reached

Which means the correct way to check for end of stream is actually read() == -1. Even more, if using something like a BufferedReader.readLine() (which we used in our example), the JavaDoc tells us:

A String containing the contents of the line, not including any line-termination characters, or null if the end of the stream has been reached

So another way to check for end of stream using a BufferedReader is readLine() == null.

As such the correct code would be:

BufferedReader reader = new BufferedReader(...);
String line;
while( (line = reader.readLine()) != null ) {
   // do some processing with the line
}

Comparing the 2 you can see the first approach will stop reading the moment the Reader‘s internal buffer gets exhausted — even though this doesn’t mean the data is finished, but instead that we require a blocking read to replenish the buffer! That, needless to say it, causes some weird (and unrepeatable!) issues in a production environment. I can only guess that ready() was introduced to support some sort of non-blocking I/O implementations in the early JDK’s: you can envisage a use-case where (in the early JDK versions) you can build a reader on top of a socket input stream and you don’t want to block your app on waiting for bytes to arrive down the wire but instead want to leave the underlying TCP layer to do all the heavy-lifting for you and only read from the socket once the data is available. Nowadays we have java.nio and a whole plethora of classes which can support non-blocking implementations around this so I’m going to venture to guess that the ready() method is no longer needed. Hopefully, Oracle will @Deprecate it but until they do be weary about it and don’t use it for checking for end-of-stream.

Liviu Tudor — Of Man and Internet

I’m a nobody, nobody is perfect, therefore I’m perfect.

One Response to “Common mistake when dealing with Reader in Java”

A Random Thought

About Me

Technologist. MarTech. User acquisition at scale. Advisor. Speaker.

Image

Interesting Sites

Me, Myself & I -- My Sites

Sites I Write For

More from my site