bb.io
Class ParseReader

java.lang.Object
  extended by java.io.Reader
      extended by bb.io.ParseReader
All Implemented Interfaces:
Closeable, Readable

public class ParseReader
extends Reader

A Reader class designed for convenient and high performance parsing.

This class satisfies the exact same API as the PushbackReader class, so that it can be a drop-in replacement. The main differences between this class and PushbackReader are:

  1. the pushback capacity is not fixed at construction, but is dynamically increased as necessary, totally freeing the programmer from worry
  2. the skip method is overridden to correctly count line numbers as they are skipped over
  3. this class is not synchronized

This class also satisfies the exact same line number API as the LineNumberReader class. In particular, whenever data is read from the stream, it increments the line number count if have just read thru a complete line terminator sequence. (This class uses the same set of line termination sequences as LineNumberReader). The main differences between this class and LineNumberReader are:

  1. all line termination chars are exactly preserved by reads (LineNumberReader drops '\r' chars)
  2. the mark and reset methods are currently unsupported

In addition to the above APIs, this class adds some useful parsing methods like

See also this constructor for more discussion on the optimal Reader type.

This class is not multithread safe.

Author:
Brent Boyer

Nested Class Summary
static class ParseReader.UnitTest
          See the Overview page of the project's javadocs for a general description of this unit test class.
 
Field Summary
private  char[] buffer
          The single buffer simulataneously used for read ahead, pushback, and skip purposes.
private static int bufferLength_default
           
private static String charEncoding_default
           
private  int end
          Index (exclusive) of the position in buffer where data ends.
private  Reader in
          The underlying character-input stream.
private  int lineNumber
          Current line number.
private static int lineNumberInitial_default
           
private  int pushbackCapacity
          Records how much space at the beginning of buffer should be reserved for future pushbacks.
private  int start
          Index (inclusive) of the position in buffer where data starts.
 
Fields inherited from class java.io.Reader
lock
 
Constructor Summary
ParseReader(char[] buffer)
          Calls this( null, buffer, 0, buffer.length ).
ParseReader(File file)
          Calls this( new FileInputStream(file) ).
ParseReader(InputStream in)
          Calls this(in, charEncoding_default).
ParseReader(InputStream in, String charEncoding)
          Calls this( new InputStreamReader(in, charEncoding) ).
ParseReader(Reader in)
          Calls this( in, new char[bufferLength_default], 0, 0 ).
ParseReader(Reader in, char[] buffer, int start, int end)
          Calls this( in, buffer, start, end, lineNumberInitial_default ).
ParseReader(Reader in, char[] buffer, int start, int end, int lineNumber)
          The fundamental constructor.
 
Method Summary
private static Reader checkReader(Reader in)
           
 void close()
          Sets start > end, since that is one signal of the closed state.
 void confirmTokenNext(String token)
          Returns confirmTokenNext(token, true) (i.e. is always case sensitive).
 void confirmTokenNext(String token, boolean isCaseSensitive)
          Confirms that the supplied token's chars next occur on the stream.
private  void ensureBufferHasData()
          Checks that buffer has at least 1 char of data.
private  void ensureOpen()
          Checks that the stream has not been closed.
private  void ensurePushbackCapacity(int pushbackCapacityNeeded)
          Checks that buffer has the requested free space, increasing its size if necessary.
 int getLineNumber()
          Get the current line number.
 boolean hasData()
          Reports whether or not data can still be read.
private  boolean isNewLineNext()
          Reports if the next character on the stream is a newline char (i.e.
 boolean isTokenNext(String token)
          Returns isTokenNext(token, true) (i.e. is always case sensitive).
 boolean isTokenNext(String token, boolean isCaseSensitive)
          Determines if token's chars next occur on the stream.
 void mark(int readAheadLimit)
          Mark the present position in the stream.
 boolean markSupported()
          Tell whether this stream supports the mark() operation.
 int read()
          Returns the next char from the stream.
 int read(char[] cbuf, int offset, int length)
          Attempts to read chars into the specified portion of cbuf.
 String readLine()
          Reads all the characters in the stream up to but not including the next line termination sequence or until end of stream is hit.
 String readThruToken(String token)
          Returns readThruToken(token, true, false) (i.e. is always case sensitive and excludes token from result).
 String readThruToken(String token, boolean isCaseSensitive, boolean includeToken)
          Reads over as many chars as necessary until token is read thru.
 boolean ready()
          Tell whether this stream is ready to be read.
 void reset()
          Reset the stream.
private  void resizeBuffer(int capacityNew)
          Resizes buffer to the requested (greater) capacity.
 void setLineNumber(int lineNumber)
          Set the current line number.
 long skip(long n)
          This method attempts to skip over the requested number of characters.
 void skipFully(long n)
          This method guarantees to skip over the specified number of chars.
 long skipTillTokenNext(String token)
          Returns skipTillTokenNext(token, true) (i.e. is always case sensitive).
 long skipTillTokenNext(String token, boolean isCaseSensitive)
          Skips over as many chars as necessary until token is next on the stream.
 int skipWhitespace()
          Skips over all whitespace on the stream until hit first non-whitespace char (or end of stream); that first non-whitespace char will be what is next read from the stream.
 void unread(char[] cbuf)
          Convenience method that simply calls unread(cbuf, 0, cbuf.length).
 void unread(char[] cbuf, int offset, int length)
          Pushes back the specified portion of cbuf to the stream.
 void unread(int charAsInt)
          Pushes back the supplied char to the stream.
 
Methods inherited from class java.io.Reader
read, read
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

lineNumberInitial_default

private static final int lineNumberInitial_default
See Also:
Constant Field Values

bufferLength_default

private static final int bufferLength_default
See Also:
Constant Field Values

charEncoding_default

private static final String charEncoding_default
See Also:
encoding : Java Glossary

in

private Reader in
The underlying character-input stream. May initially be null, in which case all reads come solely from data initially in buffer. Will be null after this instance is closed.


buffer

private char[] buffer
The single buffer simulataneously used for read ahead, pushback, and skip purposes. Will never be null, except after this instance is closed. Will never be zero-length.


start

private int start
Index (inclusive) of the position in buffer where data starts. Along with end, this field defines where valid data exists in buffer, namely, for indices in the interval [start, end). All other space in buffer is considered to be free, and may be overwritten at will.

The relation start < end is always satisfied except (a) when there is no more data in the buffer, which is always signified by start == end or (b) when this instance has been closed, in which case start > end. In every case, start and end must both always be a value in the range [0, buffer.length].

The next read will usually return the char at start (and start will subsequently be incremented). Exception: if there is no data in buffer, then buffer will first be populated from in, with start (and end) assigned appropriate values for the data read in.

The next unread will usually write to start - 1 (and start will subsequently be decremented). Exception: if start == 0, then buffer will first be resized to allow pushback near the beginning of the array, with start incremented appropriate for the resize.


end

private int end
Index (exclusive) of the position in buffer where data ends. See documentation on start for more details on buffer's data format.

Like start, end must always be a value in the range [0, buffer.length].


pushbackCapacity

private int pushbackCapacity
Records how much space at the beginning of buffer should be reserved for future pushbacks.


lineNumber

private int lineNumber
Current line number. There are no restrictions on its value.

Constructor Detail

ParseReader

public ParseReader(Reader in,
                   char[] buffer,
                   int start,
                   int end,
                   int lineNumber)
            throws IllegalArgumentException
The fundamental constructor.

Parameters:
in - the underlying Reader from which characters will be read; may be null (i.e. reads solely come from contents of buffer)
buffer - assigned to the buffer field; for peak performance, user should ensure that it is sufficiently large; see also start for further details concerning the data format of buffer
start - will be assigned to the start field
lineNumber - will be assigned to the lineNumber field
Throws:
IllegalArgumentException - if buffer == null; buffer.length == 0; start < 0; start > buffer.length; (in == null) && (start == buffer.length)

ParseReader

public ParseReader(Reader in,
                   char[] buffer,
                   int start,
                   int end)
            throws IllegalArgumentException
Calls this( in, buffer, start, end, lineNumberInitial_default ).

Note: the best performance is obtained when Reader is an unbuffered "low Level" Reader (e.g. FileReader) and not when it is a higher level buffered Reader (e.g. BufferedReader). This is because this class always does it own buffering, so any buffering done by Reader will simply waste memory and involve extra method calls.

Parameters:
in - Reader from which chars will be read
Throws:
IllegalArgumentException - if buffer == null; buffer.length == 0; start < 0; start > buffer.length; if (in == null) && (start == buffer.length)

ParseReader

public ParseReader(char[] buffer)
            throws IllegalArgumentException
Calls this( null, buffer, 0, buffer.length ).

Parameters:
buffer - an array into which all chars have already been read
Throws:
IllegalArgumentException - if buffer == null

ParseReader

public ParseReader(Reader in)
            throws IllegalArgumentException
Calls this( in, new char[bufferLength_default], 0, 0 ).

Parameters:
in - Reader from which chars will be read
Throws:
IllegalArgumentException - if in == null

ParseReader

public ParseReader(InputStream in,
                   String charEncoding)
            throws UnsupportedEncodingException
Calls this( new InputStreamReader(in, charEncoding) ).

Parameters:
in - File InputStream which chars will be read
charEncoding - name of the Charset to use to decode bytes into chars
Throws:
UnsupportedEncodingException - if charEncoding is not supported

ParseReader

public ParseReader(InputStream in)
            throws UnsupportedEncodingException
Calls this(in, charEncoding_default).

Parameters:
in - File InputStream which chars will be read
Throws:
UnsupportedEncodingException - if charEncoding_default is not supported (this should never happen)

ParseReader

public ParseReader(File file)
            throws FileNotFoundException,
                   SecurityException,
                   UnsupportedEncodingException
Calls this( new FileInputStream(file) ).

Parameters:
file - File from which chars will be read
Throws:
FileNotFoundException - if file does not exist, is a directory rather than a regular file, or for some other reason cannot be opened for reading.
SecurityException - if a security manager exists and its checkRead method denies read access to file
UnsupportedEncodingException - if charEncoding_default is not supported (this should never happen)
Method Detail

checkReader

private static Reader checkReader(Reader in)
                           throws IllegalArgumentException
Throws:
IllegalArgumentException

getLineNumber

public int getLineNumber()
Get the current line number.


setLineNumber

public void setLineNumber(int lineNumber)
Set the current line number.


ensureOpen

private void ensureOpen()
                 throws IOException
Checks that the stream has not been closed.

Throws:
IOException

ensureBufferHasData

private void ensureBufferHasData()
                          throws RuntimeException,
                                 IOException
Checks that buffer has at least 1 char of data.

Will read from in, if it exists, if necessary. If it reads from in, it will, of course, change the start and end values as appropriate. In this case, however, this method guarantees that pushbackCapacity will be respected (i.e. after a read from in, start = pushbackCapacity).

Throws:
RuntimeException - if unable to guarantee that buffer has data
IOException - if an I/O problem occurs

ensurePushbackCapacity

private void ensurePushbackCapacity(int pushbackCapacityNeeded)
Checks that buffer has the requested free space, increasing its size if necessary.


resizeBuffer

private void resizeBuffer(int capacityNew)
Resizes buffer to the requested (greater) capacity. Data from the current buffer is transfered to the end of the new array (this reserves space for pushbacks).


read

public int read()
         throws IOException
Returns the next char from the stream. Additionally, it increments the line number count if the last char of a line terminator sequence was just read.

Overrides:
read in class Reader
Returns:
the char read, or -1 if the end of stream has been reached
Throws:
IOException - if an I/O problem occurs

read

public int read(char[] cbuf,
                int offset,
                int length)
         throws IllegalArgumentException,
                IOException
Attempts to read chars into the specified portion of cbuf. Will only block if no char initially available; will never block merely to read all the requested chars. Increments the line number count each time the last char of a line terminator sequence is encountered on the stream.

Specified by:
read in class Reader
Parameters:
cbuf - destination buffer
offset - offset in cbuf at which to start writing chars
length - maximum number of chars to read
Returns:
the number of chars actually read into cbuf, or -1 if the end of stream has already been reached
Throws:
IllegalArgumentException - if cbuf == null; offset < 0; length <= 0; offset + length > cbuf.length
IOException - if an I/O problem occurs

readLine

public String readLine()
                throws IOException
Reads all the characters in the stream up to but not including the next line termination sequence or until end of stream is hit. Additionally, it always increments the line number count (unless end of stream was encountered).

Returns:
a String containing the contents of the line, but not including any line-termination characters; result will be zero length if the next char(s) on the stream are a line termination sequence; result will be null if end of stream is immediately encountered
Throws:
IOException - if an I/O problem occurs

skip

public long skip(long n)
          throws IllegalArgumentException,
                 IOException
This method attempts to skip over the requested number of characters. Will only block if no char initially available; will never block merely to skip all the requested chars. Increments the line number count each time the last char of a line terminator sequence is encountered on the stream.

Overrides:
skip in class Reader
Parameters:
n - the number of chars to attempt to skip
Returns:
the number of characters actually skipped
Throws:
IllegalArgumentException - if n < 0
IOException - if an I/O problem occurs
See Also:
skipFully

skipFully

public void skipFully(long n)
               throws IOException,
                      EOFException
This method guarantees to skip over the specified number of chars. Will block as often as needed in order to guarantee the requested chars are skipped. If end of stream is encountered first, it throws an EOFException.

Parameters:
n - the number of chars required to skip
Throws:
IOException - if an I/O problem occurs
EOFException - if hit end of stream before skipping n chars
See Also:
skip

skipWhitespace

public int skipWhitespace()
                   throws IOException
Skips over all whitespace on the stream until hit first non-whitespace char (or end of stream); that first non-whitespace char will be what is next read from the stream.

Character.isWhitespace defines what constitutes whitespace.

Returns:
the number of whitespace chars skipped over; will be >= 0
Throws:
IOException - if an I/O problem occurs

unread

public void unread(int charAsInt)
            throws IllegalArgumentException,
                   IOException
Pushes back the supplied char to the stream. Additionally, it decrements the line number count if have just unread the last char of a line terminator sequence.

If charAsInt equals -1, this method immediately returns without doing anything. Thus, this method can always undo a call to read if supply the result returned by read.

Throws:
IllegalArgumentException - if charAsInt is neither a legitimate char value nor -1
IOException - if an I/O problem occurs

unread

public void unread(char[] cbuf,
                   int offset,
                   int length)
            throws IllegalArgumentException,
                   IOException
Pushes back the specified portion of cbuf to the stream. Additionally, it decrements the line number count each time the last char of a line terminator sequence goes by.

After this method returns, the next chars to be read will be cbuf[offset], cbuf[offset+1], etc. Thus, this method can undo a call to read if supply the char[] just read into.

Parameters:
cbuf - char array
offset - offset of first char to push back
length - number of chars to push back

Throws:
IllegalArgumentException - if cbuf == null; offset < 0; length <= 0; offset + length > cbuf.length
IOException - if an I/O problem occurs

unread

public void unread(char[] cbuf)
            throws IllegalArgumentException,
                   IOException
Convenience method that simply calls unread(cbuf, 0, cbuf.length).

Throws:
IllegalArgumentException - if there is a problem with one of the args
IOException - If an I/O problem occurs

close

public void close()
Sets start > end, since that is one signal of the closed state. Nulls the reference to buffer, which both allows it to be garbage collected as well as also signals the closed state. Finally, closes in then nulls the reference to it.

Specified by:
close in interface Closeable
Specified by:
close in class Reader

mark

public void mark(int readAheadLimit)
          throws IOException
Mark the present position in the stream.

The implementation here always throws an IOException.

Overrides:
mark in class Reader
Throws:
IOException - since mark is not supported

markSupported

public boolean markSupported()
Tell whether this stream supports the mark() operation.

The implementation here always returns false.

Overrides:
markSupported in class Reader

ready

public boolean ready()
              throws IOException
Tell whether this stream is ready to be read.

Overrides:
ready in class Reader
Throws:
IOException - if an I/O problem occurs

reset

public void reset()
           throws IOException
Reset the stream.

The implementation here always throws an IOException.

Overrides:
reset in class Reader
Throws:
IOException - since reset is not supported

isNewLineNext

private boolean isNewLineNext()
                       throws IOException
Reports if the next character on the stream is a newline char (i.e. '\n') or not.

From the caller's perspective, the stream's state is unaffected by this method. (Internally, reads into the buffer etc may have to be performed.)

Throws:
IOException - if an I/O problem occurs

hasData

public boolean hasData()
                throws IOException
Reports whether or not data can still be read. Unlike ready, this method will block if necessary, because it returns false only if end of stream has been reached, which is a stronger guarantee than ready provides.

From the caller's perspective, the stream's state is unaffected by this method. (Internally, reads into the buffer etc may have to be performed.)

Throws:
IOException - if an I/O problem occurs

isTokenNext

public boolean isTokenNext(String token)
                    throws IllegalArgumentException,
                           IOException
Returns isTokenNext(token, true) (i.e. is always case sensitive).

Throws:
IllegalArgumentException
IOException

isTokenNext

public boolean isTokenNext(String token,
                           boolean isCaseSensitive)
                    throws IllegalArgumentException,
                           IOException
Determines if token's chars next occur on the stream.

From the caller's perspective, the stream's state is unaffected by this method. (Internally, various buffer operations may have to be performed.)

Parameters:
isCaseSensitive - if true, specifies that case matters in matching the chars of token; false means that case is irrelevant
Throws:
IllegalArgumentException - if token is null or zero-length
IOException - if an I/O problem occurs

skipTillTokenNext

public long skipTillTokenNext(String token)
                       throws IllegalArgumentException,
                              IOException
Returns skipTillTokenNext(token, true) (i.e. is always case sensitive).

Throws:
IllegalArgumentException
IOException

skipTillTokenNext

public long skipTillTokenNext(String token,
                              boolean isCaseSensitive)
                       throws IllegalArgumentException,
                              IOException
Skips over as many chars as necessary until token is next on the stream.

Parameters:
isCaseSensitive - if true, specifies that case matters in matching the chars of token; false means that case is irrelevant
Returns:
the number of chars skipped over before token was found; returns -1 if hit end of stream first
Throws:
IllegalArgumentException - if token is null or zero-length
IOException - if an I/O problem occurs

confirmTokenNext

public void confirmTokenNext(String token)
                      throws IllegalArgumentException,
                             IOException,
                             ParseException
Returns confirmTokenNext(token, true) (i.e. is always case sensitive).

Throws:
IllegalArgumentException
IOException
ParseException

confirmTokenNext

public void confirmTokenNext(String token,
                             boolean isCaseSensitive)
                      throws IllegalArgumentException,
                             IOException,
                             ParseException
Confirms that the supplied token's chars next occur on the stream. Chars are read off the stream until either a mismatch is encountered or all of token's chars have been matched.

If token is fully read, then the stream's state is affected by this method: all the values are read off and the line number count may be increased. If a mismatch is encountered, however, the stream state is restored before return which, in this case, will be abnormal termination with a ParseException thrown. Abnormal termination thru any other exception, however, may leave the stream in an indeterminate state.

Throws:
IllegalArgumentException - if token is null or zero-length
IOException - if an I/O problem occurs
ParseException - if token is not fully matched by the stream's next contents; the stream's state will be restored before return

readThruToken

public String readThruToken(String token)
                     throws IllegalArgumentException,
                            IOException,
                            IllegalStateException
Returns readThruToken(token, true, false) (i.e. is always case sensitive and excludes token from result).

Throws:
IllegalArgumentException
IOException
IllegalStateException

readThruToken

public String readThruToken(String token,
                            boolean isCaseSensitive,
                            boolean includeToken)
                     throws IllegalArgumentException,
                            IOException,
                            IllegalStateException
Reads over as many chars as necessary until token is read thru.

Parameters:
isCaseSensitive - if true, specifies that case matters in matching the chars of token; false means that case is irrelevant
includeToken - if true, specifies that token's chars are included (at the end) of the result; false means that they are left out; note that if includeToken is true and isCaseSensitive is false, then the case of these token chars included in the result may differ from the case of what occurred on the stream
Returns:
all chars which were read up to token plus, if includeToken is true, token itself (otherwise token is excluded)
Throws:
IllegalArgumentException - if token is null or zero-length
IOException - if an I/O problem occurs
IllegalStateException - if fail to read thru token