bb.io
Class FileParser

java.lang.Object
  extended by bb.io.FileParser
All Implemented Interfaces:
Closeable

public class FileParser
extends Object
implements Closeable

Many file formats consist of lines of data, with tokens of data on each line being separated by a constant set of delimiters. Familiar examples are tab, space, and comma delimited files. This class was written to aid the parsing of such file types.

You simply construct an instance for the desired file, along with regular expressions for the delimiter token(s) and nondata lines (e.g. comment or blank lines). Then you may repeatedly call readDataLine and process the data. When finished, call close.

Warning: parsing of files like tab, space, and comma delimited files may be a lot more complicated if the tokens themselves may contain any of the token delimiters. In this case, you will need to know how the delimiter is escaped so that it can appear inside a token (e.g. Excel may put double quotes around tokens).

This class is not multithread safe.

Author:
Brent Boyer

Field Summary
private  File file
           
private  ParseReader in
           
private  int lastLineNumber
           
private  Pattern nondataLinePattern
           
private  Pattern tokenDelimiterPattern
           
 
Constructor Summary
FileParser(File file, String tokenDelimiterRegexp, String nondataLineRegexp)
          Constructor.
 
Method Summary
 void close()
          Closes all resources associated with the parsing.
 String getLocation()
          Returns the location (line # and file path) associated with the previous call to readDataLine.
 boolean isNonDataLine(String line)
           
 String[] readDataLine()
          Reads the next line of data for the file, parses all the tokens on that line (using tokenDelimiterRegexp), and returns them.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

file

private final File file

in

private final ParseReader in

tokenDelimiterPattern

private final Pattern tokenDelimiterPattern

nondataLinePattern

private final Pattern nondataLinePattern

lastLineNumber

private int lastLineNumber
Constructor Detail

FileParser

public FileParser(File file,
                  String tokenDelimiterRegexp,
                  String nondataLineRegexp)
           throws IllegalArgumentException,
                  IllegalStateException,
                  SecurityException,
                  IOException,
                  UnsupportedEncodingException,
                  PatternSyntaxException
Constructor.

Parameters:
tokenDelimiterRegexp - regular expression to match token delimiters (e.g. "[ ]+|[\\t,]" matches one or more spaces, or a single tab or comma)
nondataLineRegexp - regular expression to nondata lines (e.g. "#.*|\\s*" matches any line which starts with '#' or which is empty or all whitespace); may be null in which case every line is treated as a data line
Throws:
IllegalArgumentException - if file is null, does not exist, is a directory, or if it refers to a file that cannot be read by this application; tokenDelimiterRegexp == null
IllegalStateException - if file holds more than Integer.MAX_VALUE bytes (which cannot be held in a java array)
SecurityException - if a security manager exists and its SecurityManager.checkRead(java.lang.String) method denies read access to file
IOException - if an I/O problem occurs
UnsupportedEncodingException - if the default char encoding used by ParseReader is not supported (this should never happen)
PatternSyntaxException - if either regex's syntax is invalid
Method Detail

readDataLine

public String[] readDataLine()
                      throws IOException
Reads the next line of data for the file, parses all the tokens on that line (using tokenDelimiterRegexp), and returns them. Any nondata lines encountered are skipped over. If end of file is encountered, then null is returned.

Throws:
IOException

isNonDataLine

public boolean isNonDataLine(String line)

getLocation

public String getLocation()
                   throws IllegalStateException
Returns the location (line # and file path) associated with the previous call to readDataLine. Typically use this method when reporting errors associated with the data obtained from that call.

Throws:
IllegalStateException - if getLocation called before readDataLine has ever been called

close

public void close()
Closes all resources associated with the parsing.

Specified by:
close in interface Closeable