bb.io
Class TarUtil

java.lang.Object
  extended by bb.io.TarUtil

public final class TarUtil
extends Object

Provides static utility methods for dealing with TAR Files.

This class is multithread safe: it is immutable (both its immediate state, as well as the deep state of its fields).

Author:
Brent Boyer
See Also:
Wikipedia TAR file format, BSD TAR file format, TAR file format, Forum posting on TAR and Java, jakarta compression library

Nested Class Summary
static class TarUtil.UnitTest
          See the Overview page of the project's javadocs for a general description of this unit test class.
 
Field Summary
private static String appendAll_key
           
private static String appendBackup_key
           
private static String appendExtension_key
           
private static String appendTimeStamp_key
           
private static String directoryExtraction_key
           
private static String filter_key
           
private static boolean giveUserFeedback
           
private static List<String> keysLegal_archive
          Specifies all the switch keys which can legally appear as command line arguments to main for an archive.
private static List<String> keysLegal_extract
          Specifies all the switch keys which can legally appear as command line arguments to main for an extract.
private static String overwrite_key
           
private static String pathsToArchive_key
           
private static long tarableFileSizeLimit
          Maximum size of a file that can be put into a TAR archive file by this class.
private static String tarFile_key
           
 
Constructor Summary
private TarUtil()
          This sole private constructor suppresses the default (public) constructor, ensuring non-instantiability outside of this class.
 
Method Summary
static void archive(File tarFile, FileFilter filter, File... pathsToArchive)
          Writes each element of pathsToArchive to a new TAR format archive file specified by tarFile.
private static void archive(File path, FileParent fileParent, org.apache.commons.compress.archivers.tar.TarArchiveOutputStream tarArchiveOutputStream, FileFilter filter)
          Writes path as a new TarArchiveEntry to tarArchiveOutputStream.
static void extract(File tarFile, File directoryExtraction, boolean overwrite)
          Extracts the contents of tarFile to directoryExtraction.
static org.apache.commons.compress.archivers.tar.TarArchiveEntry[] getEntries(File tarFile, boolean sortResult)
          Returns all the TarArchiveEntrys inside tarFile.
static org.apache.commons.compress.archivers.tar.TarArchiveEntry[] getEntries(org.apache.commons.compress.archivers.tar.TarArchiveInputStream tarArchiveInputStream, boolean sortResult)
          Returns all the TarArchiveEntrys that can next be read by tarArchiveInputStream.
private static InputStream getInputStream(File tarFile)
          If tarFile's extension is simply tar, then returns a new FileInputStream.
private static OutputStream getOutputStream(File tarFile)
          If tarFile's extension is simply tar, then returns a new FileOutputStream.
static boolean isTarable(File path)
          If path is a directory, then returns true.
static void main(String[] args)
          May be used either to archive to or extract from a TAR file.
private static void readInFile(File path, OutputStream out)
          Reads all the bytes from path and writes them to out.
private static void writeOutFile(org.apache.commons.compress.archivers.tar.TarArchiveInputStream tarArchiveInputStream, File path)
          Writes all the bytes from tarArchiveInputStream's current TarArchiveEntry to path.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

tarFile_key

private static final String tarFile_key
See Also:
Constant Field Values

pathsToArchive_key

private static final String pathsToArchive_key
See Also:
Constant Field Values

appendBackup_key

private static final String appendBackup_key
See Also:
Constant Field Values

appendTimeStamp_key

private static final String appendTimeStamp_key
See Also:
Constant Field Values

appendExtension_key

private static final String appendExtension_key
See Also:
Constant Field Values

appendAll_key

private static final String appendAll_key
See Also:
Constant Field Values

filter_key

private static final String filter_key
See Also:
Constant Field Values

directoryExtraction_key

private static final String directoryExtraction_key
See Also:
Constant Field Values

overwrite_key

private static final String overwrite_key
See Also:
Constant Field Values

keysLegal_archive

private static final List<String> keysLegal_archive
Specifies all the switch keys which can legally appear as command line arguments to main for an archive.


keysLegal_extract

private static final List<String> keysLegal_extract
Specifies all the switch keys which can legally appear as command line arguments to main for an extract.


tarableFileSizeLimit

private static final long tarableFileSizeLimit
Maximum size of a file that can be put into a TAR archive file by this class. The value here is for the classic TAR format, namely 8 GB = (2^33) - 1 = 8,589,934,591 bytes:
For historical reasons numerical values are encoded in octal with leading zeroes. The final character is either a null or a space. Thus although there are 12 bytes reserved for storing the file size, only 11 octal digits can be stored. This gives a maximum file size of 8 gigabytes on archived files. To overcome this limitation some versions of tar, including the GNU implementation, support an extension in which the file size is encoded in binary.
Tar (file format)

See Also:
Constant Field Values

giveUserFeedback

private static final boolean giveUserFeedback
See Also:
Constant Field Values
Constructor Detail

TarUtil

private TarUtil()
This sole private constructor suppresses the default (public) constructor, ensuring non-instantiability outside of this class.

Method Detail

main

public static void main(String[] args)
May be used either to archive to or extract from a TAR file. The action to perform and all of its specifications are embedded as command line switches in args.

If archiving, the source path(s) to be archived (which can be either normal files or directories) are specified as the (key/value) command line switch -pathsToArchive commaSeparatedListOfPaths. The target TAR archive file is the (key/value) command line switch -tarFile insertPathHere. The following optional switches may also be supplied:

  1. -appendBackup appends _backup to the TAR file's name
  2. -appendTimeStamp appends _ followed by a timestamp to the TAR file's name
  3. -appendExtension appends .tar to the TAR file's name
  4. -appendAll is equivalent to supplying all the above append options
  5. -filter fullyQualifiedClassName specifies the name of a FileFilter which limits what gets archived. Since this FileFilter class will be instantiated by a call to Class.forName, it must have a no-arg constructor.
For example, here is a complete command line that archives just the class files found under two different class directories:

                java  bb.io.TarUtil  -tarFile ../log/test.tar  -pathsToArchive ../class1,../class2  -filter bb.io.filefilter.ClassFilter
 

If extracting, the target directory to extract into is always specified as the command line switch -directoryExtraction insertPathHere. The source TAR archive file is the same -tarFile command line switch mentioned before. An optional switch -overwrite true/false may also be supplied to control if overwriting of existing normal files is allowed or not. By default overwriting is not allowed (an Exception will be thrown if extraction needs to overwrite an existing file). For example, here is a complete command line that extracts a TAR file to a specific directory, overwriting any existing files:


                java  bb.io.TarUtil  -tarFile ../log/test.tar  -directoryExtraction ../log/tarExtractOutput  -overwrite true
 

Optional GZIP compression/decompression may also be done when archiving/extracting a TAR file. Normally, the value for the -tarFile switch must be a path which ends in a ".tar" (case insensitive) extension. However, this program will also accept either ".tar.gz" or ".tgz" extensions, in which case it will automatically perform GZIP compression/decompression on the TAR file.

Note that the switches may appear in any order on the command line.

If this method is this Java process's entry point (i.e. first main method), then its final action is a call to System.exit, which means that this method never returns; its exit code is 0 if it executes normally, 1 if it throws a Throwable (which will be caught and logged). Otherwise, this method returns and leaves the JVM running.


isTarable

public static boolean isTarable(File path)
                         throws IllegalArgumentException,
                                SecurityException
If path is a directory, then returns true. Else if path is a normal file, then returns true if path's length (in bytes) is <= tarableFileSizeLimit, false otherwise. Else returns false.

Throws:
IllegalArgumentException - if path == null or path does not exist
SecurityException - if a security manager exists and its SecurityManager.checkRead method denies read access to path

archive

public static void archive(File tarFile,
                           FileFilter filter,
                           File... pathsToArchive)
                    throws Exception
Writes each element of pathsToArchive to a new TAR format archive file specified by tarFile. If any element is a directory, the entire contents of its directory tree will be archived (as limited by filter). Paths that would otherwise be archived may be screened out by supplying a non null value for filter.

Altho this method does not use DirUtil.getTree, it uses filter to control subdirectory exploration in a similar manner.

In general, the path stored in the archive is the path relative to the parent of the relevant element of pathsToArchive. For example, suppose that some element of pathsToArchive corresponds to D:/someDirectory, and suppose that that directory contains the subdirectory and child file D:/someDirectory/anotherDirectory/childFile. Then the paths stored in the archive are anotherDirectory and anotherDirectory/childFile respectively.

One complication with the above scheme is paths which are file system roots: they have no parents. Examples include the windows path C: or the unix path /. In cases like these, this method uses an imaginary parent name of the form rootXXX (where XXX is an integer). For example, on a windows machine, if pathsToArchive contains the paths C: and D:, then the contents of C: might be stored in the archive with a path that starts with root1, and the contents of D: may have an archive path that starts with root2. This behavior ensures that the archive preserves the separate origins of the 2 sources, which is necessary so that they do not get mixed when extracted.

The TAR archive witten by this method will use GNU TAR rules for the entry headers if long path names are encountered. This means that standard POSIX compliant programs that do not support the GNU TAR extension will be unable to extract the contents.

Optional GZIP compression may also be done. Normally, tarFile must be a path which ends in a ".tar" (case insensitive) extension. However, this method will also accept either ".tar.gz" or ".tgz" extensions, in which case it will perform GZIP compression on tarFile as part of archiving.

Parameters:
tarFile - the TAR File that will write the archive data to
filter - a FileFilter that can use to screen out paths from being written to the archive; may be null, which means everything inside pathsToArchive gets archived; if not null, see warnings in DirUtil.getTree on directory acceptance
pathsToArchive - array of all the paths to archive
Throws:
Exception - if any Throwable is caught; the Throwable is stored as the cause, and the message stores the path of tarFile; here are some of the possible causes:
  1. IllegalArgumentException if pathsToArchive == null; pathsToArchive.length == 0; tarFile == null; tarFile already exists and either is not a normal file or is but already has data inside it; tarFile has an invalid extension; any element of pathsToArchive is null, does not exist, cannot be read, is equal to tarFile, its path contains tarFile, or it fails isTarable
  2. SecurityException if a security manager exists and its SecurityManager.checkRead method denies read access to some path
  3. IOException if an I/O problem occurs

getOutputStream

private static OutputStream getOutputStream(File tarFile)
                                     throws IllegalArgumentException,
                                            IOException
If tarFile's extension is simply tar, then returns a new FileOutputStream. Else if tarFile's extension is tar.gz or tgz, then returns a new GZIPOutputStream wrapping a new FileOutputStream.

Note: the result is never buffered, since the TarArchiveOutputStream which will use the result always has an internal buffer.

Throws:
IllegalArgumentException - if tarFile has an unrecognized extension
IOException - if an I/O problem occurs
See Also:
gzip home page, .tar.gz file format FAQ

archive

private static void archive(File path,
                            FileParent fileParent,
                            org.apache.commons.compress.archivers.tar.TarArchiveOutputStream tarArchiveOutputStream,
                            FileFilter filter)
                     throws Exception
Writes path as a new TarArchiveEntry to tarArchiveOutputStream. If path is a normal file, then next writes path's data to tarArchiveOutputStream.

If path is a directory, then this method additionally calls itself on the contents (thus recursing thru the entire directory tree).

Warning: several popular programs (e.g. winzip) fail to display mere directory entries. Furthermore, if just a directory entry is present (i.e. it is empty), they also may fail to create a new empty directoy when extracting the TAR file's contents. These are bugs in their behavior.

An optional FileFilter can be supplied to screen out paths that would otherwise be archived.

This method does not close tarArchiveOutputStream: that is the responsibility of the caller.

The caller also must take on the responsibility to not do anything stupid, like write path more than once, or have the path be the same File that tarArchiveOutputStream is writing to.

Parameters:
path - the File to archive
fileParent - the FileParent for path
tarArchiveOutputStream - the TarArchiveOutputStream that will write the archive data to
filter - a FileFilter that can use to screen out certain files from being written to the archive; may be null (so everything specified by path gets archived)
Throws:
Exception - if any Throwable is caught; the Throwable is stored as the cause, and the message stores path's information; here are some of the possible causes:
  1. IllegalArgumentException if path fails isTarable
  2. SecurityException if a security manager exists and its SecurityManager.checkRead method denies read access to path
  3. IOException if an I/O problem occurs

readInFile

private static void readInFile(File path,
                               OutputStream out)
                        throws IOException
Reads all the bytes from path and writes them to out.

Throws:
IOException - if an I/O problem occurs

getEntries

public static org.apache.commons.compress.archivers.tar.TarArchiveEntry[] getEntries(File tarFile,
                                                                                     boolean sortResult)
                                                                              throws IllegalArgumentException,
                                                                                     IOException
Returns all the TarArchiveEntrys inside tarFile.

Parameters:
tarFile - the TAR format file to be read
sortResult - if true, then the result is first sorted by each entry's name before return; otherwise the order is the sequence read from tarFile
Throws:
IllegalArgumentException - if tarFile fails Check.validFile
IOException - if an I/O problem occurs

getEntries

public static org.apache.commons.compress.archivers.tar.TarArchiveEntry[] getEntries(org.apache.commons.compress.archivers.tar.TarArchiveInputStream tarArchiveInputStream,
                                                                                     boolean sortResult)
                                                                              throws IllegalArgumentException,
                                                                                     IOException
Returns all the TarArchiveEntrys that can next be read by tarArchiveInputStream.

Nothing should have been previously read from tarArchiveInputStream if the full result is desired. Nothing more can be read from tarArchiveInputStream when this method returns, since the final action will be to close tarArchiveInputStream.

Parameters:
tarArchiveInputStream - the TarArchiveInputStream to get the entries from
sortResult - if true, then the result is first sorted by each entry's name before return; otherwise the order is the sequence read from tarArchiveInputStream
Throws:
IllegalArgumentException - if tarArchiveInputStream == null
IOException - if an I/O problem occurs

extract

public static void extract(File tarFile,
                           File directoryExtraction,
                           boolean overwrite)
                    throws IllegalArgumentException,
                           SecurityException,
                           IllegalStateException,
                           IOException
Extracts the contents of tarFile to directoryExtraction.

It is an error if tarFile does not exist, is not a normal file, or is not in the proper TAR format. In contrast, directoryExtraction need not exist, since it (and any parent directories) will be created if necessary.

Optional GZIP decompression may also be done. Normally, tarFile must be a path which ends in a ".tar" (case insensitive) extension. However, this method will also accept either ".tar.gz" or ".tgz" extensions, in which case it will perform GZIP decompression on tarFile as part of extracting.

Parameters:
tarFile - the TAR archive file
directoryExtraction - the directory that will extract the contents of tarFile into
overwrite - specifies whether or not extraction is allowed to overwrite an existing normal file inside directoryExtraction
Throws:
IllegalArgumentException - if tarFile is not valid; if directoryExtraction fails DirUtil.ensureExists; tarFile has an invalid extension
SecurityException - if a security manager exists and its SecurityManager.checkRead method denies read access to tarFile or directoryExtraction
IllegalStateException - if directoryExtraction failed to be created or is not an actual directory but is some other type of file
IOException - if an I/O problem occurs

getInputStream

private static InputStream getInputStream(File tarFile)
                                   throws IllegalArgumentException,
                                          IOException
If tarFile's extension is simply tar, then returns a new FileInputStream. Else if tarFile's extension is tar.gz or tgz, then returns a new GZIPnputStream wrapping a new FileInputStream.

Note: the result is never buffered, since the TarArchiveInputStream which will use the result always has an internal buffer.

Throws:
IllegalArgumentException - if tarFile has an unrecognized extension
IOException - if an I/O problem occurs
See Also:
gzip home page, .tar.gz file format FAQ

writeOutFile

private static void writeOutFile(org.apache.commons.compress.archivers.tar.TarArchiveInputStream tarArchiveInputStream,
                                 File path)
                          throws IOException
Writes all the bytes from tarArchiveInputStream's current TarArchiveEntry to path.

Throws:
IOException - if an I/O problem occurs