356 lines
		
	
	
		
			14 KiB
		
	
	
	
		
			Groff
		
	
	
	
	
	
			
		
		
	
	
			356 lines
		
	
	
		
			14 KiB
		
	
	
	
		
			Groff
		
	
	
	
	
	
| .\" Copyright (c) 2003-2009 Tim Kientzle
 | |
| .\" All rights reserved.
 | |
| .\"
 | |
| .\" Redistribution and use in source and binary forms, with or without
 | |
| .\" modification, are permitted provided that the following conditions
 | |
| .\" are met:
 | |
| .\" 1. Redistributions of source code must retain the above copyright
 | |
| .\"    notice, this list of conditions and the following disclaimer.
 | |
| .\" 2. Redistributions in binary form must reproduce the above copyright
 | |
| .\"    notice, this list of conditions and the following disclaimer in the
 | |
| .\"    documentation and/or other materials provided with the distribution.
 | |
| .\"
 | |
| .\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
 | |
| .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
 | |
| .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
 | |
| .\" ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
 | |
| .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
 | |
| .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
 | |
| .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
 | |
| .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
 | |
| .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
 | |
| .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
 | |
| .\" SUCH DAMAGE.
 | |
| .\"
 | |
| .\" $FreeBSD: head/lib/libarchive/libarchive-formats.5 201077 2009-12-28 01:50:23Z kientzle $
 | |
| .\"
 | |
| .Dd December 27, 2009
 | |
| .Dt libarchive-formats 5
 | |
| .Os
 | |
| .Sh NAME
 | |
| .Nm libarchive-formats
 | |
| .Nd archive formats supported by the libarchive library
 | |
| .Sh DESCRIPTION
 | |
| The
 | |
| .Xr libarchive 3
 | |
| library reads and writes a variety of streaming archive formats.
 | |
| Generally speaking, all of these archive formats consist of a series of
 | |
| .Dq entries .
 | |
| Each entry stores a single file system object, such as a file, directory,
 | |
| or symbolic link.
 | |
| .Pp
 | |
| The following provides a brief description of each format supported
 | |
| by libarchive, with some information about recognized extensions or
 | |
| limitations of the current library support.
 | |
| Note that just because a format is supported by libarchive does not
 | |
| imply that a program that uses libarchive will support that format.
 | |
| Applications that use libarchive specify which formats they wish
 | |
| to support, though many programs do use libarchive convenience
 | |
| functions to enable all supported formats.
 | |
| .Ss Tar Formats
 | |
| The
 | |
| .Xr libarchive 3
 | |
| library can read most tar archives.
 | |
| However, it only writes POSIX-standard
 | |
| .Dq ustar
 | |
| and
 | |
| .Dq pax interchange
 | |
| formats.
 | |
| .Pp
 | |
| All tar formats store each entry in one or more 512-byte records.
 | |
| The first record is used for file metadata, including filename,
 | |
| timestamp, and mode information, and the file data is stored in
 | |
| subsequent records.
 | |
| Later variants have extended this by either appropriating undefined
 | |
| areas of the header record, extending the header to multiple records,
 | |
| or by storing special entries that modify the interpretation of
 | |
| subsequent entries.
 | |
| .Pp
 | |
| .Bl -tag -width indent
 | |
| .It Cm gnutar
 | |
| The
 | |
| .Xr libarchive 3
 | |
| library can read GNU-format tar archives.
 | |
| It currently supports the most popular GNU extensions, including
 | |
| modern long filename and linkname support, as well as atime and ctime data.
 | |
| The libarchive library does not support multi-volume
 | |
| archives, nor the old GNU long filename format.
 | |
| It can read GNU sparse file entries, including the new POSIX-based
 | |
| formats, but cannot write GNU sparse file entries.
 | |
| .It Cm pax
 | |
| The
 | |
| .Xr libarchive 3
 | |
| library can read and write POSIX-compliant pax interchange format
 | |
| archives.
 | |
| Pax interchange format archives are an extension of the older ustar
 | |
| format that adds a separate entry with additional attributes stored
 | |
| as key/value pairs immediately before each regular entry.
 | |
| The presence of these additional entries is the only difference between
 | |
| pax interchange format and the older ustar format.
 | |
| The extended attributes are of unlimited length and are stored
 | |
| as UTF-8 Unicode strings.
 | |
| Keywords defined in the standard are in all lowercase; vendors are allowed
 | |
| to define custom keys by preceding them with the vendor name in all uppercase.
 | |
| When writing pax archives, libarchive uses many of the SCHILY keys
 | |
| defined by Joerg Schilling's
 | |
| .Dq star
 | |
| archiver and a few LIBARCHIVE keys.
 | |
| The libarchive library can read most of the SCHILY keys
 | |
| and most of the GNU keys introduced by GNU tar.
 | |
| It silently ignores any keywords that it does not understand.
 | |
| .It Cm restricted pax
 | |
| The libarchive library can also write pax archives in which it
 | |
| attempts to suppress the extended attributes entry whenever
 | |
| possible.
 | |
| The result will be identical to a ustar archive unless the
 | |
| extended attributes entry is required to store a long file
 | |
| name, long linkname, extended ACL, file flags, or if any of the standard
 | |
| ustar data (user name, group name, UID, GID, etc) cannot be fully
 | |
| represented in the ustar header.
 | |
| In all cases, the result can be dearchived by any program that
 | |
| can read POSIX-compliant pax interchange format archives.
 | |
| Programs that correctly read ustar format (see below) will also be
 | |
| able to read this format; any extended attributes will be extracted as
 | |
| separate files stored in
 | |
| .Pa PaxHeader
 | |
| directories.
 | |
| .It Cm ustar
 | |
| The libarchive library can both read and write this format.
 | |
| This format has the following limitations:
 | |
| .Bl -bullet -compact
 | |
| .It
 | |
| Device major and minor numbers are limited to 21 bits.
 | |
| Nodes with larger numbers will not be added to the archive.
 | |
| .It
 | |
| Path names in the archive are limited to 255 bytes.
 | |
| (Shorter if there is no / character in exactly the right place.)
 | |
| .It
 | |
| Symbolic links and hard links are stored in the archive with
 | |
| the name of the referenced file.
 | |
| This name is limited to 100 bytes.
 | |
| .It
 | |
| Extended attributes, file flags, and other extended
 | |
| security information cannot be stored.
 | |
| .It
 | |
| Archive entries are limited to 8 gigabytes in size.
 | |
| .El
 | |
| Note that the pax interchange format has none of these restrictions.
 | |
| .El
 | |
| .Pp
 | |
| The libarchive library also reads a variety of commonly-used extensions to
 | |
| the basic tar format.
 | |
| These extensions are recognized automatically whenever they appear.
 | |
| .Bl -tag -width indent
 | |
| .It Numeric extensions.
 | |
| The POSIX standards require fixed-length numeric fields to be written with
 | |
| some character position reserved for terminators.
 | |
| Libarchive allows these fields to be written without terminator characters.
 | |
| This extends the allowable range; in particular, ustar archives with this
 | |
| extension can support entries up to 64 gigabytes in size.
 | |
| Libarchive also recognizes base-256 values in most numeric fields.
 | |
| This essentially removes all limitations on file size, modification time,
 | |
| and device numbers.
 | |
| .It Solaris extensions
 | |
| Libarchive recognizes ACL and extended attribute records written
 | |
| by Solaris tar.
 | |
| Currently, libarchive only has support for old-style ACLs; the
 | |
| newer NFSv4 ACLs are recognized but discarded.
 | |
| .El
 | |
| .Pp
 | |
| The first tar program appeared in Seventh Edition Unix in 1979.
 | |
| The first official standard for the tar file format was the
 | |
| .Dq ustar
 | |
| (Unix Standard Tar) format defined by POSIX in 1988.
 | |
| POSIX.1-2001 extended the ustar format to create the
 | |
| .Dq pax interchange
 | |
| format.
 | |
| .Ss Cpio Formats
 | |
| The libarchive library can read a number of common cpio variants and can write
 | |
| .Dq odc
 | |
| and
 | |
| .Dq newc
 | |
| format archives.
 | |
| A cpio archive stores each entry as a fixed-size header followed
 | |
| by a variable-length filename and variable-length data.
 | |
| Unlike the tar format, the cpio format does only minimal padding
 | |
| of the header or file data.
 | |
| There are several cpio variants, which differ primarily in
 | |
| how they store the initial header: some store the values as
 | |
| octal or hexadecimal numbers in ASCII, others as binary values of
 | |
| varying byte order and length.
 | |
| .Bl -tag -width indent
 | |
| .It Cm binary
 | |
| The libarchive library transparently reads both big-endian and little-endian
 | |
| variants of the original binary cpio format.
 | |
| This format used 32-bit binary values for file size and mtime,
 | |
| and 16-bit binary values for the other fields.
 | |
| .It Cm odc
 | |
| The libarchive library can both read and write this
 | |
| POSIX-standard format, which is officially known as the
 | |
| .Dq cpio interchange format
 | |
| or the
 | |
| .Dq octet-oriented cpio archive format
 | |
| and sometimes unofficially referred to as the
 | |
| .Dq old character format .
 | |
| This format stores the header contents as octal values in ASCII.
 | |
| It is standard, portable, and immune from byte-order confusion.
 | |
| File sizes and mtime are limited to 33 bits (8GB file size),
 | |
| other fields are limited to 18 bits.
 | |
| .It Cm SVR4
 | |
| The libarchive library can read both CRC and non-CRC variants of
 | |
| this format.
 | |
| The SVR4 format uses eight-digit hexadecimal values for
 | |
| all header fields.
 | |
| This limits file size to 4GB, and also limits the mtime and
 | |
| other fields to 32 bits.
 | |
| The SVR4 format can optionally include a CRC of the file
 | |
| contents, although libarchive does not currently verify this CRC.
 | |
| .El
 | |
| .Pp
 | |
| Cpio first appeared in PWB/UNIX 1.0, which was released within
 | |
| AT&T in 1977.
 | |
| PWB/UNIX 1.0 formed the basis of System III Unix, released outside
 | |
| of AT&T in 1981.
 | |
| This makes cpio older than tar, although cpio was not included
 | |
| in Version 7 AT&T Unix.
 | |
| As a result, the tar command became much better known in universities
 | |
| and research groups that used Version 7.
 | |
| The combination of the
 | |
| .Nm find
 | |
| and
 | |
| .Nm cpio
 | |
| utilities provided very precise control over file selection.
 | |
| Unfortunately, the format has many limitations that make it unsuitable
 | |
| for widespread use.
 | |
| Only the POSIX format permits files over 4GB, and its 18-bit
 | |
| limit for most other fields makes it unsuitable for modern systems.
 | |
| In addition, cpio formats only store numeric UID/GID values (not
 | |
| usernames and group names), which can make it very difficult to correctly
 | |
| transfer archives across systems with dissimilar user numbering.
 | |
| .Ss Shar Formats
 | |
| A
 | |
| .Dq shell archive
 | |
| is a shell script that, when executed on a POSIX-compliant
 | |
| system, will recreate a collection of file system objects.
 | |
| The libarchive library can write two different kinds of shar archives:
 | |
| .Bl -tag -width indent
 | |
| .It Cm shar
 | |
| The traditional shar format uses a limited set of POSIX
 | |
| commands, including
 | |
| .Xr echo 1 ,
 | |
| .Xr mkdir 1 ,
 | |
| and
 | |
| .Xr sed 1 .
 | |
| It is suitable for portably archiving small collections of plain text files.
 | |
| However, it is not generally well-suited for large archives
 | |
| (many implementations of
 | |
| .Xr sh 1
 | |
| have limits on the size of a script) nor should it be used with non-text files.
 | |
| .It Cm shardump
 | |
| This format is similar to shar but encodes files using
 | |
| .Xr uuencode 1
 | |
| so that the result will be a plain text file regardless of the file contents.
 | |
| It also includes additional shell commands that attempt to reproduce as
 | |
| many file attributes as possible, including owner, mode, and flags.
 | |
| The additional commands used to restore file attributes make
 | |
| shardump archives less portable than plain shar archives.
 | |
| .El
 | |
| .Ss ISO9660 format
 | |
| Libarchive can read and extract from files containing ISO9660-compliant
 | |
| CDROM images.
 | |
| In many cases, this can remove the need to burn a physical CDROM
 | |
| just in order to read the files contained in an ISO9660 image.
 | |
| It also avoids security and complexity issues that come with
 | |
| virtual mounts and loopback devices.
 | |
| Libarchive supports the most common Rockridge extensions and has partial
 | |
| support for Joliet extensions.
 | |
| If both extensions are present, the Joliet extensions will be
 | |
| used and the Rockridge extensions will be ignored.
 | |
| In particular, this can create problems with hardlinks and symlinks,
 | |
| which are supported by Rockridge but not by Joliet.
 | |
| .Ss Zip format
 | |
| Libarchive can read and write zip format archives that have
 | |
| uncompressed entries and entries compressed with the
 | |
| .Dq deflate
 | |
| algorithm.
 | |
| Older zip compression algorithms are not supported.
 | |
| It can extract jar archives, archives that use Zip64 extensions and many
 | |
| self-extracting zip archives.
 | |
| Libarchive reads Zip archives as they are being streamed,
 | |
| which allows it to read archives of arbitrary size.
 | |
| It currently does not use the central directory; this
 | |
| limits libarchive's ability to support some self-extracting
 | |
| archives and ones that have been modified in certain ways.
 | |
| .Ss Archive (library) file format
 | |
| The Unix archive format (commonly created by the
 | |
| .Xr ar 1
 | |
| archiver) is a general-purpose format which is
 | |
| used almost exclusively for object files to be
 | |
| read by the link editor
 | |
| .Xr ld 1 .
 | |
| The ar format has never been standardised.
 | |
| There are two common variants:
 | |
| the GNU format derived from SVR4,
 | |
| and the BSD format, which first appeared in 4.4BSD.
 | |
| The two differ primarily in their handling of filenames
 | |
| longer than 15 characters:
 | |
| the GNU/SVR4 variant writes a filename table at the beginning of the archive;
 | |
| the BSD format stores each long filename in an extension
 | |
| area adjacent to the entry.
 | |
| Libarchive can read both extensions,
 | |
| including archives that may include both types of long filenames.
 | |
| Programs using libarchive can write GNU/SVR4 format
 | |
| if they provide a filename table to be written into
 | |
| the archive before any of the entries.
 | |
| Any entries whose names are not in the filename table
 | |
| will be written using BSD-style long filenames.
 | |
| This can cause problems for programs such as
 | |
| GNU ld that do not support the BSD-style long filenames.
 | |
| .Ss mtree
 | |
| Libarchive can read and write files in
 | |
| .Xr mtree 5
 | |
| format.
 | |
| This format is not a true archive format, but rather a textual description
 | |
| of a file hierarchy in which each line specifies the name of a file and
 | |
| provides specific metadata about that file.
 | |
| Libarchive can read all of the keywords supported by both
 | |
| the NetBSD and FreeBSD versions of
 | |
| .Xr mtree 1 ,
 | |
| although many of the keywords cannot currently be stored in an
 | |
| .Tn archive_entry
 | |
| object.
 | |
| When writing, libarchive supports use of the
 | |
| .Xr archive_write_set_options 3
 | |
| interface to specify which keywords should be included in the
 | |
| output.
 | |
| If libarchive was compiled with access to suitable
 | |
| cryptographic libraries (such as the OpenSSL libraries),
 | |
| it can compute hash entries such as
 | |
| .Cm sha512
 | |
| or
 | |
| .Cm md5
 | |
| from file data being written to the mtree writer.
 | |
| .Pp
 | |
| When reading an mtree file, libarchive will locate the corresponding
 | |
| files on disk using the
 | |
| .Cm contents
 | |
| keyword if present or the regular filename.
 | |
| If it can locate and open the file on disk, it will use that
 | |
| to fill in any metadata that is missing from the mtree file
 | |
| and will read the file contents and return those to the program
 | |
| using libarchive.
 | |
| If it cannot locate and open the file on disk, libarchive
 | |
| will return an error for any attempt to read the entry
 | |
| body.
 | |
| .Sh SEE ALSO
 | |
| .Xr ar 1 ,
 | |
| .Xr cpio 1 ,
 | |
| .Xr mkisofs 1 ,
 | |
| .Xr shar 1 ,
 | |
| .Xr tar 1 ,
 | |
| .Xr zip 1 ,
 | |
| .Xr zlib 3 ,
 | |
| .Xr cpio 5 ,
 | |
| .Xr mtree 5 ,
 | |
| .Xr tar 5
 | 
