832 lines
		
	
	
		
			27 KiB
		
	
	
	
		
			Groff
		
	
	
	
	
	
			
		
		
	
	
			832 lines
		
	
	
		
			27 KiB
		
	
	
	
		
			Groff
		
	
	
	
	
	
.\" Copyright (c) 2003-2009 Tim Kientzle
 | 
						|
.\" All rights reserved.
 | 
						|
.\"
 | 
						|
.\" Redistribution and use in source and binary forms, with or without
 | 
						|
.\" modification, are permitted provided that the following conditions
 | 
						|
.\" are met:
 | 
						|
.\" 1. Redistributions of source code must retain the above copyright
 | 
						|
.\"    notice, this list of conditions and the following disclaimer.
 | 
						|
.\" 2. Redistributions in binary form must reproduce the above copyright
 | 
						|
.\"    notice, this list of conditions and the following disclaimer in the
 | 
						|
.\"    documentation and/or other materials provided with the distribution.
 | 
						|
.\"
 | 
						|
.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
 | 
						|
.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
 | 
						|
.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
 | 
						|
.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
 | 
						|
.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
 | 
						|
.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
 | 
						|
.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
 | 
						|
.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
 | 
						|
.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
 | 
						|
.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
 | 
						|
.\" SUCH DAMAGE.
 | 
						|
.\"
 | 
						|
.\" $FreeBSD: head/lib/libarchive/tar.5 201077 2009-12-28 01:50:23Z kientzle $
 | 
						|
.\"
 | 
						|
.Dd December 27, 2009
 | 
						|
.Dt tar 5
 | 
						|
.Os
 | 
						|
.Sh NAME
 | 
						|
.Nm tar
 | 
						|
.Nd format of tape archive files
 | 
						|
.Sh DESCRIPTION
 | 
						|
The
 | 
						|
.Nm
 | 
						|
archive format collects any number of files, directories, and other
 | 
						|
file system objects (symbolic links, device nodes, etc.) into a single
 | 
						|
stream of bytes.
 | 
						|
The format was originally designed to be used with
 | 
						|
tape drives that operate with fixed-size blocks, but is widely used as
 | 
						|
a general packaging mechanism.
 | 
						|
.Ss General Format
 | 
						|
A
 | 
						|
.Nm
 | 
						|
archive consists of a series of 512-byte records.
 | 
						|
Each file system object requires a header record which stores basic metadata
 | 
						|
(pathname, owner, permissions, etc.) and zero or more records containing any
 | 
						|
file data.
 | 
						|
The end of the archive is indicated by two records consisting
 | 
						|
entirely of zero bytes.
 | 
						|
.Pp
 | 
						|
For compatibility with tape drives that use fixed block sizes,
 | 
						|
programs that read or write tar files always read or write a fixed
 | 
						|
number of records with each I/O operation.
 | 
						|
These
 | 
						|
.Dq blocks
 | 
						|
are always a multiple of the record size.
 | 
						|
The maximum block size supported by early
 | 
						|
implementations was 10240 bytes or 20 records.
 | 
						|
This is still the default for most implementations
 | 
						|
although block sizes of 1MiB (2048 records) or larger are
 | 
						|
commonly used with modern high-speed tape drives.
 | 
						|
(Note: the terms
 | 
						|
.Dq block
 | 
						|
and
 | 
						|
.Dq record
 | 
						|
here are not entirely standard; this document follows the
 | 
						|
convention established by John Gilmore in documenting
 | 
						|
.Nm pdtar . )
 | 
						|
.Ss Old-Style Archive Format
 | 
						|
The original tar archive format has been extended many times to
 | 
						|
include additional information that various implementors found
 | 
						|
necessary.
 | 
						|
This section describes the variant implemented by the tar command
 | 
						|
included in
 | 
						|
.At v7 ,
 | 
						|
which seems to be the earliest widely-used version of the tar program.
 | 
						|
.Pp
 | 
						|
The header record for an old-style
 | 
						|
.Nm
 | 
						|
archive consists of the following:
 | 
						|
.Bd -literal -offset indent
 | 
						|
struct header_old_tar {
 | 
						|
	char name[100];
 | 
						|
	char mode[8];
 | 
						|
	char uid[8];
 | 
						|
	char gid[8];
 | 
						|
	char size[12];
 | 
						|
	char mtime[12];
 | 
						|
	char checksum[8];
 | 
						|
	char linkflag[1];
 | 
						|
	char linkname[100];
 | 
						|
	char pad[255];
 | 
						|
};
 | 
						|
.Ed
 | 
						|
All unused bytes in the header record are filled with nulls.
 | 
						|
.Bl -tag -width indent
 | 
						|
.It Va name
 | 
						|
Pathname, stored as a null-terminated string.
 | 
						|
Early tar implementations only stored regular files (including
 | 
						|
hardlinks to those files).
 | 
						|
One common early convention used a trailing "/" character to indicate
 | 
						|
a directory name, allowing directory permissions and owner information
 | 
						|
to be archived and restored.
 | 
						|
.It Va mode
 | 
						|
File mode, stored as an octal number in ASCII.
 | 
						|
.It Va uid , Va gid
 | 
						|
User id and group id of owner, as octal numbers in ASCII.
 | 
						|
.It Va size
 | 
						|
Size of file, as octal number in ASCII.
 | 
						|
For regular files only, this indicates the amount of data
 | 
						|
that follows the header.
 | 
						|
In particular, this field was ignored by early tar implementations
 | 
						|
when extracting hardlinks.
 | 
						|
Modern writers should always store a zero length for hardlink entries.
 | 
						|
.It Va mtime
 | 
						|
Modification time of file, as an octal number in ASCII.
 | 
						|
This indicates the number of seconds since the start of the epoch,
 | 
						|
00:00:00 UTC January 1, 1970.
 | 
						|
Note that negative values should be avoided
 | 
						|
here, as they are handled inconsistently.
 | 
						|
.It Va checksum
 | 
						|
Header checksum, stored as an octal number in ASCII.
 | 
						|
To compute the checksum, set the checksum field to all spaces,
 | 
						|
then sum all bytes in the header using unsigned arithmetic.
 | 
						|
This field should be stored as six octal digits followed by a null and a space
 | 
						|
character.
 | 
						|
Note that many early implementations of tar used signed arithmetic
 | 
						|
for the checksum field, which can cause interoperability problems
 | 
						|
when transferring archives between systems.
 | 
						|
Modern robust readers compute the checksum both ways and accept the
 | 
						|
header if either computation matches.
 | 
						|
.It Va linkflag , Va linkname
 | 
						|
In order to preserve hardlinks and conserve tape, a file
 | 
						|
with multiple links is only written to the archive the first
 | 
						|
time it is encountered.
 | 
						|
The next time it is encountered, the
 | 
						|
.Va linkflag
 | 
						|
is set to an ASCII
 | 
						|
.Sq 1
 | 
						|
and the
 | 
						|
.Va linkname
 | 
						|
field holds the first name under which this file appears.
 | 
						|
(Note that regular files have a null value in the
 | 
						|
.Va linkflag
 | 
						|
field.)
 | 
						|
.El
 | 
						|
.Pp
 | 
						|
Early tar implementations varied in how they terminated these fields.
 | 
						|
The tar command in
 | 
						|
.At v7
 | 
						|
used the following conventions (this is also documented in early BSD manpages):
 | 
						|
the pathname must be null-terminated;
 | 
						|
the mode, uid, and gid fields must end in a space and a null byte;
 | 
						|
the size and mtime fields must end in a space;
 | 
						|
the checksum is terminated by a null and a space.
 | 
						|
Early implementations filled the numeric fields with leading spaces.
 | 
						|
This seems to have been common practice until the
 | 
						|
.St -p1003.1-88
 | 
						|
standard was released.
 | 
						|
For best portability, modern implementations should fill the numeric
 | 
						|
fields with leading zeros.
 | 
						|
.Ss Pre-POSIX Archives
 | 
						|
An early draft of
 | 
						|
.St -p1003.1-88
 | 
						|
served as the basis for John Gilmore's
 | 
						|
.Nm pdtar
 | 
						|
program and many system implementations from the late 1980s
 | 
						|
and early 1990s.
 | 
						|
These archives generally follow the POSIX ustar
 | 
						|
format described below with the following variations:
 | 
						|
.Bl -bullet -compact -width indent
 | 
						|
.It
 | 
						|
The magic value is
 | 
						|
.Dq ustar\ \&
 | 
						|
(note the following space).
 | 
						|
The version field contains a space character followed by a null.
 | 
						|
.It
 | 
						|
The numeric fields are generally filled with leading spaces
 | 
						|
(not leading zeros as recommended in the final standard).
 | 
						|
.It
 | 
						|
The prefix field is often not used, limiting pathnames to
 | 
						|
the 100 characters of old-style archives.
 | 
						|
.El
 | 
						|
.Ss POSIX ustar Archives
 | 
						|
.St -p1003.1-88
 | 
						|
defined a standard tar file format to be read and written
 | 
						|
by compliant implementations of
 | 
						|
.Xr tar 1 .
 | 
						|
This format is often called the
 | 
						|
.Dq ustar
 | 
						|
format, after the magic value used
 | 
						|
in the header.
 | 
						|
(The name is an acronym for
 | 
						|
.Dq Unix Standard TAR . )
 | 
						|
It extends the historic format with new fields:
 | 
						|
.Bd -literal -offset indent
 | 
						|
struct header_posix_ustar {
 | 
						|
	char name[100];
 | 
						|
	char mode[8];
 | 
						|
	char uid[8];
 | 
						|
	char gid[8];
 | 
						|
	char size[12];
 | 
						|
	char mtime[12];
 | 
						|
	char checksum[8];
 | 
						|
	char typeflag[1];
 | 
						|
	char linkname[100];
 | 
						|
	char magic[6];
 | 
						|
	char version[2];
 | 
						|
	char uname[32];
 | 
						|
	char gname[32];
 | 
						|
	char devmajor[8];
 | 
						|
	char devminor[8];
 | 
						|
	char prefix[155];
 | 
						|
	char pad[12];
 | 
						|
};
 | 
						|
.Ed
 | 
						|
.Bl -tag -width indent
 | 
						|
.It Va typeflag
 | 
						|
Type of entry.
 | 
						|
POSIX extended the earlier
 | 
						|
.Va linkflag
 | 
						|
field with several new type values:
 | 
						|
.Bl -tag -width indent -compact
 | 
						|
.It Dq 0
 | 
						|
Regular file.
 | 
						|
NUL should be treated as a synonym, for compatibility purposes.
 | 
						|
.It Dq 1
 | 
						|
Hard link.
 | 
						|
.It Dq 2
 | 
						|
Symbolic link.
 | 
						|
.It Dq 3
 | 
						|
Character device node.
 | 
						|
.It Dq 4
 | 
						|
Block device node.
 | 
						|
.It Dq 5
 | 
						|
Directory.
 | 
						|
.It Dq 6
 | 
						|
FIFO node.
 | 
						|
.It Dq 7
 | 
						|
Reserved.
 | 
						|
.It Other
 | 
						|
A POSIX-compliant implementation must treat any unrecognized typeflag value
 | 
						|
as a regular file.
 | 
						|
In particular, writers should ensure that all entries
 | 
						|
have a valid filename so that they can be restored by readers that do not
 | 
						|
support the corresponding extension.
 | 
						|
Uppercase letters "A" through "Z" are reserved for custom extensions.
 | 
						|
Note that sockets and whiteout entries are not archivable.
 | 
						|
.El
 | 
						|
It is worth noting that the
 | 
						|
.Va size
 | 
						|
field, in particular, has different meanings depending on the type.
 | 
						|
For regular files, of course, it indicates the amount of data
 | 
						|
following the header.
 | 
						|
For directories, it may be used to indicate the total size of all
 | 
						|
files in the directory, for use by operating systems that pre-allocate
 | 
						|
directory space.
 | 
						|
For all other types, it should be set to zero by writers and ignored
 | 
						|
by readers.
 | 
						|
.It Va magic
 | 
						|
Contains the magic value
 | 
						|
.Dq ustar
 | 
						|
followed by a NUL byte to indicate that this is a POSIX standard archive.
 | 
						|
Full compliance requires the uname and gname fields be properly set.
 | 
						|
.It Va version
 | 
						|
Version.
 | 
						|
This should be
 | 
						|
.Dq 00
 | 
						|
(two copies of the ASCII digit zero) for POSIX standard archives.
 | 
						|
.It Va uname , Va gname
 | 
						|
User and group names, as null-terminated ASCII strings.
 | 
						|
These should be used in preference to the uid/gid values
 | 
						|
when they are set and the corresponding names exist on
 | 
						|
the system.
 | 
						|
.It Va devmajor , Va devminor
 | 
						|
Major and minor numbers for character device or block device entry.
 | 
						|
.It Va name , Va prefix
 | 
						|
If the pathname is too long to fit in the 100 bytes provided by the standard
 | 
						|
format, it can be split at any
 | 
						|
.Pa /
 | 
						|
character with the first portion going into the prefix field.
 | 
						|
If the prefix field is not empty, the reader will prepend
 | 
						|
the prefix value and a
 | 
						|
.Pa /
 | 
						|
character to the regular name field to obtain the full pathname.
 | 
						|
The standard does not require a trailing
 | 
						|
.Pa /
 | 
						|
character on directory names, though most implementations still
 | 
						|
include this for compatibility reasons.
 | 
						|
.El
 | 
						|
.Pp
 | 
						|
Note that all unused bytes must be set to
 | 
						|
.Dv NUL .
 | 
						|
.Pp
 | 
						|
Field termination is specified slightly differently by POSIX
 | 
						|
than by previous implementations.
 | 
						|
The
 | 
						|
.Va magic ,
 | 
						|
.Va uname ,
 | 
						|
and
 | 
						|
.Va gname
 | 
						|
fields must have a trailing
 | 
						|
.Dv NUL .
 | 
						|
The
 | 
						|
.Va pathname ,
 | 
						|
.Va linkname ,
 | 
						|
and
 | 
						|
.Va prefix
 | 
						|
fields must have a trailing
 | 
						|
.Dv NUL
 | 
						|
unless they fill the entire field.
 | 
						|
(In particular, it is possible to store a 256-character pathname if it
 | 
						|
happens to have a
 | 
						|
.Pa /
 | 
						|
as the 156th character.)
 | 
						|
POSIX requires numeric fields to be zero-padded in the front, and requires
 | 
						|
them to be terminated with either space or
 | 
						|
.Dv NUL
 | 
						|
characters.
 | 
						|
.Pp
 | 
						|
Currently, most tar implementations comply with the ustar
 | 
						|
format, occasionally extending it by adding new fields to the
 | 
						|
blank area at the end of the header record.
 | 
						|
.Ss Pax Interchange Format
 | 
						|
There are many attributes that cannot be portably stored in a
 | 
						|
POSIX ustar archive.
 | 
						|
.St -p1003.1-2001
 | 
						|
defined a
 | 
						|
.Dq pax interchange format
 | 
						|
that uses two new types of entries to hold text-formatted
 | 
						|
metadata that applies to following entries.
 | 
						|
Note that a pax interchange format archive is a ustar archive in every
 | 
						|
respect.
 | 
						|
The new data is stored in ustar-compatible archive entries that use the
 | 
						|
.Dq x
 | 
						|
or
 | 
						|
.Dq g
 | 
						|
typeflag.
 | 
						|
In particular, older implementations that do not fully support these
 | 
						|
extensions will extract the metadata into regular files, where the
 | 
						|
metadata can be examined as necessary.
 | 
						|
.Pp
 | 
						|
An entry in a pax interchange format archive consists of one or
 | 
						|
two standard ustar entries, each with its own header and data.
 | 
						|
The first optional entry stores the extended attributes
 | 
						|
for the following entry.
 | 
						|
This optional first entry has an "x" typeflag and a size field that
 | 
						|
indicates the total size of the extended attributes.
 | 
						|
The extended attributes themselves are stored as a series of text-format
 | 
						|
lines encoded in the portable UTF-8 encoding.
 | 
						|
Each line consists of a decimal number, a space, a key string, an equals
 | 
						|
sign, a value string, and a new line.
 | 
						|
The decimal number indicates the length of the entire line, including the
 | 
						|
initial length field and the trailing newline.
 | 
						|
An example of such a field is:
 | 
						|
.Dl 25 ctime=1084839148.1212\en
 | 
						|
Keys in all lowercase are standard keys.
 | 
						|
Vendors can add their own keys by prefixing them with an all uppercase
 | 
						|
vendor name and a period.
 | 
						|
Note that, unlike the historic header, numeric values are stored using
 | 
						|
decimal, not octal.
 | 
						|
A description of some common keys follows:
 | 
						|
.Bl -tag -width indent
 | 
						|
.It Cm atime , Cm ctime , Cm mtime
 | 
						|
File access, inode change, and modification times.
 | 
						|
These fields can be negative or include a decimal point and a fractional value.
 | 
						|
.It Cm uname , Cm uid , Cm gname , Cm gid
 | 
						|
User name, group name, and numeric UID and GID values.
 | 
						|
The user name and group name stored here are encoded in UTF8
 | 
						|
and can thus include non-ASCII characters.
 | 
						|
The UID and GID fields can be of arbitrary length.
 | 
						|
.It Cm linkpath
 | 
						|
The full path of the linked-to file.
 | 
						|
Note that this is encoded in UTF8 and can thus include non-ASCII characters.
 | 
						|
.It Cm path
 | 
						|
The full pathname of the entry.
 | 
						|
Note that this is encoded in UTF8 and can thus include non-ASCII characters.
 | 
						|
.It Cm realtime.* , Cm security.*
 | 
						|
These keys are reserved and may be used for future standardization.
 | 
						|
.It Cm size
 | 
						|
The size of the file.
 | 
						|
Note that there is no length limit on this field, allowing conforming
 | 
						|
archives to store files much larger than the historic 8GB limit.
 | 
						|
.It Cm SCHILY.*
 | 
						|
Vendor-specific attributes used by Joerg Schilling's
 | 
						|
.Nm star
 | 
						|
implementation.
 | 
						|
.It Cm SCHILY.acl.access , Cm SCHILY.acl.default
 | 
						|
Stores the access and default ACLs as textual strings in a format
 | 
						|
that is an extension of the format specified by POSIX.1e draft 17.
 | 
						|
In particular, each user or group access specification can include a fourth
 | 
						|
colon-separated field with the numeric UID or GID.
 | 
						|
This allows ACLs to be restored on systems that may not have complete
 | 
						|
user or group information available (such as when NIS/YP or LDAP services
 | 
						|
are temporarily unavailable).
 | 
						|
.It Cm SCHILY.devminor , Cm SCHILY.devmajor
 | 
						|
The full minor and major numbers for device nodes.
 | 
						|
.It Cm SCHILY.fflags
 | 
						|
The file flags.
 | 
						|
.It Cm SCHILY.realsize
 | 
						|
The full size of the file on disk.
 | 
						|
XXX explain? XXX
 | 
						|
.It Cm SCHILY.dev, Cm SCHILY.ino , Cm SCHILY.nlinks
 | 
						|
The device number, inode number, and link count for the entry.
 | 
						|
In particular, note that a pax interchange format archive using Joerg
 | 
						|
Schilling's
 | 
						|
.Cm SCHILY.*
 | 
						|
extensions can store all of the data from
 | 
						|
.Va struct stat .
 | 
						|
.It Cm LIBARCHIVE.xattr. Ns Ar namespace Ns . Ns Ar key
 | 
						|
Libarchive stores POSIX.1e-style extended attributes using
 | 
						|
keys of this form.
 | 
						|
The
 | 
						|
.Ar key
 | 
						|
value is URL-encoded:
 | 
						|
All non-ASCII characters and the two special characters
 | 
						|
.Dq =
 | 
						|
and
 | 
						|
.Dq %
 | 
						|
are encoded as
 | 
						|
.Dq %
 | 
						|
followed by two uppercase hexadecimal digits.
 | 
						|
The value of this key is the extended attribute value
 | 
						|
encoded in base 64.
 | 
						|
XXX Detail the base-64 format here XXX
 | 
						|
.It Cm VENDOR.*
 | 
						|
XXX document other vendor-specific extensions XXX
 | 
						|
.El
 | 
						|
.Pp
 | 
						|
Any values stored in an extended attribute override the corresponding
 | 
						|
values in the regular tar header.
 | 
						|
Note that compliant readers should ignore the regular fields when they
 | 
						|
are overridden.
 | 
						|
This is important, as existing archivers are known to store non-compliant
 | 
						|
values in the standard header fields in this situation.
 | 
						|
There are no limits on length for any of these fields.
 | 
						|
In particular, numeric fields can be arbitrarily large.
 | 
						|
All text fields are encoded in UTF8.
 | 
						|
Compliant writers should store only portable 7-bit ASCII characters in
 | 
						|
the standard ustar header and use extended
 | 
						|
attributes whenever a text value contains non-ASCII characters.
 | 
						|
.Pp
 | 
						|
In addition to the
 | 
						|
.Cm x
 | 
						|
entry described above, the pax interchange format
 | 
						|
also supports a
 | 
						|
.Cm g
 | 
						|
entry.
 | 
						|
The
 | 
						|
.Cm g
 | 
						|
entry is identical in format, but specifies attributes that serve as
 | 
						|
defaults for all subsequent archive entries.
 | 
						|
The
 | 
						|
.Cm g
 | 
						|
entry is not widely used.
 | 
						|
.Pp
 | 
						|
Besides the new
 | 
						|
.Cm x
 | 
						|
and
 | 
						|
.Cm g
 | 
						|
entries, the pax interchange format has a few other minor variations
 | 
						|
from the earlier ustar format.
 | 
						|
The most troubling one is that hardlinks are permitted to have
 | 
						|
data following them.
 | 
						|
This allows readers to restore any hardlink to a file without
 | 
						|
having to rewind the archive to find an earlier entry.
 | 
						|
However, it creates complications for robust readers, as it is no longer
 | 
						|
clear whether or not they should ignore the size field for hardlink entries.
 | 
						|
.Ss GNU Tar Archives
 | 
						|
The GNU tar program started with a pre-POSIX format similar to that
 | 
						|
described earlier and has extended it using several different mechanisms:
 | 
						|
It added new fields to the empty space in the header (some of which was later
 | 
						|
used by POSIX for conflicting purposes);
 | 
						|
it allowed the header to be continued over multiple records;
 | 
						|
and it defined new entries that modify following entries
 | 
						|
(similar in principle to the
 | 
						|
.Cm x
 | 
						|
entry described above, but each GNU special entry is single-purpose,
 | 
						|
unlike the general-purpose
 | 
						|
.Cm x
 | 
						|
entry).
 | 
						|
As a result, GNU tar archives are not POSIX compatible, although
 | 
						|
more lenient POSIX-compliant readers can successfully extract most
 | 
						|
GNU tar archives.
 | 
						|
.Bd -literal -offset indent
 | 
						|
struct header_gnu_tar {
 | 
						|
	char name[100];
 | 
						|
	char mode[8];
 | 
						|
	char uid[8];
 | 
						|
	char gid[8];
 | 
						|
	char size[12];
 | 
						|
	char mtime[12];
 | 
						|
	char checksum[8];
 | 
						|
	char typeflag[1];
 | 
						|
	char linkname[100];
 | 
						|
	char magic[6];
 | 
						|
	char version[2];
 | 
						|
	char uname[32];
 | 
						|
	char gname[32];
 | 
						|
	char devmajor[8];
 | 
						|
	char devminor[8];
 | 
						|
	char atime[12];
 | 
						|
	char ctime[12];
 | 
						|
	char offset[12];
 | 
						|
	char longnames[4];
 | 
						|
	char unused[1];
 | 
						|
	struct {
 | 
						|
		char offset[12];
 | 
						|
		char numbytes[12];
 | 
						|
	} sparse[4];
 | 
						|
	char isextended[1];
 | 
						|
	char realsize[12];
 | 
						|
	char pad[17];
 | 
						|
};
 | 
						|
.Ed
 | 
						|
.Bl -tag -width indent
 | 
						|
.It Va typeflag
 | 
						|
GNU tar uses the following special entry types, in addition to
 | 
						|
those defined by POSIX:
 | 
						|
.Bl -tag -width indent
 | 
						|
.It "7"
 | 
						|
GNU tar treats type "7" records identically to type "0" records,
 | 
						|
except on one obscure RTOS where they are used to indicate the
 | 
						|
pre-allocation of a contiguous file on disk.
 | 
						|
.It "D"
 | 
						|
This indicates a directory entry.
 | 
						|
Unlike the POSIX-standard "5"
 | 
						|
typeflag, the header is followed by data records listing the names
 | 
						|
of files in this directory.
 | 
						|
Each name is preceded by an ASCII "Y"
 | 
						|
if the file is stored in this archive or "N" if the file is not
 | 
						|
stored in this archive.
 | 
						|
Each name is terminated with a null, and
 | 
						|
an extra null marks the end of the name list.
 | 
						|
The purpose of this
 | 
						|
entry is to support incremental backups; a program restoring from
 | 
						|
such an archive may wish to delete files on disk that did not exist
 | 
						|
in the directory when the archive was made.
 | 
						|
.Pp
 | 
						|
Note that the "D" typeflag specifically violates POSIX, which requires
 | 
						|
that unrecognized typeflags be restored as normal files.
 | 
						|
In this case, restoring the "D" entry as a file could interfere
 | 
						|
with subsequent creation of the like-named directory.
 | 
						|
.It "K"
 | 
						|
The data for this entry is a long linkname for the following regular entry.
 | 
						|
.It "L"
 | 
						|
The data for this entry is a long pathname for the following regular entry.
 | 
						|
.It "M"
 | 
						|
This is a continuation of the last file on the previous volume.
 | 
						|
GNU multi-volume archives guarantee that each volume begins with a valid
 | 
						|
entry header.
 | 
						|
To ensure this, a file may be split, with part stored at the end of one volume,
 | 
						|
and part stored at the beginning of the next volume.
 | 
						|
The "M" typeflag indicates that this entry continues an existing file.
 | 
						|
Such entries can only occur as the first or second entry
 | 
						|
in an archive (the latter only if the first entry is a volume label).
 | 
						|
The
 | 
						|
.Va size
 | 
						|
field specifies the size of this entry.
 | 
						|
The
 | 
						|
.Va offset
 | 
						|
field at bytes 369-380 specifies the offset where this file fragment
 | 
						|
begins.
 | 
						|
The
 | 
						|
.Va realsize
 | 
						|
field specifies the total size of the file (which must equal
 | 
						|
.Va size
 | 
						|
plus
 | 
						|
.Va offset ) .
 | 
						|
When extracting, GNU tar checks that the header file name is the one it is
 | 
						|
expecting, that the header offset is in the correct sequence, and that
 | 
						|
the sum of offset and size is equal to realsize.
 | 
						|
.It "N"
 | 
						|
Type "N" records are no longer generated by GNU tar.
 | 
						|
They contained a
 | 
						|
list of files to be renamed or symlinked after extraction; this was
 | 
						|
originally used to support long names.
 | 
						|
The contents of this record
 | 
						|
are a text description of the operations to be done, in the form
 | 
						|
.Dq Rename %s to %s\en
 | 
						|
or
 | 
						|
.Dq Symlink %s to %s\en ;
 | 
						|
in either case, both
 | 
						|
filenames are escaped using K&R C syntax.
 | 
						|
Due to security concerns, "N" records are now generally ignored
 | 
						|
when reading archives.
 | 
						|
.It "S"
 | 
						|
This is a
 | 
						|
.Dq sparse
 | 
						|
regular file.
 | 
						|
Sparse files are stored as a series of fragments.
 | 
						|
The header contains a list of fragment offset/length pairs.
 | 
						|
If more than four such entries are required, the header is
 | 
						|
extended as necessary with
 | 
						|
.Dq extra
 | 
						|
header extensions (an older format that is no longer used), or
 | 
						|
.Dq sparse
 | 
						|
extensions.
 | 
						|
.It "V"
 | 
						|
The
 | 
						|
.Va name
 | 
						|
field should be interpreted as a tape/volume header name.
 | 
						|
This entry should generally be ignored on extraction.
 | 
						|
.El
 | 
						|
.It Va magic
 | 
						|
The magic field holds the five characters
 | 
						|
.Dq ustar
 | 
						|
followed by a space.
 | 
						|
Note that POSIX ustar archives have a trailing null.
 | 
						|
.It Va version
 | 
						|
The version field holds a space character followed by a null.
 | 
						|
Note that POSIX ustar archives use two copies of the ASCII digit
 | 
						|
.Dq 0 .
 | 
						|
.It Va atime , Va ctime
 | 
						|
The time the file was last accessed and the time of
 | 
						|
last change of file information, stored in octal as with
 | 
						|
.Va mtime .
 | 
						|
.It Va longnames
 | 
						|
This field is apparently no longer used.
 | 
						|
.It Sparse Va offset / Va numbytes
 | 
						|
Each such structure specifies a single fragment of a sparse
 | 
						|
file.
 | 
						|
The two fields store values as octal numbers.
 | 
						|
The fragments are each padded to a multiple of 512 bytes
 | 
						|
in the archive.
 | 
						|
On extraction, the list of fragments is collected from the
 | 
						|
header (including any extension headers), and the data
 | 
						|
is then read and written to the file at appropriate offsets.
 | 
						|
.It Va isextended
 | 
						|
If this is set to non-zero, the header will be followed by additional
 | 
						|
.Dq sparse header
 | 
						|
records.
 | 
						|
Each such record contains information about as many as 21 additional
 | 
						|
sparse blocks as shown here:
 | 
						|
.Bd -literal -offset indent
 | 
						|
struct gnu_sparse_header {
 | 
						|
	struct {
 | 
						|
		char offset[12];
 | 
						|
		char numbytes[12];
 | 
						|
	} sparse[21];
 | 
						|
	char    isextended[1];
 | 
						|
	char    padding[7];
 | 
						|
};
 | 
						|
.Ed
 | 
						|
.It Va realsize
 | 
						|
A binary representation of the file's complete size, with a much larger range
 | 
						|
than the POSIX file size.
 | 
						|
In particular, with
 | 
						|
.Cm M
 | 
						|
type files, the current entry is only a portion of the file.
 | 
						|
In that case, the POSIX size field will indicate the size of this
 | 
						|
entry; the
 | 
						|
.Va realsize
 | 
						|
field will indicate the total size of the file.
 | 
						|
.El
 | 
						|
.Ss GNU tar pax archives
 | 
						|
GNU tar 1.14 (XXX check this XXX) and later will write
 | 
						|
pax interchange format archives when you specify the
 | 
						|
.Fl -posix
 | 
						|
flag.
 | 
						|
This format uses custom keywords to store sparse file information.
 | 
						|
There have been three iterations of this support, referred to
 | 
						|
as
 | 
						|
.Dq 0.0 ,
 | 
						|
.Dq 0.1 ,
 | 
						|
and
 | 
						|
.Dq 1.0 .
 | 
						|
.Bl -tag -width indent
 | 
						|
.It Cm GNU.sparse.numblocks , Cm GNU.sparse.offset , Cm GNU.sparse.numbytes , Cm  GNU.sparse.size
 | 
						|
The
 | 
						|
.Dq 0.0
 | 
						|
format used an initial
 | 
						|
.Cm GNU.sparse.numblocks
 | 
						|
attribute to indicate the number of blocks in the file, a pair of
 | 
						|
.Cm GNU.sparse.offset
 | 
						|
and
 | 
						|
.Cm GNU.sparse.numbytes
 | 
						|
to indicate the offset and size of each block,
 | 
						|
and a single
 | 
						|
.Cm GNU.sparse.size
 | 
						|
to indicate the full size of the file.
 | 
						|
This is not the same as the size in the tar header because the
 | 
						|
latter value does not include the size of any holes.
 | 
						|
This format required that the order of attributes be preserved and
 | 
						|
relied on readers accepting multiple appearances of the same attribute
 | 
						|
names, which is not officially permitted by the standards.
 | 
						|
.It Cm GNU.sparse.map
 | 
						|
The
 | 
						|
.Dq 0.1
 | 
						|
format used a single attribute that stored a comma-separated
 | 
						|
list of decimal numbers.
 | 
						|
Each pair of numbers indicated the offset and size, respectively,
 | 
						|
of a block of data.
 | 
						|
This does not work well if the archive is extracted by an archiver
 | 
						|
that does not recognize this extension, since many pax implementations
 | 
						|
simply discard unrecognized attributes.
 | 
						|
.It Cm GNU.sparse.major , Cm GNU.sparse.minor , Cm GNU.sparse.name , Cm GNU.sparse.realsize
 | 
						|
The
 | 
						|
.Dq 1.0
 | 
						|
format stores the sparse block map in one or more 512-byte blocks
 | 
						|
prepended to the file data in the entry body.
 | 
						|
The pax attributes indicate the existence of this map
 | 
						|
(via the
 | 
						|
.Cm GNU.sparse.major
 | 
						|
and
 | 
						|
.Cm GNU.sparse.minor
 | 
						|
fields)
 | 
						|
and the full size of the file.
 | 
						|
The
 | 
						|
.Cm GNU.sparse.name
 | 
						|
holds the true name of the file.
 | 
						|
To avoid confusion, the name stored in the regular tar header
 | 
						|
is a modified name so that extraction errors will be apparent
 | 
						|
to users.
 | 
						|
.El
 | 
						|
.Ss Solaris Tar
 | 
						|
XXX More Details Needed XXX
 | 
						|
.Pp
 | 
						|
Solaris tar (beginning with SunOS XXX 5.7 ?? XXX) supports an
 | 
						|
.Dq extended
 | 
						|
format that is fundamentally similar to pax interchange format,
 | 
						|
with the following differences:
 | 
						|
.Bl -bullet -compact -width indent
 | 
						|
.It
 | 
						|
Extended attributes are stored in an entry whose type is
 | 
						|
.Cm X ,
 | 
						|
not
 | 
						|
.Cm x ,
 | 
						|
as used by pax interchange format.
 | 
						|
The detailed format of this entry appears to be the same
 | 
						|
as detailed above for the
 | 
						|
.Cm x
 | 
						|
entry.
 | 
						|
.It
 | 
						|
An additional
 | 
						|
.Cm A
 | 
						|
entry is used to store an ACL for the following regular entry.
 | 
						|
The body of this entry contains a seven-digit octal number
 | 
						|
followed by a zero byte, followed by the
 | 
						|
textual ACL description.
 | 
						|
The octal value is the number of ACL entries
 | 
						|
plus a constant that indicates the ACL type: 01000000
 | 
						|
for POSIX.1e ACLs and 03000000 for NFSv4 ACLs.
 | 
						|
.El
 | 
						|
.Ss AIX Tar
 | 
						|
XXX More details needed XXX
 | 
						|
.Ss Mac OS X Tar
 | 
						|
The tar distributed with Apple's Mac OS X stores most regular files
 | 
						|
as two separate entries in the tar archive.
 | 
						|
The two entries have the same name except that the first
 | 
						|
one has
 | 
						|
.Dq ._
 | 
						|
added to the beginning of the name.
 | 
						|
This first entry stores the
 | 
						|
.Dq resource fork
 | 
						|
with additional attributes for the file.
 | 
						|
The Mac OS X
 | 
						|
.Fn CopyFile
 | 
						|
API is used to separate a file on disk into separate
 | 
						|
resource and data streams and to reassemble those separate
 | 
						|
streams when the file is restored to disk.
 | 
						|
.Ss Other Extensions
 | 
						|
One obvious extension to increase the size of files is to
 | 
						|
eliminate the terminating characters from the various
 | 
						|
numeric fields.
 | 
						|
For example, the standard only allows the size field to contain
 | 
						|
11 octal digits, reserving the twelfth byte for a trailing
 | 
						|
NUL character.
 | 
						|
Allowing 12 octal digits allows file sizes up to 64 GB.
 | 
						|
.Pp
 | 
						|
Another extension, utilized by GNU tar, star, and other newer
 | 
						|
.Nm
 | 
						|
implementations, permits binary numbers in the standard numeric fields.
 | 
						|
This is flagged by setting the high bit of the first byte.
 | 
						|
This permits 95-bit values for the length and time fields
 | 
						|
and 63-bit values for the uid, gid, and device numbers.
 | 
						|
GNU tar supports this extension for the
 | 
						|
length, mtime, ctime, and atime fields.
 | 
						|
Joerg Schilling's star program supports this extension for
 | 
						|
all numeric fields.
 | 
						|
Note that this extension is largely obsoleted by the extended attribute
 | 
						|
record provided by the pax interchange format.
 | 
						|
.Pp
 | 
						|
Another early GNU extension allowed base-64 values rather than octal.
 | 
						|
This extension was short-lived and is no longer supported by any
 | 
						|
implementation.
 | 
						|
.Sh SEE ALSO
 | 
						|
.Xr ar 1 ,
 | 
						|
.Xr pax 1 ,
 | 
						|
.Xr tar 1
 | 
						|
.Sh STANDARDS
 | 
						|
The
 | 
						|
.Nm tar
 | 
						|
utility is no longer a part of POSIX or the Single Unix Standard.
 | 
						|
It last appeared in
 | 
						|
.St -susv2 .
 | 
						|
It has been supplanted in subsequent standards by
 | 
						|
.Xr pax 1 .
 | 
						|
The ustar format is currently part of the specification for the
 | 
						|
.Xr pax 1
 | 
						|
utility.
 | 
						|
The pax interchange file format is new with
 | 
						|
.St -p1003.1-2001 .
 | 
						|
.Sh HISTORY
 | 
						|
A
 | 
						|
.Nm tar
 | 
						|
command appeared in Seventh Edition Unix, which was released in January, 1979.
 | 
						|
It replaced the
 | 
						|
.Nm tp
 | 
						|
program from Fourth Edition Unix which in turn replaced the
 | 
						|
.Nm tap
 | 
						|
program from First Edition Unix.
 | 
						|
John Gilmore's
 | 
						|
.Nm pdtar
 | 
						|
public-domain implementation (circa 1987) was highly influential
 | 
						|
and formed the basis of
 | 
						|
.Nm GNU tar
 | 
						|
(circa 1988).
 | 
						|
Joerg Shilling's
 | 
						|
.Nm star
 | 
						|
archiver is another open-source (GPL) archiver (originally developed
 | 
						|
circa 1985) which features complete support for pax interchange
 | 
						|
format.
 | 
						|
.Pp
 | 
						|
This documentation was written as part of the
 | 
						|
.Nm libarchive
 | 
						|
and
 | 
						|
.Nm bsdtar
 | 
						|
project by
 | 
						|
.An Tim Kientzle Aq kientzle@FreeBSD.org .
 |