832 lines
		
	
	
		
			27 KiB
		
	
	
	
		
			Groff
		
	
	
	
	
	
			
		
		
	
	
			832 lines
		
	
	
		
			27 KiB
		
	
	
	
		
			Groff
		
	
	
	
	
	
| .\" Copyright (c) 2003-2009 Tim Kientzle
 | |
| .\" All rights reserved.
 | |
| .\"
 | |
| .\" Redistribution and use in source and binary forms, with or without
 | |
| .\" modification, are permitted provided that the following conditions
 | |
| .\" are met:
 | |
| .\" 1. Redistributions of source code must retain the above copyright
 | |
| .\"    notice, this list of conditions and the following disclaimer.
 | |
| .\" 2. Redistributions in binary form must reproduce the above copyright
 | |
| .\"    notice, this list of conditions and the following disclaimer in the
 | |
| .\"    documentation and/or other materials provided with the distribution.
 | |
| .\"
 | |
| .\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
 | |
| .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
 | |
| .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
 | |
| .\" ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
 | |
| .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
 | |
| .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
 | |
| .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
 | |
| .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
 | |
| .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
 | |
| .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
 | |
| .\" SUCH DAMAGE.
 | |
| .\"
 | |
| .\" $FreeBSD: head/lib/libarchive/tar.5 201077 2009-12-28 01:50:23Z kientzle $
 | |
| .\"
 | |
| .Dd December 27, 2009
 | |
| .Dt tar 5
 | |
| .Os
 | |
| .Sh NAME
 | |
| .Nm tar
 | |
| .Nd format of tape archive files
 | |
| .Sh DESCRIPTION
 | |
| The
 | |
| .Nm
 | |
| archive format collects any number of files, directories, and other
 | |
| file system objects (symbolic links, device nodes, etc.) into a single
 | |
| stream of bytes.
 | |
| The format was originally designed to be used with
 | |
| tape drives that operate with fixed-size blocks, but is widely used as
 | |
| a general packaging mechanism.
 | |
| .Ss General Format
 | |
| A
 | |
| .Nm
 | |
| archive consists of a series of 512-byte records.
 | |
| Each file system object requires a header record which stores basic metadata
 | |
| (pathname, owner, permissions, etc.) and zero or more records containing any
 | |
| file data.
 | |
| The end of the archive is indicated by two records consisting
 | |
| entirely of zero bytes.
 | |
| .Pp
 | |
| For compatibility with tape drives that use fixed block sizes,
 | |
| programs that read or write tar files always read or write a fixed
 | |
| number of records with each I/O operation.
 | |
| These
 | |
| .Dq blocks
 | |
| are always a multiple of the record size.
 | |
| The maximum block size supported by early
 | |
| implementations was 10240 bytes or 20 records.
 | |
| This is still the default for most implementations
 | |
| although block sizes of 1MiB (2048 records) or larger are
 | |
| commonly used with modern high-speed tape drives.
 | |
| (Note: the terms
 | |
| .Dq block
 | |
| and
 | |
| .Dq record
 | |
| here are not entirely standard; this document follows the
 | |
| convention established by John Gilmore in documenting
 | |
| .Nm pdtar . )
 | |
| .Ss Old-Style Archive Format
 | |
| The original tar archive format has been extended many times to
 | |
| include additional information that various implementors found
 | |
| necessary.
 | |
| This section describes the variant implemented by the tar command
 | |
| included in
 | |
| .At v7 ,
 | |
| which seems to be the earliest widely-used version of the tar program.
 | |
| .Pp
 | |
| The header record for an old-style
 | |
| .Nm
 | |
| archive consists of the following:
 | |
| .Bd -literal -offset indent
 | |
| struct header_old_tar {
 | |
| 	char name[100];
 | |
| 	char mode[8];
 | |
| 	char uid[8];
 | |
| 	char gid[8];
 | |
| 	char size[12];
 | |
| 	char mtime[12];
 | |
| 	char checksum[8];
 | |
| 	char linkflag[1];
 | |
| 	char linkname[100];
 | |
| 	char pad[255];
 | |
| };
 | |
| .Ed
 | |
| All unused bytes in the header record are filled with nulls.
 | |
| .Bl -tag -width indent
 | |
| .It Va name
 | |
| Pathname, stored as a null-terminated string.
 | |
| Early tar implementations only stored regular files (including
 | |
| hardlinks to those files).
 | |
| One common early convention used a trailing "/" character to indicate
 | |
| a directory name, allowing directory permissions and owner information
 | |
| to be archived and restored.
 | |
| .It Va mode
 | |
| File mode, stored as an octal number in ASCII.
 | |
| .It Va uid , Va gid
 | |
| User id and group id of owner, as octal numbers in ASCII.
 | |
| .It Va size
 | |
| Size of file, as octal number in ASCII.
 | |
| For regular files only, this indicates the amount of data
 | |
| that follows the header.
 | |
| In particular, this field was ignored by early tar implementations
 | |
| when extracting hardlinks.
 | |
| Modern writers should always store a zero length for hardlink entries.
 | |
| .It Va mtime
 | |
| Modification time of file, as an octal number in ASCII.
 | |
| This indicates the number of seconds since the start of the epoch,
 | |
| 00:00:00 UTC January 1, 1970.
 | |
| Note that negative values should be avoided
 | |
| here, as they are handled inconsistently.
 | |
| .It Va checksum
 | |
| Header checksum, stored as an octal number in ASCII.
 | |
| To compute the checksum, set the checksum field to all spaces,
 | |
| then sum all bytes in the header using unsigned arithmetic.
 | |
| This field should be stored as six octal digits followed by a null and a space
 | |
| character.
 | |
| Note that many early implementations of tar used signed arithmetic
 | |
| for the checksum field, which can cause interoperability problems
 | |
| when transferring archives between systems.
 | |
| Modern robust readers compute the checksum both ways and accept the
 | |
| header if either computation matches.
 | |
| .It Va linkflag , Va linkname
 | |
| In order to preserve hardlinks and conserve tape, a file
 | |
| with multiple links is only written to the archive the first
 | |
| time it is encountered.
 | |
| The next time it is encountered, the
 | |
| .Va linkflag
 | |
| is set to an ASCII
 | |
| .Sq 1
 | |
| and the
 | |
| .Va linkname
 | |
| field holds the first name under which this file appears.
 | |
| (Note that regular files have a null value in the
 | |
| .Va linkflag
 | |
| field.)
 | |
| .El
 | |
| .Pp
 | |
| Early tar implementations varied in how they terminated these fields.
 | |
| The tar command in
 | |
| .At v7
 | |
| used the following conventions (this is also documented in early BSD manpages):
 | |
| the pathname must be null-terminated;
 | |
| the mode, uid, and gid fields must end in a space and a null byte;
 | |
| the size and mtime fields must end in a space;
 | |
| the checksum is terminated by a null and a space.
 | |
| Early implementations filled the numeric fields with leading spaces.
 | |
| This seems to have been common practice until the
 | |
| .St -p1003.1-88
 | |
| standard was released.
 | |
| For best portability, modern implementations should fill the numeric
 | |
| fields with leading zeros.
 | |
| .Ss Pre-POSIX Archives
 | |
| An early draft of
 | |
| .St -p1003.1-88
 | |
| served as the basis for John Gilmore's
 | |
| .Nm pdtar
 | |
| program and many system implementations from the late 1980s
 | |
| and early 1990s.
 | |
| These archives generally follow the POSIX ustar
 | |
| format described below with the following variations:
 | |
| .Bl -bullet -compact -width indent
 | |
| .It
 | |
| The magic value is
 | |
| .Dq ustar\ \&
 | |
| (note the following space).
 | |
| The version field contains a space character followed by a null.
 | |
| .It
 | |
| The numeric fields are generally filled with leading spaces
 | |
| (not leading zeros as recommended in the final standard).
 | |
| .It
 | |
| The prefix field is often not used, limiting pathnames to
 | |
| the 100 characters of old-style archives.
 | |
| .El
 | |
| .Ss POSIX ustar Archives
 | |
| .St -p1003.1-88
 | |
| defined a standard tar file format to be read and written
 | |
| by compliant implementations of
 | |
| .Xr tar 1 .
 | |
| This format is often called the
 | |
| .Dq ustar
 | |
| format, after the magic value used
 | |
| in the header.
 | |
| (The name is an acronym for
 | |
| .Dq Unix Standard TAR . )
 | |
| It extends the historic format with new fields:
 | |
| .Bd -literal -offset indent
 | |
| struct header_posix_ustar {
 | |
| 	char name[100];
 | |
| 	char mode[8];
 | |
| 	char uid[8];
 | |
| 	char gid[8];
 | |
| 	char size[12];
 | |
| 	char mtime[12];
 | |
| 	char checksum[8];
 | |
| 	char typeflag[1];
 | |
| 	char linkname[100];
 | |
| 	char magic[6];
 | |
| 	char version[2];
 | |
| 	char uname[32];
 | |
| 	char gname[32];
 | |
| 	char devmajor[8];
 | |
| 	char devminor[8];
 | |
| 	char prefix[155];
 | |
| 	char pad[12];
 | |
| };
 | |
| .Ed
 | |
| .Bl -tag -width indent
 | |
| .It Va typeflag
 | |
| Type of entry.
 | |
| POSIX extended the earlier
 | |
| .Va linkflag
 | |
| field with several new type values:
 | |
| .Bl -tag -width indent -compact
 | |
| .It Dq 0
 | |
| Regular file.
 | |
| NUL should be treated as a synonym, for compatibility purposes.
 | |
| .It Dq 1
 | |
| Hard link.
 | |
| .It Dq 2
 | |
| Symbolic link.
 | |
| .It Dq 3
 | |
| Character device node.
 | |
| .It Dq 4
 | |
| Block device node.
 | |
| .It Dq 5
 | |
| Directory.
 | |
| .It Dq 6
 | |
| FIFO node.
 | |
| .It Dq 7
 | |
| Reserved.
 | |
| .It Other
 | |
| A POSIX-compliant implementation must treat any unrecognized typeflag value
 | |
| as a regular file.
 | |
| In particular, writers should ensure that all entries
 | |
| have a valid filename so that they can be restored by readers that do not
 | |
| support the corresponding extension.
 | |
| Uppercase letters "A" through "Z" are reserved for custom extensions.
 | |
| Note that sockets and whiteout entries are not archivable.
 | |
| .El
 | |
| It is worth noting that the
 | |
| .Va size
 | |
| field, in particular, has different meanings depending on the type.
 | |
| For regular files, of course, it indicates the amount of data
 | |
| following the header.
 | |
| For directories, it may be used to indicate the total size of all
 | |
| files in the directory, for use by operating systems that pre-allocate
 | |
| directory space.
 | |
| For all other types, it should be set to zero by writers and ignored
 | |
| by readers.
 | |
| .It Va magic
 | |
| Contains the magic value
 | |
| .Dq ustar
 | |
| followed by a NUL byte to indicate that this is a POSIX standard archive.
 | |
| Full compliance requires the uname and gname fields be properly set.
 | |
| .It Va version
 | |
| Version.
 | |
| This should be
 | |
| .Dq 00
 | |
| (two copies of the ASCII digit zero) for POSIX standard archives.
 | |
| .It Va uname , Va gname
 | |
| User and group names, as null-terminated ASCII strings.
 | |
| These should be used in preference to the uid/gid values
 | |
| when they are set and the corresponding names exist on
 | |
| the system.
 | |
| .It Va devmajor , Va devminor
 | |
| Major and minor numbers for character device or block device entry.
 | |
| .It Va name , Va prefix
 | |
| If the pathname is too long to fit in the 100 bytes provided by the standard
 | |
| format, it can be split at any
 | |
| .Pa /
 | |
| character with the first portion going into the prefix field.
 | |
| If the prefix field is not empty, the reader will prepend
 | |
| the prefix value and a
 | |
| .Pa /
 | |
| character to the regular name field to obtain the full pathname.
 | |
| The standard does not require a trailing
 | |
| .Pa /
 | |
| character on directory names, though most implementations still
 | |
| include this for compatibility reasons.
 | |
| .El
 | |
| .Pp
 | |
| Note that all unused bytes must be set to
 | |
| .Dv NUL .
 | |
| .Pp
 | |
| Field termination is specified slightly differently by POSIX
 | |
| than by previous implementations.
 | |
| The
 | |
| .Va magic ,
 | |
| .Va uname ,
 | |
| and
 | |
| .Va gname
 | |
| fields must have a trailing
 | |
| .Dv NUL .
 | |
| The
 | |
| .Va pathname ,
 | |
| .Va linkname ,
 | |
| and
 | |
| .Va prefix
 | |
| fields must have a trailing
 | |
| .Dv NUL
 | |
| unless they fill the entire field.
 | |
| (In particular, it is possible to store a 256-character pathname if it
 | |
| happens to have a
 | |
| .Pa /
 | |
| as the 156th character.)
 | |
| POSIX requires numeric fields to be zero-padded in the front, and requires
 | |
| them to be terminated with either space or
 | |
| .Dv NUL
 | |
| characters.
 | |
| .Pp
 | |
| Currently, most tar implementations comply with the ustar
 | |
| format, occasionally extending it by adding new fields to the
 | |
| blank area at the end of the header record.
 | |
| .Ss Pax Interchange Format
 | |
| There are many attributes that cannot be portably stored in a
 | |
| POSIX ustar archive.
 | |
| .St -p1003.1-2001
 | |
| defined a
 | |
| .Dq pax interchange format
 | |
| that uses two new types of entries to hold text-formatted
 | |
| metadata that applies to following entries.
 | |
| Note that a pax interchange format archive is a ustar archive in every
 | |
| respect.
 | |
| The new data is stored in ustar-compatible archive entries that use the
 | |
| .Dq x
 | |
| or
 | |
| .Dq g
 | |
| typeflag.
 | |
| In particular, older implementations that do not fully support these
 | |
| extensions will extract the metadata into regular files, where the
 | |
| metadata can be examined as necessary.
 | |
| .Pp
 | |
| An entry in a pax interchange format archive consists of one or
 | |
| two standard ustar entries, each with its own header and data.
 | |
| The first optional entry stores the extended attributes
 | |
| for the following entry.
 | |
| This optional first entry has an "x" typeflag and a size field that
 | |
| indicates the total size of the extended attributes.
 | |
| The extended attributes themselves are stored as a series of text-format
 | |
| lines encoded in the portable UTF-8 encoding.
 | |
| Each line consists of a decimal number, a space, a key string, an equals
 | |
| sign, a value string, and a new line.
 | |
| The decimal number indicates the length of the entire line, including the
 | |
| initial length field and the trailing newline.
 | |
| An example of such a field is:
 | |
| .Dl 25 ctime=1084839148.1212\en
 | |
| Keys in all lowercase are standard keys.
 | |
| Vendors can add their own keys by prefixing them with an all uppercase
 | |
| vendor name and a period.
 | |
| Note that, unlike the historic header, numeric values are stored using
 | |
| decimal, not octal.
 | |
| A description of some common keys follows:
 | |
| .Bl -tag -width indent
 | |
| .It Cm atime , Cm ctime , Cm mtime
 | |
| File access, inode change, and modification times.
 | |
| These fields can be negative or include a decimal point and a fractional value.
 | |
| .It Cm uname , Cm uid , Cm gname , Cm gid
 | |
| User name, group name, and numeric UID and GID values.
 | |
| The user name and group name stored here are encoded in UTF8
 | |
| and can thus include non-ASCII characters.
 | |
| The UID and GID fields can be of arbitrary length.
 | |
| .It Cm linkpath
 | |
| The full path of the linked-to file.
 | |
| Note that this is encoded in UTF8 and can thus include non-ASCII characters.
 | |
| .It Cm path
 | |
| The full pathname of the entry.
 | |
| Note that this is encoded in UTF8 and can thus include non-ASCII characters.
 | |
| .It Cm realtime.* , Cm security.*
 | |
| These keys are reserved and may be used for future standardization.
 | |
| .It Cm size
 | |
| The size of the file.
 | |
| Note that there is no length limit on this field, allowing conforming
 | |
| archives to store files much larger than the historic 8GB limit.
 | |
| .It Cm SCHILY.*
 | |
| Vendor-specific attributes used by Joerg Schilling's
 | |
| .Nm star
 | |
| implementation.
 | |
| .It Cm SCHILY.acl.access , Cm SCHILY.acl.default
 | |
| Stores the access and default ACLs as textual strings in a format
 | |
| that is an extension of the format specified by POSIX.1e draft 17.
 | |
| In particular, each user or group access specification can include a fourth
 | |
| colon-separated field with the numeric UID or GID.
 | |
| This allows ACLs to be restored on systems that may not have complete
 | |
| user or group information available (such as when NIS/YP or LDAP services
 | |
| are temporarily unavailable).
 | |
| .It Cm SCHILY.devminor , Cm SCHILY.devmajor
 | |
| The full minor and major numbers for device nodes.
 | |
| .It Cm SCHILY.fflags
 | |
| The file flags.
 | |
| .It Cm SCHILY.realsize
 | |
| The full size of the file on disk.
 | |
| XXX explain? XXX
 | |
| .It Cm SCHILY.dev, Cm SCHILY.ino , Cm SCHILY.nlinks
 | |
| The device number, inode number, and link count for the entry.
 | |
| In particular, note that a pax interchange format archive using Joerg
 | |
| Schilling's
 | |
| .Cm SCHILY.*
 | |
| extensions can store all of the data from
 | |
| .Va struct stat .
 | |
| .It Cm LIBARCHIVE.xattr. Ns Ar namespace Ns . Ns Ar key
 | |
| Libarchive stores POSIX.1e-style extended attributes using
 | |
| keys of this form.
 | |
| The
 | |
| .Ar key
 | |
| value is URL-encoded:
 | |
| All non-ASCII characters and the two special characters
 | |
| .Dq =
 | |
| and
 | |
| .Dq %
 | |
| are encoded as
 | |
| .Dq %
 | |
| followed by two uppercase hexadecimal digits.
 | |
| The value of this key is the extended attribute value
 | |
| encoded in base 64.
 | |
| XXX Detail the base-64 format here XXX
 | |
| .It Cm VENDOR.*
 | |
| XXX document other vendor-specific extensions XXX
 | |
| .El
 | |
| .Pp
 | |
| Any values stored in an extended attribute override the corresponding
 | |
| values in the regular tar header.
 | |
| Note that compliant readers should ignore the regular fields when they
 | |
| are overridden.
 | |
| This is important, as existing archivers are known to store non-compliant
 | |
| values in the standard header fields in this situation.
 | |
| There are no limits on length for any of these fields.
 | |
| In particular, numeric fields can be arbitrarily large.
 | |
| All text fields are encoded in UTF8.
 | |
| Compliant writers should store only portable 7-bit ASCII characters in
 | |
| the standard ustar header and use extended
 | |
| attributes whenever a text value contains non-ASCII characters.
 | |
| .Pp
 | |
| In addition to the
 | |
| .Cm x
 | |
| entry described above, the pax interchange format
 | |
| also supports a
 | |
| .Cm g
 | |
| entry.
 | |
| The
 | |
| .Cm g
 | |
| entry is identical in format, but specifies attributes that serve as
 | |
| defaults for all subsequent archive entries.
 | |
| The
 | |
| .Cm g
 | |
| entry is not widely used.
 | |
| .Pp
 | |
| Besides the new
 | |
| .Cm x
 | |
| and
 | |
| .Cm g
 | |
| entries, the pax interchange format has a few other minor variations
 | |
| from the earlier ustar format.
 | |
| The most troubling one is that hardlinks are permitted to have
 | |
| data following them.
 | |
| This allows readers to restore any hardlink to a file without
 | |
| having to rewind the archive to find an earlier entry.
 | |
| However, it creates complications for robust readers, as it is no longer
 | |
| clear whether or not they should ignore the size field for hardlink entries.
 | |
| .Ss GNU Tar Archives
 | |
| The GNU tar program started with a pre-POSIX format similar to that
 | |
| described earlier and has extended it using several different mechanisms:
 | |
| It added new fields to the empty space in the header (some of which was later
 | |
| used by POSIX for conflicting purposes);
 | |
| it allowed the header to be continued over multiple records;
 | |
| and it defined new entries that modify following entries
 | |
| (similar in principle to the
 | |
| .Cm x
 | |
| entry described above, but each GNU special entry is single-purpose,
 | |
| unlike the general-purpose
 | |
| .Cm x
 | |
| entry).
 | |
| As a result, GNU tar archives are not POSIX compatible, although
 | |
| more lenient POSIX-compliant readers can successfully extract most
 | |
| GNU tar archives.
 | |
| .Bd -literal -offset indent
 | |
| struct header_gnu_tar {
 | |
| 	char name[100];
 | |
| 	char mode[8];
 | |
| 	char uid[8];
 | |
| 	char gid[8];
 | |
| 	char size[12];
 | |
| 	char mtime[12];
 | |
| 	char checksum[8];
 | |
| 	char typeflag[1];
 | |
| 	char linkname[100];
 | |
| 	char magic[6];
 | |
| 	char version[2];
 | |
| 	char uname[32];
 | |
| 	char gname[32];
 | |
| 	char devmajor[8];
 | |
| 	char devminor[8];
 | |
| 	char atime[12];
 | |
| 	char ctime[12];
 | |
| 	char offset[12];
 | |
| 	char longnames[4];
 | |
| 	char unused[1];
 | |
| 	struct {
 | |
| 		char offset[12];
 | |
| 		char numbytes[12];
 | |
| 	} sparse[4];
 | |
| 	char isextended[1];
 | |
| 	char realsize[12];
 | |
| 	char pad[17];
 | |
| };
 | |
| .Ed
 | |
| .Bl -tag -width indent
 | |
| .It Va typeflag
 | |
| GNU tar uses the following special entry types, in addition to
 | |
| those defined by POSIX:
 | |
| .Bl -tag -width indent
 | |
| .It "7"
 | |
| GNU tar treats type "7" records identically to type "0" records,
 | |
| except on one obscure RTOS where they are used to indicate the
 | |
| pre-allocation of a contiguous file on disk.
 | |
| .It "D"
 | |
| This indicates a directory entry.
 | |
| Unlike the POSIX-standard "5"
 | |
| typeflag, the header is followed by data records listing the names
 | |
| of files in this directory.
 | |
| Each name is preceded by an ASCII "Y"
 | |
| if the file is stored in this archive or "N" if the file is not
 | |
| stored in this archive.
 | |
| Each name is terminated with a null, and
 | |
| an extra null marks the end of the name list.
 | |
| The purpose of this
 | |
| entry is to support incremental backups; a program restoring from
 | |
| such an archive may wish to delete files on disk that did not exist
 | |
| in the directory when the archive was made.
 | |
| .Pp
 | |
| Note that the "D" typeflag specifically violates POSIX, which requires
 | |
| that unrecognized typeflags be restored as normal files.
 | |
| In this case, restoring the "D" entry as a file could interfere
 | |
| with subsequent creation of the like-named directory.
 | |
| .It "K"
 | |
| The data for this entry is a long linkname for the following regular entry.
 | |
| .It "L"
 | |
| The data for this entry is a long pathname for the following regular entry.
 | |
| .It "M"
 | |
| This is a continuation of the last file on the previous volume.
 | |
| GNU multi-volume archives guarantee that each volume begins with a valid
 | |
| entry header.
 | |
| To ensure this, a file may be split, with part stored at the end of one volume,
 | |
| and part stored at the beginning of the next volume.
 | |
| The "M" typeflag indicates that this entry continues an existing file.
 | |
| Such entries can only occur as the first or second entry
 | |
| in an archive (the latter only if the first entry is a volume label).
 | |
| The
 | |
| .Va size
 | |
| field specifies the size of this entry.
 | |
| The
 | |
| .Va offset
 | |
| field at bytes 369-380 specifies the offset where this file fragment
 | |
| begins.
 | |
| The
 | |
| .Va realsize
 | |
| field specifies the total size of the file (which must equal
 | |
| .Va size
 | |
| plus
 | |
| .Va offset ) .
 | |
| When extracting, GNU tar checks that the header file name is the one it is
 | |
| expecting, that the header offset is in the correct sequence, and that
 | |
| the sum of offset and size is equal to realsize.
 | |
| .It "N"
 | |
| Type "N" records are no longer generated by GNU tar.
 | |
| They contained a
 | |
| list of files to be renamed or symlinked after extraction; this was
 | |
| originally used to support long names.
 | |
| The contents of this record
 | |
| are a text description of the operations to be done, in the form
 | |
| .Dq Rename %s to %s\en
 | |
| or
 | |
| .Dq Symlink %s to %s\en ;
 | |
| in either case, both
 | |
| filenames are escaped using K&R C syntax.
 | |
| Due to security concerns, "N" records are now generally ignored
 | |
| when reading archives.
 | |
| .It "S"
 | |
| This is a
 | |
| .Dq sparse
 | |
| regular file.
 | |
| Sparse files are stored as a series of fragments.
 | |
| The header contains a list of fragment offset/length pairs.
 | |
| If more than four such entries are required, the header is
 | |
| extended as necessary with
 | |
| .Dq extra
 | |
| header extensions (an older format that is no longer used), or
 | |
| .Dq sparse
 | |
| extensions.
 | |
| .It "V"
 | |
| The
 | |
| .Va name
 | |
| field should be interpreted as a tape/volume header name.
 | |
| This entry should generally be ignored on extraction.
 | |
| .El
 | |
| .It Va magic
 | |
| The magic field holds the five characters
 | |
| .Dq ustar
 | |
| followed by a space.
 | |
| Note that POSIX ustar archives have a trailing null.
 | |
| .It Va version
 | |
| The version field holds a space character followed by a null.
 | |
| Note that POSIX ustar archives use two copies of the ASCII digit
 | |
| .Dq 0 .
 | |
| .It Va atime , Va ctime
 | |
| The time the file was last accessed and the time of
 | |
| last change of file information, stored in octal as with
 | |
| .Va mtime .
 | |
| .It Va longnames
 | |
| This field is apparently no longer used.
 | |
| .It Sparse Va offset / Va numbytes
 | |
| Each such structure specifies a single fragment of a sparse
 | |
| file.
 | |
| The two fields store values as octal numbers.
 | |
| The fragments are each padded to a multiple of 512 bytes
 | |
| in the archive.
 | |
| On extraction, the list of fragments is collected from the
 | |
| header (including any extension headers), and the data
 | |
| is then read and written to the file at appropriate offsets.
 | |
| .It Va isextended
 | |
| If this is set to non-zero, the header will be followed by additional
 | |
| .Dq sparse header
 | |
| records.
 | |
| Each such record contains information about as many as 21 additional
 | |
| sparse blocks as shown here:
 | |
| .Bd -literal -offset indent
 | |
| struct gnu_sparse_header {
 | |
| 	struct {
 | |
| 		char offset[12];
 | |
| 		char numbytes[12];
 | |
| 	} sparse[21];
 | |
| 	char    isextended[1];
 | |
| 	char    padding[7];
 | |
| };
 | |
| .Ed
 | |
| .It Va realsize
 | |
| A binary representation of the file's complete size, with a much larger range
 | |
| than the POSIX file size.
 | |
| In particular, with
 | |
| .Cm M
 | |
| type files, the current entry is only a portion of the file.
 | |
| In that case, the POSIX size field will indicate the size of this
 | |
| entry; the
 | |
| .Va realsize
 | |
| field will indicate the total size of the file.
 | |
| .El
 | |
| .Ss GNU tar pax archives
 | |
| GNU tar 1.14 (XXX check this XXX) and later will write
 | |
| pax interchange format archives when you specify the
 | |
| .Fl -posix
 | |
| flag.
 | |
| This format uses custom keywords to store sparse file information.
 | |
| There have been three iterations of this support, referred to
 | |
| as
 | |
| .Dq 0.0 ,
 | |
| .Dq 0.1 ,
 | |
| and
 | |
| .Dq 1.0 .
 | |
| .Bl -tag -width indent
 | |
| .It Cm GNU.sparse.numblocks , Cm GNU.sparse.offset , Cm GNU.sparse.numbytes , Cm  GNU.sparse.size
 | |
| The
 | |
| .Dq 0.0
 | |
| format used an initial
 | |
| .Cm GNU.sparse.numblocks
 | |
| attribute to indicate the number of blocks in the file, a pair of
 | |
| .Cm GNU.sparse.offset
 | |
| and
 | |
| .Cm GNU.sparse.numbytes
 | |
| to indicate the offset and size of each block,
 | |
| and a single
 | |
| .Cm GNU.sparse.size
 | |
| to indicate the full size of the file.
 | |
| This is not the same as the size in the tar header because the
 | |
| latter value does not include the size of any holes.
 | |
| This format required that the order of attributes be preserved and
 | |
| relied on readers accepting multiple appearances of the same attribute
 | |
| names, which is not officially permitted by the standards.
 | |
| .It Cm GNU.sparse.map
 | |
| The
 | |
| .Dq 0.1
 | |
| format used a single attribute that stored a comma-separated
 | |
| list of decimal numbers.
 | |
| Each pair of numbers indicated the offset and size, respectively,
 | |
| of a block of data.
 | |
| This does not work well if the archive is extracted by an archiver
 | |
| that does not recognize this extension, since many pax implementations
 | |
| simply discard unrecognized attributes.
 | |
| .It Cm GNU.sparse.major , Cm GNU.sparse.minor , Cm GNU.sparse.name , Cm GNU.sparse.realsize
 | |
| The
 | |
| .Dq 1.0
 | |
| format stores the sparse block map in one or more 512-byte blocks
 | |
| prepended to the file data in the entry body.
 | |
| The pax attributes indicate the existence of this map
 | |
| (via the
 | |
| .Cm GNU.sparse.major
 | |
| and
 | |
| .Cm GNU.sparse.minor
 | |
| fields)
 | |
| and the full size of the file.
 | |
| The
 | |
| .Cm GNU.sparse.name
 | |
| holds the true name of the file.
 | |
| To avoid confusion, the name stored in the regular tar header
 | |
| is a modified name so that extraction errors will be apparent
 | |
| to users.
 | |
| .El
 | |
| .Ss Solaris Tar
 | |
| XXX More Details Needed XXX
 | |
| .Pp
 | |
| Solaris tar (beginning with SunOS XXX 5.7 ?? XXX) supports an
 | |
| .Dq extended
 | |
| format that is fundamentally similar to pax interchange format,
 | |
| with the following differences:
 | |
| .Bl -bullet -compact -width indent
 | |
| .It
 | |
| Extended attributes are stored in an entry whose type is
 | |
| .Cm X ,
 | |
| not
 | |
| .Cm x ,
 | |
| as used by pax interchange format.
 | |
| The detailed format of this entry appears to be the same
 | |
| as detailed above for the
 | |
| .Cm x
 | |
| entry.
 | |
| .It
 | |
| An additional
 | |
| .Cm A
 | |
| entry is used to store an ACL for the following regular entry.
 | |
| The body of this entry contains a seven-digit octal number
 | |
| followed by a zero byte, followed by the
 | |
| textual ACL description.
 | |
| The octal value is the number of ACL entries
 | |
| plus a constant that indicates the ACL type: 01000000
 | |
| for POSIX.1e ACLs and 03000000 for NFSv4 ACLs.
 | |
| .El
 | |
| .Ss AIX Tar
 | |
| XXX More details needed XXX
 | |
| .Ss Mac OS X Tar
 | |
| The tar distributed with Apple's Mac OS X stores most regular files
 | |
| as two separate entries in the tar archive.
 | |
| The two entries have the same name except that the first
 | |
| one has
 | |
| .Dq ._
 | |
| added to the beginning of the name.
 | |
| This first entry stores the
 | |
| .Dq resource fork
 | |
| with additional attributes for the file.
 | |
| The Mac OS X
 | |
| .Fn CopyFile
 | |
| API is used to separate a file on disk into separate
 | |
| resource and data streams and to reassemble those separate
 | |
| streams when the file is restored to disk.
 | |
| .Ss Other Extensions
 | |
| One obvious extension to increase the size of files is to
 | |
| eliminate the terminating characters from the various
 | |
| numeric fields.
 | |
| For example, the standard only allows the size field to contain
 | |
| 11 octal digits, reserving the twelfth byte for a trailing
 | |
| NUL character.
 | |
| Allowing 12 octal digits allows file sizes up to 64 GB.
 | |
| .Pp
 | |
| Another extension, utilized by GNU tar, star, and other newer
 | |
| .Nm
 | |
| implementations, permits binary numbers in the standard numeric fields.
 | |
| This is flagged by setting the high bit of the first byte.
 | |
| This permits 95-bit values for the length and time fields
 | |
| and 63-bit values for the uid, gid, and device numbers.
 | |
| GNU tar supports this extension for the
 | |
| length, mtime, ctime, and atime fields.
 | |
| Joerg Schilling's star program supports this extension for
 | |
| all numeric fields.
 | |
| Note that this extension is largely obsoleted by the extended attribute
 | |
| record provided by the pax interchange format.
 | |
| .Pp
 | |
| Another early GNU extension allowed base-64 values rather than octal.
 | |
| This extension was short-lived and is no longer supported by any
 | |
| implementation.
 | |
| .Sh SEE ALSO
 | |
| .Xr ar 1 ,
 | |
| .Xr pax 1 ,
 | |
| .Xr tar 1
 | |
| .Sh STANDARDS
 | |
| The
 | |
| .Nm tar
 | |
| utility is no longer a part of POSIX or the Single Unix Standard.
 | |
| It last appeared in
 | |
| .St -susv2 .
 | |
| It has been supplanted in subsequent standards by
 | |
| .Xr pax 1 .
 | |
| The ustar format is currently part of the specification for the
 | |
| .Xr pax 1
 | |
| utility.
 | |
| The pax interchange file format is new with
 | |
| .St -p1003.1-2001 .
 | |
| .Sh HISTORY
 | |
| A
 | |
| .Nm tar
 | |
| command appeared in Seventh Edition Unix, which was released in January, 1979.
 | |
| It replaced the
 | |
| .Nm tp
 | |
| program from Fourth Edition Unix which in turn replaced the
 | |
| .Nm tap
 | |
| program from First Edition Unix.
 | |
| John Gilmore's
 | |
| .Nm pdtar
 | |
| public-domain implementation (circa 1987) was highly influential
 | |
| and formed the basis of
 | |
| .Nm GNU tar
 | |
| (circa 1988).
 | |
| Joerg Shilling's
 | |
| .Nm star
 | |
| archiver is another open-source (GPL) archiver (originally developed
 | |
| circa 1985) which features complete support for pax interchange
 | |
| format.
 | |
| .Pp
 | |
| This documentation was written as part of the
 | |
| .Nm libarchive
 | |
| and
 | |
| .Nm bsdtar
 | |
| project by
 | |
| .An Tim Kientzle Aq kientzle@FreeBSD.org .
 | 
