tar {utils} | R Documentation |
Create a Tar Archive
Description
Create a tar archive.
Usage
tar(tarfile, files = NULL,
compression = c("none", "gzip", "bzip2", "xz", "zstd"),
compression_level = 6, tar = Sys.getenv("tar"),
extra_flags = "")
Arguments
tarfile |
The pathname of the tar file: tilde expansion (see
|
files |
A character vector of filepaths to be archived: the default is to archive all files under the current directory. |
compression |
character string giving the type of compression to be used (default none). Can be abbreviated. |
compression_level |
integer: the level of compression. Only used
for the internal method: see the help for |
tar |
character string: the path to the command to be used. If
the command itself contains spaces it needs to be quoted (e.g., by
|
extra_flags |
Any extra flags for an external |
Details
This is either a wrapper for a tar
command or uses an
internal implementation in R. The latter is used if tarfile
is a connection or if the argument tar
is "internal"
or
""
(the ‘factory-fresh’ default). Note that whereas
Unix-alike versions of R set the environment variable TAR, its
value is not the default for this function.
Argument extra_flags
is passed to an external tar
and
so is platform-dependent. Possibly useful values include -h
(follow symbolic links, also -L on some platforms),
‘--acls’, --exclude-backups, --exclude-vcs (and
similar) and on Windows --force-local (so drives can be
included in filepaths: this used to be the default for R on Windows).
A convenient and robust way to set options for GNU tar
is via
environment variable TAR_OPTIONS. Appending --force-local
to TAR does not work with GNU tar
due to restrictions on
how some options can be mixed. The tar
available on Windows 10
(libarchive's bsdtar
) supports drive letters by default. It
does not support the --force-local, but ignores
TAR_OPTIONS.
For GNU tar
,
--format=ustar forces a more portable format. (The default is
set at compilation and will be shown at the end of the output from
tar --help
: for version 1.35 ‘out-of-the-box’ it is
--format=gnu, but the manual says the intention is to change
to --format=posix which is the same as pax
–
it was never part of the POSIX standard for tar
and should
not be used. However, the intention has been stated now for several
years without changing the default.)
For libarchive's bsdtar
, --format=ustar is more
portable than the default.
One issue which can cause an external command to fail is a command
line too long for the system shell: this is worked
around if the external command is detected to be GNU tar
or
libarchive tar
(aka bsdtar
).
Note that files = '.'
will usually not work with an external
tar
as that would expand the list of files after
tarfile
is created. (It does work with the default internal
method.)
Value
The return code from system
or 0
for the internal
version, invisibly.
Portability
The ‘tar’ format no longer has an agreed standard!
‘Unix Standard Tar’ was part of POSIX 1003.1:1998 but has been
removed in favour of pax
, and in any case many common
implementations diverged from the former standard.
Many R platforms use a version of GNU tar
, but the
behaviour seems to be changed with each version. macOS >= 10.6,
FreeBSD and Windows 10 use bsdtar
from the libarchive
project (but for macOS often a quite-old version), and commercial
Unixes will have their own versions. bsdtar
is available for
many other platforms: macOS up to at least 10.9 (but not recently) had
GNU tar
as gnutar
and other platforms,
e.g. Solaris, have it as gtar
: on a Unix-alike
configure
will try gnutar
and gtar
before tar
.
Known problems arise from
The handling of file paths of more than 100 bytes. These were unsupported in early versions of
tar
, and supported in one way by POSIXtar
and in another by GNUtar
and yet another by the POSIXpax
command which recenttar
programs often support. The internal implementation warns on paths of more than 100 bytes, uses the ‘ustar’ way from the 1998 POSIX standard which supports up to 256 bytes (depending on the path: in particular the final component is limited to 100 bytes) if possible, otherwise the GNU way (which is widely supported, including byuntar
).Most formats do not record the encoding of file paths.
(File) links.
tar
was developed on an OS that used hard links, and physical files that were referred to more than once in the list of files to be included were included only once, the remaining instances being added as links. Later a means to include symbolic links was added. The internal implementation supports symbolic links (on OSes that support them), only. Of course, the question arises as to how links should be unpacked on OSes that do not support them: for regular files file copies can be used.Names of links in the ‘ustar’ format are restricted to 100 bytes. There is an GNU extension for arbitrarily long link names, but
bsdtar
ignores it. The internal method uses the GNU extension, with a warning.Header fields, in particular the padding to be used when fields are not full or not used. POSIX did define the correct behaviour but commonly used implementations did (and still do) not comply.
File sizes. The ‘ustar’ format is restricted to 8GB per (uncompressed) file.
For portability, avoid file paths of more than 100 bytes and all links (especially hard links and symbolic links to directories).
The internal implementation writes only the blocks of 512 bytes
required (including trailing blocks of NULs), unlike GNU tar
which by default pads with ‘nul’ to a multiple of 20 blocks
(10KB). Implementations which pad differ on whether the block padding
should occur before or after compression (or both): padding was
designed for improved performance on physical tape drives.
The ‘ustar’ format records file modification times to a resolution of 1 second: on file systems with higher resolution it is conventional to discard fractional seconds.
Compression
When an external tar
command is used, compressing the tar
archive requires that tar
supports the -z,
-j, -J or --zstdflag, and may require the
appropriate command (gzip
, bzip2
xz
or
zstd
) to be available. For GNU tar
, further
compression programs can be specified by
e.g. extra_flags = "-I lz4"
or "--lzip"
or
"--lzop"
in argument extra_flags
. Some versions of
bsdtar
accept options such as --lz4,
--lzop and --lrzip or an external compressor
via --use-compress-program lz4: these could be
supplied in extra_flags
.
NetBSD prior to 8.0 used flag --xz rather than -J,
so this should be used via extra_flags = "--xz"
rather
than compression = "xz"
. The commands from OpenBSD and the
Heirloom Toolchest are not documented to support xz
nor
zstd
.
The tar
program in recent macOS (e.g. 15.2) does
support zstd
compression.via an
external command, but Apple does not supply one.
The tar
programs in commercial Unixen such as AIX and
Solaris do not support compression.
GNU tar
added support in version 1.22 for xz
compression and in version 1.31 for zstd
compression.
bsdtar
added support for xz
in 2019 and for
zstd
in 2020.
Neither the internal or the known external tar
commands
support parallel compression — but this function can be used to write
an uncompressed tarball which can then be compressed in parallel, for
example with zstd -T0
.
Note
For users of macOS. Apple's file systems have a legacy concept of
‘resource forks’ dating from classic Mac OS and rarely used
nowadays. Apple's version of tar
stores these as separate
files in the tarball with names prefixed by ‘._’, and unpacks
such files into resource forks (if possible): other ways of unpacking
(including untar
in R) unpack them as separate files.
When argument tar
is set to the command tar
on macOS,
environment variable COPYFILE_DISABLE=1 is set, which for the
system version of tar
prevents these separate files being
included in the tarball.
See Also
https://en.wikipedia.org/wiki/Tar_(file_format),
https://pubs.opengroup.org/onlinepubs/9699919799/utilities/pax.html#tag_20_92_13_06
for the way the POSIX utility pax
handles tar
formats.
https://github.com/libarchive/libarchive/wiki/FormatTar.