Skip to content

Linux data compressors (gzip, bzip and xz)

For greater efficiency and savings of backup media, there is the data compression feature.

There are basically three Linux Data Compressors with different data compression algorithms. The first to appear was GZIP, then BZIP2 and finally XZ.

Gzip and gunzip data compressor

The first widely used data compressor is gzip. It uses a comprehension algorithm called Lempel-Ziv. This technique finds duplicate characters in the input data. The second occurrence of characters is replaced by pointers to the previous reference, in the form of distance and length pairs. When compressing a file, gzip adds the suffix .gz.

To compress a file:

$ gzip file

To unzip a file:

$ gzip —d arquivo.gz

Or

$ gunzip arquivo.gz

bzip2 and bunzip2 data compressor

The bzip2 compactor compresses files using the Burrows-Wheeler and Huffman algorithm. This technique operates on large blocks of data. The larger the block size, the higher the compression rate achieved. It is considered better than conventional data compressors. When compressing a file, bzip2 adds the suffix .bz2.

To compress a file:

$ bzip2 file

To unzip a file:

$ bzip2 —d .bz2 file

Or

$ bunzip2 file. bz2

There are some cases where the compressed file may become larger than the original file. This can occur if the algorithm used finds no occurrences to compress the data and the compactor header is added to the original file.

xz and unxz data compressor

Also, we have the xz data compressor, which uses an algorithm similar to gzip. It produces files with the extension.xz or .lzma.

To compress a file:

$ xz file

To unzip:

$ xz --decompress.xz file

Or

$ unxz.xz file

To give you an idea of the difference between the three compressors gzip, bzip2 and xz, see the comparative example of the TAR package from a website backup file:

site.tar 9.8M # uncompressed file <br></br>site.tar.gz 2.6M # archive compressed with gzip <br></br>site.tar.bz2 2.4M # file compressed with bzip

site.tar.xz 2.1M # file compressed with xz ### Joining Files with Tarball

Tarball files are bundles of files and directories that maintain the original directory and file structure in a tar archive, with the possibility of compressing data.

The command to package files and directories is tar. The name of this command comes from “Tape-Archive”. It reads files and directories and saves them to tape or file.

Along with the data, it saves important information such as the last modification, access permissions, and others. This makes it able to restore the original state of the data.

The tar command options are not so optional. It takes at least two arguments:

  • options: Tells what tar should do
  • [source]: If tar is used for backup, this parameter can be a file, a device, a directory to be copied;
  • [destination]: If the command is used for backup, this option will specify the destination for the data. It can be a tarball file or a device. If used to restore the files, it will specify a tarball file and a device from which the data will be extracted.

First, you must choose what tar should do using the options:

  • -c: Creates a new .tar file;
  • -u: Add more files to the .tar file only if they are new or modified;
  • -r: Adds the files specified at the end of the file .tar;
  • -g: Creates an incremental backup;
  • -t: Lists the contents of a .tar file;
  • -x: Extracts the .tar archive files;

It even has auxiliary options:

  • -j: Uses bzip2 to compress and unzip the .tar.bz2 files;
  • -J: Uses xz to compress and unzip the files .tar.xz
  • -z: Uses gzip to compress and unzip the files .tar.gz;
  • -v: Lists all processed files;
  • -f: Indicates that the destination is a file on disk, not a magnetic tape drive;

The tar options can be combined into a single parameter such as “cvzf”.

Because it is a command that was originally designed to read/write to tape, to create a tar archive, or to read a tar archive from disk, the “f” option should always be used.

Examples:

To save a particular /var/lib/mysql directory to one in the /var/backup/mysql.tar.gz file:

$ tar cvzf /var/backup/mysql.tar.gz /var/mysql

To extract the same package:

$ tar xvzf /var/backup/mysql.tar.gz —C/

You can open the contents of a tarball file in two ways:

$ gzip —d arquivo.tar.gz

The gzip command decompresses arquivo.tar.gz and removes the suffix .gz.

$ tar xvf arquivo.tar

The tar utility extracts the contents of the package.

We can also use simpler forms:

$ tar xvzf arquivo.tar.gz

Or

$ gzip —dc arquivo.tar.gz | tar xv

If the file is compressed with bzip2, it must be unzipped by bunzip2 or use bzip2’s —d option.

$ bzip2 —d .tar.bz2 file

Or

$ bunzip2.tar.bz2 file

And

$ tar xvf arquivo.tar

In the case of files compressed with xz, the xz command can be used:

$ xz -d linux.tar.xz

Followed by:

$ tar xvf linux.tar

OR

$ tar xVjf linux.tar.xz

In the graphical environment, you can unzip and extract a tarball file without much effort, just by clicking on the file. In this way, Linux will invoke the appropriate data compactor in the background along with the tar to extract the data package in the current directory.

See the comparison between the compression performed by the gzip, bzip2 and xz compressors and an uncompressed tar archive:

$ ls -1sHS linux*
895M linux.tar
165M linux.tar.gz
126M linux.tar.bz2
105M linux-5.4.3.tar.xz

For the exam, it is recommended to memorize the following table:

Used**
**Compactor **Extension** **Tar Option**
.tar.gz Gzip $ tar xvzf arquivo.tar.gz
.tar.bz2 Bzip2 $ tar xvjf arquivo.tar. bz2
.tar.xz Xz $ tar xVJF.tar.xz
file Learn much more about Linux in our online course. You can register here. If you already have an account, or want to create one, just log in or create your user here.

Did you like it?

Share