Skip to content

Iconv command on Linux (converts files)

The iconv command on Linux can be used to convert different types of files to different encodings.

The most common options are:

  • -l: List all supported formats;
  • -f: Establishes the data input format;
  • -t: Establishes the data output format;
  • -o: Output file.

Examples:

In this example, iconv will convert arquivo1.txt to saída.txt, converting characters from the ISO-8859-1 encoding generally used in Windows to the UTF-8 format:

$ iconv -f ISO-8859-1 -t UTF-8 —o saída.txt arquivo1.txt

The -l option shows all the encoding formats supported by iconv:

$ iconv -l <br></br>437, 500, 500V1, 850, 851, 852, 855, 856, 857, 858, 860, 861, 862, 863, 864, <br></br>865, 866, 866NAV, 869, 874, 904, 1026, 1046, 1047, 8859_1, 8859_2, 8859_3
(...)

Character Patterns

Like any operating system, Linux needs to work with several character maps in order to cover several languages and character formats.

To do this, the use of “Character Maps” was agreed upon, which map a character from a given alphabet into a sequence of bits, which will make up those characters.

These character maps are a convention used worldwide by various computer systems, which is why they were labeled with names and numbers, so that the conversion of file bits is possible, in the correct language and characters. This also makes it possible to convert characters from one map to another, sometimes with some loss of data.

The following is a brief description of the most commonly used character maps.

ASCII

ASCII is an acronym for American Standard Code for Information Interchange, which in Portuguese means “American Standard Code for Information Interchange”. This standard is a seven-bit character encoding based on the English alphabet.

ASCII codes represent text on computers, communication equipment, and other devices that work with text. ASCII was developed for use by telegraphs in 1960 that used 7-bit printers. Most modern character encodings inherited it as a basis.

The encoding defines 128 characters, completely filling in the seven available bits. Of those, 33 are not printable, such as currently obsolete control characters that affect text processing. Except for the space character, the rest are made up of printable characters.

ASCII table
![](https://learnlinux.com.br/editor/files/ascii_pt.jpg)### ISO-8859

Most of ASCII’s 95 printable characters are sufficient for information exchange when it comes to data written in English. However, other Latin and Eastern languages need symbols to represent characters that are not covered by ASCII, such as accented letters and other characters.

The ISO-8859 standard solved this problem using 8-bit encoding, allowing 128 more encodings in addition to the 128 existing in ASCII.

Even with 128 more symbols, ISO-8859 did not contain all the special characters that German, Spanish, Portuguese, Swedish, Hungarian, and other languages required. In this way, they created different character maps that are part of ISO-8859, as follows:

  • ISO-8859-1 — Latin-1: Western European Latin characters. It is the most used because it covers English, German, French, Italian, Portuguese, Spanish and other languages from the western region of Europe;
  • ISO-8859-2 — Latin-2: Characters from Central and Eastern Europe, such as Polish, Slovenian, Serbian, Hungarian, etc.;
  • ISO-8859-3 — Latin-3: Characters from Southern Europe, such as Turkish, and also the Esperanto;
  • ISO-8859-4 — Latin-4: Characters from Northern Europe, such as Estonian, Lithuanian, among others;
  • ISO-8859-5 — Latin/Cyrillic: Characters used in Russia and Ukraine;
  • ISO-8859-6 — Latin/Arabic: Arabic characters;
  • ISO-8859-7 — Latin/Greek: Greek characters;
  • ISO-8859-8 — Latin/Hebrew: Hebrew characters;
  • ISO-8859-9 — Latin-5: Turkish characters;
  • ISO-8859-10 — Latin-6: Used in Baltic languages;
  • ISO-8859-11 — Latin/Thai: Used in Baltic languages;
  • ISO-8859-12 — Latin/Devanagari: Used in Devanagari;
  • ISO-8859-13 — Latin-7: Added some characters that were missing in Latin-4 and Latin-6;
  • ISO-8859-14 — Latin-8: Celtic characters;
  • ISO-8859-15 — Latin-9: Revision of Latin 1, removing some unused symbols and adding others;
  • ISO-8859-16 — Latin-10: Used in Southeastern Europe for Albanian, Croatian, Hungarian, Italian, Polish, Romanian, and Slovenian, but also Finnish, French, German, and Irish Gaelic (new spelling). The focus is more on letters than symbols. The currency sign is replaced with the euro symbol.

UNICODE

Unicode is a standard that allows computers to consistently represent and manipulate text from any existing writing system.

The standard consists of a repertoire of about one hundred thousand characters, a set of code diagrams for visual reference, a methodology for coding and a set of standard character encodings, an enumeration of character properties such as upper case and lower case, a set of files of computer with reference data, in addition to rules for normalization, decomposition, alphabetical ordering, and rendering.

Unicode is comprised of standardized Unicode transformation schemes called Unicode Transformation Format, or UTF.

Its success in unifying character sets led to its widespread and predominant use in the internationalization and localization of computer programs. The standard has been implemented (read about deploy) in several recent technologies, including XML, Java, and modern operating systems.

Unicode has the explicit purpose of transcending the limitations of traditional character encodings, such as those defined by the ISO 8859 standard, which are widely used in several countries but remain mostly incompatible with one another.

UTF-8 (8-bit Unicode Transformation Format) is a type of variable-length Unicode encoding created by Ken Thompson and Rob Pike.

It can represent any standard Unicode universal character and is also compatible with ASCII. For this reason, it is adopted as the universal standard encoding type for email, web pages, and other locations.

The “Internet Engineering Task Force” (IETF) requires that all protocols used on the Internet support at least UTF-8.

Learn much more about Linux in our online course. You can register here. If you already have an account, or want to create one, just log in or create your user here.

Did you like it?

Share