Categories: Shell Script

Regular Expressions on Linux [Basic Guide]

Let’s be serious. Get the kids out of the classroom. If you don’t know a lot about regular expressions on Linux, you don’t know Linux yet. It’s just like having a yacht to ride in the pool at home.

A regular expression is a formal method of specifying a text pattern to search for in one or more files.

It is a composition of characters with special functions (metacharacters) that grouped together with literal characters (from A to Z) and numbers can form a sequence, an expression that shell and text editors can understand and search.

Regular expressions are useful for searching or validating variable texts such as:

IP address number;
E-mail address;
Internet address (URL);
Data in the column in a text;
Data entered in a language, such as HTML;
number, ID number, social security number, etc.;
Date and Time.

Several text editors and programming languages support regular expressions. The important tools that work with this feature for the CompTIA Linux+ test are grep and sed.

Grep command

Use:

$ grep [options] regular-expression files

The grep command is widely used in everyday administrative tasks in Linux.

It filters the lines of a given file by searching for a regular expression as standard.

Grep can read one or more files that are passed as arguments, or it can receive in standard input the redirecting the output of another process.

If grep receives more than one file or a wildcard as an argument to perform its search, it will indicate the file name followed by a colon and the line found.

Your most frequent options are:

-c: Shows only the number of occurrences in the files and not the lines where the occurrences were found;
-h: Shows only the lines found, without indicating the file names;
-i: Search for occurrences ignoring whether the letters are in uppercase or lowercase letters;
-v: Show all lines of the searched file minus the occurrences found. It has the opposite effect;
-n: Shows, in addition to the text of the lines found, the number of lines within the files;
-B n: Shows n lines before the line found;
-A n: Shows n lines after the line found.

Examples:

$ grep uira /etc/passwd

uira:x: 500:100:uira: /home/uira: /bin/bash

Look for the word uira in the /etc/password file.

$ grep ‘^u’ /etc/passwd

uucp:x: 10:14:Unix-to-UNIX COpy system: /etc/uucp: /bin/bash

uira:x: 500:100:uira: /home/uira: /bin/bash

Search for all lines starting with the letter u in the file /etc/password. The circumflex accent symbolizes the beginning of a line.

$ grep ‘false$’ /etc/passwd

mail:x: 8:12:Mailer daemon: /var/spool/clientmqueue: /bin/false

wwwrun:x: 30:8:WWW daemon apache: /var/lib/wwwrun: /bin/false

Search for all lines that end with the word false. The $ symbol represents the end of a line.

$ grep ‘^ [aeiou] ‘/etc/passwd

uucp:x: 10:14:Unix-to-UNIX CoPy system: /etc/uucp: /bin/bash

at:x: 25:25:Batch jobs daemon: /var/spool/atjobs: /bin/bash

uira:x: 500:100:uira: /home/uira: /bin/bash

aliases :x: 501:1000: :/var/qmail: /bin/false

In the shell, always use ‘single quotations’ when using regular expressions.

Look for all the lines that begin with the vowels. The expression A regular called list looks for any of the characters inside the bracket. Note that the list only searches for just one character, no matter how much longer than that character Be it.

$grep ‘^. [aeiou] ‘/etc/passwd

root:x: 0:0:root: /root: /bin/bash

bin:x: 1:1:bin: /bin: /bin: /bin/bash

news:x: 9:13:News system: /etc/news: /bin/bash

uira:x: 500:100:uira: /home/uira: /bin/bash

Search for all lines where the first character is either and the second character is a vowel. The final point in the expression regular symbolizes “any character”.

$ grep ‘[0-9] [0-9] [0-9] [0-9]’ /etc/passwd squid:x: 31:65534:WWW-proxy squid: /var/cache/squid

: /bin/false nobody:x: 65534:65533:nobody: /var/lib/nobody: /bin/bash alias:x: 501:1000: :/var/lib/nobody: /bin/bash

alias:x: 501:1000: :/var/lib/nobody: /bin/bash alias:x: 501:1000: :/var/lib/nobody: /bin/bash

alias:x: 501:1000: :/var/lib/nobody: /bin/bash alias:x: 501:1000: :/var/lib/nobody: /bin//qmail: /bin/false

qmails:x: 502:1000: :/var/qmail: /bin/false

Search for lines containing a sequence of four consecutive numbers.

$ cat * | grep security

Search all files in a directory where the word security occurs.

Egrep command

Use:

$ egrep [options] regular-expression files

The egrep command is very similar to grep, but it supports expressions regular with the metacharacters +,? , | and ().

fgrep command

Use:

$ fgrep [options] key – search files

The fgrep command is also similar to grep, but it doesn’t support it regular expressions, searching only for a search key or text in files. For this reason, it’s faster than grep, but less versatile.

sed command

Use:

$sed [options] {script}

The sed command is a text editor simple used to make small transformations in the content of the files.

sed takes text from one or more files passed as an argument to command line and transform it by sending the modification to the standard output (video monitor).

If we want sed to actually change the contents of the file it is It is necessary to use the redirector greater than “>” for another file Any and then copy. The “-i extension” option also makes it possible to edit directly the file and saving a backup copy with the extension indicated.

$ sed ‘s/\ /usr\ /local\ /bin/\ /usr\ /bin/’ texto.txt > textonovo.txt

You can choose to use —i, so sed will alter texto.txt and maintain a backup in text.txt told:

$ sed —i old ‘s/\ /usr\ /local\ /bin/\ /usr\ /bin/’ texto.txt

Change the /usr/local/bin sequence to /usr/bin in the texto.txt file. Observe What the backslashes (\) tell the regular expression interpreter that the character immediately after must be understood in its literal form and not a regular expression symbol.

$ sed ‘s/uira/carla/’ /etc/passwd

Change the name Uira to the name Carla in the /etc/passwd file

Summary of Regular Expressions

Observe in the table the main symbols that can be combined to form regular expressions complex:

TABLE – Regular Expressions

Symbol	Description
‘	Single quotes are used to delimit the text within them, such as Literal characters not interpreting any characters that are inside of them as a metacharacter.
”	Double quotes are used to delimit the text within them as literal characters except the “$” and “\” signs.
\	The counterbar is used to inform that the character immediately after it must be interpreted as literal.
^	The accent A circumflex is used to indicate the beginning of a line.
$	The dollar sign is used to indicate the end of a line.
[cheered]	A string of characters inside square brackets is interpreted as all the characters are valid for the search.
[a-z]	The minus sign indicates that any character between the letter a and the letter z are valid.
[^abc]	Indicates that Any character except the letters a, b, c are valid in the search.
/word/	Indicates that the The search word must be separated by spaces on the left and on the right.
[0-9]	Indicates that Any number from zero to nine is valid for the search.
.	The final point indicates any character but one blank line.

Examples:

To search for lines starting with “Nov 10”:

$ grep “^Nov 10” messages

Nov 10 01:12:55 gs123 ntpd [2241]: time reset +0.177479 s

Nov 10 01:17:17 gs123 ntpd [2241]: synchronized to LOCAL (0), stratum 10

Nov 10 01:18:49 gs123 ntpd [2241]: synchronized to 15.1.13.13, stratum 3

To search for lines ending with “terminating.”

$ grep “terminating.$” messages

Jul 12 17:01:09 Cloneme kernel: Kernel log daemon terminating.

Oct 28 06:29:54 Cloneme Kernel: Kernel log daemon terminating.

For Count the blank lines in meulog.log, we use the “^$”:

$ grep -c “^$” meulog.log

mylog.log:3

To search for all “.ola” occurrences from a file:

$ grep “.ola”

Mola

Tola

Hola

Uirá Endy Ribeiro

Uirá Endy Ribeiro is a Software Developer and Cloud Computing Architect with a 23-year career. He has master's degrees in computer science and fifteen IT certifications and is the author of 11 books recognized in the IT world market. He is also Director at Universidade Salgado de Oliveira and Director of the Linux Professional Institute - LPI Director's Board.

Uirá Endy Ribeiro