Linux Grep Regular Expression Example

grepis one of the most useful and powerful commands for text processing in Linux

7 min read
By myfreax
Linux Grep Regular Expression Example
Linux Grep Regular Expression Example

grepis one of the most useful and powerful commands for text processing in Linux. grepSearches one or more input files for lines matching a regular expression, and writes each matching line to standard output.

A regular expression is a pattern that matches a set of strings. Patterns consist of operators, literal characters, and metacharacters, which have special meanings. GNU grepsupports three regular expression syntaxes Basic, Extended and Perl-compatible.

grepCalled as Basic when no regular expression type is given, grepinterpreting the search pattern as a basic Basic regular expression. To interpret the pattern as an extended regular expression, use the -E/ --extended-regexpoption.

In GNU grep's implementation, there is no functional difference between the basic regular expression and extended regular expression syntax, and the two are identical.

The only difference is that the metacharacters ?, +, {, |, (and in basic regular expressions )are interpreted as literal characters, ie, these characters are not interpreted as regular expressions.

In order to preserve the special meaning of metacharacters when using basic regular expressions, characters must \be escaped with a backslash. We'll explain what these and other metacharacters mean later.

In general, you should always enclose regular expressions in single quotes to avoid shell interpretation and execution of metacharacters in shell meaning.

character match

grepThe most basic usage of the command is to search a file for a character or string. In addition to searching the contents of files, grep can also search the contents of standard input.

For example, to search for bashthe user using the login shell as the default, you can /etc/passwdsearch the file for all lines containing bashthe string.

The following grepcommand will search the contents of the file, then print the user that contains the user using bash as the login shell:

grep bash /etc/passwd

The output should look like this:

root:x:0:0:root:/root:/bin/bash
myfreax:x:1000:1000:myfreax:/home/myfreax:/bin/bash

In this example, the string bashis a basic regular expression consisting of four characters. This tells grepthe search to be followed by the b, a, s, hstring.

By default, grepcommands are case-sensitive. This means that uppercase and lowercase characters are treated as different characters. To ignore case when searching, use the -i/ --ignore-caseoption.

It's worth mentioning that grepthe search pattern is searched/looked up as a string rather than a word. So if you're searching gnu, grepthe line with gnu embedded in the larger word will also be printed. For example cygnusor magnum.

If you search for a fully qualified string or a string containing spaces, you need to enclose it in single or double quotes, this:

grep "Gnome Display Manager" /etc/passwd

line header and line end

^The caret sign indicates a string match at the beginning of the line. If the regex ^starts with, the following string grepwill be matched at the beginning of each line .^

The following grepcommand will file.txtsearch a file for lines starting with a string linux:

grep '^linux' file.txt

$The dollar sign matches the end-of-line string. $After that indicates what you need to search for. grepThe string that will be matched at the end of each line $.

The following grepcommand will file.txtsearch a file for lines ending with the string linux:

grep 'linux$' file.txt

In addition to searching for the beginning and end of a line, you can also combine ^关键词$regular expressions constructed by . Will allow searching for the specified content, not the line that embeds the large string match.

Another useful example is combining ^$patterns to match all empty lines, i.e. nothing at the beginning or end. This is especially useful when looking for blank lines.

The following grepcommand will file.txtsearch a file for lines that only contain linux:

grep '^linux$' file.txt

matches a single character

.A symbol is a metacharacter that matches any single character.

For example, to include kan followed by anything that has two characters and ends with the string "roo", you can use the following pattern:

grep 'kan..roo' file.txt

Bracket Expression

[]Bracket expressions allow characters to be matched within brackets []to match a set of characters. []That is , any character within the brackets is used to match a line.

For example, the following grepcommand will file.txtsearch a file for lines containing acceptor accent:

grep 'acce[np]t' file.txt

If the first character inside square brackets is a symbol ^, it will match any character not enclosed in square brackets.

The following pattern will match strings containing except, lrepresenting any character. For example , any combination of strings that do not match the included line.co.a.cocacobaltcola

For example, the following grepcommand will file.txtsearch a file colafor lines that are not:

grep 'co[^l]a' file.txt

You can construct a range expression by specifying the first and last characters of a hyphen-delimited range, specifying a sequence of characters within the bracket expression instead of writing all the characters one by one.

For example, [a-e]equivalent to [abcde], [1-3]equivalent to [123]. The following expression matches every line starting with an uppercase letter:

grep '^[A-Z]' file.txt

grepPredefined character categories enclosed in square brackets are also supported. [:alnum:]Indicates matching a single digit and alphabetic character, the [0-9A-Za-z]same as with. [:alpha:]Indicates matching a single alphabetic character, the [A-Za-z]same as .

[:blank:]means match a single space and a tab. [:digit:]Indicates matching a single digit 0 1 2 3 4 5 6 7 8 9.

[:lower:] means match a single lowercase alphabetic character, [a-z]same as . [:upper:] means match a single uppercase letter, [A-Z]same as .

quantifier

Quantifiers allow you to specify the number of times a match must occur, i.e. a matching keyword can be matched more than once. grepThe following are some of the quantifiers supported by GNU .

*Indicates zero or more matches. ?means match the previous item zero or one time, +means match the previous item one or more times. {n} matches the previous item n, which nis a number.

{n,}Match at least n times. {,m}Matches the previous item at most m times. {n,m} matches the previous item must appear from nm times, if it is {2,4}, that is, 2 to 4 times.

Now that we know about quantifiers in regular expressions, we will use quantifiers as an example next. Searching using quantifiers in grep, and how to avoid shell interpretation of special characters *, ?etc.

*The character matches the preceding character zero or more times. The following grepcommand examples will match sright, right, ssrightetc.

s*rightThe quantifier of a regular expression *means to match the s character zero or more times, that is, there is no upper limit, it can be many sssss. 's*right'Using single quotes around regular expressions is also a way to avoid special characters being interpreted by the shell.

echo right |  grep 's*right'
echo ssright |  grep 's*right'

The following is a more advanced pattern that matches all lines starting with a capital letter and ending with a period or comma. .*A regular expression means match any number of any characters.

The following grepcommand -Eoptions indicate the use of extended regular expressions. ^Indicates the starting position of the line, [A-Z]indicating A to big Z:

grep -E '^[A-Z].*[.,]$' file.txt

?Make the previous character optional and only match once. The following grep command will match brightboth and right.

You will have ?extra backslashes in front of the characters here. If you are using basic regular expressions you need backslash escape ?characters to avoid shell interpretation and execution.

grep 'b\?right' file.txt

The following grep -Eis a way to match patterns using extended regular expressions '\b?right', so there is no need to escape those characters that have special meanings.

grep -E 'b?right' file.txt

+The character matches the previous item one or more times. The following will match srightand ssright, but not right.

The following grep command options -Eindicate the use of extended regular expressions, the pattern indicates that one or more characters 's+'must exist , and there is no upper limit.ss

grep -E 's+right' file.txt

The brace characters {}allow you to specify an exact number, and the number of matches must be within the specified range. The following grep command will match integers from 3 to 9 digits.

In the following '[[:digit:]]{3,9}'pattern, [:digit:] represents a number from 0 to 9, [[:digit:]]which means [0-9], which {3,9}means matching 3 to 9 times, that is, a row must contain 3 to 9 consecutive digits.

grep -E '[[:digit:]]{3,9}' file.txt

OR operation

The pipe |or operator lets you specify different possible matches, which can be literal strings or regular expressions. Of all regular expression operators, this operator has the lowest precedence.

In the example below, we search Nginx's error log file for occurrences of words fatal, errorand lines, which do not need to be escaped criticalif extended regular expressions are used.|

grep 'fatal\|error\|critical' /var/log/nginx/error.log
grep -E 'fatal|error|critical' /var/log/nginx/error.log

grouping

Grouping is a feature of regular expressions that allows you to group patterns and reference them. ()Groups can be created using parentheses . When using basic regular expressions, \parentheses must be escaped with a backslash.

Regular expressions can have multiple groups. As a result, groups of matching captures are usually kept in an array whose members are in the same order as the matching groups. This usually just matches the order of the groups themselves.

The matched groups are kept in an array, and a reference to the captured group is required if needed. $1, ..., $9A reference to a captured group can be used .

The following example matches fearboth and less. Quantifiers ?make the (fear)composition optional.

grep -E '(fear)?less' file.txt

backslash expression

GNU grepincludes several metacharacters consisting of backslashes and regular characters. Here are some of the most common special backslash expressions.

\bMatch word boundaries. <Matches an empty string at the beginning of a word. >Match an empty string at the end of a word. \wmatches a word. \smatches spaces.

abjectThe following pattern will match the word sum alone object. If you embed larger words, these words will not be matched:

grep '\b[ao]bject\b' file.txt

Conclusion

Regular expressions are commonly used in text editors, programming languages, and command-line tools such as grep, sedand awk. Knowing how to construct regular expressions can be very helpful when searching text files, writing scripts, or filtering command output.

If you have any questions or feedback, please feel free to leave a comment.