Linux Grep Regular Expression Example
grepis one of the most useful and powerful commands for text processing in Linux
grep
is one of the most useful and powerful commands for text processing in Linux. grep
Searches one or more input files for lines matching a regular expression, and writes each matching line to standard output.
A regular expression is a pattern that matches a set of strings. Patterns consist of operators, literal characters, and metacharacters, which have special meanings. GNU grep
supports three regular expression syntaxes Basic, Extended and Perl-compatible.
grep
Called as Basic when no regular expression type is given, grep
interpreting the search pattern as a basic Basic regular expression. To interpret the pattern as an extended regular expression, use the -E
/ --extended-regexp
option.
In GNU grep
's implementation, there is no functional difference between the basic regular expression and extended regular expression syntax, and the two are identical.
The only difference is that the metacharacters ?
, +
, {
, |
, (
and in basic regular expressions )
are interpreted as literal characters, ie, these characters are not interpreted as regular expressions.
In order to preserve the special meaning of metacharacters when using basic regular expressions, characters must \
be escaped with a backslash. We'll explain what these and other metacharacters mean later.
In general, you should always enclose regular expressions in single quotes to avoid shell interpretation and execution of metacharacters in shell meaning.
character match
grep
The most basic usage of the command is to search a file for a character or string. In addition to searching the contents of files, grep can also search the contents of standard input.
For example, to search for bash
the user using the login shell as the default, you can /etc/passwd
search the file for all lines containing bash
the string.
The following grep
command will search the contents of the file, then print the user that contains the user using bash as the login shell:
grep bash /etc/passwd
The output should look like this:
root:x:0:0:root:/root:/bin/bash
myfreax:x:1000:1000:myfreax:/home/myfreax:/bin/bash
In this example, the string bash
is a basic regular expression consisting of four characters. This tells grep
the search to be followed by the b
, a
, s
, h
string.
By default, grep
commands are case-sensitive. This means that uppercase and lowercase characters are treated as different characters. To ignore case when searching, use the -i
/ --ignore-case
option.
It's worth mentioning that grep
the search pattern is searched/looked up as a string rather than a word. So if you're searching gnu
, grep
the line with gnu embedded in the larger word will also be printed. For example cygnus
or magnum
.
If you search for a fully qualified string or a string containing spaces, you need to enclose it in single or double quotes, this:
grep "Gnome Display Manager" /etc/passwd
line header and line end
^
The caret sign indicates a string match at the beginning of the line. If the regex ^
starts with, the following string grep
will be matched at the beginning of each line .^
The following grep
command will file.txt
search a file for lines starting with a string linux
:
grep '^linux' file.txt
$
The dollar sign matches the end-of-line string. $
After that indicates what you need to search for. grep
The string that will be matched at the end of each line $
.
The following grep
command will file.txt
search a file for lines ending with the string linux:
grep 'linux$' file.txt
In addition to searching for the beginning and end of a line, you can also combine ^关键词$
regular expressions constructed by . Will allow searching for the specified content, not the line that embeds the large string match.
Another useful example is combining ^$
patterns to match all empty lines, i.e. nothing at the beginning or end. This is especially useful when looking for blank lines.
The following grep
command will file.txt
search a file for lines that only contain linux
:
grep '^linux$' file.txt
matches a single character
.
A symbol is a metacharacter that matches any single character.
For example, to include kan followed by anything that has two characters and ends with the string "roo", you can use the following pattern:
grep 'kan..roo' file.txt
Bracket Expression
[]
Bracket expressions allow characters to be matched within brackets []
to match a set of characters. []
That is , any character within the brackets is used to match a line.
For example, the following grep
command will file.txt
search a file for lines containing accept
or accent
:
grep 'acce[np]t' file.txt
If the first character inside square brackets is a symbol ^
, it will match any character not enclosed in square brackets.
The following pattern will match strings containing except, l
representing any character. For example , any combination of strings that do not match the included line.co.a.cocacobaltcola
For example, the following grep
command will file.txt
search a file cola
for lines that are not:
grep 'co[^l]a' file.txt
You can construct a range expression by specifying the first and last characters of a hyphen-delimited range, specifying a sequence of characters within the bracket expression instead of writing all the characters one by one.
For example, [a-e]
equivalent to [abcde]
, [1-3]
equivalent to [123]
. The following expression matches every line starting with an uppercase letter:
grep '^[A-Z]' file.txt
grep
Predefined character categories enclosed in square brackets are also supported. [:alnum:]
Indicates matching a single digit and alphabetic character, the [0-9A-Za-z]
same as with. [:alpha:]
Indicates matching a single alphabetic character, the [A-Za-z]
same as .
[:blank:]
means match a single space and a tab. [:digit:]
Indicates matching a single digit 0 1 2 3 4 5 6 7 8 9
.
[:lower:] means match a single lowercase alphabetic character, [a-z]
same as . [:upper:] means match a single uppercase letter, [A-Z]
same as .
quantifier
Quantifiers allow you to specify the number of times a match must occur, i.e. a matching keyword can be matched more than once. grep
The following are some of the quantifiers supported by GNU .
*
Indicates zero or more matches. ?
means match the previous item zero or one time, +
means match the previous item one or more times. {n} matches the previous item n
, which n
is a number.
{n,}
Match at least n times. {,m}
Matches the previous item at most m times. {n,m} matches the previous item must appear from nm times, if it is {2,4}, that is, 2 to 4 times.
Now that we know about quantifiers in regular expressions, we will use quantifiers as an example next. Searching using quantifiers in grep, and how to avoid shell interpretation of special characters *
, ?
etc.
*
The character matches the preceding character zero or more times. The following grep
command examples will match sright
, right
, ssright
etc.
s*right
The quantifier of a regular expression *
means to match the s character zero or more times, that is, there is no upper limit, it can be many sssss
. 's*right'
Using single quotes around regular expressions is also a way to avoid special characters being interpreted by the shell.
echo right | grep 's*right'
echo ssright | grep 's*right'
The following is a more advanced pattern that matches all lines starting with a capital letter and ending with a period or comma. .*
A regular expression means match any number of any characters.
The following grep
command -E
options indicate the use of extended regular expressions. ^
Indicates the starting position of the line, [A-Z]
indicating A to big Z:
grep -E '^[A-Z].*[.,]$' file.txt
?
Make the previous character optional and only match once. The following grep command will match bright
both and right
.
You will have ?
extra backslashes in front of the characters here. If you are using basic regular expressions you need backslash escape ?
characters to avoid shell interpretation and execution.
grep 'b\?right' file.txt
The following grep -E
is a way to match patterns using extended regular expressions '\b?right'
, so there is no need to escape those characters that have special meanings.
grep -E 'b?right' file.txt
+
The character matches the previous item one or more times. The following will match sright
and ssright
, but not right
.
The following grep command options -E
indicate the use of extended regular expressions, the pattern indicates that one or more characters 's+'
must exist , and there is no upper limit.ss
grep -E 's+right' file.txt
The brace characters {}
allow you to specify an exact number, and the number of matches must be within the specified range. The following grep command will match integers from 3 to 9 digits.
In the following '[[:digit:]]{3,9}'
pattern, [:digit:] represents a number from 0 to 9, [[:digit:]]
which means [0-9], which {3,9}
means matching 3 to 9 times, that is, a row must contain 3 to 9 consecutive digits.
grep -E '[[:digit:]]{3,9}' file.txt
OR operation
The pipe |
or operator lets you specify different possible matches, which can be literal strings or regular expressions. Of all regular expression operators, this operator has the lowest precedence.
In the example below, we search Nginx's error log file for occurrences of words fatal
, error
and lines, which do not need to be escaped critical
if extended regular expressions are used.|
grep 'fatal\|error\|critical' /var/log/nginx/error.log
grep -E 'fatal|error|critical' /var/log/nginx/error.log
grouping
Grouping is a feature of regular expressions that allows you to group patterns and reference them. ()
Groups can be created using parentheses . When using basic regular expressions, \
parentheses must be escaped with a backslash.
Regular expressions can have multiple groups. As a result, groups of matching captures are usually kept in an array whose members are in the same order as the matching groups. This usually just matches the order of the groups themselves.
The matched groups are kept in an array, and a reference to the captured group is required if needed. $1, ..., $9
A reference to a captured group can be used .
The following example matches fear
both and less
. Quantifiers ?
make the (fear)
composition optional.
grep -E '(fear)?less' file.txt
backslash expression
GNU grep
includes several metacharacters consisting of backslashes and regular characters. Here are some of the most common special backslash expressions.
\b
Match word boundaries. <
Matches an empty string at the beginning of a word. >
Match an empty string at the end of a word. \w
matches a word. \s
matches spaces.
abject
The following pattern will match the word sum alone object
. If you embed larger words, these words will not be matched:
grep '\b[ao]bject\b' file.txt
Conclusion
Regular expressions are commonly used in text editors, programming languages, and command-line tools such as grep
, sed
and awk
. Knowing how to construct regular expressions can be very helpful when searching text files, writing scripts, or filtering command output.
If you have any questions or feedback, please feel free to leave a comment.