mirror of
https://github.com/ziishaned/learn-regex.git
synced 2025-08-23 20:06:08 -04:00
Merge pull request #197 from tommcandrew/master
A few grammar and punctuation improvements
This commit is contained in:
commit
a287bb6bd2
284
README.md
284
README.md
@ -31,64 +31,65 @@
|
|||||||
* [Tiếng Việt](translations/README-vn.md)
|
* [Tiếng Việt](translations/README-vn.md)
|
||||||
* [فارسی](translations/README-fa.md)
|
* [فارسی](translations/README-fa.md)
|
||||||
|
|
||||||
## What is Regular Expression?
|
## What are Regular Expressions?
|
||||||
|
|
||||||
> Regular expression is a group of characters or symbols which is used to find a specific pattern from a text.
|
> A regular expression is a group of characters or symbols which is used to find a specific pattern in a text.
|
||||||
|
|
||||||
A regular expression is a pattern that is matched against a subject string from
|
A regular expression is a pattern that is matched against a subject string from
|
||||||
left to right. Regular expression is used for replacing a text within a string,
|
left to right. Regular expressions are used to replace text within a string,
|
||||||
validating form, extract a substring from a string based upon a pattern match,
|
validating forms, extracting a substring from a string based on a pattern match,
|
||||||
and so much more. The word "Regular expression" is a mouthful, so you will usually
|
and so much more. The term "regular expression" is a mouthful, so you will usually
|
||||||
find the term abbreviated as "regex" or "regexp".
|
find the term abbreviated to "regex" or "regexp".
|
||||||
|
|
||||||
Imagine you are writing an application and you want to set the rules for when a
|
Imagine you are writing an application and you want to set the rules for when a
|
||||||
user chooses their username. We want to allow the username to contain letters,
|
user chooses their username. We want to allow the username to contain letters,
|
||||||
numbers, underscores and hyphens. We also want to limit the number of characters
|
numbers, underscores and hyphens. We also want to limit the number of characters
|
||||||
in username so it does not look ugly. We use the following regular expression to
|
in the username so it does not look ugly. We can use the following regular expression to
|
||||||
validate a username:
|
validate the username:
|
||||||
|
|
||||||
<br/><br/>
|
<br/><br/>
|
||||||
<p align="center">
|
<p align="center">
|
||||||
<img src="./img/regexp-en.png" alt="Regular expression">
|
<img src="./img/regexp-en.png" alt="Regular expression">
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
Above regular expression can accept the strings `john_doe`, `jo-hn_doe` and
|
The regular expression above can accept the strings `john_doe`, `jo-hn_doe` and
|
||||||
`john12_as`. It does not match `Jo` because that string contains uppercase
|
`john12_as`. It does not match `Jo` because that string contains an uppercase
|
||||||
letter and also it is too short.
|
letter and also it is too short.
|
||||||
|
|
||||||
## Table of Contents
|
## Table of Contents
|
||||||
|
|
||||||
- [Basic Matchers](#1-basic-matchers)
|
- [Basic Matchers](#1-basic-matchers)
|
||||||
- [Meta character](#2-meta-characters)
|
- [Meta Characters](#2-meta-characters)
|
||||||
- [Full stop](#21-full-stop)
|
- [The Full Stop](#21-the-full-stops)
|
||||||
- [Character set](#22-character-set)
|
- [Character Sets](#22-character-sets)
|
||||||
- [Negated character set](#221-negated-character-set)
|
- [Negated Character Sets](#221-negated-character-sets)
|
||||||
- [Repetitions](#23-repetitions)
|
- [Repetitions](#23-repetitions)
|
||||||
- [The Star](#231-the-star)
|
- [The Star](#231-the-star)
|
||||||
- [The Plus](#232-the-plus)
|
- [The Plus](#232-the-plus)
|
||||||
- [The Question Mark](#233-the-question-mark)
|
- [The Question Mark](#233-the-question-mark)
|
||||||
- [Braces](#24-braces)
|
- [Braces](#24-braces)
|
||||||
- [Character Group](#25-character-group)
|
- [Capturing Groups](#25-capturing-groups)
|
||||||
|
- [Non-Capturing Groups](#251-non-capturing-groups)
|
||||||
- [Alternation](#26-alternation)
|
- [Alternation](#26-alternation)
|
||||||
- [Escaping special character](#27-escaping-special-character)
|
- [Escaping Special Characters](#27-escaping-special-characters)
|
||||||
- [Anchors](#28-anchors)
|
- [Anchors](#28-anchors)
|
||||||
- [Caret](#281-caret)
|
- [The Caret](#281-the-caret)
|
||||||
- [Dollar](#282-dollar)
|
- [The Dollar Sign](#282-the-dollar-sign)
|
||||||
- [Shorthand Character Sets](#3-shorthand-character-sets)
|
- [Shorthand Character Sets](#3-shorthand-character-sets)
|
||||||
- [Lookaround](#4-lookaround)
|
- [Lookarounds](#4-lookarounds)
|
||||||
- [Positive Lookahead](#41-positive-lookahead)
|
- [Positive Lookahead](#41-positive-lookahead)
|
||||||
- [Negative Lookahead](#42-negative-lookahead)
|
- [Negative Lookahead](#42-negative-lookahead)
|
||||||
- [Positive Lookbehind](#43-positive-lookbehind)
|
- [Positive Lookbehind](#43-positive-lookbehind)
|
||||||
- [Negative Lookbehind](#44-negative-lookbehind)
|
- [Negative Lookbehind](#44-negative-lookbehind)
|
||||||
- [Flags](#5-flags)
|
- [Flags](#5-flags)
|
||||||
- [Case Insensitive](#51-case-insensitive)
|
- [Case Insensitive](#51-case-insensitive)
|
||||||
- [Global search](#52-global-search)
|
- [Global Search](#52-global-search)
|
||||||
- [Multiline](#53-multiline)
|
- [Multiline](#53-multiline)
|
||||||
- [Greedy vs lazy matching](#6-greedy-vs-lazy-matching)
|
- [Greedy vs Lazy Matching](#6-greedy-vs-lazy-matching)
|
||||||
|
|
||||||
## 1. Basic Matchers
|
## 1. Basic Matchers
|
||||||
|
|
||||||
A regular expression is just a pattern of characters that we use to perform
|
A regular expression is just a pattern of characters that we use to perform a
|
||||||
search in a text. For example, the regular expression `the` means: the letter
|
search in a text. For example, the regular expression `the` means: the letter
|
||||||
`t`, followed by the letter `h`, followed by the letter `e`.
|
`t`, followed by the letter `h`, followed by the letter `e`.
|
||||||
|
|
||||||
@ -112,7 +113,7 @@ not match the string `the`.
|
|||||||
|
|
||||||
## 2. Meta Characters
|
## 2. Meta Characters
|
||||||
|
|
||||||
Meta characters are the building blocks of the regular expressions. Meta
|
Meta characters are the building blocks of regular expressions. Meta
|
||||||
characters do not stand for themselves but instead are interpreted in some
|
characters do not stand for themselves but instead are interpreted in some
|
||||||
special way. Some meta characters have a special meaning and are written inside
|
special way. Some meta characters have a special meaning and are written inside
|
||||||
square brackets. The meta characters are as follows:
|
square brackets. The meta characters are as follows:
|
||||||
@ -132,9 +133,9 @@ square brackets. The meta characters are as follows:
|
|||||||
|^|Matches the beginning of the input.|
|
|^|Matches the beginning of the input.|
|
||||||
|$|Matches the end of the input.|
|
|$|Matches the end of the input.|
|
||||||
|
|
||||||
## 2.1 Full stop
|
## 2.1 The Full Stop
|
||||||
|
|
||||||
Full stop `.` is the simplest example of meta character. The meta character `.`
|
The full stop `.` is the simplest example of a meta character. The meta character `.`
|
||||||
matches any single character. It will not match return or newline characters.
|
matches any single character. It will not match return or newline characters.
|
||||||
For example, the regular expression `.ar` means: any character, followed by the
|
For example, the regular expression `.ar` means: any character, followed by the
|
||||||
letter `a`, followed by the letter `r`.
|
letter `a`, followed by the letter `r`.
|
||||||
@ -145,11 +146,11 @@ letter `a`, followed by the letter `r`.
|
|||||||
|
|
||||||
[Test the regular expression](https://regex101.com/r/xc9GkU/1)
|
[Test the regular expression](https://regex101.com/r/xc9GkU/1)
|
||||||
|
|
||||||
## 2.2 Character set
|
## 2.2 Character Sets
|
||||||
|
|
||||||
Character sets are also called character class. Square brackets are used to
|
Character sets are also called character classes. Square brackets are used to
|
||||||
specify character sets. Use a hyphen inside a character set to specify the
|
specify character sets. Use a hyphen inside a character set to specify the
|
||||||
characters' range. The order of the character range inside square brackets
|
characters' range. The order of the character range inside the square brackets
|
||||||
doesn't matter. For example, the regular expression `[Tt]he` means: an uppercase
|
doesn't matter. For example, the regular expression `[Tt]he` means: an uppercase
|
||||||
`T` or lowercase `t`, followed by the letter `h`, followed by the letter `e`.
|
`T` or lowercase `t`, followed by the letter `h`, followed by the letter `e`.
|
||||||
|
|
||||||
@ -160,7 +161,7 @@ doesn't matter. For example, the regular expression `[Tt]he` means: an uppercase
|
|||||||
[Test the regular expression](https://regex101.com/r/2ITLQ4/1)
|
[Test the regular expression](https://regex101.com/r/2ITLQ4/1)
|
||||||
|
|
||||||
A period inside a character set, however, means a literal period. The regular
|
A period inside a character set, however, means a literal period. The regular
|
||||||
expression `ar[.]` means: a lowercase character `a`, followed by letter `r`,
|
expression `ar[.]` means: a lowercase character `a`, followed by the letter `r`,
|
||||||
followed by a period `.` character.
|
followed by a period `.` character.
|
||||||
|
|
||||||
<pre>
|
<pre>
|
||||||
@ -169,7 +170,7 @@ followed by a period `.` character.
|
|||||||
|
|
||||||
[Test the regular expression](https://regex101.com/r/wL3xtE/1)
|
[Test the regular expression](https://regex101.com/r/wL3xtE/1)
|
||||||
|
|
||||||
### 2.2.1 Negated character set
|
### 2.2.1 Negated Character Sets
|
||||||
|
|
||||||
In general, the caret symbol represents the start of the string, but when it is
|
In general, the caret symbol represents the start of the string, but when it is
|
||||||
typed after the opening square bracket it negates the character set. For
|
typed after the opening square bracket it negates the character set. For
|
||||||
@ -184,14 +185,14 @@ followed by the character `a`, followed by the letter `r`.
|
|||||||
|
|
||||||
## 2.3 Repetitions
|
## 2.3 Repetitions
|
||||||
|
|
||||||
Following meta characters `+`, `*` or `?` are used to specify how many times a
|
The meta characters `+`, `*` or `?` are used to specify how many times a
|
||||||
subpattern can occur. These meta characters act differently in different
|
subpattern can occur. These meta characters act differently in different
|
||||||
situations.
|
situations.
|
||||||
|
|
||||||
### 2.3.1 The Star
|
### 2.3.1 The Star
|
||||||
|
|
||||||
The symbol `*` matches zero or more repetitions of the preceding matcher. The
|
The `*` symbol matches zero or more repetitions of the preceding matcher. The
|
||||||
regular expression `a*` means: zero or more repetitions of preceding lowercase
|
regular expression `a*` means: zero or more repetitions of the preceding lowercase
|
||||||
character `a`. But if it appears after a character set or class then it finds
|
character `a`. But if it appears after a character set or class then it finds
|
||||||
the repetitions of the whole character set. For example, the regular expression
|
the repetitions of the whole character set. For example, the regular expression
|
||||||
`[a-z]*` means: any number of lowercase letters in a row.
|
`[a-z]*` means: any number of lowercase letters in a row.
|
||||||
@ -205,8 +206,8 @@ the repetitions of the whole character set. For example, the regular expression
|
|||||||
The `*` symbol can be used with the meta character `.` to match any string of
|
The `*` symbol can be used with the meta character `.` to match any string of
|
||||||
characters `.*`. The `*` symbol can be used with the whitespace character `\s`
|
characters `.*`. The `*` symbol can be used with the whitespace character `\s`
|
||||||
to match a string of whitespace characters. For example, the expression
|
to match a string of whitespace characters. For example, the expression
|
||||||
`\s*cat\s*` means: zero or more spaces, followed by lowercase character `c`,
|
`\s*cat\s*` means: zero or more spaces, followed by a lowercase `c`,
|
||||||
followed by lowercase character `a`, followed by lowercase character `t`,
|
followed by a lowercase `a`, followed by a lowercase `t`,
|
||||||
followed by zero or more spaces.
|
followed by zero or more spaces.
|
||||||
|
|
||||||
<pre>
|
<pre>
|
||||||
@ -217,9 +218,9 @@ followed by zero or more spaces.
|
|||||||
|
|
||||||
### 2.3.2 The Plus
|
### 2.3.2 The Plus
|
||||||
|
|
||||||
The symbol `+` matches one or more repetitions of the preceding character. For
|
The `+` symbol matches one or more repetitions of the preceding character. For
|
||||||
example, the regular expression `c.+t` means: lowercase letter `c`, followed by
|
example, the regular expression `c.+t` means: a lowercase `c`, followed by
|
||||||
at least one character, followed by the lowercase character `t`. It needs to be
|
at least one character, followed by a lowercase `t`. It needs to be
|
||||||
clarified that`t` is the last `t` in the sentence.
|
clarified that`t` is the last `t` in the sentence.
|
||||||
|
|
||||||
<pre>
|
<pre>
|
||||||
@ -230,11 +231,10 @@ clarified that `t` is the last `t` in the sentence.
|
|||||||
|
|
||||||
### 2.3.3 The Question Mark
|
### 2.3.3 The Question Mark
|
||||||
|
|
||||||
In regular expression the meta character `?` makes the preceding character
|
In regular expressions, the meta character `?` makes the preceding character
|
||||||
optional. This symbol matches zero or one instance of the preceding character.
|
optional. This symbol matches zero or one instance of the preceding character.
|
||||||
For example, the regular expression `[T]?he` means: Optional the uppercase
|
For example, the regular expression `[T]?he` means: Optional uppercase
|
||||||
letter `T`, followed by the lowercase character `h`, followed by the lowercase
|
`T`, followed by a lowercase `h`, followed bya lowercase `e`.
|
||||||
character `e`.
|
|
||||||
|
|
||||||
<pre>
|
<pre>
|
||||||
"[T]he" => <a href="#learn-regex"><strong>The</strong></a> car is parked in the garage.
|
"[T]he" => <a href="#learn-regex"><strong>The</strong></a> car is parked in the garage.
|
||||||
@ -250,10 +250,10 @@ character `e`.
|
|||||||
|
|
||||||
## 2.4 Braces
|
## 2.4 Braces
|
||||||
|
|
||||||
In regular expression braces that are also called quantifiers are used to
|
In regular expressions, braces (also called quantifiers) are used to
|
||||||
specify the number of times that a character or a group of characters can be
|
specify the number of times that a character or a group of characters can be
|
||||||
repeated. For example, the regular expression `[0-9]{2,3}` means: Match at least
|
repeated. For example, the regular expression `[0-9]{2,3}` means: Match at least
|
||||||
2 digits but not more than 3 (characters in the range of 0 to 9).
|
2 digits, but not more than 3, ranging from 0 to 9.
|
||||||
|
|
||||||
<pre>
|
<pre>
|
||||||
"[0-9]{2,3}" => The number was 9.<a href="#learn-regex"><strong>999</strong></a>7 but we rounded it off to <a href="#learn-regex"><strong>10</strong></a>.0.
|
"[0-9]{2,3}" => The number was 9.<a href="#learn-regex"><strong>999</strong></a>7 but we rounded it off to <a href="#learn-regex"><strong>10</strong></a>.0.
|
||||||
@ -262,7 +262,7 @@ repeated. For example, the regular expression `[0-9]{2,3}` means: Match at least
|
|||||||
[Test the regular expression](https://regex101.com/r/juM86s/1)
|
[Test the regular expression](https://regex101.com/r/juM86s/1)
|
||||||
|
|
||||||
We can leave out the second number. For example, the regular expression
|
We can leave out the second number. For example, the regular expression
|
||||||
`[0-9]{2,}` means: Match 2 or more digits. If we also remove the comma the
|
`[0-9]{2,}` means: Match 2 or more digits. If we also remove the comma, the
|
||||||
regular expression `[0-9]{3}` means: Match exactly 3 digits.
|
regular expression `[0-9]{3}` means: Match exactly 3 digits.
|
||||||
|
|
||||||
<pre>
|
<pre>
|
||||||
@ -277,16 +277,16 @@ regular expression `[0-9]{3}` means: Match exactly 3 digits.
|
|||||||
|
|
||||||
[Test the regular expression](https://regex101.com/r/Sivu30/1)
|
[Test the regular expression](https://regex101.com/r/Sivu30/1)
|
||||||
|
|
||||||
## 2.5 Capturing Group
|
## 2.5 Capturing Groups
|
||||||
|
|
||||||
A capturing group is a group of sub-patterns that is written inside Parentheses
|
A capturing group is a group of sub-patterns that is written inside parentheses
|
||||||
`(...)`. Like as we discussed before that in regular expression if we put a quantifier
|
`(...)`. As discussed before, in regular expressions, if we put a quantifier
|
||||||
after a character then it will repeat the preceding character. But if we put quantifier
|
after a character then it will repeat the preceding character. But if we put a quantifier
|
||||||
after a capturing group then it repeats the whole capturing group. For example,
|
after a capturing group then it repeats the whole capturing group. For example,
|
||||||
the regular expression `(ab)*` matches zero or more repetitions of the character
|
the regular expression `(ab)*` matches zero or more repetitions of the character
|
||||||
"ab". We can also use the alternation `|` meta character inside capturing group.
|
"ab". We can also use the alternation `|` meta character inside a capturing group.
|
||||||
For example, the regular expression `(c|g|p)ar` means: lowercase character `c`,
|
For example, the regular expression `(c|g|p)ar` means: a lowercase `c`,
|
||||||
`g` or `p`, followed by character `a`, followed by character `r`.
|
`g` or `p`, followed by `a`, followed by `r`.
|
||||||
|
|
||||||
<pre>
|
<pre>
|
||||||
"(c|g|p)ar" => The <a href="#learn-regex"><strong>car</strong></a> is <a href="#learn-regex"><strong>par</strong></a>ked in the <a href="#learn-regex"><strong>gar</strong></a>age.
|
"(c|g|p)ar" => The <a href="#learn-regex"><strong>car</strong></a> is <a href="#learn-regex"><strong>par</strong></a>ked in the <a href="#learn-regex"><strong>gar</strong></a>age.
|
||||||
@ -294,15 +294,15 @@ For example, the regular expression `(c|g|p)ar` means: lowercase character `c`,
|
|||||||
|
|
||||||
[Test the regular expression](https://regex101.com/r/tUxrBG/1)
|
[Test the regular expression](https://regex101.com/r/tUxrBG/1)
|
||||||
|
|
||||||
Note that capturing groups do not only match but also capture the characters for use in
|
Note that capturing groups do not only match, but also capture, the characters for use in
|
||||||
the parent language. The parent language could be python or javascript or virtually any
|
the parent language. The parent language could be Python or JavaScript or virtually any
|
||||||
language that implements regular expressions in a function definition.
|
language that implements regular expressions in a function definition.
|
||||||
|
|
||||||
### 2.5.1 Non-capturing group
|
### 2.5.1 Non-Capturing Groups
|
||||||
|
|
||||||
A non-capturing group is a capturing group that only matches the characters, but
|
A non-capturing group is a capturing group that matches the characters but
|
||||||
does not capture the group. A non-capturing group is denoted by a `?` followed by a `:`
|
does not capture the group. A non-capturing group is denoted by a `?` followed by a `:`
|
||||||
within parenthesis `(...)`. For example, the regular expression `(?:c|g|p)ar` is similar to
|
within parentheses `(...)`. For example, the regular expression `(?:c|g|p)ar` is similar to
|
||||||
`(c|g|p)ar` in that it matches the same characters but will not create a capture group.
|
`(c|g|p)ar` in that it matches the same characters but will not create a capture group.
|
||||||
|
|
||||||
<pre>
|
<pre>
|
||||||
@ -319,13 +319,13 @@ See also [4. Lookaround](#4-lookaround).
|
|||||||
|
|
||||||
In a regular expression, the vertical bar `|` is used to define alternation.
|
In a regular expression, the vertical bar `|` is used to define alternation.
|
||||||
Alternation is like an OR statement between multiple expressions. Now, you may be
|
Alternation is like an OR statement between multiple expressions. Now, you may be
|
||||||
thinking that character set and alternation works the same way. But the big
|
thinking that character sets and alternation work the same way. But the big
|
||||||
difference between character set and alternation is that character set works on
|
difference between character sets and alternation is that character sets work at the
|
||||||
character level but alternation works on expression level. For example, the
|
character level but alternation works at the expression level. For example, the
|
||||||
regular expression `(T|t)he|car` means: either (uppercase character `T` or lowercase
|
regular expression `(T|t)he|car` means: either (an uppercase `T` or a lowercase
|
||||||
`t`, followed by lowercase character `h`, followed by lowercase character `e`) OR
|
`t`, followed by a lowercase `h`, followed by a lowercase `e`) OR
|
||||||
(lowercase character `c`, followed by lowercase character `a`, followed by
|
(a lowercase `c`, followed by a lowercase `a`, followed by
|
||||||
lowercase character `r`). Note that I put the parentheses for clarity, to show that either expression
|
a lowercase `r`). Note that I included the parentheses for clarity, to show that either expression
|
||||||
in parentheses can be met and it will match.
|
in parentheses can be met and it will match.
|
||||||
|
|
||||||
<pre>
|
<pre>
|
||||||
@ -334,17 +334,15 @@ in parentheses can be met and it will match.
|
|||||||
|
|
||||||
[Test the regular expression](https://regex101.com/r/fBXyX0/1)
|
[Test the regular expression](https://regex101.com/r/fBXyX0/1)
|
||||||
|
|
||||||
## 2.7 Escaping special character
|
## 2.7 Escaping Special Characters
|
||||||
|
|
||||||
Backslash `\` is used in regular expression to escape the next character. This
|
A backslash `\` is used in regular expressions to escape the next character. This
|
||||||
allows us to specify a symbol as a matching character including reserved
|
allows us to include reserved characters such as `{ } [ ] / \ + * . $ ^ | ?` as matching characters. To use one of these special character as a matching character, prepend it with `\`.
|
||||||
characters `{ } [ ] / \ + * . $ ^ | ?`. To use a special character as a matching
|
|
||||||
character prepend `\` before it.
|
|
||||||
|
|
||||||
For example, the regular expression `.` is used to match any character except
|
For example, the regular expression `.` is used to match any character except a
|
||||||
newline. Now to match `.` in an input string the regular expression
|
newline. Now, to match `.` in an input string, the regular expression
|
||||||
`(f|c|m)at\.?` means: lowercase letter `f`, `c` or `m`, followed by lowercase
|
`(f|c|m)at\.?` means: a lowercase `f`, `c` or `m`, followed by a lowercase
|
||||||
character `a`, followed by lowercase letter `t`, followed by optional `.`
|
`a`, followed by a lowercase `t`, followed by an optional `.`
|
||||||
character.
|
character.
|
||||||
|
|
||||||
<pre>
|
<pre>
|
||||||
@ -357,20 +355,20 @@ character.
|
|||||||
|
|
||||||
In regular expressions, we use anchors to check if the matching symbol is the
|
In regular expressions, we use anchors to check if the matching symbol is the
|
||||||
starting symbol or ending symbol of the input string. Anchors are of two types:
|
starting symbol or ending symbol of the input string. Anchors are of two types:
|
||||||
First type is Caret `^` that check if the matching character is the start
|
The first type is the caret `^` that check if the matching character is the first
|
||||||
character of the input and the second type is Dollar `$` that checks if matching
|
character of the input and the second type is the dollar sign `$` which checks if a matching
|
||||||
character is the last character of the input string.
|
character is the last character of the input string.
|
||||||
|
|
||||||
### 2.8.1 Caret
|
### 2.8.1 The Caret
|
||||||
|
|
||||||
Caret `^` symbol is used to check if matching character is the first character
|
The caret symbol `^` is used to check if a matching character is the first character
|
||||||
of the input string. If we apply the following regular expression `^a` (if a is
|
of the input string. If we apply the following regular expression `^a` (meaning 'a' must be
|
||||||
the starting symbol) to input string `abc` it matches `a`. But if we apply
|
the starting character) to the string `abc`, it will match `a`. But if we apply
|
||||||
regular expression `^b` on above input string it does not match anything.
|
the regular expression `^b` to the above string, it will not match anything.
|
||||||
Because in input string `abc` "b" is not the starting symbol. Let's take a look
|
Because in the string `abc`, the "b" is not the starting character. Let's take a look
|
||||||
at another regular expression `^(T|t)he` which means: uppercase character `T` or
|
at another regular expression `^(T|t)he` which means: an uppercase `T` or
|
||||||
lowercase character `t` is the start symbol of the input string, followed by
|
a lowercase `t` must be the first character in the string, followed by a
|
||||||
lowercase character `h`, followed by lowercase character `e`.
|
lowercase `h`, followed by a lowercase `e`.
|
||||||
|
|
||||||
<pre>
|
<pre>
|
||||||
"(T|t)he" => <a href="#learn-regex"><strong>The</strong></a> car is parked in <a href="#learn-regex"><strong>the</strong></a> garage.
|
"(T|t)he" => <a href="#learn-regex"><strong>The</strong></a> car is parked in <a href="#learn-regex"><strong>the</strong></a> garage.
|
||||||
@ -384,12 +382,12 @@ lowercase character `h`, followed by lowercase character `e`.
|
|||||||
|
|
||||||
[Test the regular expression](https://regex101.com/r/jXrKne/1)
|
[Test the regular expression](https://regex101.com/r/jXrKne/1)
|
||||||
|
|
||||||
### 2.8.2 Dollar
|
### 2.8.2 The Dollar Sign
|
||||||
|
|
||||||
Dollar `$` symbol is used to check if matching character is the last character
|
The dollar sign `$` is used to check if a matching character is the last character
|
||||||
of the input string. For example, regular expression `(at\.)$` means: a
|
in the string. For example, the regular expression `(at\.)$` means: a
|
||||||
lowercase character `a`, followed by lowercase character `t`, followed by a `.`
|
lowercase `a`, followed by a lowercase `t`, followed by a `.`
|
||||||
character and the matcher must be end of the string.
|
character and the matcher must be at the end of the string.
|
||||||
|
|
||||||
<pre>
|
<pre>
|
||||||
"(at\.)" => The fat c<a href="#learn-regex"><strong>at.</strong></a> s<a href="#learn-regex"><strong>at.</strong></a> on the m<a href="#learn-regex"><strong>at.</strong></a>
|
"(at\.)" => The fat c<a href="#learn-regex"><strong>at.</strong></a> s<a href="#learn-regex"><strong>at.</strong></a> on the m<a href="#learn-regex"><strong>at.</strong></a>
|
||||||
@ -405,30 +403,29 @@ character and the matcher must be end of the string.
|
|||||||
|
|
||||||
## 3. Shorthand Character Sets
|
## 3. Shorthand Character Sets
|
||||||
|
|
||||||
Regular expression provides shorthands for the commonly used character sets,
|
There are a number of convenient shorthands for commonly used character sets/
|
||||||
which offer convenient shorthands for commonly used regular expressions. The
|
regular expressions:
|
||||||
shorthand character sets are as follows:
|
|
||||||
|
|
||||||
|Shorthand|Description|
|
|Shorthand|Description|
|
||||||
|:----:|----|
|
|:----:|----|
|
||||||
|.|Any character except new line|
|
|.|Any character except new line|
|
||||||
|\w|Matches alphanumeric characters: `[a-zA-Z0-9_]`|
|
|\w|Matches alphanumeric characters: `[a-zA-Z0-9_]`|
|
||||||
|\W|Matches non-alphanumeric characters: `[^\w]`|
|
|\W|Matches non-alphanumeric characters: `[^\w]`|
|
||||||
|\d|Matches digit: `[0-9]`|
|
|\d|Matches digits: `[0-9]`|
|
||||||
|\D|Matches non-digit: `[^\d]`|
|
|\D|Matches non-digits: `[^\d]`|
|
||||||
|\s|Matches whitespace character: `[\t\n\f\r\p{Z}]`|
|
|\s|Matches whitespace characters: `[\t\n\f\r\p{Z}]`|
|
||||||
|\S|Matches non-whitespace character: `[^\s]`|
|
|\S|Matches non-whitespace characters: `[^\s]`|
|
||||||
|
|
||||||
## 4. Lookaround
|
## 4. Lookarounds
|
||||||
|
|
||||||
Lookbehind and lookahead (also called lookaround) are specific types of
|
Lookbehinds and lookaheads (also called lookarounds) are specific types of
|
||||||
***non-capturing groups*** (used to match the pattern but not included in matching
|
***non-capturing groups*** (used to match a pattern but without including it in the matching
|
||||||
list). Lookarounds are used when we have the condition that this pattern is
|
list). Lookarounds are used when we a pattern must be
|
||||||
preceded or followed by another certain pattern. For example, we want to get all
|
preceded or followed by another pattern. For example, imagine we want to get all
|
||||||
numbers that are preceded by `$` character from the following input string
|
numbers that are preceded by the `$` character from the string
|
||||||
`$4.44 and $10.88`. We will use following regular expression `(?<=\$)[0-9\.]*`
|
`$4.44 and $10.88`. We will use the following regular expression `(?<=\$)[0-9\.]*`
|
||||||
which means: get all the numbers which contain `.` character and are preceded
|
which means: get all the numbers which contain the `.` character and are preceded
|
||||||
by `$` character. Following are the lookarounds that are used in regular
|
by the `$` character. These are the lookarounds that are used in regular
|
||||||
expressions:
|
expressions:
|
||||||
|
|
||||||
|Symbol|Description|
|
|Symbol|Description|
|
||||||
@ -444,12 +441,12 @@ The positive lookahead asserts that the first part of the expression must be
|
|||||||
followed by the lookahead expression. The returned match only contains the text
|
followed by the lookahead expression. The returned match only contains the text
|
||||||
that is matched by the first part of the expression. To define a positive
|
that is matched by the first part of the expression. To define a positive
|
||||||
lookahead, parentheses are used. Within those parentheses, a question mark with
|
lookahead, parentheses are used. Within those parentheses, a question mark with
|
||||||
equal sign is used like this: `(?=...)`. Lookahead expression is written after
|
an equals sign is used like this: `(?=...)`. The lookahead expressions is written after
|
||||||
the equal sign inside parentheses. For example, the regular expression
|
the equals sign inside parentheses. For example, the regular expression
|
||||||
`(T|t)he(?=\sfat)` means: optionally match lowercase letter `t` or uppercase
|
`(T|t)he(?=\sfat)` means: match either a lowercase `t` or an uppercase
|
||||||
letter `T`, followed by letter `h`, followed by letter `e`. In parentheses we
|
`T`, followed by the letter `h`, followed by the letter `e`. In parentheses we
|
||||||
define positive lookahead which tells regular expression engine to match `The`
|
define a positive lookahead which tells the regular expression engine to match `The`
|
||||||
or `the` which are followed by the word `fat`.
|
or `the` only if it's followed by the word `fat`.
|
||||||
|
|
||||||
<pre>
|
<pre>
|
||||||
"(T|t)he(?=\sfat)" => <a href="#learn-regex"><strong>The</strong></a> fat cat sat on the mat.
|
"(T|t)he(?=\sfat)" => <a href="#learn-regex"><strong>The</strong></a> fat cat sat on the mat.
|
||||||
@ -459,13 +456,12 @@ or `the` which are followed by the word `fat`.
|
|||||||
|
|
||||||
### 4.2 Negative Lookahead
|
### 4.2 Negative Lookahead
|
||||||
|
|
||||||
Negative lookahead is used when we need to get all matches from input string
|
Negative lookaheads are used when we need to get all matches from an input string
|
||||||
that are not followed by a pattern. Negative lookahead is defined same as we define
|
that are not followed by a certain pattern. A negative lookahead is written the same way as a
|
||||||
positive lookahead but the only difference is instead of equal `=` character we
|
positive lookahead. The only difference is, instead of an equals sign `=`, we
|
||||||
use negation `!` character i.e. `(?!...)`. Let's take a look at the following
|
use an exclamation mark `!` to indicate negation i.e. `(?!...)`. Let's take a look at the following
|
||||||
regular expression `(T|t)he(?!\sfat)` which means: get all `The` or `the` words
|
regular expression `(T|t)he(?!\sfat)` which means: get all `The` or `the` words
|
||||||
from input string that are not followed by the word `fat` precedes by a space
|
from the input string that are not followed by a space character and the word `fat`.
|
||||||
character.
|
|
||||||
|
|
||||||
<pre>
|
<pre>
|
||||||
"(T|t)he(?!\sfat)" => The fat cat sat on <a href="#learn-regex"><strong>the</strong></a> mat.
|
"(T|t)he(?!\sfat)" => The fat cat sat on <a href="#learn-regex"><strong>the</strong></a> mat.
|
||||||
@ -475,10 +471,10 @@ character.
|
|||||||
|
|
||||||
### 4.3 Positive Lookbehind
|
### 4.3 Positive Lookbehind
|
||||||
|
|
||||||
Positive lookbehind is used to get all the matches that are preceded by a
|
Positive lookbehinds are used to get all the matches that are preceded by a
|
||||||
specific pattern. Positive lookbehind is denoted by `(?<=...)`. For example, the
|
specific pattern. Positive lookbehinds are written `(?<=...)`. For example, the
|
||||||
regular expression `(?<=(T|t)he\s)(fat|mat)` means: get all `fat` or `mat` words
|
regular expression `(?<=(T|t)he\s)(fat|mat)` means: get all `fat` or `mat` words
|
||||||
from input string that are after the word `The` or `the`.
|
from the input string that come after the word `The` or `the`.
|
||||||
|
|
||||||
<pre>
|
<pre>
|
||||||
"(?<=(T|t)he\s)(fat|mat)" => The <a href="#learn-regex"><strong>fat</strong></a> cat sat on the <a href="#learn-regex"><strong>mat</strong></a>.
|
"(?<=(T|t)he\s)(fat|mat)" => The <a href="#learn-regex"><strong>fat</strong></a> cat sat on the <a href="#learn-regex"><strong>mat</strong></a>.
|
||||||
@ -488,9 +484,9 @@ from input string that are after the word `The` or `the`.
|
|||||||
|
|
||||||
### 4.4 Negative Lookbehind
|
### 4.4 Negative Lookbehind
|
||||||
|
|
||||||
Negative lookbehind is used to get all the matches that are not preceded by a
|
Negative lookbehinds are used to get all the matches that are not preceded by a
|
||||||
specific pattern. Negative lookbehind is denoted by `(?<!...)`. For example, the
|
specific pattern. Negative lookbehinds are written `(?<!...)`. For example, the
|
||||||
regular expression `(?<!(T|t)he\s)(cat)` means: get all `cat` words from input
|
regular expression `(?<!(T|t)he\s)(cat)` means: get all `cat` words from the input
|
||||||
string that are not after the word `The` or `the`.
|
string that are not after the word `The` or `the`.
|
||||||
|
|
||||||
<pre>
|
<pre>
|
||||||
@ -507,17 +503,17 @@ integral part of the RegExp.
|
|||||||
|
|
||||||
|Flag|Description|
|
|Flag|Description|
|
||||||
|:----:|----|
|
|:----:|----|
|
||||||
|i|Case insensitive: Sets matching to be case-insensitive.|
|
|i|Case insensitive: Match will be case-insensitive.|
|
||||||
|g|Global Search: Search for a pattern throughout the input string.|
|
|g|Global Search: Match all instances, not just the first.|
|
||||||
|m|Multiline: Anchor meta character works on each line.|
|
|m|Multiline: Anchor meta characters work on each line.|
|
||||||
|
|
||||||
### 5.1 Case Insensitive
|
### 5.1 Case Insensitive
|
||||||
|
|
||||||
The `i` modifier is used to perform case-insensitive matching. For example, the
|
The `i` modifier is used to perform case-insensitive matching. For example, the
|
||||||
regular expression `/The/gi` means: uppercase letter `T`, followed by lowercase
|
regular expression `/The/gi` means: an uppercase `T`, followed by a lowercase
|
||||||
character `h`, followed by character `e`. And at the end of regular expression
|
`h`, followed by an `e`. And at the end of regular expression
|
||||||
the `i` flag tells the regular expression engine to ignore the case. As you can
|
the `i` flag tells the regular expression engine to ignore the case. As you can
|
||||||
see we also provided `g` flag because we want to search for the pattern in the
|
see, we also provided `g` flag because we want to search for the pattern in the
|
||||||
whole input string.
|
whole input string.
|
||||||
|
|
||||||
<pre>
|
<pre>
|
||||||
@ -532,13 +528,13 @@ whole input string.
|
|||||||
|
|
||||||
[Test the regular expression](https://regex101.com/r/ahfiuh/1)
|
[Test the regular expression](https://regex101.com/r/ahfiuh/1)
|
||||||
|
|
||||||
### 5.2 Global search
|
### 5.2 Global Search
|
||||||
|
|
||||||
The `g` modifier is used to perform a global match (find all matches rather than
|
The `g` modifier is used to perform a global match (finds all matches rather than
|
||||||
stopping after the first match). For example, the regular expression`/.(at)/g`
|
stopping after the first match). For example, the regular expression`/.(at)/g`
|
||||||
means: any character except new line, followed by lowercase character `a`,
|
means: any character except a new line, followed by a lowercase `a`,
|
||||||
followed by lowercase character `t`. Because we provided `g` flag at the end of
|
followed by a lowercase `t`. Because we provided the `g` flag at the end of
|
||||||
the regular expression now it will find all matches in the input string, not just the first one (which is the default behavior).
|
the regular expression, it will now find all matches in the input string, not just the first one (which is the default behavior).
|
||||||
|
|
||||||
<pre>
|
<pre>
|
||||||
"/.(at)/" => The <a href="#learn-regex"><strong>fat</strong></a> cat sat on the mat.
|
"/.(at)/" => The <a href="#learn-regex"><strong>fat</strong></a> cat sat on the mat.
|
||||||
@ -554,12 +550,12 @@ the regular expression now it will find all matches in the input string, not jus
|
|||||||
|
|
||||||
### 5.3 Multiline
|
### 5.3 Multiline
|
||||||
|
|
||||||
The `m` modifier is used to perform a multi-line match. As we discussed earlier
|
The `m` modifier is used to perform a multi-line match. As we discussed earlier,
|
||||||
anchors `(^, $)` are used to check if pattern is the beginning of the input or
|
anchors `(^, $)` are used to check if a pattern is at the beginning of the input or
|
||||||
end of the input string. But if we want that anchors works on each line we use
|
the end. But if we want the anchors to work on each line, we use
|
||||||
`m` flag. For example, the regular expression `/at(.)?$/gm` means: lowercase
|
the `m` flag. For example, the regular expression `/at(.)?$/gm` means: a lowercase
|
||||||
character `a`, followed by lowercase character `t`, optionally anything except
|
`a`, followed by a lowercase `t` and, optionally, anything except
|
||||||
new line. And because of `m` flag now regular expression engine matches pattern
|
a new line. And because of the `m` flag, the regular expression engine now matches patterns
|
||||||
at the end of each line in a string.
|
at the end of each line in a string.
|
||||||
|
|
||||||
<pre>
|
<pre>
|
||||||
@ -578,9 +574,9 @@ at the end of each line in a string.
|
|||||||
|
|
||||||
[Test the regular expression](https://regex101.com/r/E88WE2/1)
|
[Test the regular expression](https://regex101.com/r/E88WE2/1)
|
||||||
|
|
||||||
## 6. Greedy vs lazy matching
|
## 6. Greedy vs Lazy Matching
|
||||||
By default regex will do greedy matching which means it will match as long as
|
By default, a regex will perform a greedy match, which means the match will be as long as
|
||||||
possible. We can use `?` to match in lazy way which means as short as possible.
|
possible. We can use `?` to match in a lazy way, which means the match should be as short as possible.
|
||||||
|
|
||||||
<pre>
|
<pre>
|
||||||
"/(.*at)/" => <a href="#learn-regex"><strong>The fat cat sat on the mat</strong></a>. </pre>
|
"/(.*at)/" => <a href="#learn-regex"><strong>The fat cat sat on the mat</strong></a>. </pre>
|
||||||
@ -597,7 +593,7 @@ possible. We can use `?` to match in lazy way which means as short as possible.
|
|||||||
|
|
||||||
## Contribution
|
## Contribution
|
||||||
|
|
||||||
* Open pull request with improvements
|
* Open a pull request with improvements
|
||||||
* Discuss ideas in issues
|
* Discuss ideas in issues
|
||||||
* Spread the word
|
* Spread the word
|
||||||
* Reach out with any feedback [](https://twitter.com/ziishaned)
|
* Reach out with any feedback [](https://twitter.com/ziishaned)
|
||||||
|
Loading…
x
Reference in New Issue
Block a user