mirror of
https://github.com/ziishaned/learn-regex.git
synced 2025-08-04 10:36:33 -04:00
Homogenized images (#69)
* RegExp image for French plus * Some typo corrected * Typographics rules (as space before :) * égual - égal * Images folder added, img src updated * § formatted to 80 lines
This commit is contained in:
parent
4b641cd11d
commit
9968a235b6
@ -21,7 +21,7 @@ Imagina que estas escribiendo una aplicación y quieres agregar reglas para cuan
|
|||||||
|
|
||||||
<br/><br/>
|
<br/><br/>
|
||||||
<p align="center">
|
<p align="center">
|
||||||
<img src="http://imgur.com/EtlKH14.png" alt="Regular expression">
|
<img src="./img/regexp-es.png" alt="Expresión regular">
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
De la expresión regular anterior, se puede aceptar las cadenas 'john_doe', 'jo-hn_doe' y 'john12_as'. La expresión no coincide con el nombre de usuario 'Jo', porque es una cadena de caracteres que contiene letras mayúsculas y es demasiado corta.
|
De la expresión regular anterior, se puede aceptar las cadenas 'john_doe', 'jo-hn_doe' y 'john12_as'. La expresión no coincide con el nombre de usuario 'Jo', porque es una cadena de caracteres que contiene letras mayúsculas y es demasiado corta.
|
||||||
|
@ -24,7 +24,7 @@ le pseudonyme à contenir des lettres, des nombres, des underscores et des trait
|
|||||||
de caractères dans le pseudonyme pour qu'il n'ait pas l'air moche. Nous utilisons l'expression régulière suivante pour valider un pseudonyme:
|
de caractères dans le pseudonyme pour qu'il n'ait pas l'air moche. Nous utilisons l'expression régulière suivante pour valider un pseudonyme:
|
||||||
<br/><br/>
|
<br/><br/>
|
||||||
<p align="center">
|
<p align="center">
|
||||||
<img src="http://i.imgur.com/OGM7KV8.png" alt="Expression régulière">
|
<img src="./img/regexp-fr.png" alt="Expressions régulières">
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
L'expression régulière ci-dessus peut accepter les strings `john_doe`, `jo-hn_doe` et `john12_as`. Ça ne fonctionne pas avec `Jo` car
|
L'expression régulière ci-dessus peut accepter les strings `john_doe`, `jo-hn_doe` et `john12_as`. Ça ne fonctionne pas avec `Jo` car
|
||||||
|
@ -27,7 +27,7 @@
|
|||||||
|
|
||||||
<br/><br/>
|
<br/><br/>
|
||||||
<p align="center">
|
<p align="center">
|
||||||
<img src="https://i.imgur.com/ekFpQUg.png" alt="Regular expression">
|
<img src="./img/regexp-en.png" alt="Regular expression">
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
この正規表現によって `john_doe, jo-hn_doe, john12_as` などは許容されることになります。
|
この正規表現によって `john_doe, jo-hn_doe, john12_as` などは許容されることになります。
|
||||||
|
282
README.md
282
README.md
@ -15,20 +15,26 @@
|
|||||||
|
|
||||||
> Regular expression is a group of characters or symbols which is used to find a specific pattern from a text.
|
> Regular expression is a group of characters or symbols which is used to find a specific pattern from a text.
|
||||||
|
|
||||||
A regular expression is a pattern that is matched against a subject string from left to right. The word "Regular expression" is a
|
A regular expression is a pattern that is matched against a subject string from
|
||||||
mouthful, you will usually find the term abbreviated as "regex" or "regexp". Regular expression is used for replacing a text within
|
left to right. The word "Regular expression" is a mouthful, you will usually
|
||||||
a string, validating form, extract a substring from a string based upon a pattern match, and so much more.
|
find the term abbreviated as "regex" or "regexp". Regular expression is used for
|
||||||
|
replacing a text within a string, validating form, extract a substring from a
|
||||||
|
string based upon a pattern match, and so much more.
|
||||||
|
|
||||||
|
Imagine you are writing an application and you want to set the rules for when a
|
||||||
|
user chooses their username. We want to allow the username to contain letters,
|
||||||
|
numbers, underscores and hyphens. We also want to limit the number of characters
|
||||||
|
in username so it does not look ugly. We use the following regular expression to
|
||||||
|
validate a username:
|
||||||
|
|
||||||
Imagine you are writing an application and you want to set the rules for when a user chooses their username. We want to
|
|
||||||
allow the username to contain letters, numbers, underscores and hyphens. We also want to limit the number of
|
|
||||||
characters in username so it does not look ugly. We use the following regular expression to validate a username:
|
|
||||||
<br/><br/>
|
<br/><br/>
|
||||||
<p align="center">
|
<p align="center">
|
||||||
<img src="https://i.imgur.com/ekFpQUg.png" alt="Regular expression">
|
<img src="./img/regexp-en.png" alt="Regular expression">
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
Above regular expression can accept the strings `john_doe`, `jo-hn_doe` and `john12_as`. It does not match `Jo` because that string
|
Above regular expression can accept the strings `john_doe`, `jo-hn_doe` and
|
||||||
contains uppercase letter and also it is too short.
|
`john12_as`. It does not match `Jo` because that string contains uppercase
|
||||||
|
letter and also it is too short.
|
||||||
|
|
||||||
## Table of Contents
|
## Table of Contents
|
||||||
|
|
||||||
@ -61,8 +67,9 @@ contains uppercase letter and also it is too short.
|
|||||||
|
|
||||||
## 1. Basic Matchers
|
## 1. Basic Matchers
|
||||||
|
|
||||||
A regular expression is just a pattern of characters that we use to perform search in a text. For example, the regular expression
|
A regular expression is just a pattern of characters that we use to perform
|
||||||
`the` means: the letter `t`, followed by the letter `h`, followed by the letter `e`.
|
search in a text. For example, the regular expression `the` means: the letter
|
||||||
|
`t`, followed by the letter `h`, followed by the letter `e`.
|
||||||
|
|
||||||
<pre>
|
<pre>
|
||||||
"the" => The fat cat sat on <a href="#learn-regex"><strong>the</strong></a> mat.
|
"the" => The fat cat sat on <a href="#learn-regex"><strong>the</strong></a> mat.
|
||||||
@ -70,9 +77,11 @@ A regular expression is just a pattern of characters that we use to perform sear
|
|||||||
|
|
||||||
[Test the regular expression](https://regex101.com/r/dmRygT/1)
|
[Test the regular expression](https://regex101.com/r/dmRygT/1)
|
||||||
|
|
||||||
The regular expression `123` matches the string `123`. The regular expression is matched against an input string by comparing each
|
The regular expression `123` matches the string `123`. The regular expression is
|
||||||
character in the regular expression to each character in the input string, one after another. Regular expressions are normally
|
matched against an input string by comparing each character in the regular
|
||||||
case-sensitive so the regular expression `The` would not match the string `the`.
|
expression to each character in the input string, one after another. Regular
|
||||||
|
expressions are normally case-sensitive so the regular expression `The` would
|
||||||
|
not match the string `the`.
|
||||||
|
|
||||||
<pre>
|
<pre>
|
||||||
"The" => <a href="#learn-regex"><strong>The</strong></a> fat cat sat on the mat.
|
"The" => <a href="#learn-regex"><strong>The</strong></a> fat cat sat on the mat.
|
||||||
@ -82,9 +91,10 @@ case-sensitive so the regular expression `The` would not match the string `the`.
|
|||||||
|
|
||||||
## 2. Meta Characters
|
## 2. Meta Characters
|
||||||
|
|
||||||
Meta characters are the building blocks of the regular expressions. Meta characters do not stand for themselves but instead are
|
Meta characters are the building blocks of the regular expressions. Meta
|
||||||
interpreted in some special way. Some meta characters have a special meaning and are written inside square brackets.
|
characters do not stand for themselves but instead are interpreted in some
|
||||||
The meta characters are as follows:
|
special way. Some meta characters have a special meaning and are written inside
|
||||||
|
square brackets. The meta characters are as follows:
|
||||||
|
|
||||||
|Meta character|Description|
|
|Meta character|Description|
|
||||||
|:----:|----|
|
|:----:|----|
|
||||||
@ -103,9 +113,10 @@ The meta characters are as follows:
|
|||||||
|
|
||||||
## 2.1 Full stop
|
## 2.1 Full stop
|
||||||
|
|
||||||
Full stop `.` is the simplest example of meta character. The meta character `.` matches any single character. It will not match return
|
Full stop `.` is the simplest example of meta character. The meta character `.`
|
||||||
or newline characters. For example, the regular expression `.ar` means: any character, followed by the letter `a`, followed by the
|
matches any single character. It will not match return or newline characters.
|
||||||
letter `r`.
|
For example, the regular expression `.ar` means: any character, followed by the
|
||||||
|
letter `a`, followed by the letter `r`.
|
||||||
|
|
||||||
<pre>
|
<pre>
|
||||||
".ar" => The <a href="#learn-regex"><strong>car</strong></a> <a href="#learn-regex"><strong>par</strong></a>ked in the <a href="#learn-regex"><strong>gar</strong></a>age.
|
".ar" => The <a href="#learn-regex"><strong>car</strong></a> <a href="#learn-regex"><strong>par</strong></a>ked in the <a href="#learn-regex"><strong>gar</strong></a>age.
|
||||||
@ -115,9 +126,11 @@ letter `r`.
|
|||||||
|
|
||||||
## 2.2 Character set
|
## 2.2 Character set
|
||||||
|
|
||||||
Character sets are also called character class. Square brackets are used to specify character sets. Use a hyphen inside a character set to
|
Character sets are also called character class. Square brackets are used to
|
||||||
specify the characters' range. The order of the character range inside square brackets doesn't matter. For example, the regular
|
specify character sets. Use a hyphen inside a character set to specify the
|
||||||
expression `[Tt]he` means: an uppercase `T` or lowercase `t`, followed by the letter `h`, followed by the letter `e`.
|
characters' range. The order of the character range inside square brackets
|
||||||
|
doesn't matter. For example, the regular expression `[Tt]he` means: an uppercase
|
||||||
|
`T` or lowercase `t`, followed by the letter `h`, followed by the letter `e`.
|
||||||
|
|
||||||
<pre>
|
<pre>
|
||||||
"[Tt]he" => <a href="#learn-regex"><strong>The</strong></a> car parked in <a href="#learn-regex"><strong>the</strong></a> garage.
|
"[Tt]he" => <a href="#learn-regex"><strong>The</strong></a> car parked in <a href="#learn-regex"><strong>the</strong></a> garage.
|
||||||
@ -125,7 +138,9 @@ expression `[Tt]he` means: an uppercase `T` or lowercase `t`, followed by the le
|
|||||||
|
|
||||||
[Test the regular expression](https://regex101.com/r/2ITLQ4/1)
|
[Test the regular expression](https://regex101.com/r/2ITLQ4/1)
|
||||||
|
|
||||||
A period inside a character set, however, means a literal period. The regular expression `ar[.]` means: a lowercase character `a`, followed by letter `r`, followed by a period `.` character.
|
A period inside a character set, however, means a literal period. The regular
|
||||||
|
expression `ar[.]` means: a lowercase character `a`, followed by letter `r`,
|
||||||
|
followed by a period `.` character.
|
||||||
|
|
||||||
<pre>
|
<pre>
|
||||||
"ar[.]" => A garage is a good place to park a c<a href="#learn-regex"><strong>ar.</strong></a>
|
"ar[.]" => A garage is a good place to park a c<a href="#learn-regex"><strong>ar.</strong></a>
|
||||||
@ -135,9 +150,10 @@ A period inside a character set, however, means a literal period. The regular ex
|
|||||||
|
|
||||||
### 2.2.1 Negated character set
|
### 2.2.1 Negated character set
|
||||||
|
|
||||||
In general, the caret symbol represents the start of the string, but when it is typed after the opening square bracket it negates the
|
In general, the caret symbol represents the start of the string, but when it is
|
||||||
character set. For example, the regular expression `[^c]ar` means: any character except `c`, followed by the character `a`, followed by
|
typed after the opening square bracket it negates the character set. For
|
||||||
the letter `r`.
|
example, the regular expression `[^c]ar` means: any character except `c`,
|
||||||
|
followed by the character `a`, followed by the letter `r`.
|
||||||
|
|
||||||
<pre>
|
<pre>
|
||||||
"[^c]ar" => The car <a href="#learn-regex"><strong>par</strong></a>ked in the <a href="#learn-regex"><strong>gar</strong></a>age.
|
"[^c]ar" => The car <a href="#learn-regex"><strong>par</strong></a>ked in the <a href="#learn-regex"><strong>gar</strong></a>age.
|
||||||
@ -147,14 +163,17 @@ the letter `r`.
|
|||||||
|
|
||||||
## 2.3 Repetitions
|
## 2.3 Repetitions
|
||||||
|
|
||||||
Following meta characters `+`, `*` or `?` are used to specify how many times a subpattern can occur. These meta characters act
|
Following meta characters `+`, `*` or `?` are used to specify how many times a
|
||||||
differently in different situations.
|
subpattern can occur. These meta characters act differently in different
|
||||||
|
situations.
|
||||||
|
|
||||||
### 2.3.1 The Star
|
### 2.3.1 The Star
|
||||||
|
|
||||||
The symbol `*` matches zero or more repetitions of the preceding matcher. The regular expression `a*` means: zero or more repetitions
|
The symbol `*` matches zero or more repetitions of the preceding matcher. The
|
||||||
of preceding lowercase character `a`. But if it appears after a character set or class then it finds the repetitions of the whole
|
regular expression `a*` means: zero or more repetitions of preceding lowercase
|
||||||
character set. For example, the regular expression `[a-z]*` means: any number of lowercase letters in a row.
|
character `a`. But if it appears after a character set or class then it finds
|
||||||
|
the repetitions of the whole character set. For example, the regular expression
|
||||||
|
`[a-z]*` means: any number of lowercase letters in a row.
|
||||||
|
|
||||||
<pre>
|
<pre>
|
||||||
"[a-z]*" => T<a href="#learn-regex"><strong>he</strong></a> <a href="#learn-regex"><strong>car</strong></a> <a href="#learn-regex"><strong>parked</strong></a> <a href="#learn-regex"><strong>in</strong></a> <a href="#learn-regex"><strong>the</strong></a> <a href="#learn-regex"><strong>garage</strong></a> #21.
|
"[a-z]*" => T<a href="#learn-regex"><strong>he</strong></a> <a href="#learn-regex"><strong>car</strong></a> <a href="#learn-regex"><strong>parked</strong></a> <a href="#learn-regex"><strong>in</strong></a> <a href="#learn-regex"><strong>the</strong></a> <a href="#learn-regex"><strong>garage</strong></a> #21.
|
||||||
@ -162,10 +181,12 @@ character set. For example, the regular expression `[a-z]*` means: any number of
|
|||||||
|
|
||||||
[Test the regular expression](https://regex101.com/r/7m8me5/1)
|
[Test the regular expression](https://regex101.com/r/7m8me5/1)
|
||||||
|
|
||||||
The `*` symbol can be used with the meta character `.` to match any string of characters `.*`. The `*` symbol can be used with the
|
The `*` symbol can be used with the meta character `.` to match any string of
|
||||||
whitespace character `\s` to match a string of whitespace characters. For example, the expression `\s*cat\s*` means: zero or more
|
characters `.*`. The `*` symbol can be used with the whitespace character `\s`
|
||||||
spaces, followed by lowercase character `c`, followed by lowercase character `a`, followed by lowercase character `t`, followed by
|
to match a string of whitespace characters. For example, the expression
|
||||||
zero or more spaces.
|
`\s*cat\s*` means: zero or more spaces, followed by lowercase character `c`,
|
||||||
|
followed by lowercase character `a`, followed by lowercase character `t`,
|
||||||
|
followed by zero or more spaces.
|
||||||
|
|
||||||
<pre>
|
<pre>
|
||||||
"\s*cat\s*" => The fat<a href="#learn-regex"><strong> cat </strong></a>sat on the <a href="#learn-regex">con<strong>cat</strong>enation</a>.
|
"\s*cat\s*" => The fat<a href="#learn-regex"><strong> cat </strong></a>sat on the <a href="#learn-regex">con<strong>cat</strong>enation</a>.
|
||||||
@ -175,8 +196,10 @@ zero or more spaces.
|
|||||||
|
|
||||||
### 2.3.2 The Plus
|
### 2.3.2 The Plus
|
||||||
|
|
||||||
The symbol `+` matches one or more repetitions of the preceding character. For example, the regular expression `c.+t` means: lowercase
|
The symbol `+` matches one or more repetitions of the preceding character. For
|
||||||
letter `c`, followed by at least one character, followed by the lowercase character `t`. It needs to be clarified that `t` is the last `t` in the sentence.
|
example, the regular expression `c.+t` means: lowercase letter `c`, followed by
|
||||||
|
at least one character, followed by the lowercase character `t`. It needs to be
|
||||||
|
clarified that `t` is the last `t` in the sentence.
|
||||||
|
|
||||||
<pre>
|
<pre>
|
||||||
"c.+t" => The fat <a href="#learn-regex"><strong>cat sat on the mat</strong></a>.
|
"c.+t" => The fat <a href="#learn-regex"><strong>cat sat on the mat</strong></a>.
|
||||||
@ -186,9 +209,11 @@ letter `c`, followed by at least one character, followed by the lowercase charac
|
|||||||
|
|
||||||
### 2.3.3 The Question Mark
|
### 2.3.3 The Question Mark
|
||||||
|
|
||||||
In regular expression the meta character `?` makes the preceding character optional. This symbol matches zero or one instance of
|
In regular expression the meta character `?` makes the preceding character
|
||||||
the preceding character. For example, the regular expression `[T]?he` means: Optional the uppercase letter `T`, followed by the lowercase
|
optional. This symbol matches zero or one instance of the preceding character.
|
||||||
character `h`, followed by the lowercase character `e`.
|
For example, the regular expression `[T]?he` means: Optional the uppercase
|
||||||
|
letter `T`, followed by the lowercase character `h`, followed by the lowercase
|
||||||
|
character `e`.
|
||||||
|
|
||||||
<pre>
|
<pre>
|
||||||
"[T]he" => <a href="#learn-regex"><strong>The</strong></a> car is parked in the garage.
|
"[T]he" => <a href="#learn-regex"><strong>The</strong></a> car is parked in the garage.
|
||||||
@ -204,9 +229,10 @@ character `h`, followed by the lowercase character `e`.
|
|||||||
|
|
||||||
## 2.4 Braces
|
## 2.4 Braces
|
||||||
|
|
||||||
In regular expression braces that are also called quantifiers are used to specify the number of times that a
|
In regular expression braces that are also called quantifiers are used to
|
||||||
character or a group of characters can be repeated. For example, the regular expression `[0-9]{2,3}` means: Match at least 2 digits but not more than 3 (
|
specify the number of times that a character or a group of characters can be
|
||||||
characters in the range of 0 to 9).
|
repeated. For example, the regular expression `[0-9]{2,3}` means: Match at least
|
||||||
|
2 digits but not more than 3 ( characters in the range of 0 to 9).
|
||||||
|
|
||||||
<pre>
|
<pre>
|
||||||
"[0-9]{2,3}" => The number was 9.<a href="#learn-regex"><strong>999</strong></a>7 but we rounded it off to <a href="#learn-regex"><strong>10</strong></a>.0.
|
"[0-9]{2,3}" => The number was 9.<a href="#learn-regex"><strong>999</strong></a>7 but we rounded it off to <a href="#learn-regex"><strong>10</strong></a>.0.
|
||||||
@ -214,8 +240,9 @@ characters in the range of 0 to 9).
|
|||||||
|
|
||||||
[Test the regular expression](https://regex101.com/r/juM86s/1)
|
[Test the regular expression](https://regex101.com/r/juM86s/1)
|
||||||
|
|
||||||
We can leave out the second number. For example, the regular expression `[0-9]{2,}` means: Match 2 or more digits. If we also remove
|
We can leave out the second number. For example, the regular expression
|
||||||
the comma the regular expression `[0-9]{3}` means: Match exactly 3 digits.
|
`[0-9]{2,}` means: Match 2 or more digits. If we also remove the comma the
|
||||||
|
regular expression `[0-9]{3}` means: Match exactly 3 digits.
|
||||||
|
|
||||||
<pre>
|
<pre>
|
||||||
"[0-9]{2,}" => The number was 9.<a href="#learn-regex"><strong>9997</strong></a> but we rounded it off to <a href="#learn-regex"><strong>10</strong></a>.0.
|
"[0-9]{2,}" => The number was 9.<a href="#learn-regex"><strong>9997</strong></a> but we rounded it off to <a href="#learn-regex"><strong>10</strong></a>.0.
|
||||||
@ -231,10 +258,13 @@ the comma the regular expression `[0-9]{3}` means: Match exactly 3 digits.
|
|||||||
|
|
||||||
## 2.5 Character Group
|
## 2.5 Character Group
|
||||||
|
|
||||||
Character group is a group of sub-patterns that is written inside Parentheses `(...)`. As we discussed before that in regular expression
|
Character group is a group of sub-patterns that is written inside Parentheses `(...)`.
|
||||||
if we put a quantifier after a character then it will repeat the preceding character. But if we put quantifier after a character group then
|
As we discussed before that in regular expression if we put a quantifier after a
|
||||||
it repeats the whole character group. For example, the regular expression `(ab)*` matches zero or more repetitions of the character "ab".
|
character then it will repeat the preceding character. But if we put quantifier
|
||||||
We can also use the alternation `|` meta character inside character group. For example, the regular expression `(c|g|p)ar` means: lowercase character `c`,
|
after a character group then it repeats the whole character group. For example,
|
||||||
|
the regular expression `(ab)*` matches zero or more repetitions of the character
|
||||||
|
"ab". We can also use the alternation `|` meta character inside character group.
|
||||||
|
For example, the regular expression `(c|g|p)ar` means: lowercase character `c`,
|
||||||
`g` or `p`, followed by character `a`, followed by character `r`.
|
`g` or `p`, followed by character `a`, followed by character `r`.
|
||||||
|
|
||||||
<pre>
|
<pre>
|
||||||
@ -245,11 +275,15 @@ We can also use the alternation `|` meta character inside character group. For e
|
|||||||
|
|
||||||
## 2.6 Alternation
|
## 2.6 Alternation
|
||||||
|
|
||||||
In regular expression Vertical bar `|` is used to define alternation. Alternation is like a condition between multiple expressions. Now,
|
In regular expression Vertical bar `|` is used to define alternation.
|
||||||
you may be thinking that character set and alternation works the same way. But the big difference between character set and alternation
|
Alternation is like a condition between multiple expressions. Now, you may be
|
||||||
is that character set works on character level but alternation works on expression level. For example, the regular expression
|
thinking that character set and alternation works the same way. But the big
|
||||||
`(T|t)he|car` means: uppercase character `T` or lowercase `t`, followed by lowercase character `h`, followed by lowercase character `e`
|
difference between character set and alternation is that character set works on
|
||||||
or lowercase character `c`, followed by lowercase character `a`, followed by lowercase character `r`.
|
character level but alternation works on expression level. For example, the
|
||||||
|
regular expression `(T|t)he|car` means: uppercase character `T` or lowercase
|
||||||
|
`t`, followed by lowercase character `h`, followed by lowercase character `e` or
|
||||||
|
lowercase character `c`, followed by lowercase character `a`, followed by
|
||||||
|
lowercase character `r`.
|
||||||
|
|
||||||
<pre>
|
<pre>
|
||||||
"(T|t)he|car" => <a href="#learn-regex"><strong>The</strong></a> <a href="#learn-regex"><strong>car</strong></a> is parked in <a href="#learn-regex"><strong>the</strong></a> garage.
|
"(T|t)he|car" => <a href="#learn-regex"><strong>The</strong></a> <a href="#learn-regex"><strong>car</strong></a> is parked in <a href="#learn-regex"><strong>the</strong></a> garage.
|
||||||
@ -259,12 +293,16 @@ or lowercase character `c`, followed by lowercase character `a`, followed by low
|
|||||||
|
|
||||||
## 2.7 Escaping special character
|
## 2.7 Escaping special character
|
||||||
|
|
||||||
Backslash `\` is used in regular expression to escape the next character. This allows us to specify a symbol as a matching character
|
Backslash `\` is used in regular expression to escape the next character. This
|
||||||
including reserved characters `{ } [ ] / \ + * . $ ^ | ?`. To use a special character as a matching character prepend `\` before it.
|
allows us to specify a symbol as a matching character including reserved
|
||||||
|
characters `{ } [ ] / \ + * . $ ^ | ?`. To use a special character as a matching
|
||||||
|
character prepend `\` before it.
|
||||||
|
|
||||||
For example, the regular expression `.` is used to match any character except newline. Now to match `.` in an input string the regular
|
For example, the regular expression `.` is used to match any character except
|
||||||
expression `(f|c|m)at\.?` means: lowercase letter `f`, `c` or `m`, followed by lowercase character `a`, followed by lowercase letter
|
newline. Now to match `.` in an input string the regular expression
|
||||||
`t`, followed by optional `.` character.
|
`(f|c|m)at\.?` means: lowercase letter `f`, `c` or `m`, followed by lowercase
|
||||||
|
character `a`, followed by lowercase letter `t`, followed by optional `.`
|
||||||
|
character.
|
||||||
|
|
||||||
<pre>
|
<pre>
|
||||||
"(f|c|m)at\.?" => The <a href="#learn-regex"><strong>fat</strong></a> <a href="#learn-regex"><strong>cat</strong></a> sat on the <a href="#learn-regex"><strong>mat.</strong></a>
|
"(f|c|m)at\.?" => The <a href="#learn-regex"><strong>fat</strong></a> <a href="#learn-regex"><strong>cat</strong></a> sat on the <a href="#learn-regex"><strong>mat.</strong></a>
|
||||||
@ -274,18 +312,22 @@ expression `(f|c|m)at\.?` means: lowercase letter `f`, `c` or `m`, followed by l
|
|||||||
|
|
||||||
## 2.8 Anchors
|
## 2.8 Anchors
|
||||||
|
|
||||||
In regular expressions, we use anchors to check if the matching symbol is the starting symbol or ending symbol of the
|
In regular expressions, we use anchors to check if the matching symbol is the
|
||||||
input string. Anchors are of two types: First type is Caret `^` that check if the matching character is the start
|
starting symbol or ending symbol of the input string. Anchors are of two types:
|
||||||
character of the input and the second type is Dollar `$` that checks if matching character is the last character of the
|
First type is Caret `^` that check if the matching character is the start
|
||||||
input string.
|
character of the input and the second type is Dollar `$` that checks if matching
|
||||||
|
character is the last character of the input string.
|
||||||
|
|
||||||
### 2.8.1 Caret
|
### 2.8.1 Caret
|
||||||
|
|
||||||
Caret `^` symbol is used to check if matching character is the first character of the input string. If we apply the following regular
|
Caret `^` symbol is used to check if matching character is the first character
|
||||||
expression `^a` (if a is the starting symbol) to input string `abc` it matches `a`. But if we apply regular expression `^b` on above
|
of the input string. If we apply the following regular expression `^a` (if a is
|
||||||
input string it does not match anything. Because in input string `abc` "b" is not the starting symbol. Let's take a look at another
|
the starting symbol) to input string `abc` it matches `a`. But if we apply
|
||||||
regular expression `^(T|t)he` which means: uppercase character `T` or lowercase character `t` is the start symbol of the input string,
|
regular expression `^b` on above input string it does not match anything.
|
||||||
followed by lowercase character `h`, followed by lowercase character `e`.
|
Because in input string `abc` "b" is not the starting symbol. Let's take a look
|
||||||
|
at another regular expression `^(T|t)he` which means: uppercase character `T` or
|
||||||
|
lowercase character `t` is the start symbol of the input string, followed by
|
||||||
|
lowercase character `h`, followed by lowercase character `e`.
|
||||||
|
|
||||||
<pre>
|
<pre>
|
||||||
"(T|t)he" => <a href="#learn-regex"><strong>The</strong></a> car is parked in <a href="#learn-regex"><strong>the</strong></a> garage.
|
"(T|t)he" => <a href="#learn-regex"><strong>The</strong></a> car is parked in <a href="#learn-regex"><strong>the</strong></a> garage.
|
||||||
@ -301,9 +343,10 @@ followed by lowercase character `h`, followed by lowercase character `e`.
|
|||||||
|
|
||||||
### 2.8.2 Dollar
|
### 2.8.2 Dollar
|
||||||
|
|
||||||
Dollar `$` symbol is used to check if matching character is the last character of the input string. For example, regular expression
|
Dollar `$` symbol is used to check if matching character is the last character
|
||||||
`(at\.)$` means: a lowercase character `a`, followed by lowercase character `t`, followed by a `.` character and the matcher
|
of the input string. For example, regular expression `(at\.)$` means: a
|
||||||
must be end of the string.
|
lowercase character `a`, followed by lowercase character `t`, followed by a `.`
|
||||||
|
character and the matcher must be end of the string.
|
||||||
|
|
||||||
<pre>
|
<pre>
|
||||||
"(at\.)" => The fat c<a href="#learn-regex"><strong>at.</strong></a> s<a href="#learn-regex"><strong>at.</strong></a> on the m<a href="#learn-regex"><strong>at.</strong></a>
|
"(at\.)" => The fat c<a href="#learn-regex"><strong>at.</strong></a> s<a href="#learn-regex"><strong>at.</strong></a> on the m<a href="#learn-regex"><strong>at.</strong></a>
|
||||||
@ -319,8 +362,9 @@ must be end of the string.
|
|||||||
|
|
||||||
## 3. Shorthand Character Sets
|
## 3. Shorthand Character Sets
|
||||||
|
|
||||||
Regular expression provides shorthands for the commonly used character sets, which offer convenient shorthands for commonly used
|
Regular expression provides shorthands for the commonly used character sets,
|
||||||
regular expressions. The shorthand character sets are as follows:
|
which offer convenient shorthands for commonly used regular expressions. The
|
||||||
|
shorthand character sets are as follows:
|
||||||
|
|
||||||
|Shorthand|Description|
|
|Shorthand|Description|
|
||||||
|:----:|----|
|
|:----:|----|
|
||||||
@ -334,11 +378,15 @@ regular expressions. The shorthand character sets are as follows:
|
|||||||
|
|
||||||
## 4. Lookaround
|
## 4. Lookaround
|
||||||
|
|
||||||
Lookbehind and lookahead sometimes known as lookaround are specific type of ***non-capturing group*** (Use to match the pattern but not
|
Lookbehind and lookahead sometimes known as lookaround are specific type of
|
||||||
included in matching list). Lookaheads are used when we have the condition that this pattern is preceded or followed by another certain
|
***non-capturing group*** (Use to match the pattern but not included in matching
|
||||||
pattern. For example, we want to get all numbers that are preceded by `$` character from the following input string `$4.44 and $10.88`.
|
list). Lookaheads are used when we have the condition that this pattern is
|
||||||
We will use following regular expression `(?<=\$)[0-9\.]*` which means: get all the numbers which contain `.` character and are preceded
|
preceded or followed by another certain pattern. For example, we want to get all
|
||||||
by `$` character. Following are the lookarounds that are used in regular expressions:
|
numbers that are preceded by `$` character from the following input string
|
||||||
|
`$4.44 and $10.88`. We will use following regular expression `(?<=\$)[0-9\.]*`
|
||||||
|
which means: get all the numbers which contain `.` character and are preceded
|
||||||
|
by `$` character. Following are the lookarounds that are used in regular
|
||||||
|
expressions:
|
||||||
|
|
||||||
|Symbol|Description|
|
|Symbol|Description|
|
||||||
|:----:|----|
|
|:----:|----|
|
||||||
@ -349,12 +397,16 @@ by `$` character. Following are the lookarounds that are used in regular express
|
|||||||
|
|
||||||
### 4.1 Positive Lookahead
|
### 4.1 Positive Lookahead
|
||||||
|
|
||||||
The positive lookahead asserts that the first part of the expression must be followed by the lookahead expression. The returned match
|
The positive lookahead asserts that the first part of the expression must be
|
||||||
only contains the text that is matched by the first part of the expression. To define a positive lookahead, parentheses are used. Within
|
followed by the lookahead expression. The returned match only contains the text
|
||||||
those parentheses, a question mark with equal sign is used like this: `(?=...)`. Lookahead expression is written after the equal sign inside
|
that is matched by the first part of the expression. To define a positive
|
||||||
parentheses. For example, the regular expression `[T|t]he(?=\sfat)` means: optionally match lowercase letter `t` or uppercase letter `T`,
|
lookahead, parentheses are used. Within those parentheses, a question mark with
|
||||||
followed by letter `h`, followed by letter `e`. In parentheses we define positive lookahead which tells regular expression engine to match
|
equal sign is used like this: `(?=...)`. Lookahead expression is written after
|
||||||
`The` or `the` which are followed by the word `fat`.
|
the equal sign inside parentheses. For example, the regular expression
|
||||||
|
`[T|t]he(?=\sfat)` means: optionally match lowercase letter `t` or uppercase
|
||||||
|
letter `T`, followed by letter `h`, followed by letter `e`. In parentheses we
|
||||||
|
define positive lookahead which tells regular expression engine to match `The`
|
||||||
|
or `the` which are followed by the word `fat`.
|
||||||
|
|
||||||
<pre>
|
<pre>
|
||||||
"[T|t]he(?=\sfat)" => <a href="#learn-regex"><strong>The</strong></a> fat cat sat on the mat.
|
"[T|t]he(?=\sfat)" => <a href="#learn-regex"><strong>The</strong></a> fat cat sat on the mat.
|
||||||
@ -364,10 +416,13 @@ followed by letter `h`, followed by letter `e`. In parentheses we define positiv
|
|||||||
|
|
||||||
### 4.2 Negative Lookahead
|
### 4.2 Negative Lookahead
|
||||||
|
|
||||||
Negative lookahead is used when we need to get all matches from input string that are not followed by a pattern. Negative lookahead
|
Negative lookahead is used when we need to get all matches from input string
|
||||||
defined same as we define positive lookahead but the only difference is instead of equal `=` character we use negation `!` character
|
that are not followed by a pattern. Negative lookahead defined same as we define
|
||||||
i.e. `(?!...)`. Let's take a look at the following regular expression `[T|t]he(?!\sfat)` which means: get all `The` or `the` words from
|
positive lookahead but the only difference is instead of equal `=` character we
|
||||||
input string that are not followed by the word `fat` precedes by a space character.
|
use negation `!` character i.e. `(?!...)`. Let's take a look at the following
|
||||||
|
regular expression `[T|t]he(?!\sfat)` which means: get all `The` or `the` words
|
||||||
|
from input string that are not followed by the word `fat` precedes by a space
|
||||||
|
character.
|
||||||
|
|
||||||
<pre>
|
<pre>
|
||||||
"[T|t]he(?!\sfat)" => The fat cat sat on <a href="#learn-regex"><strong>the</strong></a> mat.
|
"[T|t]he(?!\sfat)" => The fat cat sat on <a href="#learn-regex"><strong>the</strong></a> mat.
|
||||||
@ -377,9 +432,10 @@ input string that are not followed by the word `fat` precedes by a space charact
|
|||||||
|
|
||||||
### 4.3 Positive Lookbehind
|
### 4.3 Positive Lookbehind
|
||||||
|
|
||||||
Positive lookbehind is used to get all the matches that are preceded by a specific pattern. Positive lookbehind is denoted by
|
Positive lookbehind is used to get all the matches that are preceded by a
|
||||||
`(?<=...)`. For example, the regular expression `(?<=[T|t]he\s)(fat|mat)` means: get all `fat` or `mat` words from input string that
|
specific pattern. Positive lookbehind is denoted by `(?<=...)`. For example, the
|
||||||
are after the word `The` or `the`.
|
regular expression `(?<=[T|t]he\s)(fat|mat)` means: get all `fat` or `mat` words
|
||||||
|
from input string that are after the word `The` or `the`.
|
||||||
|
|
||||||
<pre>
|
<pre>
|
||||||
"(?<=[T|t]he\s)(fat|mat)" => The <a href="#learn-regex"><strong>fat</strong></a> cat sat on the <a href="#learn-regex"><strong>mat</strong></a>.
|
"(?<=[T|t]he\s)(fat|mat)" => The <a href="#learn-regex"><strong>fat</strong></a> cat sat on the <a href="#learn-regex"><strong>mat</strong></a>.
|
||||||
@ -389,9 +445,10 @@ are after the word `The` or `the`.
|
|||||||
|
|
||||||
### 4.4 Negative Lookbehind
|
### 4.4 Negative Lookbehind
|
||||||
|
|
||||||
Negative lookbehind is used to get all the matches that are not preceded by a specific pattern. Negative lookbehind is denoted by
|
Negative lookbehind is used to get all the matches that are not preceded by a
|
||||||
`(?<!...)`. For example, the regular expression `(?<!(T|t)he\s)(cat)` means: get all `cat` words from input string that
|
specific pattern. Negative lookbehind is denoted by `(?<!...)`. For example, the
|
||||||
are not after the word `The` or `the`.
|
regular expression `(?<!(T|t)he\s)(cat)` means: get all `cat` words from input
|
||||||
|
string that are not after the word `The` or `the`.
|
||||||
|
|
||||||
<pre>
|
<pre>
|
||||||
"(?<![T|t]he\s)(cat)" => The cat sat on <a href="#learn-regex"><strong>cat</strong></a>.
|
"(?<![T|t]he\s)(cat)" => The cat sat on <a href="#learn-regex"><strong>cat</strong></a>.
|
||||||
@ -401,8 +458,9 @@ are not after the word `The` or `the`.
|
|||||||
|
|
||||||
## 5. Flags
|
## 5. Flags
|
||||||
|
|
||||||
Flags are also called modifiers because they modify the output of a regular expression. These flags can be used in any order or
|
Flags are also called modifiers because they modify the output of a regular
|
||||||
combination, and are an integral part of the RegExp.
|
expression. These flags can be used in any order or combination, and are an
|
||||||
|
integral part of the RegExp.
|
||||||
|
|
||||||
|Flag|Description|
|
|Flag|Description|
|
||||||
|:----:|----|
|
|:----:|----|
|
||||||
@ -412,10 +470,12 @@ combination, and are an integral part of the RegExp.
|
|||||||
|
|
||||||
### 5.1 Case Insensitive
|
### 5.1 Case Insensitive
|
||||||
|
|
||||||
The `i` modifier is used to perform case-insensitive matching. For example, the regular expression `/The/gi` means: uppercase letter
|
The `i` modifier is used to perform case-insensitive matching. For example, the
|
||||||
`T`, followed by lowercase character `h`, followed by character `e`. And at the end of regular expression the `i` flag tells the
|
regular expression `/The/gi` means: uppercase letter `T`, followed by lowercase
|
||||||
regular expression engine to ignore the case. As you can see we also provided `g` flag because we want to search for the pattern in
|
character `h`, followed by character `e`. And at the end of regular expression
|
||||||
the whole input string.
|
the `i` flag tells the regular expression engine to ignore the case. As you can
|
||||||
|
see we also provided `g` flag because we want to search for the pattern in the
|
||||||
|
whole input string.
|
||||||
|
|
||||||
<pre>
|
<pre>
|
||||||
"The" => <a href="#learn-regex"><strong>The</strong></a> fat cat sat on the mat.
|
"The" => <a href="#learn-regex"><strong>The</strong></a> fat cat sat on the mat.
|
||||||
@ -431,10 +491,11 @@ the whole input string.
|
|||||||
|
|
||||||
### 5.2 Global search
|
### 5.2 Global search
|
||||||
|
|
||||||
The `g` modifier is used to perform a global match (find all matches rather than stopping after the first match). For example, the
|
The `g` modifier is used to perform a global match (find all matches rather than
|
||||||
regular expression`/.(at)/g` means: any character except new line, followed by lowercase character `a`, followed by lowercase
|
stopping after the first match). For example, the regular expression`/.(at)/g`
|
||||||
character `t`. Because we provided `g` flag at the end of the regular expression now it will find every matches from whole input
|
means: any character except new line, followed by lowercase character `a`,
|
||||||
string.
|
followed by lowercase character `t`. Because we provided `g` flag at the end of
|
||||||
|
the regular expression now it will find every matches from whole input string.
|
||||||
|
|
||||||
<pre>
|
<pre>
|
||||||
"/.(at)/" => The <a href="#learn-regex"><strong>fat</strong></a> cat sat on the mat.
|
"/.(at)/" => The <a href="#learn-regex"><strong>fat</strong></a> cat sat on the mat.
|
||||||
@ -450,10 +511,13 @@ string.
|
|||||||
|
|
||||||
### 5.3 Multiline
|
### 5.3 Multiline
|
||||||
|
|
||||||
The `m` modifier is used to perform a multi-line match. As we discussed earlier anchors `(^, $)` are used to check if pattern is
|
The `m` modifier is used to perform a multi-line match. As we discussed earlier
|
||||||
the beginning of the input or end of the input string. But if we want that anchors works on each line we use `m` flag. For example, the
|
anchors `(^, $)` are used to check if pattern is the beginning of the input or
|
||||||
regular expression `/at(.)?$/gm` means: lowercase character `a`, followed by lowercase character `t`, optionally anything except new
|
end of the input string. But if we want that anchors works on each line we use
|
||||||
line. And because of `m` flag now regular expression engine matches pattern at the end of each line in a string.
|
`m` flag. For example, the regular expression `/at(.)?$/gm` means: lowercase
|
||||||
|
character `a`, followed by lowercase character `t`, optionally anything except
|
||||||
|
new line. And because of `m` flag now regular expression engine matches pattern
|
||||||
|
at the end of each line in a string.
|
||||||
|
|
||||||
<pre>
|
<pre>
|
||||||
"/.at(.)?$/" => The fat
|
"/.at(.)?$/" => The fat
|
||||||
|
BIN
img/img_original.png
Normal file
BIN
img/img_original.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 6.5 KiB |
BIN
img/regexp-en.png
Normal file
BIN
img/regexp-en.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 31 KiB |
BIN
img/regexp-es.png
Normal file
BIN
img/regexp-es.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 33 KiB |
BIN
img/regexp-fr.png
Normal file
BIN
img/regexp-fr.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 32 KiB |
397
img/regexp.svg
Normal file
397
img/regexp.svg
Normal file
File diff suppressed because one or more lines are too long
After Width: | Height: | Size: 35 KiB |
Loading…
x
Reference in New Issue
Block a user