mirror of
https://github.com/ziishaned/learn-regex.git
synced 2025-12-03 16:20:56 -05:00
commit
ed4ce1d6a4
285
README.md
285
README.md
@ -30,65 +30,67 @@
|
||||
* [Polish](translations/README-pl.md)
|
||||
* [Русский](translations/README-ru.md)
|
||||
* [Tiếng Việt](translations/README-vn.md)
|
||||
* [فارسی](translations/README-fa.md)
|
||||
|
||||
## What is Regular Expression?
|
||||
|
||||
> Regular expression is a group of characters or symbols which is used to find a specific pattern from a text.
|
||||
> A regular expression is a group of characters or symbols which is used to find a specific pattern in a text.
|
||||
|
||||
A regular expression is a pattern that is matched against a subject string from
|
||||
left to right. Regular expression is used for replacing a text within a string,
|
||||
validating form, extract a substring from a string based upon a pattern match,
|
||||
and so much more. The word "Regular expression" is a mouthful, so you will usually
|
||||
find the term abbreviated as "regex" or "regexp".
|
||||
left to right. Regular expressions are used to replace text within a string,
|
||||
validating forms, extracting a substring from a string based on a pattern match,
|
||||
and so much more. The term "regular expression" is a mouthful, so you will usually
|
||||
find the term abbreviated to "regex" or "regexp".
|
||||
|
||||
Imagine you are writing an application and you want to set the rules for when a
|
||||
user chooses their username. We want to allow the username to contain letters,
|
||||
numbers, underscores and hyphens. We also want to limit the number of characters
|
||||
in username so it does not look ugly. We use the following regular expression to
|
||||
validate a username:
|
||||
in the username so it does not look ugly. We can use the following regular expression to
|
||||
validate the username:
|
||||
|
||||
<br/><br/>
|
||||
<p align="center">
|
||||
<img src="./img/regexp-en.png" alt="Regular expression">
|
||||
</p>
|
||||
|
||||
Above regular expression can accept the strings `john_doe`, `jo-hn_doe` and
|
||||
`john12_as`. It does not match `Jo` because that string contains uppercase
|
||||
The regular expression above can accept the strings `john_doe`, `jo-hn_doe` and
|
||||
`john12_as`. It does not match `Jo` because that string contains an uppercase
|
||||
letter and also it is too short.
|
||||
|
||||
## Table of Contents
|
||||
|
||||
- [Basic Matchers](#1-basic-matchers)
|
||||
- [Meta character](#2-meta-characters)
|
||||
- [Full stop](#21-full-stop)
|
||||
- [Character set](#22-character-set)
|
||||
- [Negated character set](#221-negated-character-set)
|
||||
- [Meta Characters](#2-meta-characters)
|
||||
- [The Full Stop](#21-the-full-stops)
|
||||
- [Character Sets](#22-character-sets)
|
||||
- [Negated Character Sets](#221-negated-character-sets)
|
||||
- [Repetitions](#23-repetitions)
|
||||
- [The Star](#231-the-star)
|
||||
- [The Plus](#232-the-plus)
|
||||
- [The Question Mark](#233-the-question-mark)
|
||||
- [Braces](#24-braces)
|
||||
- [Character Group](#25-character-group)
|
||||
- [Capturing Groups](#25-capturing-groups)
|
||||
- [Non-Capturing Groups](#251-non-capturing-groups)
|
||||
- [Alternation](#26-alternation)
|
||||
- [Escaping special character](#27-escaping-special-character)
|
||||
- [Escaping Special Characters](#27-escaping-special-characters)
|
||||
- [Anchors](#28-anchors)
|
||||
- [Caret](#281-caret)
|
||||
- [Dollar](#282-dollar)
|
||||
- [The Caret](#281-the-caret)
|
||||
- [The Dollar Sign](#282-the-dollar-sign)
|
||||
- [Shorthand Character Sets](#3-shorthand-character-sets)
|
||||
- [Lookaround](#4-lookaround)
|
||||
- [Lookarounds](#4-lookarounds)
|
||||
- [Positive Lookahead](#41-positive-lookahead)
|
||||
- [Negative Lookahead](#42-negative-lookahead)
|
||||
- [Positive Lookbehind](#43-positive-lookbehind)
|
||||
- [Negative Lookbehind](#44-negative-lookbehind)
|
||||
- [Flags](#5-flags)
|
||||
- [Case Insensitive](#51-case-insensitive)
|
||||
- [Global search](#52-global-search)
|
||||
- [Global Search](#52-global-search)
|
||||
- [Multiline](#53-multiline)
|
||||
- [Greedy vs lazy matching](#6-greedy-vs-lazy-matching)
|
||||
- [Greedy vs Lazy Matching](#6-greedy-vs-lazy-matching)
|
||||
|
||||
## 1. Basic Matchers
|
||||
|
||||
A regular expression is just a pattern of characters that we use to perform
|
||||
A regular expression is just a pattern of characters that we use to perform a
|
||||
search in a text. For example, the regular expression `the` means: the letter
|
||||
`t`, followed by the letter `h`, followed by the letter `e`.
|
||||
|
||||
@ -112,7 +114,7 @@ not match the string `the`.
|
||||
|
||||
## 2. Meta Characters
|
||||
|
||||
Meta characters are the building blocks of the regular expressions. Meta
|
||||
Meta characters are the building blocks of regular expressions. Meta
|
||||
characters do not stand for themselves but instead are interpreted in some
|
||||
special way. Some meta characters have a special meaning and are written inside
|
||||
square brackets. The meta characters are as follows:
|
||||
@ -132,9 +134,9 @@ square brackets. The meta characters are as follows:
|
||||
|^|Matches the beginning of the input.|
|
||||
|$|Matches the end of the input.|
|
||||
|
||||
## 2.1 Full stop
|
||||
## 2.1 The Full Stop
|
||||
|
||||
Full stop `.` is the simplest example of meta character. The meta character `.`
|
||||
The full stop `.` is the simplest example of a meta character. The meta character `.`
|
||||
matches any single character. It will not match return or newline characters.
|
||||
For example, the regular expression `.ar` means: any character, followed by the
|
||||
letter `a`, followed by the letter `r`.
|
||||
@ -145,11 +147,11 @@ letter `a`, followed by the letter `r`.
|
||||
|
||||
[Test the regular expression](https://regex101.com/r/xc9GkU/1)
|
||||
|
||||
## 2.2 Character set
|
||||
## 2.2 Character Sets
|
||||
|
||||
Character sets are also called character class. Square brackets are used to
|
||||
Character sets are also called character classes. Square brackets are used to
|
||||
specify character sets. Use a hyphen inside a character set to specify the
|
||||
characters' range. The order of the character range inside square brackets
|
||||
characters' range. The order of the character range inside the square brackets
|
||||
doesn't matter. For example, the regular expression `[Tt]he` means: an uppercase
|
||||
`T` or lowercase `t`, followed by the letter `h`, followed by the letter `e`.
|
||||
|
||||
@ -160,7 +162,7 @@ doesn't matter. For example, the regular expression `[Tt]he` means: an uppercase
|
||||
[Test the regular expression](https://regex101.com/r/2ITLQ4/1)
|
||||
|
||||
A period inside a character set, however, means a literal period. The regular
|
||||
expression `ar[.]` means: a lowercase character `a`, followed by letter `r`,
|
||||
expression `ar[.]` means: a lowercase character `a`, followed by the letter `r`,
|
||||
followed by a period `.` character.
|
||||
|
||||
<pre>
|
||||
@ -169,7 +171,7 @@ followed by a period `.` character.
|
||||
|
||||
[Test the regular expression](https://regex101.com/r/wL3xtE/1)
|
||||
|
||||
### 2.2.1 Negated character set
|
||||
### 2.2.1 Negated Character Sets
|
||||
|
||||
In general, the caret symbol represents the start of the string, but when it is
|
||||
typed after the opening square bracket it negates the character set. For
|
||||
@ -184,14 +186,14 @@ followed by the character `a`, followed by the letter `r`.
|
||||
|
||||
## 2.3 Repetitions
|
||||
|
||||
Following meta characters `+`, `*` or `?` are used to specify how many times a
|
||||
The meta characters `+`, `*` or `?` are used to specify how many times a
|
||||
subpattern can occur. These meta characters act differently in different
|
||||
situations.
|
||||
|
||||
### 2.3.1 The Star
|
||||
|
||||
The symbol `*` matches zero or more repetitions of the preceding matcher. The
|
||||
regular expression `a*` means: zero or more repetitions of preceding lowercase
|
||||
The `*` symbol matches zero or more repetitions of the preceding matcher. The
|
||||
regular expression `a*` means: zero or more repetitions of the preceding lowercase
|
||||
character `a`. But if it appears after a character set or class then it finds
|
||||
the repetitions of the whole character set. For example, the regular expression
|
||||
`[a-z]*` means: any number of lowercase letters in a row.
|
||||
@ -205,8 +207,8 @@ the repetitions of the whole character set. For example, the regular expression
|
||||
The `*` symbol can be used with the meta character `.` to match any string of
|
||||
characters `.*`. The `*` symbol can be used with the whitespace character `\s`
|
||||
to match a string of whitespace characters. For example, the expression
|
||||
`\s*cat\s*` means: zero or more spaces, followed by lowercase character `c`,
|
||||
followed by lowercase character `a`, followed by lowercase character `t`,
|
||||
`\s*cat\s*` means: zero or more spaces, followed by a lowercase `c`,
|
||||
followed by a lowercase `a`, followed by a lowercase `t`,
|
||||
followed by zero or more spaces.
|
||||
|
||||
<pre>
|
||||
@ -217,10 +219,10 @@ followed by zero or more spaces.
|
||||
|
||||
### 2.3.2 The Plus
|
||||
|
||||
The symbol `+` matches one or more repetitions of the preceding character. For
|
||||
example, the regular expression `c.+t` means: lowercase letter `c`, followed by
|
||||
at least one character, followed by the lowercase character `t`. It needs to be
|
||||
clarified that `t` is the last `t` in the sentence.
|
||||
The `+` symbol matches one or more repetitions of the preceding character. For
|
||||
example, the regular expression `c.+t` means: a lowercase `c`, followed by
|
||||
at least one character, followed by a lowercase `t`. It needs to be
|
||||
clarified that`t` is the last `t` in the sentence.
|
||||
|
||||
<pre>
|
||||
"c.+t" => The fat <a href="#learn-regex"><strong>cat sat on the mat</strong></a>.
|
||||
@ -230,11 +232,10 @@ clarified that `t` is the last `t` in the sentence.
|
||||
|
||||
### 2.3.3 The Question Mark
|
||||
|
||||
In regular expression the meta character `?` makes the preceding character
|
||||
In regular expressions, the meta character `?` makes the preceding character
|
||||
optional. This symbol matches zero or one instance of the preceding character.
|
||||
For example, the regular expression `[T]?he` means: Optional the uppercase
|
||||
letter `T`, followed by the lowercase character `h`, followed by the lowercase
|
||||
character `e`.
|
||||
For example, the regular expression `[T]?he` means: Optional uppercase
|
||||
`T`, followed by a lowercase `h`, followed bya lowercase `e`.
|
||||
|
||||
<pre>
|
||||
"[T]he" => <a href="#learn-regex"><strong>The</strong></a> car is parked in the garage.
|
||||
@ -250,10 +251,10 @@ character `e`.
|
||||
|
||||
## 2.4 Braces
|
||||
|
||||
In regular expression braces that are also called quantifiers are used to
|
||||
In regular expressions, braces (also called quantifiers) are used to
|
||||
specify the number of times that a character or a group of characters can be
|
||||
repeated. For example, the regular expression `[0-9]{2,3}` means: Match at least
|
||||
2 digits but not more than 3 ( characters in the range of 0 to 9).
|
||||
2 digits, but not more than 3, ranging from 0 to 9.
|
||||
|
||||
<pre>
|
||||
"[0-9]{2,3}" => The number was 9.<a href="#learn-regex"><strong>999</strong></a>7 but we rounded it off to <a href="#learn-regex"><strong>10</strong></a>.0.
|
||||
@ -262,7 +263,7 @@ repeated. For example, the regular expression `[0-9]{2,3}` means: Match at least
|
||||
[Test the regular expression](https://regex101.com/r/juM86s/1)
|
||||
|
||||
We can leave out the second number. For example, the regular expression
|
||||
`[0-9]{2,}` means: Match 2 or more digits. If we also remove the comma the
|
||||
`[0-9]{2,}` means: Match 2 or more digits. If we also remove the comma, the
|
||||
regular expression `[0-9]{3}` means: Match exactly 3 digits.
|
||||
|
||||
<pre>
|
||||
@ -277,16 +278,16 @@ regular expression `[0-9]{3}` means: Match exactly 3 digits.
|
||||
|
||||
[Test the regular expression](https://regex101.com/r/Sivu30/1)
|
||||
|
||||
## 2.5 Capturing Group
|
||||
## 2.5 Capturing Groups
|
||||
|
||||
A capturing group is a group of sub-patterns that is written inside Parentheses
|
||||
`(...)`. Like as we discussed before that in regular expression if we put a quantifier
|
||||
after a character then it will repeat the preceding character. But if we put quantifier
|
||||
A capturing group is a group of subpatterns that is written inside parentheses
|
||||
`(...)`. As discussed before, in regular expressions, if we put a quantifier
|
||||
after a character then it will repeat the preceding character. But if we put a quantifier
|
||||
after a capturing group then it repeats the whole capturing group. For example,
|
||||
the regular expression `(ab)*` matches zero or more repetitions of the character
|
||||
"ab". We can also use the alternation `|` meta character inside capturing group.
|
||||
For example, the regular expression `(c|g|p)ar` means: lowercase character `c`,
|
||||
`g` or `p`, followed by character `a`, followed by character `r`.
|
||||
"ab". We can also use the alternation `|` meta character inside a capturing group.
|
||||
For example, the regular expression `(c|g|p)ar` means: a lowercase `c`,
|
||||
`g` or `p`, followed by `a`, followed by `r`.
|
||||
|
||||
<pre>
|
||||
"(c|g|p)ar" => The <a href="#learn-regex"><strong>car</strong></a> is <a href="#learn-regex"><strong>par</strong></a>ked in the <a href="#learn-regex"><strong>gar</strong></a>age.
|
||||
@ -294,15 +295,15 @@ For example, the regular expression `(c|g|p)ar` means: lowercase character `c`,
|
||||
|
||||
[Test the regular expression](https://regex101.com/r/tUxrBG/1)
|
||||
|
||||
Note that capturing groups do not only match but also capture the characters for use in
|
||||
the parent language. The parent language could be python or javascript or virtually any
|
||||
Note that capturing groups do not only match, but also capture, the characters for use in
|
||||
the parent language. The parent language could be Python or JavaScript or virtually any
|
||||
language that implements regular expressions in a function definition.
|
||||
|
||||
### 2.5.1 Non-capturing group
|
||||
### 2.5.1 Non-Capturing Groups
|
||||
|
||||
A non-capturing group is a capturing group that only matches the characters, but
|
||||
A non-capturing group is a capturing group that matches the characters but
|
||||
does not capture the group. A non-capturing group is denoted by a `?` followed by a `:`
|
||||
within parenthesis `(...)`. For example, the regular expression `(?:c|g|p)ar` is similar to
|
||||
within parentheses `(...)`. For example, the regular expression `(?:c|g|p)ar` is similar to
|
||||
`(c|g|p)ar` in that it matches the same characters but will not create a capture group.
|
||||
|
||||
<pre>
|
||||
@ -319,13 +320,13 @@ See also [4. Lookaround](#4-lookaround).
|
||||
|
||||
In a regular expression, the vertical bar `|` is used to define alternation.
|
||||
Alternation is like an OR statement between multiple expressions. Now, you may be
|
||||
thinking that character set and alternation works the same way. But the big
|
||||
difference between character set and alternation is that character set works on
|
||||
character level but alternation works on expression level. For example, the
|
||||
regular expression `(T|t)he|car` means: either (uppercase character `T` or lowercase
|
||||
`t`, followed by lowercase character `h`, followed by lowercase character `e`) OR
|
||||
(lowercase character `c`, followed by lowercase character `a`, followed by
|
||||
lowercase character `r`). Note that I put the parentheses for clarity, to show that either expression
|
||||
thinking that character sets and alternation work the same way. But the big
|
||||
difference between character sets and alternation is that character sets work at the
|
||||
character level but alternation works at the expression level. For example, the
|
||||
regular expression `(T|t)he|car` means: either (an uppercase `T` or a lowercase
|
||||
`t`, followed by a lowercase `h`, followed by a lowercase `e`) OR
|
||||
(a lowercase `c`, followed by a lowercase `a`, followed by
|
||||
a lowercase `r`). Note that I included the parentheses for clarity, to show that either expression
|
||||
in parentheses can be met and it will match.
|
||||
|
||||
<pre>
|
||||
@ -334,17 +335,15 @@ in parentheses can be met and it will match.
|
||||
|
||||
[Test the regular expression](https://regex101.com/r/fBXyX0/1)
|
||||
|
||||
## 2.7 Escaping special character
|
||||
## 2.7 Escaping Special Characters
|
||||
|
||||
Backslash `\` is used in regular expression to escape the next character. This
|
||||
allows us to specify a symbol as a matching character including reserved
|
||||
characters `{ } [ ] / \ + * . $ ^ | ?`. To use a special character as a matching
|
||||
character prepend `\` before it.
|
||||
A backslash `\` is used in regular expressions to escape the next character. This
|
||||
allows us to include reserved characters such as `{ } [ ] / \ + * . $ ^ | ?` as matching characters. To use one of these special character as a matching character, prepend it with `\`.
|
||||
|
||||
For example, the regular expression `.` is used to match any character except
|
||||
newline. Now to match `.` in an input string the regular expression
|
||||
`(f|c|m)at\.?` means: lowercase letter `f`, `c` or `m`, followed by lowercase
|
||||
character `a`, followed by lowercase letter `t`, followed by optional `.`
|
||||
For example, the regular expression `.` is used to match any character except a
|
||||
newline. Now, to match `.` in an input string, the regular expression
|
||||
`(f|c|m)at\.?` means: a lowercase `f`, `c` or `m`, followed by a lowercase
|
||||
`a`, followed by a lowercase `t`, followed by an optional `.`
|
||||
character.
|
||||
|
||||
<pre>
|
||||
@ -357,20 +356,20 @@ character.
|
||||
|
||||
In regular expressions, we use anchors to check if the matching symbol is the
|
||||
starting symbol or ending symbol of the input string. Anchors are of two types:
|
||||
First type is Caret `^` that check if the matching character is the start
|
||||
character of the input and the second type is Dollar `$` that checks if matching
|
||||
The first type is the caret `^` that checks if the matching character is the first
|
||||
character of the input and the second type is the dollar sign `$` which checks if a matching
|
||||
character is the last character of the input string.
|
||||
|
||||
### 2.8.1 Caret
|
||||
### 2.8.1 The Caret
|
||||
|
||||
Caret `^` symbol is used to check if matching character is the first character
|
||||
of the input string. If we apply the following regular expression `^a` (if a is
|
||||
the starting symbol) to input string `abc` it matches `a`. But if we apply
|
||||
regular expression `^b` on above input string it does not match anything.
|
||||
Because in input string `abc` "b" is not the starting symbol. Let's take a look
|
||||
at another regular expression `^(T|t)he` which means: uppercase character `T` or
|
||||
lowercase character `t` is the start symbol of the input string, followed by
|
||||
lowercase character `h`, followed by lowercase character `e`.
|
||||
The caret symbol `^` is used to check if a matching character is the first character
|
||||
of the input string. If we apply the following regular expression `^a` (meaning 'a' must be
|
||||
the starting character) to the string `abc`, it will match `a`. But if we apply
|
||||
the regular expression `^b` to the above string, it will not match anything.
|
||||
Because in the string `abc`, the "b" is not the starting character. Let's take a look
|
||||
at another regular expression `^(T|t)he` which means: an uppercase `T` or
|
||||
a lowercase `t` must be the first character in the string, followed by a
|
||||
lowercase `h`, followed by a lowercase `e`.
|
||||
|
||||
<pre>
|
||||
"(T|t)he" => <a href="#learn-regex"><strong>The</strong></a> car is parked in <a href="#learn-regex"><strong>the</strong></a> garage.
|
||||
@ -384,12 +383,12 @@ lowercase character `h`, followed by lowercase character `e`.
|
||||
|
||||
[Test the regular expression](https://regex101.com/r/jXrKne/1)
|
||||
|
||||
### 2.8.2 Dollar
|
||||
### 2.8.2 The Dollar Sign
|
||||
|
||||
Dollar `$` symbol is used to check if matching character is the last character
|
||||
of the input string. For example, regular expression `(at\.)$` means: a
|
||||
lowercase character `a`, followed by lowercase character `t`, followed by a `.`
|
||||
character and the matcher must be end of the string.
|
||||
The dollar sign `$` is used to check if a matching character is the last character
|
||||
in the string. For example, the regular expression `(at\.)$` means: a
|
||||
lowercase `a`, followed by a lowercase `t`, followed by a `.`
|
||||
character and the matcher must be at the end of the string.
|
||||
|
||||
<pre>
|
||||
"(at\.)" => The fat c<a href="#learn-regex"><strong>at.</strong></a> s<a href="#learn-regex"><strong>at.</strong></a> on the m<a href="#learn-regex"><strong>at.</strong></a>
|
||||
@ -405,30 +404,29 @@ character and the matcher must be end of the string.
|
||||
|
||||
## 3. Shorthand Character Sets
|
||||
|
||||
Regular expression provides shorthands for the commonly used character sets,
|
||||
which offer convenient shorthands for commonly used regular expressions. The
|
||||
shorthand character sets are as follows:
|
||||
There are a number of convenient shorthands for commonly used character sets/
|
||||
regular expressions:
|
||||
|
||||
|Shorthand|Description|
|
||||
|:----:|----|
|
||||
|.|Any character except new line|
|
||||
|\w|Matches alphanumeric characters: `[a-zA-Z0-9_]`|
|
||||
|\W|Matches non-alphanumeric characters: `[^\w]`|
|
||||
|\d|Matches digit: `[0-9]`|
|
||||
|\D|Matches non-digit: `[^\d]`|
|
||||
|\s|Matches whitespace character: `[\t\n\f\r\p{Z}]`|
|
||||
|\S|Matches non-whitespace character: `[^\s]`|
|
||||
|\d|Matches digits: `[0-9]`|
|
||||
|\D|Matches non-digits: `[^\d]`|
|
||||
|\s|Matches whitespace characters: `[\t\n\f\r\p{Z}]`|
|
||||
|\S|Matches non-whitespace characters: `[^\s]`|
|
||||
|
||||
## 4. Lookaround
|
||||
## 4. Lookarounds
|
||||
|
||||
Lookbehind and lookahead (also called lookaround) are specific types of
|
||||
***non-capturing groups*** (Used to match the pattern but not included in matching
|
||||
list). Lookarounds are used when we have the condition that this pattern is
|
||||
preceded or followed by another certain pattern. For example, we want to get all
|
||||
numbers that are preceded by `$` character from the following input string
|
||||
`$4.44 and $10.88`. We will use following regular expression `(?<=\$)[0-9\.]*`
|
||||
which means: get all the numbers which contain `.` character and are preceded
|
||||
by `$` character. Following are the lookarounds that are used in regular
|
||||
Lookbehinds and lookaheads (also called lookarounds) are specific types of
|
||||
***non-capturing groups*** (used to match a pattern but without including it in the matching
|
||||
list). Lookarounds are used when we a pattern must be
|
||||
preceded or followed by another pattern. For example, imagine we want to get all
|
||||
numbers that are preceded by the `$` character from the string
|
||||
`$4.44 and $10.88`. We will use the following regular expression `(?<=\$)[0-9\.]*`
|
||||
which means: get all the numbers which contain the `.` character and are preceded
|
||||
by the `$` character. These are the lookarounds that are used in regular
|
||||
expressions:
|
||||
|
||||
|Symbol|Description|
|
||||
@ -444,12 +442,12 @@ The positive lookahead asserts that the first part of the expression must be
|
||||
followed by the lookahead expression. The returned match only contains the text
|
||||
that is matched by the first part of the expression. To define a positive
|
||||
lookahead, parentheses are used. Within those parentheses, a question mark with
|
||||
equal sign is used like this: `(?=...)`. Lookahead expression is written after
|
||||
the equal sign inside parentheses. For example, the regular expression
|
||||
`(T|t)he(?=\sfat)` means: optionally match lowercase letter `t` or uppercase
|
||||
letter `T`, followed by letter `h`, followed by letter `e`. In parentheses we
|
||||
define positive lookahead which tells regular expression engine to match `The`
|
||||
or `the` which are followed by the word `fat`.
|
||||
an equals sign is used like this: `(?=...)`. The lookahead expressions is written after
|
||||
the equals sign inside parentheses. For example, the regular expression
|
||||
`(T|t)he(?=\sfat)` means: match either a lowercase `t` or an uppercase
|
||||
`T`, followed by the letter `h`, followed by the letter `e`. In parentheses we
|
||||
define a positive lookahead which tells the regular expression engine to match `The`
|
||||
or `the` only if it's followed by the word `fat`.
|
||||
|
||||
<pre>
|
||||
"(T|t)he(?=\sfat)" => <a href="#learn-regex"><strong>The</strong></a> fat cat sat on the mat.
|
||||
@ -459,13 +457,12 @@ or `the` which are followed by the word `fat`.
|
||||
|
||||
### 4.2 Negative Lookahead
|
||||
|
||||
Negative lookahead is used when we need to get all matches from input string
|
||||
that are not followed by a pattern. Negative lookahead is defined same as we define
|
||||
positive lookahead but the only difference is instead of equal `=` character we
|
||||
use negation `!` character i.e. `(?!...)`. Let's take a look at the following
|
||||
Negative lookaheads are used when we need to get all matches from an input string
|
||||
that are not followed by a certain pattern. A negative lookahead is written the same way as a
|
||||
positive lookahead. The only difference is, instead of an equals sign `=`, we
|
||||
use an exclamation mark `!` to indicate negation i.e. `(?!...)`. Let's take a look at the following
|
||||
regular expression `(T|t)he(?!\sfat)` which means: get all `The` or `the` words
|
||||
from input string that are not followed by the word `fat` precedes by a space
|
||||
character.
|
||||
from the input string that are not followed by a space character and the word `fat`.
|
||||
|
||||
<pre>
|
||||
"(T|t)he(?!\sfat)" => The fat cat sat on <a href="#learn-regex"><strong>the</strong></a> mat.
|
||||
@ -475,10 +472,10 @@ character.
|
||||
|
||||
### 4.3 Positive Lookbehind
|
||||
|
||||
Positive lookbehind is used to get all the matches that are preceded by a
|
||||
specific pattern. Positive lookbehind is denoted by `(?<=...)`. For example, the
|
||||
Positive lookbehinds are used to get all the matches that are preceded by a
|
||||
specific pattern. Positive lookbehinds are written `(?<=...)`. For example, the
|
||||
regular expression `(?<=(T|t)he\s)(fat|mat)` means: get all `fat` or `mat` words
|
||||
from input string that are after the word `The` or `the`.
|
||||
from the input string that come after the word `The` or `the`.
|
||||
|
||||
<pre>
|
||||
"(?<=(T|t)he\s)(fat|mat)" => The <a href="#learn-regex"><strong>fat</strong></a> cat sat on the <a href="#learn-regex"><strong>mat</strong></a>.
|
||||
@ -488,9 +485,9 @@ from input string that are after the word `The` or `the`.
|
||||
|
||||
### 4.4 Negative Lookbehind
|
||||
|
||||
Negative lookbehind is used to get all the matches that are not preceded by a
|
||||
specific pattern. Negative lookbehind is denoted by `(?<!...)`. For example, the
|
||||
regular expression `(?<!(T|t)he\s)(cat)` means: get all `cat` words from input
|
||||
Negative lookbehinds are used to get all the matches that are not preceded by a
|
||||
specific pattern. Negative lookbehinds are written `(?<!...)`. For example, the
|
||||
regular expression `(?<!(T|t)he\s)(cat)` means: get all `cat` words from the input
|
||||
string that are not after the word `The` or `the`.
|
||||
|
||||
<pre>
|
||||
@ -507,17 +504,17 @@ integral part of the RegExp.
|
||||
|
||||
|Flag|Description|
|
||||
|:----:|----|
|
||||
|i|Case insensitive: Sets matching to be case-insensitive.|
|
||||
|g|Global Search: Search for a pattern throughout the input string.|
|
||||
|m|Multiline: Anchor meta character works on each line.|
|
||||
|i|Case insensitive: Match will be case-insensitive.|
|
||||
|g|Global Search: Match all instances, not just the first.|
|
||||
|m|Multiline: Anchor meta characters work on each line.|
|
||||
|
||||
### 5.1 Case Insensitive
|
||||
|
||||
The `i` modifier is used to perform case-insensitive matching. For example, the
|
||||
regular expression `/The/gi` means: uppercase letter `T`, followed by lowercase
|
||||
character `h`, followed by character `e`. And at the end of regular expression
|
||||
regular expression `/The/gi` means: an uppercase `T`, followed by a lowercase
|
||||
`h`, followed by an `e`. And at the end of regular expression
|
||||
the `i` flag tells the regular expression engine to ignore the case. As you can
|
||||
see we also provided `g` flag because we want to search for the pattern in the
|
||||
see, we also provided `g` flag because we want to search for the pattern in the
|
||||
whole input string.
|
||||
|
||||
<pre>
|
||||
@ -532,13 +529,13 @@ whole input string.
|
||||
|
||||
[Test the regular expression](https://regex101.com/r/ahfiuh/1)
|
||||
|
||||
### 5.2 Global search
|
||||
### 5.2 Global Search
|
||||
|
||||
The `g` modifier is used to perform a global match (find all matches rather than
|
||||
The `g` modifier is used to perform a global match (finds all matches rather than
|
||||
stopping after the first match). For example, the regular expression`/.(at)/g`
|
||||
means: any character except new line, followed by lowercase character `a`,
|
||||
followed by lowercase character `t`. Because we provided `g` flag at the end of
|
||||
the regular expression now it will find all matches in the input string, not just the first one (which is the default behavior).
|
||||
means: any character except a new line, followed by a lowercase `a`,
|
||||
followed by a lowercase `t`. Because we provided the `g` flag at the end of
|
||||
the regular expression, it will now find all matches in the input string, not just the first one (which is the default behavior).
|
||||
|
||||
<pre>
|
||||
"/.(at)/" => The <a href="#learn-regex"><strong>fat</strong></a> cat sat on the mat.
|
||||
@ -554,12 +551,12 @@ the regular expression now it will find all matches in the input string, not jus
|
||||
|
||||
### 5.3 Multiline
|
||||
|
||||
The `m` modifier is used to perform a multi-line match. As we discussed earlier
|
||||
anchors `(^, $)` are used to check if pattern is the beginning of the input or
|
||||
end of the input string. But if we want that anchors works on each line we use
|
||||
`m` flag. For example, the regular expression `/at(.)?$/gm` means: lowercase
|
||||
character `a`, followed by lowercase character `t`, optionally anything except
|
||||
new line. And because of `m` flag now regular expression engine matches pattern
|
||||
The `m` modifier is used to perform a multi-line match. As we discussed earlier,
|
||||
anchors `(^, $)` are used to check if a pattern is at the beginning of the input or
|
||||
the end. But if we want the anchors to work on each line, we use
|
||||
the `m` flag. For example, the regular expression `/at(.)?$/gm` means: a lowercase
|
||||
`a`, followed by a lowercase `t` and, optionally, anything except
|
||||
a new line. And because of the `m` flag, the regular expression engine now matches patterns
|
||||
at the end of each line in a string.
|
||||
|
||||
<pre>
|
||||
@ -578,9 +575,9 @@ at the end of each line in a string.
|
||||
|
||||
[Test the regular expression](https://regex101.com/r/E88WE2/1)
|
||||
|
||||
## 6. Greedy vs lazy matching
|
||||
By default regex will do greedy matching , means it will match as long as
|
||||
possible. we can use `?` to match in lazy way means as short as possible
|
||||
## 6. Greedy vs Lazy Matching
|
||||
By default, a regex will perform a greedy match, which means the match will be as long as
|
||||
possible. We can use `?` to match in a lazy way, which means the match should be as short as possible.
|
||||
|
||||
<pre>
|
||||
"/(.*at)/" => <a href="#learn-regex"><strong>The fat cat sat on the mat</strong></a>. </pre>
|
||||
@ -597,7 +594,7 @@ possible. we can use `?` to match in lazy way means as short as possible
|
||||
|
||||
## Contribution
|
||||
|
||||
* Open pull request with improvements
|
||||
* Open a pull request with improvements
|
||||
* Discuss ideas in issues
|
||||
* Spread the word
|
||||
* Reach out with any feedback [](https://twitter.com/ziishaned)
|
||||
|
||||
@ -15,7 +15,7 @@
|
||||
</p>
|
||||
|
||||
|
||||
## 翻译:
|
||||
## 翻译:
|
||||
|
||||
* [English](../README.md)
|
||||
* [Español](../translations/README-es.md)
|
||||
@ -30,26 +30,27 @@
|
||||
* [Polish](../translations/README-pl.md)
|
||||
* [Русский](../translations/README-ru.md)
|
||||
* [Tiếng Việt](../translations/README-vn.md)
|
||||
* [فارسی](../translations/README-fa.md)
|
||||
|
||||
## 什么是正则表达式?
|
||||
## 什么是正则表达式?
|
||||
|
||||
> 正则表达式是一组由字母和符号组成的特殊文本, 它可以用来从文本中找出满足你想要的格式的句子.
|
||||
> 正则表达式是一组由字母和符号组成的特殊文本,它可以用来从文本中找出满足你想要的格式的句子。
|
||||
|
||||
一个正则表达式是一种从左到右匹配主体字符串的模式。
|
||||
“Regular expression”这个词比较拗口,我们常使用缩写的术语“regex”或“regexp”。
|
||||
正则表达式可以从一个基础字符串中根据一定的匹配模式替换文本中的字符串、验证表单、提取字符串等等。
|
||||
|
||||
一个正则表达式是在一个主体字符串中从左到右匹配字符串时的一种样式.
|
||||
"Regular expression"这个词比较拗口, 我们常使用缩写的术语"regex"或"regexp".
|
||||
正则表达式可以从一个基础字符串中根据一定的匹配模式替换文本中的字符串、验证表单、提取字符串等等.
|
||||
|
||||
想象你正在写一个应用, 然后你想设定一个用户命名的规则, 让用户名包含字符,数字,下划线和连字符,以及限制字符的个数,好让名字看起来没那么丑.
|
||||
我们使用以下正则表达式来验证一个用户名:
|
||||
想象你正在写一个应用,然后你想设定一个用户命名的规则,让用户名包含字符、数字、下划线和连字符,以及限制字符的个数,好让名字看起来没那么丑。
|
||||
我们使用以下正则表达式来验证一个用户名:
|
||||
|
||||
<br/><br/>
|
||||
|
||||
<p align="center">
|
||||
<img src="../img/regexp-cn.png" alt="Regular expression">
|
||||
</p>
|
||||
|
||||
以上的正则表达式可以接受 `john_doe`, `jo-hn_doe`, `john12_as`.
|
||||
但不匹配`Jo`, 因为它包含了大写的字母而且太短了.
|
||||
以上的正则表达式可以接受 `john_doe`、`jo-hn_doe`、`john12_as`。
|
||||
但不匹配`Jo`,因为它包含了大写的字母而且太短了。
|
||||
|
||||
目录
|
||||
=================
|
||||
@ -77,17 +78,17 @@
|
||||
* [4.3 ?<= ... 正后发断言](#43---正后发断言)
|
||||
* [4.4 ?<!... 负后发断言](#44--负后发断言)
|
||||
* [5. 标志](#5-标志)
|
||||
* [5.1 忽略大小写 (Case Insensitive)](#51-忽略大小写-case-insensitive)
|
||||
* [5.2 全局搜索 (Global search)](#52-全局搜索-global-search)
|
||||
* [5.3 多行修饰符 (Multiline)](#53-多行修饰符-multiline)
|
||||
* [5.1 忽略大小写(Case Insensitive)](#51-忽略大小写-case-insensitive)
|
||||
* [5.2 全局搜索(Global search)](#52-全局搜索-global-search)
|
||||
* [5.3 多行修饰符(Multiline)](#53-多行修饰符-multiline)
|
||||
* [额外补充](#额外补充)
|
||||
* [贡献](#贡献)
|
||||
* [许可证](#许可证)
|
||||
|
||||
## 1. 基本匹配
|
||||
|
||||
正则表达式其实就是在执行搜索时的格式, 它由一些字母和数字组合而成.
|
||||
例如: 一个正则表达式 `the`, 它表示一个规则: 由字母`t`开始,接着是`h`,再接着是`e`.
|
||||
正则表达式其实就是在执行搜索时的格式,它由一些字母和数字组合而成。
|
||||
例如:一个正则表达式 `the`,它表示一个规则:由字母`t`开始,接着是`h`,再接着是`e`。
|
||||
|
||||
<pre>
|
||||
"the" => The fat cat sat on <a href="#learn-regex"><strong>the</strong></a> mat.
|
||||
@ -95,9 +96,9 @@
|
||||
|
||||
[在线练习](https://regex101.com/r/dmRygT/1)
|
||||
|
||||
正则表达式`123`匹配字符串`123`. 它逐个字符的与输入的正则表达式做比较.
|
||||
正则表达式`123`匹配字符串`123`。它逐个字符的与输入的正则表达式做比较。
|
||||
|
||||
正则表达式是大小写敏感的, 所以`The`不会匹配`the`.
|
||||
正则表达式是大小写敏感的,所以`The`不会匹配`the`。
|
||||
|
||||
<pre>
|
||||
"The" => <a href="#learn-regex"><strong>The</strong></a> fat cat sat on the mat.
|
||||
@ -107,29 +108,29 @@
|
||||
|
||||
## 2. 元字符
|
||||
|
||||
正则表达式主要依赖于元字符.
|
||||
元字符不代表他们本身的字面意思, 他们都有特殊的含义. 一些元字符写在方括号中的时候有一些特殊的意思. 以下是一些元字符的介绍:
|
||||
正则表达式主要依赖于元字符。
|
||||
元字符不代表他们本身的字面意思,他们都有特殊的含义。一些元字符写在方括号中的时候有一些特殊的意思。以下是一些元字符的介绍:
|
||||
|
||||
|元字符|描述|
|
||||
|:----:|----|
|
||||
|.|句号匹配任意单个字符除了换行符.|
|
||||
|[ ]|字符种类. 匹配方括号内的任意字符.|
|
||||
|[^ ]|否定的字符种类. 匹配除了方括号里的任意字符|
|
||||
|*|匹配>=0个重复的在*号之前的字符.|
|
||||
|+|匹配>=1个重复的+号前的字符.
|
||||
|.|句号匹配任意单个字符除了换行符。|
|
||||
|[ ]|字符种类。匹配方括号内的任意字符。|
|
||||
|[^ ]|否定的字符种类。匹配除了方括号里的任意字符|
|
||||
|*|匹配>=0个重复的在*号之前的字符。|
|
||||
|+|匹配>=1个重复的+号前的字符。
|
||||
|?|标记?之前的字符为可选.|
|
||||
|{n,m}|匹配num个大括号之前的字符 (n <= num <= m).|
|
||||
|(xyz)|字符集, 匹配与 xyz 完全相等的字符串.|
|
||||
|||或运算符,匹配符号前或后的字符.|
|
||||
|{n,m}|匹配num个大括号之前的字符或字符集 (n <= num <= m).|
|
||||
|(xyz)|字符集,匹配与 xyz 完全相等的字符串.|
|
||||
|||或运算符,匹配符号前或后的字符.|
|
||||
|\|转义字符,用于匹配一些保留的字符 <code>[ ] ( ) { } . * + ? ^ $ \ |</code>|
|
||||
|^|从开始行开始匹配.|
|
||||
|$|从末端开始匹配.|
|
||||
|
||||
## 2.1 点运算符 `.`
|
||||
|
||||
`.`是元字符中最简单的例子.
|
||||
`.`匹配任意单个字符, 但不匹配换行符.
|
||||
例如, 表达式`.ar`匹配一个任意字符后面跟着是`a`和`r`的字符串.
|
||||
`.`是元字符中最简单的例子。
|
||||
`.`匹配任意单个字符,但不匹配换行符。
|
||||
例如,表达式`.ar`匹配一个任意字符后面跟着是`a`和`r`的字符串。
|
||||
|
||||
<pre>
|
||||
".ar" => The <a href="#learn-regex"><strong>car</strong></a> <a href="#learn-regex"><strong>par</strong></a>ked in the <a href="#learn-regex"><strong>gar</strong></a>age.
|
||||
@ -139,11 +140,11 @@
|
||||
|
||||
## 2.2 字符集
|
||||
|
||||
字符集也叫做字符类.
|
||||
方括号用来指定一个字符集.
|
||||
在方括号中使用连字符来指定字符集的范围.
|
||||
在方括号中的字符集不关心顺序.
|
||||
例如, 表达式`[Tt]he` 匹配 `the` 和 `The`.
|
||||
字符集也叫做字符类。
|
||||
方括号用来指定一个字符集。
|
||||
在方括号中使用连字符来指定字符集的范围。
|
||||
在方括号中的字符集不关心顺序。
|
||||
例如,表达式`[Tt]he` 匹配 `the` 和 `The`。
|
||||
|
||||
<pre>
|
||||
"[Tt]he" => <a href="#learn-regex"><strong>The</strong></a> car parked in <a href="#learn-regex"><strong>the</strong></a> garage.
|
||||
@ -151,7 +152,7 @@
|
||||
|
||||
[在线练习](https://regex101.com/r/2ITLQ4/1)
|
||||
|
||||
方括号的句号就表示句号.
|
||||
方括号的句号就表示句号。
|
||||
表达式 `ar[.]` 匹配 `ar.`字符串
|
||||
|
||||
<pre>
|
||||
@ -162,8 +163,8 @@
|
||||
|
||||
### 2.2.1 否定字符集
|
||||
|
||||
一般来说 `^` 表示一个字符串的开头, 但它用在一个方括号的开头的时候, 它表示这个字符集是否定的.
|
||||
例如, 表达式`[^c]ar` 匹配一个后面跟着`ar`的除了`c`的任意字符.
|
||||
一般来说 `^` 表示一个字符串的开头,但它用在一个方括号的开头的时候,它表示这个字符集是否定的。
|
||||
例如,表达式`[^c]ar` 匹配一个后面跟着`ar`的除了`c`的任意字符。
|
||||
|
||||
<pre>
|
||||
"[^c]ar" => The car <a href="#learn-regex"><strong>par</strong></a>ked in the <a href="#learn-regex"><strong>gar</strong></a>age.
|
||||
@ -173,13 +174,13 @@
|
||||
|
||||
## 2.3 重复次数
|
||||
|
||||
后面跟着元字符 `+`, `*` or `?` 的, 用来指定匹配子模式的次数.
|
||||
这些元字符在不同的情况下有着不同的意思.
|
||||
后面跟着元字符 `+`,`*` or `?` 的,用来指定匹配子模式的次数。
|
||||
这些元字符在不同的情况下有着不同的意思。
|
||||
|
||||
### 2.3.1 `*` 号
|
||||
|
||||
`*`号匹配 在`*`之前的字符出现`大于等于0`次.
|
||||
例如, 表达式 `a*` 匹配以0或更多个a开头的字符, 因为有0个这个条件, 其实也就匹配了所有的字符. 表达式`[a-z]*` 匹配一个行中所有以小写字母开头的字符串.
|
||||
`*`号匹配 在`*`之前的字符出现`大于等于0`次。
|
||||
例如,表达式 `a*` 匹配0或更多个以a开头的字符。表达式`[a-z]*` 匹配一个行中所有以小写字母开头的字符串。
|
||||
|
||||
<pre>
|
||||
"[a-z]*" => T<a href="#learn-regex"><strong>he</strong></a> <a href="#learn-regex"><strong>car</strong></a> <a href="#learn-regex"><strong>parked</strong></a> <a href="#learn-regex"><strong>in</strong></a> <a href="#learn-regex"><strong>the</strong></a> <a href="#learn-regex"><strong>garage</strong></a> #21.
|
||||
@ -187,8 +188,8 @@
|
||||
|
||||
[在线练习](https://regex101.com/r/7m8me5/1)
|
||||
|
||||
`*`字符和`.`字符搭配可以匹配所有的字符`.*`.
|
||||
`*`和表示匹配空格的符号`\s`连起来用, 如表达式`\s*cat\s*`匹配0或更多个空格开头和0或更多个空格结尾的cat字符串.
|
||||
`*`字符和`.`字符搭配可以匹配所有的字符`.*`。
|
||||
`*`和表示匹配空格的符号`\s`连起来用,如表达式`\s*cat\s*`匹配0或更多个空格开头和0或更多个空格结尾的cat字符串。
|
||||
|
||||
<pre>
|
||||
"\s*cat\s*" => The fat<a href="#learn-regex"><strong> cat </strong></a>sat on the con<a href="#learn-regex"><strong>cat</strong></a>enation.
|
||||
@ -198,8 +199,8 @@
|
||||
|
||||
### 2.3.2 `+` 号
|
||||
|
||||
`+`号匹配`+`号之前的字符出现 >=1 次.
|
||||
例如表达式`c.+t` 匹配以首字母`c`开头以`t`结尾,中间跟着任意个字符的字符串.
|
||||
`+`号匹配`+`号之前的字符出现 >=1 次。
|
||||
例如表达式`c.+t` 匹配以首字母`c`开头以`t`结尾,中间跟着至少一个字符的字符串。
|
||||
|
||||
<pre>
|
||||
"c.+t" => The fat <a href="#learn-regex"><strong>cat sat on the mat</strong></a>.
|
||||
@ -209,8 +210,8 @@
|
||||
|
||||
### 2.3.3 `?` 号
|
||||
|
||||
在正则表达式中元字符 `?` 标记在符号前面的字符为可选, 即出现 0 或 1 次.
|
||||
例如, 表达式 `[T]?he` 匹配字符串 `he` 和 `The`.
|
||||
在正则表达式中元字符 `?` 标记在符号前面的字符为可选,即出现 0 或 1 次。
|
||||
例如,表达式 `[T]?he` 匹配字符串 `he` 和 `The`。
|
||||
|
||||
<pre>
|
||||
"[T]he" => <a href="#learn-regex"><strong>The</strong></a> car is parked in the garage.
|
||||
@ -226,8 +227,8 @@
|
||||
|
||||
## 2.4 `{}` 号
|
||||
|
||||
在正则表达式中 `{}` 是一个量词, 常用来一个或一组字符可以重复出现的次数.
|
||||
例如, 表达式 `[0-9]{2,3}` 匹配最少 2 位最多 3 位 0~9 的数字.
|
||||
在正则表达式中 `{}` 是一个量词,常用来限定一个或一组字符可以重复出现的次数。
|
||||
例如, 表达式 `[0-9]{2,3}` 匹配最少 2 位最多 3 位 0~9 的数字。
|
||||
|
||||
<pre>
|
||||
"[0-9]{2,3}" => The number was 9.<a href="#learn-regex"><strong>999</strong></a>7 but we rounded it off to <a href="#learn-regex"><strong>10</strong></a>.0.
|
||||
@ -235,8 +236,8 @@
|
||||
|
||||
[在线练习](https://regex101.com/r/juM86s/1)
|
||||
|
||||
我们可以省略第二个参数.
|
||||
例如, `[0-9]{2,}` 匹配至少两位 0~9 的数字.
|
||||
我们可以省略第二个参数。
|
||||
例如,`[0-9]{2,}` 匹配至少两位 0~9 的数字。
|
||||
|
||||
<pre>
|
||||
"[0-9]{2,}" => The number was 9.<a href="#learn-regex"><strong>9997</strong></a> but we rounded it off to <a href="#learn-regex"><strong>10</strong></a>.0.
|
||||
@ -244,8 +245,8 @@
|
||||
|
||||
[在线练习](https://regex101.com/r/Gdy4w5/1)
|
||||
|
||||
如果逗号也省略掉则表示重复固定的次数.
|
||||
例如, `[0-9]{3}` 匹配3位数字
|
||||
如果逗号也省略掉则表示重复固定的次数。
|
||||
例如,`[0-9]{3}` 匹配3位数字
|
||||
|
||||
<pre>
|
||||
"[0-9]{3}" => The number was 9.<a href="#learn-regex"><strong>999</strong></a>7 but we rounded it off to 10.0.
|
||||
@ -255,9 +256,10 @@
|
||||
|
||||
## 2.5 `(...)` 特征标群
|
||||
|
||||
特征标群是一组写在 `(...)` 中的子模式. 例如之前说的 `{}` 是用来表示前面一个字符出现指定次数. 但如果在 `{}` 前加入特征标群则表示整个标群内的字符重复 N 次. 例如, 表达式 `(ab)*` 匹配连续出现 0 或更多个 `ab`.
|
||||
特征标群是一组写在 `(...)` 中的子模式。`(...)` 中包含的内容将会被看成一个整体,和数学中小括号( )的作用相同。例如, 表达式 `(ab)*` 匹配连续出现 0 或更多个 `ab`。如果没有使用 `(...)` ,那么表达式 `ab*` 将匹配连续出现 0 或更多个 `b` 。再比如之前说的 `{}` 是用来表示前面一个字符出现指定次数。但如果在 `{}` 前加上特征标群 `(...)` 则表示整个标群内的字符重复 N 次。
|
||||
|
||||
我们还可以在 `()` 中用或字符 `|` 表示或. 例如, `(c|g|p)ar` 匹配 `car` 或 `gar` 或 `par`.
|
||||
|
||||
我们还可以在 `()` 中用或字符 `|` 表示或。例如,`(c|g|p)ar` 匹配 `car` 或 `gar` 或 `par`.
|
||||
|
||||
<pre>
|
||||
"(c|g|p)ar" => The <a href="#learn-regex"><strong>car</strong></a> is <a href="#learn-regex"><strong>par</strong></a>ked in the <a href="#learn-regex"><strong>gar</strong></a>age.
|
||||
@ -267,9 +269,9 @@
|
||||
|
||||
## 2.6 `|` 或运算符
|
||||
|
||||
或运算符就表示或, 用作判断条件.
|
||||
或运算符就表示或,用作判断条件。
|
||||
|
||||
例如 `(T|t)he|car` 匹配 `(T|t)he` 或 `car`.
|
||||
例如 `(T|t)he|car` 匹配 `(T|t)he` 或 `car`。
|
||||
|
||||
<pre>
|
||||
"(T|t)he|car" => <a href="#learn-regex"><strong>The</strong></a> <a href="#learn-regex"><strong>car</strong></a> is parked in <a href="#learn-regex"><strong>the</strong></a> garage.
|
||||
@ -279,9 +281,9 @@
|
||||
|
||||
## 2.7 转码特殊字符
|
||||
|
||||
反斜线 `\` 在表达式中用于转码紧跟其后的字符. 用于指定 `{ } [ ] / \ + * . $ ^ | ?` 这些特殊字符. 如果想要匹配这些特殊字符则要在其前面加上反斜线 `\`.
|
||||
反斜线 `\` 在表达式中用于转码紧跟其后的字符。用于指定 `{ } [ ] / \ + * . $ ^ | ?` 这些特殊字符。如果想要匹配这些特殊字符则要在其前面加上反斜线 `\`。
|
||||
|
||||
例如 `.` 是用来匹配除换行符外的所有字符的. 如果想要匹配句子中的 `.` 则要写成 `\.` 以下这个例子 `\.?`是选择性匹配`.`
|
||||
例如 `.` 是用来匹配除换行符外的所有字符的。如果想要匹配句子中的 `.` 则要写成 `\.` 以下这个例子 `\.?`是选择性匹配`.`
|
||||
|
||||
<pre>
|
||||
"(f|c|m)at\.?" => The <a href="#learn-regex"><strong>fat</strong></a> <a href="#learn-regex"><strong>cat</strong></a> sat on the <a href="#learn-regex"><strong>mat.</strong></a>
|
||||
@ -291,15 +293,15 @@
|
||||
|
||||
## 2.8 锚点
|
||||
|
||||
在正则表达式中, 想要匹配指定开头或结尾的字符串就要使用到锚点. `^` 指定开头, `$` 指定结尾.
|
||||
在正则表达式中,想要匹配指定开头或结尾的字符串就要使用到锚点。`^` 指定开头,`$` 指定结尾。
|
||||
|
||||
### 2.8.1 `^` 号
|
||||
|
||||
`^` 用来检查匹配的字符串是否在所匹配字符串的开头.
|
||||
`^` 用来检查匹配的字符串是否在所匹配字符串的开头。
|
||||
|
||||
例如, 在 `abc` 中使用表达式 `^a` 会得到结果 `a`. 但如果使用 `^b` 将匹配不到任何结果. 因为在字符串 `abc` 中并不是以 `b` 开头.
|
||||
例如,在 `abc` 中使用表达式 `^a` 会得到结果 `a`。但如果使用 `^b` 将匹配不到任何结果。因为在字符串 `abc` 中并不是以 `b` 开头。
|
||||
|
||||
例如, `^(T|t)he` 匹配以 `The` 或 `the` 开头的字符串.
|
||||
例如,`^(T|t)he` 匹配以 `The` 或 `the` 开头的字符串。
|
||||
|
||||
<pre>
|
||||
"(T|t)he" => <a href="#learn-regex"><strong>The</strong></a> car is parked in <a href="#learn-regex"><strong>the</strong></a> garage.
|
||||
@ -315,9 +317,9 @@
|
||||
|
||||
### 2.8.2 `$` 号
|
||||
|
||||
同理于 `^` 号, `$` 号用来匹配字符是否是最后一个.
|
||||
同理于 `^` 号,`$` 号用来匹配字符是否是最后一个。
|
||||
|
||||
例如, `(at\.)$` 匹配以 `at.` 结尾的字符串.
|
||||
例如,`(at\.)$` 匹配以 `at.` 结尾的字符串。
|
||||
|
||||
<pre>
|
||||
"(at\.)" => The fat c<a href="#learn-regex"><strong>at.</strong></a> s<a href="#learn-regex"><strong>at.</strong></a> on the m<a href="#learn-regex"><strong>at.</strong></a>
|
||||
@ -333,33 +335,33 @@
|
||||
|
||||
## 3. 简写字符集
|
||||
|
||||
正则表达式提供一些常用的字符集简写. 如下:
|
||||
正则表达式提供一些常用的字符集简写。如下:
|
||||
|
||||
|简写|描述|
|
||||
|:----:|----|
|
||||
|.|除换行符外的所有字符|
|
||||
|\w|匹配所有字母数字, 等同于 `[a-zA-Z0-9_]`|
|
||||
|\W|匹配所有非字母数字, 即符号, 等同于: `[^\w]`|
|
||||
|\d|匹配数字: `[0-9]`|
|
||||
|\D|匹配非数字: `[^\d]`|
|
||||
|\s|匹配所有空格字符, 等同于: `[\t\n\f\r\p{Z}]`|
|
||||
|\S|匹配所有非空格字符: `[^\s]`|
|
||||
|\w|匹配所有字母数字,等同于 `[a-zA-Z0-9_]`|
|
||||
|\W|匹配所有非字母数字,即符号,等同于: `[^\w]`|
|
||||
|\d|匹配数字: `[0-9]`|
|
||||
|\D|匹配非数字: `[^\d]`|
|
||||
|\s|匹配所有空格字符,等同于: `[\t\n\f\r\p{Z}]`|
|
||||
|\S|匹配所有非空格字符: `[^\s]`|
|
||||
|\f|匹配一个换页符|
|
||||
|\n|匹配一个换行符|
|
||||
|\r|匹配一个回车符|
|
||||
|\t|匹配一个制表符|
|
||||
|\v|匹配一个垂直制表符|
|
||||
|\p|匹配 CR/LF (等同于 `\r\n`),用来匹配 DOS 行终止符|
|
||||
|\p|匹配 CR/LF(等同于 `\r\n`),用来匹配 DOS 行终止符|
|
||||
|
||||
## 4. 零宽度断言(前后预查)
|
||||
## 4. 零宽度断言(前后预查)
|
||||
|
||||
先行断言和后发断言都属于**非捕获簇**(不捕获文本 ,也不针对组合计进行计数).
|
||||
先行断言用于判断所匹配的格式是否在另一个确定的格式之前, 匹配结果不包含该确定格式(仅作为约束).
|
||||
先行断言和后发断言都属于**非捕获簇**(不捕获文本 ,也不针对组合计进行计数)。
|
||||
先行断言用于判断所匹配的格式是否在另一个确定的格式之前,匹配结果不包含该确定格式(仅作为约束)。
|
||||
|
||||
例如, 我们想要获得所有跟在 `$` 符号后的数字, 我们可以使用正后发断言 `(?<=\$)[0-9\.]*`.
|
||||
这个表达式匹配 `$` 开头, 之后跟着 `0,1,2,3,4,5,6,7,8,9,.` 这些字符可以出现大于等于 0 次.
|
||||
例如,我们想要获得所有跟在 `$` 符号后的数字,我们可以使用正后发断言 `(?<=\$)[0-9\.]*`。
|
||||
这个表达式匹配 `$` 开头,之后跟着 `0,1,2,3,4,5,6,7,8,9,.` 这些字符可以出现大于等于 0 次。
|
||||
|
||||
零宽度断言如下:
|
||||
零宽度断言如下:
|
||||
|
||||
|符号|描述|
|
||||
|:----:|----|
|
||||
@ -370,13 +372,13 @@
|
||||
|
||||
### 4.1 `?=...` 正先行断言
|
||||
|
||||
`?=...` 正先行断言, 表示第一部分表达式之后必须跟着 `?=...`定义的表达式.
|
||||
`?=...` 正先行断言,表示第一部分表达式之后必须跟着 `?=...`定义的表达式。
|
||||
|
||||
返回结果只包含满足匹配条件的第一部分表达式.
|
||||
定义一个正先行断言要使用 `()`. 在括号内部使用一个问号和等号: `(?=...)`.
|
||||
返回结果只包含满足匹配条件的第一部分表达式。
|
||||
定义一个正先行断言要使用 `()`。在括号内部使用一个问号和等号: `(?=...)`。
|
||||
|
||||
正先行断言的内容写在括号中的等号后面.
|
||||
例如, 表达式 `(T|t)he(?=\sfat)` 匹配 `The` 和 `the`, 在括号中我们又定义了正先行断言 `(?=\sfat)` ,即 `The` 和 `the` 后面紧跟着 `(空格)fat`.
|
||||
正先行断言的内容写在括号中的等号后面。
|
||||
例如,表达式 `(T|t)he(?=\sfat)` 匹配 `The` 和 `the`,在括号中我们又定义了正先行断言 `(?=\sfat)` ,即 `The` 和 `the` 后面紧跟着 `(空格)fat`。
|
||||
|
||||
<pre>
|
||||
"(T|t)he(?=\sfat)" => <a href="#learn-regex"><strong>The</strong></a> fat cat sat on the mat.
|
||||
@ -386,10 +388,10 @@
|
||||
|
||||
### 4.2 `?!...` 负先行断言
|
||||
|
||||
负先行断言 `?!` 用于筛选所有匹配结果, 筛选条件为 其后不跟随着断言中定义的格式.
|
||||
`正先行断言` 定义和 `负先行断言` 一样, 区别就是 `=` 替换成 `!` 也就是 `(?!...)`.
|
||||
负先行断言 `?!` 用于筛选所有匹配结果,筛选条件为 其后不跟随着断言中定义的格式。
|
||||
`正先行断言` 定义和 `负先行断言` 一样,区别就是 `=` 替换成 `!` 也就是 `(?!...)`。
|
||||
|
||||
表达式 `(T|t)he(?!\sfat)` 匹配 `The` 和 `the`, 且其后不跟着 `(空格)fat`.
|
||||
表达式 `(T|t)he(?!\sfat)` 匹配 `The` 和 `the`,且其后不跟着 `(空格)fat`。
|
||||
|
||||
<pre>
|
||||
"(T|t)he(?!\sfat)" => The fat cat sat on <a href="#learn-regex"><strong>the</strong></a> mat.
|
||||
@ -399,8 +401,8 @@
|
||||
|
||||
### 4.3 `?<= ...` 正后发断言
|
||||
|
||||
正后发断言 记作`(?<=...)` 用于筛选所有匹配结果, 筛选条件为 其前跟随着断言中定义的格式.
|
||||
例如, 表达式 `(?<=(T|t)he\s)(fat|mat)` 匹配 `fat` 和 `mat`, 且其前跟着 `The` 或 `the`.
|
||||
正后发断言 记作`(?<=...)` 用于筛选所有匹配结果,筛选条件为 其前跟随着断言中定义的格式。
|
||||
例如,表达式 `(?<=(T|t)he\s)(fat|mat)` 匹配 `fat` 和 `mat`,且其前跟着 `The` 或 `the`。
|
||||
|
||||
<pre>
|
||||
"(?<=(T|t)he\s)(fat|mat)" => The <a href="#learn-regex"><strong>fat</strong></a> cat sat on the <a href="#learn-regex"><strong>mat</strong></a>.
|
||||
@ -410,8 +412,8 @@
|
||||
|
||||
### 4.4 `?<!...` 负后发断言
|
||||
|
||||
负后发断言 记作 `(?<!...)` 用于筛选所有匹配结果, 筛选条件为 其前不跟随着断言中定义的格式.
|
||||
例如, 表达式 `(?<!(T|t)he\s)(cat)` 匹配 `cat`, 且其前不跟着 `The` 或 `the`.
|
||||
负后发断言 记作 `(?<!...)` 用于筛选所有匹配结果,筛选条件为 其前不跟随着断言中定义的格式。
|
||||
例如,表达式 `(?<!(T|t)he\s)(cat)` 匹配 `cat`,且其前不跟着 `The` 或 `the`。
|
||||
|
||||
<pre>
|
||||
"(?<!(T|t)he\s)(cat)" => The cat sat on <a href="#learn-regex"><strong>cat</strong></a>.
|
||||
@ -421,19 +423,19 @@
|
||||
|
||||
## 5. 标志
|
||||
|
||||
标志也叫模式修正符, 因为它可以用来修改表达式的搜索结果.
|
||||
这些标志可以任意的组合使用, 它也是整个正则表达式的一部分.
|
||||
标志也叫模式修正符,因为它可以用来修改表达式的搜索结果。
|
||||
这些标志可以任意的组合使用,它也是整个正则表达式的一部分。
|
||||
|
||||
|标志|描述|
|
||||
|:----:|----|
|
||||
|i|忽略大小写.|
|
||||
|g|全局搜索.|
|
||||
|m|多行的: 锚点元字符 `^` `$` 工作范围在每行的起始.|
|
||||
|i|忽略大小写。|
|
||||
|g|全局搜索。|
|
||||
|m|多行修饰符:锚点元字符 `^` `$` 工作范围在每行的起始。|
|
||||
|
||||
### 5.1 忽略大小写 (Case Insensitive)
|
||||
|
||||
修饰语 `i` 用于忽略大小写.
|
||||
例如, 表达式 `/The/gi` 表示在全局搜索 `The`, 在后面的 `i` 将其条件修改为忽略大小写, 则变成搜索 `the` 和 `The`, `g` 表示全局搜索.
|
||||
修饰语 `i` 用于忽略大小写。
|
||||
例如,表达式 `/The/gi` 表示在全局搜索 `The`,在后面的 `i` 将其条件修改为忽略大小写,则变成搜索 `the` 和 `The`,`g` 表示全局搜索。
|
||||
|
||||
<pre>
|
||||
"The" => <a href="#learn-regex"><strong>The</strong></a> fat cat sat on the mat.
|
||||
@ -449,8 +451,8 @@
|
||||
|
||||
### 5.2 全局搜索 (Global search)
|
||||
|
||||
修饰符 `g` 常用于执行一个全局搜索匹配, 即(不仅仅返回第一个匹配的, 而是返回全部).
|
||||
例如, 表达式 `/.(at)/g` 表示搜索 任意字符(除了换行) + `at`, 并返回全部结果.
|
||||
修饰符 `g` 常用于执行一个全局搜索匹配,即(不仅仅返回第一个匹配的,而是返回全部)。
|
||||
例如,表达式 `/.(at)/g` 表示搜索 任意字符(除了换行)+ `at`,并返回全部结果。
|
||||
|
||||
<pre>
|
||||
"/.(at)/" => The <a href="#learn-regex"><strong>fat</strong></a> cat sat on the mat.
|
||||
@ -466,11 +468,11 @@
|
||||
|
||||
### 5.3 多行修饰符 (Multiline)
|
||||
|
||||
多行修饰符 `m` 常用于执行一个多行匹配.
|
||||
多行修饰符 `m` 常用于执行一个多行匹配。
|
||||
|
||||
像之前介绍的 `(^,$)` 用于检查格式是否是在待检测字符串的开头或结尾. 但我们如果想要它在每行的开头和结尾生效, 我们需要用到多行修饰符 `m`.
|
||||
像之前介绍的 `(^,$)` 用于检查格式是否是在待检测字符串的开头或结尾。但我们如果想要它在每行的开头和结尾生效,我们需要用到多行修饰符 `m`。
|
||||
|
||||
例如, 表达式 `/at(.)?$/gm` 表示小写字符 `a` 后跟小写字符 `t` , 末尾可选除换行符外任意字符. 根据 `m` 修饰符, 现在表达式匹配每行的结尾.
|
||||
例如,表达式 `/at(.)?$/gm` 表示小写字符 `a` 后跟小写字符 `t` ,末尾可选除换行符外任意字符。根据 `m` 修饰符,现在表达式匹配每行的结尾。
|
||||
|
||||
<pre>
|
||||
"/.at(.)?$/" => The fat
|
||||
@ -500,7 +502,6 @@
|
||||
<pre>
|
||||
"/(.*?at)/" => <a href="#learn-regex"><strong>The fat</strong></a> cat sat on the mat. </pre>
|
||||
|
||||
|
||||
[在线练习](https://regex101.com/r/AyAdgJ/2)
|
||||
|
||||
## 贡献
|
||||
|
||||
@ -30,11 +30,13 @@
|
||||
* [Polish](../translations/README-pl.md)
|
||||
* [Русский](../translations/README-ru.md)
|
||||
* [Tiếng Việt](../translations/README-vn.md)
|
||||
* [فارسی](../translations/README-fa.md)
|
||||
|
||||
## Qué es una expresión regular?
|
||||
|
||||
> Una expresión regular es un grupo de caracteres o símbolos, los cuales son usados para buscar un patrón específico dentro de un texto.
|
||||
|
||||
Una expresión regular es un patrón que que se compara con una cadena de caracteres de izquierda a derecha. La palabra "expresión regular" puede también ser escrita como "Regex" o "Regexp". Las expresiones regulares se utilizan para remplazar un texto dentro de una cadena de caracteres (*string*), validar formularios, extraer una porción de una cadena de caracteres (*substring*) basado en la coincidencia de una patrón, y muchas cosas más.
|
||||
Una expresión regular es un patrón que que se compara con una cadena de caracteres de izquierda a derecha. La palabra "expresión regular" puede también ser escrita como "Regex" o "Regexp". Las expresiones regulares se utilizan para reemplazar un texto dentro de una cadena de caracteres (*string*), validar formularios, extraer una porción de una cadena de caracteres (*substring*) basado en la coincidencia de una patrón, y muchas cosas más.
|
||||
|
||||
Imagina que estás escribiendo una aplicación y quieres agregar reglas para cuando el usuario elija su nombre de usuario. Nosotros queremos permitir que el nombre de usuario contenga letras, números, guión bajo (raya), y guión medio. También queremos limitar el número de caracteres en el nombre de usuario para que no se vea feo. Para ello usamos la siguiente expresión regular para validar el nombre de usuario.
|
||||
|
||||
@ -430,7 +432,7 @@ El modificador `g` se utiliza para realizar una coincidencia global
|
||||
Por ejemplo, la expresión regular `/.(At)/g` significa: cualquier carácter,
|
||||
excepto la nueva línea, seguido del carácter en minúscula `a`, seguido del carácter
|
||||
en minúscula `t`. Debido a que proveimos el indicador `g` al final de la expresión
|
||||
regular, ahora encontrará todas las coincidencias de toda la cadena de entrada, no sólo la
|
||||
regular, ahora encontrará todas las coincidencias de toda la cadena de entrada, no sólo la
|
||||
primera instancia (el cual es el comportamiento normal).
|
||||
|
||||
|
||||
|
||||
608
translations/README-fa.md
Normal file
608
translations/README-fa.md
Normal file
@ -0,0 +1,608 @@
|
||||
<p align="center">
|
||||
<br/>
|
||||
<a href="https://github.com/ziishaned/learn-regex">
|
||||
<img src="https://i.imgur.com/bYwl7Vf.png" alt="Learn Regex">
|
||||
</a>
|
||||
<br /><br />
|
||||
<p>
|
||||
<a href="https://twitter.com/ziishaned">
|
||||
<img src="https://img.shields.io/twitter/follow/ziishaned.svg?style=social" />
|
||||
</a>
|
||||
<a href="https://github.com/ziishaned">
|
||||
<img src="https://img.shields.io/github/followers/ziishaned.svg?label=Follow%20%40ziishaned&style=social" />
|
||||
</a>
|
||||
</p>
|
||||
</p>
|
||||
|
||||
## برگردان ها:
|
||||
|
||||
* [English](../README.md)
|
||||
* [Español](../translations/README-es.md)
|
||||
* [Français](../translations/README-fr.md)
|
||||
* [Português do Brasil](../translations/README-pt_BR.md)
|
||||
* [中文版](../translations/README-cn.md)
|
||||
* [日本語](../translations/README-ja.md)
|
||||
* [한국어](../translations/README-ko.md)
|
||||
* [Turkish](../translations/README-tr.md)
|
||||
* [Greek](../translations/README-gr.md)
|
||||
* [Magyar](../translations/README-hu.md)
|
||||
* [Polish](../translations/README-pl.md)
|
||||
* [Русский](../translations/README-ru.md)
|
||||
* [Tiếng Việt](../translations/README-vn.md)
|
||||
* [فارسی](../translations/README-fa.md)
|
||||
|
||||
<div dir="rtl">
|
||||
|
||||
## عبارت منظم چیست؟
|
||||
</div>
|
||||
<div dir="rtl">
|
||||
|
||||
> عبارت منظم یک گروه از کارکترها یا نمادهاست که برای پیدا کردن یک الگوی مشخص در یک متن به کار گرفته می شود.
|
||||
</div>
|
||||
|
||||
<div dir="rtl">
|
||||
یک عبارت منظم یک الگو است که با رشته ای حاص مطابقت دارد. عبارت منظم در اعتبار سنجی داده های ورودی فرم ها، پیدا کردن یک زیر متن در یک متن بزرگتر بر اساس یک الگوی ویژ] و مواردی از این دست به کار گرفته می شود. عبارت "Regular expression" کمی ثقیل است، پس معمولا بیشتر مخفف آن - "regex" یا "regexp" - را به کار می برند.
|
||||
|
||||
فرض کنید یه برنامه نوشته اید و می خواهید قوانینی برای گزینش نام کاربری برا کاربران بگزارید. می خواهیم اجازه دهی که نام کاربری شامل حروف، اعداد، خط زیر و خط فاصله باشد. همچنین می خواهیم تعداد مشخصه ها یا همان کارکترها در نام کاربری محدود کنیم . ما از چنین عبارت منظمی برای اعتبار سنجی نام کاربری استفاده می کنیم:
|
||||
</div>
|
||||
<br/><br/>
|
||||
<p align="center">
|
||||
<img src="../img/regexp-en.png" alt="Regular expression">
|
||||
</p>
|
||||
<div dir="rtl">
|
||||
عبارت منظم به کار رفته در اینجا رشته `john_doe` و `jo-hn_doe` و `john12_as` می پذیرد ولی `Jo` را به دلیل کوتاه بودن بیش از حد و همچنین به کار بردن حروف بزرگ نمی پذیرد.
|
||||
</div>
|
||||
<div dir="rtl">
|
||||
|
||||
## فهرست
|
||||
|
||||
- [پایه ای ترین همخوانی](#1-basic-matchers)
|
||||
- [Meta character](#2-meta-characters)
|
||||
- [Full stop](#21-full-stop)
|
||||
- [Character set](#22-character-set)
|
||||
- [Negated character set](#221-negated-character-set)
|
||||
- [Repetitions](#23-repetitions)
|
||||
- [The Star](#231-the-star)
|
||||
- [The Plus](#232-the-plus)
|
||||
- [The Question Mark](#233-the-question-mark)
|
||||
- [Braces](#24-braces)
|
||||
- [Character Group](#25-character-group)
|
||||
- [Alternation](#26-alternation)
|
||||
- [Escaping special character](#27-escaping-special-character)
|
||||
- [Anchors](#28-anchors)
|
||||
- [Caret](#281-caret)
|
||||
- [Dollar](#282-dollar)
|
||||
- [Shorthand Character Sets](#3-shorthand-character-sets)
|
||||
- [Lookaround](#4-lookaround)
|
||||
- [Positive Lookahead](#41-positive-lookahead)
|
||||
- [Negative Lookahead](#42-negative-lookahead)
|
||||
- [Positive Lookbehind](#43-positive-lookbehind)
|
||||
- [Negative Lookbehind](#44-negative-lookbehind)
|
||||
- [Flags](#5-flags)
|
||||
- [Case Insensitive](#51-case-insensitive)
|
||||
- [Global search](#52-global-search)
|
||||
- [Multiline](#53-multiline)
|
||||
- [Greedy vs lazy matching](#6-greedy-vs-lazy-matching)
|
||||
</div>
|
||||
<div dir="rtl">
|
||||
|
||||
## 1. پایه ای ترین همخوانی
|
||||
|
||||
یک عبارت منظم در واقع یک الگو برای جست و جو در یک متن است. برای مثال عبارت منظم `the` به معنی : حرف
|
||||
`t`, پس از آن حرف `h`, پس از آن حرف `e` است.
|
||||
</div>
|
||||
<pre>
|
||||
"the" => The fat cat sat on <a href="#learn-regex"><strong>the</strong></a> mat.
|
||||
</pre>
|
||||
|
||||
<div dir="rtl">
|
||||
|
||||
[عبارت منظم را در عمل ببینید](https://regex101.com/r/dmRygT/1)
|
||||
|
||||
عبارت منظم `123` با رشته `123` مطابقت دارد. عبارت منظم با مقایسه حرف به حرف و کارکتر به کارکترش با متن مورد نظر تطابق را می یابد. همچنین عبارت منظم حساس به اندازه (بزرگی یا کوچکی حروف) هستند. بنابر این واژه ی `The` با `the` همخوان نیست.
|
||||
</div>
|
||||
|
||||
<pre>
|
||||
"The" => <a href="#learn-regex"><strong>The</strong></a> fat cat sat on the mat.
|
||||
</pre>
|
||||
|
||||
<div dir="rtl">
|
||||
|
||||
[این عبارت منظم را در عمل ببنیند](https://regex101.com/r/1paXsy/1)
|
||||
</div>
|
||||
|
||||
## 2. Meta Characters
|
||||
|
||||
Meta characters are the building blocks of the regular expressions. Meta
|
||||
characters do not stand for themselves but instead are interpreted in some
|
||||
special way. Some meta characters have a special meaning and are written inside
|
||||
square brackets. The meta characters are as follows:
|
||||
|
||||
|Meta character|Description|
|
||||
|:----:|----|
|
||||
|.|Period matches any single character except a line break.|
|
||||
|[ ]|Character class. Matches any character contained between the square brackets.|
|
||||
|[^ ]|Negated character class. Matches any character that is not contained between the square brackets|
|
||||
|*|Matches 0 or more repetitions of the preceding symbol.|
|
||||
|+|Matches 1 or more repetitions of the preceding symbol.|
|
||||
|?|Makes the preceding symbol optional.|
|
||||
|{n,m}|Braces. Matches at least "n" but not more than "m" repetitions of the preceding symbol.|
|
||||
|(xyz)|Character group. Matches the characters xyz in that exact order.|
|
||||
|||Alternation. Matches either the characters before or the characters after the symbol.|
|
||||
|\|Escapes the next character. This allows you to match reserved characters <code>[ ] ( ) { } . * + ? ^ $ \ |</code>|
|
||||
|^|Matches the beginning of the input.|
|
||||
|$|Matches the end of the input.|
|
||||
|
||||
## 2.1 Full stop
|
||||
|
||||
Full stop `.` is the simplest example of meta character. The meta character `.`
|
||||
matches any single character. It will not match return or newline characters.
|
||||
For example, the regular expression `.ar` means: any character, followed by the
|
||||
letter `a`, followed by the letter `r`.
|
||||
|
||||
<pre>
|
||||
".ar" => The <a href="#learn-regex"><strong>car</strong></a> <a href="#learn-regex"><strong>par</strong></a>ked in the <a href="#learn-regex"><strong>gar</strong></a>age.
|
||||
</pre>
|
||||
|
||||
[Test the regular expression](https://regex101.com/r/xc9GkU/1)
|
||||
|
||||
## 2.2 Character set
|
||||
|
||||
Character sets are also called character class. Square brackets are used to
|
||||
specify character sets. Use a hyphen inside a character set to specify the
|
||||
characters' range. The order of the character range inside square brackets
|
||||
doesn't matter. For example, the regular expression `[Tt]he` means: an uppercase
|
||||
`T` or lowercase `t`, followed by the letter `h`, followed by the letter `e`.
|
||||
|
||||
<pre>
|
||||
"[Tt]he" => <a href="#learn-regex"><strong>The</strong></a> car parked in <a href="#learn-regex"><strong>the</strong></a> garage.
|
||||
</pre>
|
||||
|
||||
[Test the regular expression](https://regex101.com/r/2ITLQ4/1)
|
||||
|
||||
A period inside a character set, however, means a literal period. The regular
|
||||
expression `ar[.]` means: a lowercase character `a`, followed by letter `r`,
|
||||
followed by a period `.` character.
|
||||
|
||||
<pre>
|
||||
"ar[.]" => A garage is a good place to park a c<a href="#learn-regex"><strong>ar.</strong></a>
|
||||
</pre>
|
||||
|
||||
[Test the regular expression](https://regex101.com/r/wL3xtE/1)
|
||||
|
||||
### 2.2.1 Negated character set
|
||||
|
||||
In general, the caret symbol represents the start of the string, but when it is
|
||||
typed after the opening square bracket it negates the character set. For
|
||||
example, the regular expression `[^c]ar` means: any character except `c`,
|
||||
followed by the character `a`, followed by the letter `r`.
|
||||
|
||||
<pre>
|
||||
"[^c]ar" => The car <a href="#learn-regex"><strong>par</strong></a>ked in the <a href="#learn-regex"><strong>gar</strong></a>age.
|
||||
</pre>
|
||||
|
||||
[Test the regular expression](https://regex101.com/r/nNNlq3/1)
|
||||
|
||||
## 2.3 Repetitions
|
||||
|
||||
Following meta characters `+`, `*` or `?` are used to specify how many times a
|
||||
subpattern can occur. These meta characters act differently in different
|
||||
situations.
|
||||
|
||||
### 2.3.1 The Star
|
||||
|
||||
The symbol `*` matches zero or more repetitions of the preceding matcher. The
|
||||
regular expression `a*` means: zero or more repetitions of preceding lowercase
|
||||
character `a`. But if it appears after a character set or class then it finds
|
||||
the repetitions of the whole character set. For example, the regular expression
|
||||
`[a-z]*` means: any number of lowercase letters in a row.
|
||||
|
||||
<pre>
|
||||
"[a-z]*" => T<a href="#learn-regex"><strong>he</strong></a> <a href="#learn-regex"><strong>car</strong></a> <a href="#learn-regex"><strong>parked</strong></a> <a href="#learn-regex"><strong>in</strong></a> <a href="#learn-regex"><strong>the</strong></a> <a href="#learn-regex"><strong>garage</strong></a> #21.
|
||||
</pre>
|
||||
|
||||
[Test the regular expression](https://regex101.com/r/7m8me5/1)
|
||||
|
||||
The `*` symbol can be used with the meta character `.` to match any string of
|
||||
characters `.*`. The `*` symbol can be used with the whitespace character `\s`
|
||||
to match a string of whitespace characters. For example, the expression
|
||||
`\s*cat\s*` means: zero or more spaces, followed by lowercase character `c`,
|
||||
followed by lowercase character `a`, followed by lowercase character `t`,
|
||||
followed by zero or more spaces.
|
||||
|
||||
<pre>
|
||||
"\s*cat\s*" => The fat<a href="#learn-regex"><strong> cat </strong></a>sat on the con<a href="#learn-regex"><strong>cat</strong></a>enation.
|
||||
</pre>
|
||||
|
||||
[Test the regular expression](https://regex101.com/r/gGrwuz/1)
|
||||
|
||||
### 2.3.2 The Plus
|
||||
|
||||
The symbol `+` matches one or more repetitions of the preceding character. For
|
||||
example, the regular expression `c.+t` means: lowercase letter `c`, followed by
|
||||
at least one character, followed by the lowercase character `t`. It needs to be
|
||||
clarified that `t` is the last `t` in the sentence.
|
||||
|
||||
<pre>
|
||||
"c.+t" => The fat <a href="#learn-regex"><strong>cat sat on the mat</strong></a>.
|
||||
</pre>
|
||||
|
||||
[Test the regular expression](https://regex101.com/r/Dzf9Aa/1)
|
||||
|
||||
### 2.3.3 The Question Mark
|
||||
|
||||
In regular expression the meta character `?` makes the preceding character
|
||||
optional. This symbol matches zero or one instance of the preceding character.
|
||||
For example, the regular expression `[T]?he` means: Optional the uppercase
|
||||
letter `T`, followed by the lowercase character `h`, followed by the lowercase
|
||||
character `e`.
|
||||
|
||||
<pre>
|
||||
"[T]he" => <a href="#learn-regex"><strong>The</strong></a> car is parked in the garage.
|
||||
</pre>
|
||||
|
||||
[Test the regular expression](https://regex101.com/r/cIg9zm/1)
|
||||
|
||||
<pre>
|
||||
"[T]?he" => <a href="#learn-regex"><strong>The</strong></a> car is parked in t<a href="#learn-regex"><strong>he</strong></a> garage.
|
||||
</pre>
|
||||
|
||||
[Test the regular expression](https://regex101.com/r/kPpO2x/1)
|
||||
|
||||
## 2.4 Braces
|
||||
|
||||
In regular expression braces that are also called quantifiers are used to
|
||||
specify the number of times that a character or a group of characters can be
|
||||
repeated. For example, the regular expression `[0-9]{2,3}` means: Match at least
|
||||
2 digits but not more than 3 ( characters in the range of 0 to 9).
|
||||
|
||||
<pre>
|
||||
"[0-9]{2,3}" => The number was 9.<a href="#learn-regex"><strong>999</strong></a>7 but we rounded it off to <a href="#learn-regex"><strong>10</strong></a>.0.
|
||||
</pre>
|
||||
|
||||
[Test the regular expression](https://regex101.com/r/juM86s/1)
|
||||
|
||||
We can leave out the second number. For example, the regular expression
|
||||
`[0-9]{2,}` means: Match 2 or more digits. If we also remove the comma the
|
||||
regular expression `[0-9]{3}` means: Match exactly 3 digits.
|
||||
|
||||
<pre>
|
||||
"[0-9]{2,}" => The number was 9.<a href="#learn-regex"><strong>9997</strong></a> but we rounded it off to <a href="#learn-regex"><strong>10</strong></a>.0.
|
||||
</pre>
|
||||
|
||||
[Test the regular expression](https://regex101.com/r/Gdy4w5/1)
|
||||
|
||||
<pre>
|
||||
"[0-9]{3}" => The number was 9.<a href="#learn-regex"><strong>999</strong></a>7 but we rounded it off to 10.0.
|
||||
</pre>
|
||||
|
||||
[Test the regular expression](https://regex101.com/r/Sivu30/1)
|
||||
|
||||
## 2.5 Capturing Group
|
||||
|
||||
A capturing group is a group of sub-patterns that is written inside Parentheses
|
||||
`(...)`. Like as we discussed before that in regular expression if we put a quantifier
|
||||
after a character then it will repeat the preceding character. But if we put quantifier
|
||||
after a capturing group then it repeats the whole capturing group. For example,
|
||||
the regular expression `(ab)*` matches zero or more repetitions of the character
|
||||
"ab". We can also use the alternation `|` meta character inside capturing group.
|
||||
For example, the regular expression `(c|g|p)ar` means: lowercase character `c`,
|
||||
`g` or `p`, followed by character `a`, followed by character `r`.
|
||||
|
||||
<pre>
|
||||
"(c|g|p)ar" => The <a href="#learn-regex"><strong>car</strong></a> is <a href="#learn-regex"><strong>par</strong></a>ked in the <a href="#learn-regex"><strong>gar</strong></a>age.
|
||||
</pre>
|
||||
|
||||
[Test the regular expression](https://regex101.com/r/tUxrBG/1)
|
||||
|
||||
Note that capturing groups do not only match but also capture the characters for use in
|
||||
the parent language. The parent language could be python or javascript or virtually any
|
||||
language that implements regular expressions in a function definition.
|
||||
|
||||
### 2.5.1 Non-capturing group
|
||||
|
||||
A non-capturing group is a capturing group that only matches the characters, but
|
||||
does not capture the group. A non-capturing group is denoted by a `?` followed by a `:`
|
||||
within parenthesis `(...)`. For example, the regular expression `(?:c|g|p)ar` is similar to
|
||||
`(c|g|p)ar` in that it matches the same characters but will not create a capture group.
|
||||
|
||||
<pre>
|
||||
"(?:c|g|p)ar" => The <a href="#learn-regex"><strong>car</strong></a> is <a href="#learn-regex"><strong>par</strong></a>ked in the <a href="#learn-regex"><strong>gar</strong></a>age.
|
||||
</pre>
|
||||
|
||||
[Test the regular expression](https://regex101.com/r/Rm7Me8/1)
|
||||
|
||||
Non-capturing groups can come in handy when used in find-and-replace functionality or
|
||||
when mixed with capturing groups to keep the overview when producing any other kind of output.
|
||||
See also [4. Lookaround](#4-lookaround).
|
||||
|
||||
## 2.6 Alternation
|
||||
|
||||
In a regular expression, the vertical bar `|` is used to define alternation.
|
||||
Alternation is like an OR statement between multiple expressions. Now, you may be
|
||||
thinking that character set and alternation works the same way. But the big
|
||||
difference between character set and alternation is that character set works on
|
||||
character level but alternation works on expression level. For example, the
|
||||
regular expression `(T|t)he|car` means: either (uppercase character `T` or lowercase
|
||||
`t`, followed by lowercase character `h`, followed by lowercase character `e`) OR
|
||||
(lowercase character `c`, followed by lowercase character `a`, followed by
|
||||
lowercase character `r`). Note that I put the parentheses for clarity, to show that either expression
|
||||
in parentheses can be met and it will match.
|
||||
|
||||
<pre>
|
||||
"(T|t)he|car" => <a href="#learn-regex"><strong>The</strong></a> <a href="#learn-regex"><strong>car</strong></a> is parked in <a href="#learn-regex"><strong>the</strong></a> garage.
|
||||
</pre>
|
||||
|
||||
[Test the regular expression](https://regex101.com/r/fBXyX0/1)
|
||||
|
||||
## 2.7 Escaping special character
|
||||
|
||||
Backslash `\` is used in regular expression to escape the next character. This
|
||||
allows us to specify a symbol as a matching character including reserved
|
||||
characters `{ } [ ] / \ + * . $ ^ | ?`. To use a special character as a matching
|
||||
character prepend `\` before it.
|
||||
|
||||
For example, the regular expression `.` is used to match any character except
|
||||
newline. Now to match `.` in an input string the regular expression
|
||||
`(f|c|m)at\.?` means: lowercase letter `f`, `c` or `m`, followed by lowercase
|
||||
character `a`, followed by lowercase letter `t`, followed by optional `.`
|
||||
character.
|
||||
|
||||
<pre>
|
||||
"(f|c|m)at\.?" => The <a href="#learn-regex"><strong>fat</strong></a> <a href="#learn-regex"><strong>cat</strong></a> sat on the <a href="#learn-regex"><strong>mat.</strong></a>
|
||||
</pre>
|
||||
|
||||
[Test the regular expression](https://regex101.com/r/DOc5Nu/1)
|
||||
|
||||
## 2.8 Anchors
|
||||
|
||||
In regular expressions, we use anchors to check if the matching symbol is the
|
||||
starting symbol or ending symbol of the input string. Anchors are of two types:
|
||||
First type is Caret `^` that check if the matching character is the start
|
||||
character of the input and the second type is Dollar `$` that checks if matching
|
||||
character is the last character of the input string.
|
||||
|
||||
### 2.8.1 Caret
|
||||
|
||||
Caret `^` symbol is used to check if matching character is the first character
|
||||
of the input string. If we apply the following regular expression `^a` (if a is
|
||||
the starting symbol) to input string `abc` it matches `a`. But if we apply
|
||||
regular expression `^b` on above input string it does not match anything.
|
||||
Because in input string `abc` "b" is not the starting symbol. Let's take a look
|
||||
at another regular expression `^(T|t)he` which means: uppercase character `T` or
|
||||
lowercase character `t` is the start symbol of the input string, followed by
|
||||
lowercase character `h`, followed by lowercase character `e`.
|
||||
|
||||
<pre>
|
||||
"(T|t)he" => <a href="#learn-regex"><strong>The</strong></a> car is parked in <a href="#learn-regex"><strong>the</strong></a> garage.
|
||||
</pre>
|
||||
|
||||
[Test the regular expression](https://regex101.com/r/5ljjgB/1)
|
||||
|
||||
<pre>
|
||||
"^(T|t)he" => <a href="#learn-regex"><strong>The</strong></a> car is parked in the garage.
|
||||
</pre>
|
||||
|
||||
[Test the regular expression](https://regex101.com/r/jXrKne/1)
|
||||
|
||||
### 2.8.2 Dollar
|
||||
|
||||
Dollar `$` symbol is used to check if matching character is the last character
|
||||
of the input string. For example, regular expression `(at\.)$` means: a
|
||||
lowercase character `a`, followed by lowercase character `t`, followed by a `.`
|
||||
character and the matcher must be end of the string.
|
||||
|
||||
<pre>
|
||||
"(at\.)" => The fat c<a href="#learn-regex"><strong>at.</strong></a> s<a href="#learn-regex"><strong>at.</strong></a> on the m<a href="#learn-regex"><strong>at.</strong></a>
|
||||
</pre>
|
||||
|
||||
[Test the regular expression](https://regex101.com/r/y4Au4D/1)
|
||||
|
||||
<pre>
|
||||
"(at\.)$" => The fat cat. sat. on the m<a href="#learn-regex"><strong>at.</strong></a>
|
||||
</pre>
|
||||
|
||||
[Test the regular expression](https://regex101.com/r/t0AkOd/1)
|
||||
|
||||
## 3. Shorthand Character Sets
|
||||
|
||||
Regular expression provides shorthands for the commonly used character sets,
|
||||
which offer convenient shorthands for commonly used regular expressions. The
|
||||
shorthand character sets are as follows:
|
||||
|
||||
|Shorthand|Description|
|
||||
|:----:|----|
|
||||
|.|Any character except new line|
|
||||
|\w|Matches alphanumeric characters: `[a-zA-Z0-9_]`|
|
||||
|\W|Matches non-alphanumeric characters: `[^\w]`|
|
||||
|\d|Matches digit: `[0-9]`|
|
||||
|\D|Matches non-digit: `[^\d]`|
|
||||
|\s|Matches whitespace character: `[\t\n\f\r\p{Z}]`|
|
||||
|\S|Matches non-whitespace character: `[^\s]`|
|
||||
|
||||
## 4. Lookaround
|
||||
|
||||
Lookbehind and lookahead (also called lookaround) are specific types of
|
||||
***non-capturing groups*** (Used to match the pattern but not included in matching
|
||||
list). Lookarounds are used when we have the condition that this pattern is
|
||||
preceded or followed by another certain pattern. For example, we want to get all
|
||||
numbers that are preceded by `$` character from the following input string
|
||||
`$4.44 and $10.88`. We will use following regular expression `(?<=\$)[0-9\.]*`
|
||||
which means: get all the numbers which contain `.` character and are preceded
|
||||
by `$` character. Following are the lookarounds that are used in regular
|
||||
expressions:
|
||||
|
||||
|Symbol|Description|
|
||||
|:----:|----|
|
||||
|?=|Positive Lookahead|
|
||||
|?!|Negative Lookahead|
|
||||
|?<=|Positive Lookbehind|
|
||||
|?<!|Negative Lookbehind|
|
||||
|
||||
### 4.1 Positive Lookahead
|
||||
|
||||
The positive lookahead asserts that the first part of the expression must be
|
||||
followed by the lookahead expression. The returned match only contains the text
|
||||
that is matched by the first part of the expression. To define a positive
|
||||
lookahead, parentheses are used. Within those parentheses, a question mark with
|
||||
equal sign is used like this: `(?=...)`. Lookahead expression is written after
|
||||
the equal sign inside parentheses. For example, the regular expression
|
||||
`(T|t)he(?=\sfat)` means: optionally match lowercase letter `t` or uppercase
|
||||
letter `T`, followed by letter `h`, followed by letter `e`. In parentheses we
|
||||
define positive lookahead which tells regular expression engine to match `The`
|
||||
or `the` which are followed by the word `fat`.
|
||||
|
||||
<pre>
|
||||
"(T|t)he(?=\sfat)" => <a href="#learn-regex"><strong>The</strong></a> fat cat sat on the mat.
|
||||
</pre>
|
||||
|
||||
[Test the regular expression](https://regex101.com/r/IDDARt/1)
|
||||
|
||||
### 4.2 Negative Lookahead
|
||||
|
||||
Negative lookahead is used when we need to get all matches from input string
|
||||
that are not followed by a pattern. Negative lookahead is defined same as we define
|
||||
positive lookahead but the only difference is instead of equal `=` character we
|
||||
use negation `!` character i.e. `(?!...)`. Let's take a look at the following
|
||||
regular expression `(T|t)he(?!\sfat)` which means: get all `The` or `the` words
|
||||
from input string that are not followed by the word `fat` precedes by a space
|
||||
character.
|
||||
|
||||
<pre>
|
||||
"(T|t)he(?!\sfat)" => The fat cat sat on <a href="#learn-regex"><strong>the</strong></a> mat.
|
||||
</pre>
|
||||
|
||||
[Test the regular expression](https://regex101.com/r/V32Npg/1)
|
||||
|
||||
### 4.3 Positive Lookbehind
|
||||
|
||||
Positive lookbehind is used to get all the matches that are preceded by a
|
||||
specific pattern. Positive lookbehind is denoted by `(?<=...)`. For example, the
|
||||
regular expression `(?<=(T|t)he\s)(fat|mat)` means: get all `fat` or `mat` words
|
||||
from input string that are after the word `The` or `the`.
|
||||
|
||||
<pre>
|
||||
"(?<=(T|t)he\s)(fat|mat)" => The <a href="#learn-regex"><strong>fat</strong></a> cat sat on the <a href="#learn-regex"><strong>mat</strong></a>.
|
||||
</pre>
|
||||
|
||||
[Test the regular expression](https://regex101.com/r/avH165/1)
|
||||
|
||||
### 4.4 Negative Lookbehind
|
||||
|
||||
Negative lookbehind is used to get all the matches that are not preceded by a
|
||||
specific pattern. Negative lookbehind is denoted by `(?<!...)`. For example, the
|
||||
regular expression `(?<!(T|t)he\s)(cat)` means: get all `cat` words from input
|
||||
string that are not after the word `The` or `the`.
|
||||
|
||||
<pre>
|
||||
"(?<!(T|t)he\s)(cat)" => The cat sat on <a href="#learn-regex"><strong>cat</strong></a>.
|
||||
</pre>
|
||||
|
||||
[Test the regular expression](https://regex101.com/r/8Efx5G/1)
|
||||
|
||||
## 5. Flags
|
||||
|
||||
Flags are also called modifiers because they modify the output of a regular
|
||||
expression. These flags can be used in any order or combination, and are an
|
||||
integral part of the RegExp.
|
||||
|
||||
|Flag|Description|
|
||||
|:----:|----|
|
||||
|i|Case insensitive: Sets matching to be case-insensitive.|
|
||||
|g|Global Search: Search for a pattern throughout the input string.|
|
||||
|m|Multiline: Anchor meta character works on each line.|
|
||||
|
||||
### 5.1 Case Insensitive
|
||||
|
||||
The `i` modifier is used to perform case-insensitive matching. For example, the
|
||||
regular expression `/The/gi` means: uppercase letter `T`, followed by lowercase
|
||||
character `h`, followed by character `e`. And at the end of regular expression
|
||||
the `i` flag tells the regular expression engine to ignore the case. As you can
|
||||
see we also provided `g` flag because we want to search for the pattern in the
|
||||
whole input string.
|
||||
|
||||
<pre>
|
||||
"The" => <a href="#learn-regex"><strong>The</strong></a> fat cat sat on the mat.
|
||||
</pre>
|
||||
|
||||
[Test the regular expression](https://regex101.com/r/dpQyf9/1)
|
||||
|
||||
<pre>
|
||||
"/The/gi" => <a href="#learn-regex"><strong>The</strong></a> fat cat sat on <a href="#learn-regex"><strong>the</strong></a> mat.
|
||||
</pre>
|
||||
|
||||
[Test the regular expression](https://regex101.com/r/ahfiuh/1)
|
||||
|
||||
### 5.2 Global search
|
||||
|
||||
The `g` modifier is used to perform a global match (find all matches rather than
|
||||
stopping after the first match). For example, the regular expression`/.(at)/g`
|
||||
means: any character except new line, followed by lowercase character `a`,
|
||||
followed by lowercase character `t`. Because we provided `g` flag at the end of
|
||||
the regular expression now it will find all matches in the input string, not just the first one (which is the default behavior).
|
||||
|
||||
<pre>
|
||||
"/.(at)/" => The <a href="#learn-regex"><strong>fat</strong></a> cat sat on the mat.
|
||||
</pre>
|
||||
|
||||
[Test the regular expression](https://regex101.com/r/jnk6gM/1)
|
||||
|
||||
<pre>
|
||||
"/.(at)/g" => The <a href="#learn-regex"><strong>fat</strong></a> <a href="#learn-regex"><strong>cat</strong></a> <a href="#learn-regex"><strong>sat</strong></a> on the <a href="#learn-regex"><strong>mat</strong></a>.
|
||||
</pre>
|
||||
|
||||
[Test the regular expression](https://regex101.com/r/dO1nef/1)
|
||||
|
||||
### 5.3 Multiline
|
||||
|
||||
The `m` modifier is used to perform a multi-line match. As we discussed earlier
|
||||
anchors `(^, $)` are used to check if pattern is the beginning of the input or
|
||||
end of the input string. But if we want that anchors works on each line we use
|
||||
`m` flag. For example, the regular expression `/at(.)?$/gm` means: lowercase
|
||||
character `a`, followed by lowercase character `t`, optionally anything except
|
||||
new line. And because of `m` flag now regular expression engine matches pattern
|
||||
at the end of each line in a string.
|
||||
|
||||
<pre>
|
||||
"/.at(.)?$/" => The fat
|
||||
cat sat
|
||||
on the <a href="#learn-regex"><strong>mat.</strong></a>
|
||||
</pre>
|
||||
|
||||
[Test the regular expression](https://regex101.com/r/hoGMkP/1)
|
||||
|
||||
<pre>
|
||||
"/.at(.)?$/gm" => The <a href="#learn-regex"><strong>fat</strong></a>
|
||||
cat <a href="#learn-regex"><strong>sat</strong></a>
|
||||
on the <a href="#learn-regex"><strong>mat.</strong></a>
|
||||
</pre>
|
||||
|
||||
[Test the regular expression](https://regex101.com/r/E88WE2/1)
|
||||
|
||||
## 6. Greedy vs lazy matching
|
||||
By default regex will do greedy matching , means it will match as long as
|
||||
possible. we can use `?` to match in lazy way means as short as possible
|
||||
|
||||
<pre>
|
||||
"/(.*at)/" => <a href="#learn-regex"><strong>The fat cat sat on the mat</strong></a>. </pre>
|
||||
|
||||
|
||||
[Test the regular expression](https://regex101.com/r/AyAdgJ/1)
|
||||
|
||||
<pre>
|
||||
"/(.*?at)/" => <a href="#learn-regex"><strong>The fat</strong></a> cat sat on the mat. </pre>
|
||||
|
||||
|
||||
[Test the regular expression](https://regex101.com/r/AyAdgJ/2)
|
||||
|
||||
|
||||
## Contribution
|
||||
|
||||
* Open pull request with improvements
|
||||
* Discuss ideas in issues
|
||||
* Spread the word
|
||||
* Reach out with any feedback [](https://twitter.com/ziishaned)
|
||||
|
||||
## License
|
||||
|
||||
MIT © [Zeeshan Ahmad](https://twitter.com/ziishaned)
|
||||
@ -30,6 +30,7 @@
|
||||
* [Polish](../translations/README-pl.md)
|
||||
* [Русский](../translations/README-ru.md)
|
||||
* [Tiếng Việt](../translations/README-vn.md)
|
||||
* [فارسی](../translations/README-fa.md)
|
||||
|
||||
## Qu'est-ce qu'une expression régulière?
|
||||
|
||||
|
||||
@ -30,6 +30,7 @@
|
||||
* [Polish](../translations/README-pl.md)
|
||||
* [Русский](../translations/README-ru.md)
|
||||
* [Tiếng Việt](../translations/README-vn.md)
|
||||
* [فارسی](../translations/README-fa.md)
|
||||
|
||||
## Τι είναι μια Κανονική Έκφραση (Regular Expression);
|
||||
|
||||
|
||||
@ -30,6 +30,7 @@
|
||||
* [Polish](../translations/README-pl.md)
|
||||
* [Русский](../translations/README-ru.md)
|
||||
* [Tiếng Việt](../translations/README-vn.md)
|
||||
* [فارسی](../translations/README-fa.md)
|
||||
|
||||
## Mi az a reguláris kifejezés?
|
||||
|
||||
|
||||
@ -30,6 +30,7 @@
|
||||
* [Polish](../translations/README-pl.md)
|
||||
* [Русский](../translations/README-ru.md)
|
||||
* [Tiếng Việt](../translations/README-vn.md)
|
||||
* [فارسی](../translations/README-fa.md)
|
||||
|
||||
## 正規表現とは
|
||||
|
||||
|
||||
@ -30,6 +30,7 @@
|
||||
* [Polish](../translations/README-pl.md)
|
||||
* [Русский](../translations/README-ru.md)
|
||||
* [Tiếng Việt](../translations/README-vn.md)
|
||||
* [فارسی](../translations/README-fa.md)
|
||||
|
||||
## 정규표현식이란 무엇인가?
|
||||
|
||||
@ -76,7 +77,7 @@
|
||||
- [대소문자 구분없음](#51-대소문자-구분없음)
|
||||
- [전체 검색](#52-전체-검색)
|
||||
- [멀티 라인](#53-멀티-라인)
|
||||
- [탐욕적 vs 게으른 매칭](#6-탐욕적-vs-게으른 매칭)
|
||||
- [탐욕적 vs 게으른 매칭](#6-탐욕적-vs-게으른-매칭)
|
||||
|
||||
## 1. 기본 매쳐
|
||||
|
||||
|
||||
@ -30,6 +30,7 @@
|
||||
* [Polish](../translations/README-pl.md)
|
||||
* [Русский](../translations/README-ru.md)
|
||||
* [Tiếng Việt](../translations/README-vn.md)
|
||||
* [فارسی](../translations/README-fa.md)
|
||||
|
||||
## Co to jest wyrażenie regularne?
|
||||
|
||||
|
||||
@ -30,12 +30,13 @@
|
||||
* [Polish](../translations/README-pl.md)
|
||||
* [Русский](../translations/README-ru.md)
|
||||
* [Tiếng Việt](../translations/README-vn.md)
|
||||
* [فارسی](../translations/README-fa.md)
|
||||
|
||||
## O que é uma Expressão Regular?
|
||||
|
||||
> Expressão Regular é um grupo de caracteres ou símbolos utilizado para encontrar um padrão específico a partir de um texto.
|
||||
|
||||
Uma expressão regular é um padrão que é comparado com uma cadeia de caracteres da esquerda para a direita. A expressão "Expressão regular" é longa e difícil de falar; você geralmente vai encontrar o termo abreviado como "regex" ou "regexp". Expressões regulares são usadas para substituir um texto dentro de uma string, validar formulários, extrair uma parte de uma string baseada em um padrão encontrado e muito mais.
|
||||
Uma expressão regular é um padrão que é comparado com uma cadeia de caracteres da esquerda para a direita. O termo "Expressão regular" é longo e difícil de falar; você geralmente vai encontrar o termo abreviado como "regex" ou "regexp". Expressões regulares são usadas para substituir um texto dentro de uma string, validar formulários, extrair uma parte de uma string baseada em um padrão encontrado e muito mais.
|
||||
|
||||
Imagine que você está escrevendo uma aplicação e quer colocar regras para quando um usuário escolher seu username. Nós queremos permitir que o username contenha letras, números, underlines e hífens. Nós também queremos limitar o número de caracteres para não ficar muito feio. Então usamos a seguinte expressão regular para validar o username:
|
||||
|
||||
@ -306,7 +307,7 @@ As expressões regulares fornecem abreviações para conjuntos de caracteres com
|
||||
|
||||
## 4. Olhar ao Redor
|
||||
|
||||
Lookbehind (olhar atrás) e lookahead (olhar à frente), às vezes conhecidos como lookarounds (olhar ao redor), são tipos específicos de ***grupo de não captura*** (utilizado para encontrar um padrão, mas não incluí-lo na lista de ocorrêncoas). Lookarounds são usados quando temos a condição de que determinado padrão seja precedido ou seguido de outro padrão. Por exemplo, queremos capturar todos os números precedidos do caractere `$` da seguinte string de entrada: `$4.44 and $10.88`. Vamos usar a seguinte expressão regular `(?<=\$)[0-9\.]*` que significa: procure todos os números que contêm o caractere `.` e são precedidos pelo caractere `$`. A seguir estão os lookarounds que são utilizados em expressões regulares:
|
||||
Lookbehind (olhar atrás) e lookahead (olhar à frente), às vezes conhecidos como lookarounds (olhar ao redor), são tipos específicos de ***grupo de não captura*** (utilizado para encontrar um padrão, mas não incluí-lo na lista de ocorrências). Lookarounds são usados quando temos a condição de que determinado padrão seja precedido ou seguido de outro padrão. Por exemplo, queremos capturar todos os números precedidos do caractere `$` da seguinte string de entrada: `$4.44 and $10.88`. Vamos usar a seguinte expressão regular `(?<=\$)[0-9\.]*` que significa: procure todos os números que contêm o caractere `.` e são precedidos pelo caractere `$`. A seguir estão os lookarounds que são utilizados em expressões regulares:
|
||||
|
||||
|Símbolo|Descrição|
|
||||
|:----:|----|
|
||||
@ -317,7 +318,7 @@ Lookbehind (olhar atrás) e lookahead (olhar à frente), às vezes conhecidos co
|
||||
|
||||
### 4.1 Lookahead Positivo
|
||||
|
||||
O lookahead positivo impõe que a primeira parte da expressão deve ser seguida pela expressão lookahead. A combinação retornada contém apenas o texto que encontrado pela primeira parte da expressão. Para definir um lookahead positivo, deve-se usar parênteses. Dentro desses parênteses, é usado um ponto de interrogação seguido de um sinal de igual, dessa forma: `(?=...)`. Expressões lookahead são escritas depois do sinal de igual dentro do parênteses. Por exemplo, a expressão regular `[T|t]he(?=\sfat)` significa: encontre a letra minúscula `t` ou a letra maiúscula `T`, seguida da letra `h`, seguida da letra `e`. Entre parênteses, nós definimos o lookahead positivo que diz para o motor de expressões regulares para encontrar `The` ou `the` que são seguidos pela palavra `fat`.
|
||||
O lookahead positivo impõe que a primeira parte da expressão deve ser seguida pela expressão lookahead. A combinação retornada contém apenas o texto que é encontrado pela primeira parte da expressão. Para definir um lookahead positivo, deve-se usar parênteses. Dentro desses parênteses, é usado um ponto de interrogação seguido de um sinal de igual, dessa forma: `(?=...)`. Expressões lookahead são escritas depois do sinal de igual dentro do parênteses. Por exemplo, a expressão regular `[T|t]he(?=\sfat)` significa: encontre a letra minúscula `t` ou a letra maiúscula `T`, seguida da letra `h`, seguida da letra `e`. Entre parênteses, nós definimos o lookahead positivo que diz para o motor de expressões regulares para encontrar `The` ou `the` que são seguidos pela palavra `fat`.
|
||||
|
||||
<pre>
|
||||
"[T|t]he(?=\sfat)" => <a href="#learn-regex"><strong>The</strong></a> fat cat sat on the mat.
|
||||
@ -399,7 +400,7 @@ O modificador `g` é usado para realizar uma busca global (encontrar todas as oc
|
||||
|
||||
### 5.3 Multilinhas
|
||||
|
||||
O modificador `m` é usado para realizar uma busca em várias linhas. Como falamos antes, as âncoras `(^, $)` são usadas para verificar se o padrão está no início ou no final da string de entrada. Mas se queremos que as âncoras funcionem em cada uma das linhas, usamos a flag `m`. Por exemplo, a expressão regular `/at(.)?$/gm` significa: o caractere minúsculo `a`, seguido do caractere minúsculo `t`, opcionalmente seguido por qualquer caractere, exceto nova linha. E por causa da flag `m`, agora o motor de expressões regulares encontra o padrão no final de cada uma das linhas da string.
|
||||
O modificador `m` é usado para realizar uma busca em várias linhas. Como falamos antes, as âncoras `(^, $)` são usadas para verificar se o padrão está no início ou no final da string de entrada respectivamente. Mas se queremos que as âncoras funcionem em cada uma das linhas, usamos a flag `m`. Por exemplo, a expressão regular `/.at(.)?$/gm` significa: o caractere minúsculo `a`, seguido do caractere minúsculo `t`, opcionalmente seguido por qualquer caractere, exceto nova linha. E por causa da flag `m`, agora o motor de expressões regulares encontra o padrão no final de cada uma das linhas da string.
|
||||
|
||||
<pre>
|
||||
"/.at(.)?$/" => The fat
|
||||
|
||||
@ -29,6 +29,7 @@
|
||||
* [Polish](../translations/README-pl.md)
|
||||
* [Русский](../translations/README-ru.md)
|
||||
* [Tiếng Việt](../translations/README-vn.md)
|
||||
* [فارسی](../translations/README-fa.md)
|
||||
|
||||
## Что такое Регулярное выражение?
|
||||
|
||||
@ -170,7 +171,7 @@
|
||||
|
||||
## 2.3 Повторения
|
||||
|
||||
Символы `+`, `*` или `?` используются для обозначения того, как сколько раз появляется какой-либо подшаблон.
|
||||
Символы `+`, `*` или `?` используются для обозначения того сколько раз появляется какой-либо подшаблон.
|
||||
Данные метасимволы могут вести себя по-разному, в зависимости от ситуации.
|
||||
|
||||
### 2.3.1 Звёздочка
|
||||
@ -289,7 +290,7 @@
|
||||
|
||||
[Запустить регулярное выражение](https://regex101.com/r/Rm7Me8/1)
|
||||
|
||||
Не запоминающиеся группы пригодиться, когда они используются в функциях поиска и замены
|
||||
Незапоминающиеся группы могут пригодиться, когда они используются в функциях поиска и замены,
|
||||
или в сочетании со скобочными группами, например, для предпросмотра при создании скобочной группы или другого вида выходных данных,
|
||||
смотрите также [4. Опережающие и ретроспективные проверки](#4-опережающие-и-ретроспективные-проверки).
|
||||
|
||||
@ -392,8 +393,8 @@
|
||||
|
||||
Опережающие и ретроспективные проверки (в английской литературе lookbehind, lookahead) это особый вид
|
||||
***не запоминающих скобочных групп*** (находящих совпадения, но не добавляющих в массив).
|
||||
Данные проверки используются, мы знаем, что шаблон предшествует или сопровождается другим шаблоном.
|
||||
Например, мы хотим получить получить цену в долларах `$`, из следующей входной строки
|
||||
Данные проверки используются когда мы знаем, что шаблон предшествует или сопровождается другим шаблоном.
|
||||
Например, мы хотим получить цену в долларах `$` из следующей входной строки
|
||||
`$4.44 and $10.88`. Для этого используем следующее регулярное выражение `(?<=\$)[0-9\.]*`, означающее
|
||||
получение всех дробных (с точкой `.`) цифр, которым предшествует знак доллара `$`. Существуют
|
||||
следующие виды проверок:
|
||||
|
||||
@ -30,6 +30,7 @@
|
||||
* [Polish](../translations/README-pl.md)
|
||||
* [Русский](../translations/README-ru.md)
|
||||
* [Tiếng Việt](../translations/README-vn.md)
|
||||
* [فارسی](../translations/README-fa.md)
|
||||
|
||||
## Düzenli İfade Nedir?
|
||||
|
||||
|
||||
@ -31,6 +31,7 @@
|
||||
* [Polish](../translations/README-pl.md)
|
||||
* [Русский](../translations/README-ru.md)
|
||||
* [Tiếng Việt](../translations/README-vn.md)
|
||||
* [فارسی](../translations/README-fa.md)
|
||||
|
||||
|
||||
## Biểu thức chính quy là gì?
|
||||
|
||||
@ -29,6 +29,7 @@
|
||||
* [Polish](../translations/README-pl.md)
|
||||
* [Русский](../translations/README-ru.md)
|
||||
* [Tiếng Việt](../translations/README-vn.md)
|
||||
* [فارسی](../translations/README-fa.md)
|
||||
|
||||
## 什么是正则表达式?
|
||||
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user