diff --git a/README-pt_BR.md b/README-pt_BR.md new file mode 100644 index 0000000..d26324e --- /dev/null +++ b/README-pt_BR.md @@ -0,0 +1,505 @@ +
+

+Learn Regex +


+ +## Traduções: + +* [English](README.md) +* [中文版](README-cn.md) +* [日本語](README-ja.md) +* [Português do Brasil](README-pt_BR.md) + +## O que é uma Expressão Regular? + +> Expressão Regular é um grupo de caracteres ou símbolos que é usado para encontrar um padrão específico a partir de um texto. + +A regular expression is a pattern that is matched against a subject string from left to right. The word "Regular expression" is a +mouthful, you will usually find the term abbreviated as "regex" or "regexp". Regular expression is used for replacing a text within +a string, validating form, extract a substring from a string based upon a pattern match, and so much more. + +Imagine you are writing an application and you want to set the rules for when a user chooses their username. We want to +allow the username to contain letters, numbers, underscores and hyphens. We also want to limit the number of +characters in username so it does not look ugly. We use the following regular expression to validate a username: +

+

+Regular expression +

+ +Above regular expression can accept the strings `john_doe`, `jo-hn_doe` and `john12_as`. It does not match `Jo` because that string +contains uppercase letter and also it is too short. + +## Table of Contents + +- [Basic Matchers](#1-basic-matchers) +- [Meta character](#2-meta-characters) + - [Full stop](#21-full-stop) + - [Character set](#22-character-set) + - [Negated character set](#221-negated-character-set) + - [Repetitions](#23-repetitions) + - [The Star](#231-the-star) + - [The Plus](#232-the-plus) + - [The Question Mark](#233-the-question-mark) + - [Braces](#24-braces) + - [Character Group](#25-character-group) + - [Alternation](#26-alternation) + - [Escaping special character](#27-escaping-special-character) + - [Anchors](#28-anchors) + - [Caret](#281-caret) + - [Dollar](#282-dollar) +- [Shorthand Character Sets](#3-shorthand-character-sets) +- [Lookaround](#4-lookaround) + - [Positive Lookahead](#41-positive-lookahead) + - [Negative Lookahead](#42-negative-lookahead) + - [Positive Lookbehind](#43-positive-lookbehind) + - [Negative Lookbehind](#44-negative-lookbehind) +- [Flags](#5-flags) + - [Case Insensitive](#51-case-insensitive) + - [Global search](#52-global-search) + - [Multiline](#53-multiline) +- [Bonus](#bonus) + +## 1. Basic Matchers + +A regular expression is just a pattern of characters that we use to perform search in a text. For example, the regular expression +`the` means: the letter `t`, followed by the letter `h`, followed by the letter `e`. + +
+"the" => The fat cat sat on the mat.
+
+ +[Test the regular expression](https://regex101.com/r/dmRygT/1) + +The regular expression `123` matches the string `123`. The regular expression is matched against an input string by comparing each +character in the regular expression to each character in the input string, one after another. Regular expressions are normally +case-sensitive so the regular expression `The` would not match the string `the`. + +
+"The" => The fat cat sat on the mat.
+
+ +[Test the regular expression](https://regex101.com/r/1paXsy/1) + +## 2. Meta Characters + +Meta characters are the building blocks of the regular expressions. Meta characters do not stand for themselves but instead are +interpreted in some special way. Some meta characters have a special meaning and are written inside square brackets. +The meta characters are as follows: + +|Meta character|Description| +|:----:|----| +|.|Period matches any single character except a line break.| +|[ ]|Character class. Matches any character contained between the square brackets.| +|[^ ]|Negated character class. Matches any character that is not contained between the square brackets| +|*|Matches 0 or more repetitions of the preceding symbol.| +|+|Matches 1 or more repetitions of the preceding symbol. +|?|Makes the preceding symbol optional.| +|{n,m}|Braces. Matches at least "n" but not more than "m" repetitions of the preceding symbol.| +|(xyz)|Character group. Matches the characters xyz in that exact order.| +|||Alternation. Matches either the characters before or the characters after the symbol.| +|\|Escapes the next character. This allows you to match reserved characters [ ] ( ) { } . * + ? ^ $ \ || +|^|Matches the beginning of the input.| +|$|Matches the end of the input.| + +## 2.1 Full stop + +Full stop `.` is the simplest example of meta character. The meta character `.` matches any single character. It will not match return +or newline characters. For example, the regular expression `.ar` means: any character, followed by the letter `a`, followed by the +letter `r`. + +
+".ar" => The car parked in the garage.
+
+ +[Test the regular expression](https://regex101.com/r/xc9GkU/1) + +## 2.2 Character set + +Character sets are also called character class. Square brackets are used to specify character sets. Use a hyphen inside a character set to +specify the characters' range. The order of the character range inside square brackets doesn't matter. For example, the regular +expression `[Tt]he` means: an uppercase `T` or lowercase `t`, followed by the letter `h`, followed by the letter `e`. + +
+"[Tt]he" => The car parked in the garage.
+
+ +[Test the regular expression](https://regex101.com/r/2ITLQ4/1) + +A period inside a character set, however, means a literal period. The regular expression `ar[.]` means: a lowercase character `a`, followed by letter `r`, followed by a period `.` character. + +
+"ar[.]" => A garage is a good place to park a car.
+
+ +[Test the regular expression](https://regex101.com/r/wL3xtE/1) + +### 2.2.1 Negated character set + +In general, the caret symbol represents the start of the string, but when it is typed after the opening square bracket it negates the +character set. For example, the regular expression `[^c]ar` means: any character except `c`, followed by the character `a`, followed by +the letter `r`. + +
+"[^c]ar" => The car parked in the garage.
+
+ +[Test the regular expression](https://regex101.com/r/nNNlq3/1) + +## 2.3 Repetitions + +Following meta characters `+`, `*` or `?` are used to specify how many times a subpattern can occur. These meta characters act +differently in different situations. + +### 2.3.1 The Star + +The symbol `*` matches zero or more repetitions of the preceding matcher. The regular expression `a*` means: zero or more repetitions +of preceding lowercase character `a`. But if it appears after a character set or class then it finds the repetitions of the whole +character set. For example, the regular expression `[a-z]*` means: any number of lowercase letters in a row. + +
+"[a-z]*" => The car parked in the garage #21.
+
+ +[Test the regular expression](https://regex101.com/r/7m8me5/1) + +The `*` symbol can be used with the meta character `.` to match any string of characters `.*`. The `*` symbol can be used with the +whitespace character `\s` to match a string of whitespace characters. For example, the expression `\s*cat\s*` means: zero or more +spaces, followed by lowercase character `c`, followed by lowercase character `a`, followed by lowercase character `t`, followed by +zero or more spaces. + +
+"\s*cat\s*" => The fat cat sat on the concatenation.
+
+ +[Test the regular expression](https://regex101.com/r/gGrwuz/1) + +### 2.3.2 The Plus + +The symbol `+` matches one or more repetitions of the preceding character. For example, the regular expression `c.+t` means: lowercase +letter `c`, followed by at least one character, followed by the lowercase character `t`. + +
+"c.+t" => The fat cat sat on the mat.
+
+ +[Test the regular expression](https://regex101.com/r/Dzf9Aa/1) + +### 2.3.3 The Question Mark + +In regular expression the meta character `?` makes the preceding character optional. This symbol matches zero or one instance of +the preceding character. For example, the regular expression `[T]?he` means: Optional the uppercase letter `T`, followed by the lowercase +character `h`, followed by the lowercase character `e`. + +
+"[T]he" => The car is parked in the garage.
+
+ +[Test the regular expression](https://regex101.com/r/cIg9zm/1) + +
+"[T]?he" => The car is parked in the garage.
+
+ +[Test the regular expression](https://regex101.com/r/kPpO2x/1) + +## 2.4 Braces + +In regular expression braces that are also called quantifiers are used to specify the number of times that a +character or a group of characters can be repeated. For example, the regular expression `[0-9]{2,3}` means: Match at least 2 digits but not more than 3 ( +characters in the range of 0 to 9). + +
+"[0-9]{2,3}" => The number was 9.9997 but we rounded it off to 10.0.
+
+ +[Test the regular expression](https://regex101.com/r/juM86s/1) + +We can leave out the second number. For example, the regular expression `[0-9]{2,}` means: Match 2 or more digits. If we also remove +the comma the regular expression `[0-9]{3}` means: Match exactly 3 digits. + +
+"[0-9]{2,}" => The number was 9.9997 but we rounded it off to 10.0.
+
+ +[Test the regular expression](https://regex101.com/r/Gdy4w5/1) + +
+"[0-9]{3}" => The number was 9.9997 but we rounded it off to 10.0.
+
+ +[Test the regular expression](https://regex101.com/r/Sivu30/1) + +## 2.5 Character Group + +Character group is a group of sub-patterns that is written inside Parentheses `(...)`. As we discussed before that in regular expression +if we put a quantifier after a character then it will repeat the preceding character. But if we put quantifier after a character group then +it repeats the whole character group. For example, the regular expression `(ab)*` matches zero or more repetitions of the character "ab". +We can also use the alternation `|` meta character inside character group. For example, the regular expression `(c|g|p)ar` means: lowercase character `c`, +`g` or `p`, followed by character `a`, followed by character `r`. + +
+"(c|g|p)ar" => The car is parked in the garage.
+
+ +[Test the regular expression](https://regex101.com/r/tUxrBG/1) + +## 2.6 Alternation + +In regular expression Vertical bar `|` is used to define alternation. Alternation is like a condition between multiple expressions. Now, +you may be thinking that character set and alternation works the same way. But the big difference between character set and alternation +is that character set works on character level but alternation works on expression level. For example, the regular expression +`(T|t)he|car` means: uppercase character `T` or lowercase `t`, followed by lowercase character `h`, followed by lowercase character `e` +or lowercase character `c`, followed by lowercase character `a`, followed by lowercase character `r`. + +
+"(T|t)he|car" => The car is parked in the garage.
+
+ +[Test the regular expression](https://regex101.com/r/fBXyX0/1) + +## 2.7 Escaping special character + +Backslash `\` is used in regular expression to escape the next character. This allows to to specify a symbol as a matching character +including reserved characters `{ } [ ] / \ + * . $ ^ | ?`. To use a special character as a matching character prepend `\` before it. +For example, the regular expression `.` is used to match any character except newline. Now to match `.` in an input string the regular +expression `(f|c|m)at\.?` means: lowercase letter `f`, `c` or `m`, followed by lowercase character `a`, followed by lowercase letter +`t`, followed by optional `.` character. + +
+"(f|c|m)at\.?" => The fat cat sat on the mat.
+
+ +[Test the regular expression](https://regex101.com/r/DOc5Nu/1) + +## 2.8 Anchors + +In regular expressions, we use anchors to check if the matching symbol is the starting symbol or ending symbol of the +input string. Anchors are of two types: First type is Caret `^` that check if the matching character is the start +character of the input and the second type is Dollar `$` that checks if matching character is the last character of the +input string. + +### 2.8.1 Caret + +Caret `^` symbol is used to check if matching character is the first character of the input string. If we apply the following regular +expression `^a` (if a is the starting symbol) to input string `abc` it matches `a`. But if we apply regular expression `^b` on above +input string it does not match anything. Because in input string `abc` "b" is not the starting symbol. Let's take a look at another +regular expression `^(T|t)he` which means: uppercase character `T` or lowercase character `t` is the start symbol of the input string, +followed by lowercase character `h`, followed by lowercase character `e`. + +
+"(T|t)he" => The car is parked in the garage.
+
+ +[Test the regular expression](https://regex101.com/r/5ljjgB/1) + +
+"^(T|t)he" => The car is parked in the garage.
+
+ +[Test the regular expression](https://regex101.com/r/jXrKne/1) + +### 2.8.2 Dollar + +Dollar `$` symbol is used to check if matching character is the last character of the input string. For example, regular expression +`(at\.)$` means: a lowercase character `a`, followed by lowercase character `t`, followed by a `.` character and the matcher +must be end of the string. + +
+"(at\.)" => The fat cat. sat. on the mat.
+
+ +[Test the regular expression](https://regex101.com/r/y4Au4D/1) + +
+"(at\.)$" => The fat cat. sat. on the mat.
+
+ +[Test the regular expression](https://regex101.com/r/t0AkOd/1) + +## 3. Shorthand Character Sets + +Regular expression provides shorthands for the commonly used character sets, which offer convenient shorthands for commonly used +regular expressions. The shorthand character sets are as follows: + +|Shorthand|Description| +|:----:|----| +|.|Any character except new line| +|\w|Matches alphanumeric characters: `[a-zA-Z0-9_]`| +|\W|Matches non-alphanumeric characters: `[^\w]`| +|\d|Matches digit: `[0-9]`| +|\D|Matches non-digit: `[^\d]`| +|\s|Matches whitespace character: `[\t\n\f\r\p{Z}]`| +|\S|Matches non-whitespace character: `[^\s]`| + +## 4. Lookaround + +Lookbehind and lookahead sometimes known as lookaround are specific type of ***non-capturing group*** (Use to match the pattern but not +included in matching list). Lookaheads are used when we have the condition that this pattern is preceded or followed by another certain +pattern. For example, we want to get all numbers that are preceded by `$` character from the following input string `$4.44 and $10.88`. +We will use following regular expression `(?<=\$)[0-9\.]*` which means: get all the numbers which contain `.` character and are preceded +by `$` character. Following are the lookarounds that are used in regular expressions: + +|Symbol|Description| +|:----:|----| +|?=|Positive Lookahead| +|?!|Negative Lookahead| +|?<=|Positive Lookbehind| +|? +"[T|t]he(?=\sfat)" => The fat cat sat on the mat. + + +[Test the regular expression](https://regex101.com/r/IDDARt/1) + +### 4.2 Negative Lookahead + +Negative lookahead is used when we need to get all matches from input string that are not followed by a pattern. Negative lookahead +defined same as we define positive lookahead but the only difference is instead of equal `=` character we use negation `!` character +i.e. `(?!...)`. Let's take a look at the following regular expression `[T|t]he(?!\sfat)` which means: get all `The` or `the` words from +input string that are not followed by the word `fat` precedes by a space character. + +
+"[T|t]he(?!\sfat)" => The fat cat sat on the mat.
+
+ +[Test the regular expression](https://regex101.com/r/V32Npg/1) + +### 4.3 Positive Lookbehind + +Positive lookbehind is used to get all the matches that are preceded by a specific pattern. Positive lookbehind is denoted by +`(?<=...)`. For example, the regular expression `(?<=[T|t]he\s)(fat|mat)` means: get all `fat` or `mat` words from input string that +are after the word `The` or `the`. + +
+"(?<=[T|t]he\s)(fat|mat)" => The fat cat sat on the mat.
+
+ +[Test the regular expression](https://regex101.com/r/avH165/1) + +### 4.4 Negative Lookbehind + +Negative lookbehind is used to get all the matches that are not preceded by a specific pattern. Negative lookbehind is denoted by +`(? +"(?<![T|t]he\s)(cat)" => The cat sat on cat. + + +[Test the regular expression](https://regex101.com/r/8Efx5G/1) + +## 5. Flags + +Flags are also called modifiers because they modify the output of a regular expression. These flags can be used in any order or +combination, and are an integral part of the RegExp. + +|Flag|Description| +|:----:|----| +|i|Case insensitive: Sets matching to be case-insensitive.| +|g|Global Search: Search for a pattern throughout the input string.| +|m|Multiline: Anchor meta character works on each line.| + +### 5.1 Case Insensitive + +The `i` modifier is used to perform case-insensitive matching. For example, the regular expression `/The/gi` means: uppercase letter +`T`, followed by lowercase character `h`, followed by character `e`. And at the end of regular expression the `i` flag tells the +regular expression engine to ignore the case. As you can see we also provided `g` flag because we want to search for the pattern in +the whole input string. + +
+"The" => The fat cat sat on the mat.
+
+ +[Test the regular expression](https://regex101.com/r/dpQyf9/1) + +
+"/The/gi" => The fat cat sat on the mat.
+
+ +[Test the regular expression](https://regex101.com/r/ahfiuh/1) + +### 5.2 Global search + +The `g` modifier is used to perform a global match (find all matches rather than stopping after the first match). For example, the +regular expression`/.(at)/g` means: any character except new line, followed by lowercase character `a`, followed by lowercase +character `t`. Because we provided `g` flag at the end of the regular expression now it will find every matches from whole input +string. + +
+"/.(at)/" => The fat cat sat on the mat.
+
+ +[Test the regular expression](https://regex101.com/r/jnk6gM/1) + +
+"/.(at)/g" => The fat cat sat on the mat.
+
+ +[Test the regular expression](https://regex101.com/r/dO1nef/1) + +### 5.3 Multiline + +The `m` modifier is used to perform a multi-line match. As we discussed earlier anchors `(^, $)` are used to check if pattern is +the beginning of the input or end of the input string. But if we want that anchors works on each line we use `m` flag. For example, the +regular expression `/at(.)?$/gm` means: lowercase character `a`, followed by lowercase character `t`, optionally anything except new +line. And because of `m` flag now regular expression engine matches pattern at the end of each line in a string. + +
+"/.at(.)?$/" => The fat
+                cat sat
+                on the mat.
+
+ +[Test the regular expression](https://regex101.com/r/hoGMkP/1) + +
+"/.at(.)?$/gm" => The fat
+                  cat sat
+                  on the mat.
+
+ +[Test the regular expression](https://regex101.com/r/E88WE2/1) + +## Bonus + +* *Positive Integers*: `^\d+$` +* *Negative Integers*: `^-\d+$` +* *US Phone Number*: `^+?[\d\s]{3,}$` +* *US Phone with code*: `^+?[\d\s]+(?[\d\s]{10,}$` +* *Integers*: `^-?\d+$` +* *Username*: `^[\w.]{4,16}$` +* *Alpha-numeric characters*: `^[a-zA-Z0-9]*$` +* *Alpha-numeric characters with spaces*: `^[a-zA-Z0-9 ]*$` +* *Password*: `^(?=^.{6,}$)((?=.*[A-Za-z0-9])(?=.*[A-Z])(?=.*[a-z]))^.*$` +* *email*: `^([a-zA-Z0-9._%-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,})*$` +* *IPv4 address*: `^((?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))*$` +* *Lowercase letters only*: `^([a-z])*$` +* *Uppercase letters only*: `^([A-Z])*$` +* *URL*: `^(((http|https|ftp):\/\/)?([[a-zA-Z0-9]\-\.])+(\.)([[a-zA-Z0-9]]){2,4}([[a-zA-Z0-9]\/+=%&_\.~?\-]*))*$` +* *VISA credit card numbers*: `^(4[0-9]{12}(?:[0-9]{3})?)*$` +* *Date (DD/MM/YYYY)*: `^(0?[1-9]|[12][0-9]|3[01])[- /.](0?[1-9]|1[012])[- /.](19|20)?[0-9]{2}$` +* *Date (MM/DD/YYYY)*: `^(0?[1-9]|1[012])[- /.](0?[1-9]|[12][0-9]|3[01])[- /.](19|20)?[0-9]{2}$` +* *Date (YYYY/MM/DD)*: `^(19|20)?[0-9]{2}[- /.](0?[1-9]|1[012])[- /.](0?[1-9]|[12][0-9]|3[01])$` +* *MasterCard credit card numbers*: `^(5[1-5][0-9]{14})*$` +* *Hashtags*: Including hashtags with preceding text (abc123#xyz456) or containing white spaces within square brackets (#[foo bar]) : `\S*#(?:\[[^\]]+\]|\S+)` +* *@mentions*: `\B@[a-z0-9_-]+` +## Contribution + +* Report issues +* Open pull request with improvements +* Spread the word +* Reach out to me directly at ziishaned@gmail.com or [![Twitter URL](https://img.shields.io/twitter/url/https/twitter.com/ziishaned.svg?style=social&label=Follow%20%40ziishaned)](https://twitter.com/ziishaned) + +## License + +MIT © [Zeeshan Ahmed](mailto:ziishaned@gmail.com)