From 18c6bf5a984bdc73126201858664858a7aa25cd6 Mon Sep 17 00:00:00 2001
From: Tom McAndrew <42588609+tommcandrew@users.noreply.github.com>
Date: Tue, 10 Mar 2020 17:11:31 +0000
Subject: [PATCH] Improve grammar and punctuation
---
README.md | 300 +++++++++++++++++++++++++++---------------------------
1 file changed, 148 insertions(+), 152 deletions(-)
diff --git a/README.md b/README.md
index 0546c6d..6d534fb 100644
--- a/README.md
+++ b/README.md
@@ -33,62 +33,63 @@
## What is Regular Expression?
-> Regular expression is a group of characters or symbols which is used to find a specific pattern from a text.
+> A regular expression is a group of characters or symbols which is used to find a specific pattern in a text.
A regular expression is a pattern that is matched against a subject string from
-left to right. Regular expression is used for replacing a text within a string,
-validating form, extract a substring from a string based upon a pattern match,
-and so much more. The word "Regular expression" is a mouthful, so you will usually
-find the term abbreviated as "regex" or "regexp".
+left to right. Regular expressions are used to replace text within a string,
+validating forms, extracting a substring from a string based on a pattern match,
+and so much more. The term "regular expression" is a mouthful, so you will usually
+find the term abbreviated to "regex" or "regexp".
Imagine you are writing an application and you want to set the rules for when a
user chooses their username. We want to allow the username to contain letters,
numbers, underscores and hyphens. We also want to limit the number of characters
-in username so it does not look ugly. We use the following regular expression to
-validate a username:
+in the username so it does not look ugly. We can use the following regular expression to
+validate the username:
@@ -169,7 +170,7 @@ followed by a period `.` character. [Test the regular expression](https://regex101.com/r/wL3xtE/1) -### 2.2.1 Negated character set +### 2.2.1 Negated Character Sets In general, the caret symbol represents the start of the string, but when it is typed after the opening square bracket it negates the character set. For @@ -184,14 +185,14 @@ followed by the character `a`, followed by the letter `r`. ## 2.3 Repetitions -Following meta characters `+`, `*` or `?` are used to specify how many times a +The meta characters `+`, `*` or `?` are used to specify how many times a subpattern can occur. These meta characters act differently in different situations. ### 2.3.1 The Star -The symbol `*` matches zero or more repetitions of the preceding matcher. The -regular expression `a*` means: zero or more repetitions of preceding lowercase +The `*` symbol matches zero or more repetitions of the preceding matcher. The +regular expression `a*` means: zero or more repetitions of the preceding lowercase character `a`. But if it appears after a character set or class then it finds the repetitions of the whole character set. For example, the regular expression `[a-z]*` means: any number of lowercase letters in a row. @@ -205,8 +206,8 @@ the repetitions of the whole character set. For example, the regular expression The `*` symbol can be used with the meta character `.` to match any string of characters `.*`. The `*` symbol can be used with the whitespace character `\s` to match a string of whitespace characters. For example, the expression -`\s*cat\s*` means: zero or more spaces, followed by lowercase character `c`, -followed by lowercase character `a`, followed by lowercase character `t`, +`\s*cat\s*` means: zero or more spaces, followed by a lowercase `c`, +followed by a lowercase `a`, followed by a lowercase `t`, followed by zero or more spaces.@@ -217,10 +218,10 @@ followed by zero or more spaces. ### 2.3.2 The Plus -The symbol `+` matches one or more repetitions of the preceding character. For -example, the regular expression `c.+t` means: lowercase letter `c`, followed by -at least one character, followed by the lowercase character `t`. It needs to be -clarified that `t` is the last `t` in the sentence. +The `+` symbol matches one or more repetitions of the preceding character. For +example, the regular expression `c.+t` means: a lowercase `c`, followed by +at least one character, followed by a lowercase `t`. It needs to be +clarified that`t` is the last `t` in the sentence."c.+t" => The fat cat sat on the mat. @@ -230,11 +231,10 @@ clarified that `t` is the last `t` in the sentence. ### 2.3.3 The Question Mark -In regular expression the meta character `?` makes the preceding character +In regular expressions, the meta character `?` makes the preceding character optional. This symbol matches zero or one instance of the preceding character. -For example, the regular expression `[T]?he` means: Optional the uppercase -letter `T`, followed by the lowercase character `h`, followed by the lowercase -character `e`. +For example, the regular expression `[T]?he` means: Optional uppercase +`T`, followed by a lowercase `h`, followed bya lowercase `e`."[T]he" => The car is parked in the garage. @@ -250,10 +250,10 @@ character `e`. ## 2.4 Braces -In regular expression braces that are also called quantifiers are used to +In regular expressions, braces (also called quantifiers) are used to specify the number of times that a character or a group of characters can be repeated. For example, the regular expression `[0-9]{2,3}` means: Match at least -2 digits but not more than 3 (characters in the range of 0 to 9). +2 digits, but not more than 3, ranging from 0 to 9."[0-9]{2,3}" => The number was 9.9997 but we rounded it off to 10.0. @@ -262,7 +262,7 @@ repeated. For example, the regular expression `[0-9]{2,3}` means: Match at least [Test the regular expression](https://regex101.com/r/juM86s/1) We can leave out the second number. For example, the regular expression -`[0-9]{2,}` means: Match 2 or more digits. If we also remove the comma the +`[0-9]{2,}` means: Match 2 or more digits. If we also remove the comma, the regular expression `[0-9]{3}` means: Match exactly 3 digits.@@ -277,16 +277,16 @@ regular expression `[0-9]{3}` means: Match exactly 3 digits. [Test the regular expression](https://regex101.com/r/Sivu30/1) -## 2.5 Capturing Group +## 2.5 Capturing Groups -A capturing group is a group of sub-patterns that is written inside Parentheses -`(...)`. Like as we discussed before that in regular expression if we put a quantifier -after a character then it will repeat the preceding character. But if we put quantifier +A capturing group is a group of sub-patterns that is written inside parentheses +`(...)`. As discussed before, in regular expressions, if we put a quantifier +after a character then it will repeat the preceding character. But if we put a quantifier after a capturing group then it repeats the whole capturing group. For example, the regular expression `(ab)*` matches zero or more repetitions of the character -"ab". We can also use the alternation `|` meta character inside capturing group. -For example, the regular expression `(c|g|p)ar` means: lowercase character `c`, -`g` or `p`, followed by character `a`, followed by character `r`. +"ab". We can also use the alternation `|` meta character inside a capturing group. +For example, the regular expression `(c|g|p)ar` means: a lowercase `c`, +`g` or `p`, followed by `a`, followed by `r`."(c|g|p)ar" => The car is parked in the garage. @@ -294,15 +294,15 @@ For example, the regular expression `(c|g|p)ar` means: lowercase character `c`, [Test the regular expression](https://regex101.com/r/tUxrBG/1) -Note that capturing groups do not only match but also capture the characters for use in -the parent language. The parent language could be python or javascript or virtually any +Note that capturing groups do not only match, but also capture, the characters for use in +the parent language. The parent language could be Python or JavaScript or virtually any language that implements regular expressions in a function definition. -### 2.5.1 Non-capturing group +### 2.5.1 Non-Capturing Groups -A non-capturing group is a capturing group that only matches the characters, but +A non-capturing group is a capturing group that matches the characters but does not capture the group. A non-capturing group is denoted by a `?` followed by a `:` -within parenthesis `(...)`. For example, the regular expression `(?:c|g|p)ar` is similar to +within parentheses `(...)`. For example, the regular expression `(?:c|g|p)ar` is similar to `(c|g|p)ar` in that it matches the same characters but will not create a capture group.@@ -319,13 +319,13 @@ See also [4. Lookaround](#4-lookaround). In a regular expression, the vertical bar `|` is used to define alternation. Alternation is like an OR statement between multiple expressions. Now, you may be -thinking that character set and alternation works the same way. But the big -difference between character set and alternation is that character set works on -character level but alternation works on expression level. For example, the -regular expression `(T|t)he|car` means: either (uppercase character `T` or lowercase -`t`, followed by lowercase character `h`, followed by lowercase character `e`) OR -(lowercase character `c`, followed by lowercase character `a`, followed by -lowercase character `r`). Note that I put the parentheses for clarity, to show that either expression +thinking that character sets and alternation work the same way. But the big +difference between character sets and alternation is that character sets work at the +character level but alternation works at the expression level. For example, the +regular expression `(T|t)he|car` means: either (an uppercase `T` or a lowercase +`t`, followed by a lowercase `h`, followed by a lowercase `e`) OR +(a lowercase `c`, followed by a lowercase `a`, followed by +a lowercase `r`). Note that I included the parentheses for clarity, to show that either expression in parentheses can be met and it will match.@@ -334,17 +334,15 @@ in parentheses can be met and it will match. [Test the regular expression](https://regex101.com/r/fBXyX0/1) -## 2.7 Escaping special character +## 2.7 Escaping Special Characters -Backslash `\` is used in regular expression to escape the next character. This -allows us to specify a symbol as a matching character including reserved -characters `{ } [ ] / \ + * . $ ^ | ?`. To use a special character as a matching -character prepend `\` before it. +A backslash `\` is used in regular expressions to escape the next character. This +allows us to include reserved characters such as `{ } [ ] / \ + * . $ ^ | ?` as matching characters. To use one of these special character as a matching character, prepend it with `\`. -For example, the regular expression `.` is used to match any character except -newline. Now to match `.` in an input string the regular expression -`(f|c|m)at\.?` means: lowercase letter `f`, `c` or `m`, followed by lowercase -character `a`, followed by lowercase letter `t`, followed by optional `.` +For example, the regular expression `.` is used to match any character except a +newline. Now, to match `.` in an input string, the regular expression +`(f|c|m)at\.?` means: a lowercase `f`, `c` or `m`, followed by a lowercase +`a`, followed by a lowercase `t`, followed by an optional `.` character.@@ -357,20 +355,20 @@ character. In regular expressions, we use anchors to check if the matching symbol is the starting symbol or ending symbol of the input string. Anchors are of two types: -First type is Caret `^` that check if the matching character is the start -character of the input and the second type is Dollar `$` that checks if matching +The first type is the caret `^` that check if the matching character is the first +character of the input and the second type is the dollar sign `$` which checks if a matching character is the last character of the input string. -### 2.8.1 Caret +### 2.8.1 The Caret -Caret `^` symbol is used to check if matching character is the first character -of the input string. If we apply the following regular expression `^a` (if a is -the starting symbol) to input string `abc` it matches `a`. But if we apply -regular expression `^b` on above input string it does not match anything. -Because in input string `abc` "b" is not the starting symbol. Let's take a look -at another regular expression `^(T|t)he` which means: uppercase character `T` or -lowercase character `t` is the start symbol of the input string, followed by -lowercase character `h`, followed by lowercase character `e`. +The caret symbol `^` is used to check if a matching character is the first character +of the input string. If we apply the following regular expression `^a` (meaning 'a' must be +the starting character) to the string `abc`, it will match `a`. But if we apply +the regular expression `^b` to the above string, it will not match anything. +Because in the string `abc`, the "b" is not the starting character. Let's take a look +at another regular expression `^(T|t)he` which means: an uppercase `T` or +a lowercase `t` must be the first character in the string, followed by a +lowercase `h`, followed by a lowercase `e`."(T|t)he" => The car is parked in the garage. @@ -384,12 +382,12 @@ lowercase character `h`, followed by lowercase character `e`. [Test the regular expression](https://regex101.com/r/jXrKne/1) -### 2.8.2 Dollar +### 2.8.2 The Dollar Sign -Dollar `$` symbol is used to check if matching character is the last character -of the input string. For example, regular expression `(at\.)$` means: a -lowercase character `a`, followed by lowercase character `t`, followed by a `.` -character and the matcher must be end of the string. +The dollar sign `$` is used to check if a matching character is the last character +in the string. For example, the regular expression `(at\.)$` means: a +lowercase `a`, followed by a lowercase `t`, followed by a `.` +character and the matcher must be at the end of the string."(at\.)" => The fat cat. sat. on the mat. @@ -405,30 +403,29 @@ character and the matcher must be end of the string. ## 3. Shorthand Character Sets -Regular expression provides shorthands for the commonly used character sets, -which offer convenient shorthands for commonly used regular expressions. The -shorthand character sets are as follows: +There are a number of convenient shorthands for commonly used character sets/ +regular expressions: |Shorthand|Description| |:----:|----| |.|Any character except new line| |\w|Matches alphanumeric characters: `[a-zA-Z0-9_]`| |\W|Matches non-alphanumeric characters: `[^\w]`| -|\d|Matches digit: `[0-9]`| -|\D|Matches non-digit: `[^\d]`| -|\s|Matches whitespace character: `[\t\n\f\r\p{Z}]`| -|\S|Matches non-whitespace character: `[^\s]`| +|\d|Matches digits: `[0-9]`| +|\D|Matches non-digits: `[^\d]`| +|\s|Matches whitespace characters: `[\t\n\f\r\p{Z}]`| +|\S|Matches non-whitespace characters: `[^\s]`| -## 4. Lookaround +## 4. Lookarounds -Lookbehind and lookahead (also called lookaround) are specific types of -***non-capturing groups*** (used to match the pattern but not included in matching -list). Lookarounds are used when we have the condition that this pattern is -preceded or followed by another certain pattern. For example, we want to get all -numbers that are preceded by `$` character from the following input string -`$4.44 and $10.88`. We will use following regular expression `(?<=\$)[0-9\.]*` -which means: get all the numbers which contain `.` character and are preceded -by `$` character. Following are the lookarounds that are used in regular +Lookbehinds and lookaheads (also called lookarounds) are specific types of +***non-capturing groups*** (used to match a pattern but without including it in the matching +list). Lookarounds are used when we a pattern must be +preceded or followed by another pattern. For example, imagine we want to get all +numbers that are preceded by the `$` character from the string +`$4.44 and $10.88`. We will use the following regular expression `(?<=\$)[0-9\.]*` +which means: get all the numbers which contain the `.` character and are preceded +by the `$` character. These are the lookarounds that are used in regular expressions: |Symbol|Description| @@ -438,18 +435,18 @@ expressions: |?<=|Positive Lookbehind| |? "(T|t)he(?=\sfat)" => The fat cat sat on the mat. @@ -457,15 +454,14 @@ or `the` which are followed by the word `fat`. [Test the regular expression](https://regex101.com/r/IDDARt/1) -### 4.2 Negative Lookahead +### 4.2 Negative Lookaheads -Negative lookahead is used when we need to get all matches from input string -that are not followed by a pattern. Negative lookahead is defined same as we define -positive lookahead but the only difference is instead of equal `=` character we -use negation `!` character i.e. `(?!...)`. Let's take a look at the following +Negative lookaheads are used when we need to get all matches from an input string +that are not followed by a certain pattern. A negative lookahead is written the same way as a +positive lookahead. The only difference is, instead of an equals sign `=`, we +use an exclamation mark `!` to indicate negation i.e. `(?!...)`. Let's take a look at the following regular expression `(T|t)he(?!\sfat)` which means: get all `The` or `the` words -from input string that are not followed by the word `fat` precedes by a space -character. +from the input string that are not followed by a space character and the word `fat`."(T|t)he(?!\sfat)" => The fat cat sat on the mat. @@ -473,12 +469,12 @@ character. [Test the regular expression](https://regex101.com/r/V32Npg/1) -### 4.3 Positive Lookbehind +### 4.3 Positive Lookbehinds -Positive lookbehind is used to get all the matches that are preceded by a -specific pattern. Positive lookbehind is denoted by `(?<=...)`. For example, the +Positive lookbehinds are used to get all the matches that are preceded by a +specific pattern. Positive lookbehinds are written `(?<=...)`. For example, the regular expression `(?<=(T|t)he\s)(fat|mat)` means: get all `fat` or `mat` words -from input string that are after the word `The` or `the`. +from the input string that come after the word `The` or `the`."(?<=(T|t)he\s)(fat|mat)" => The fat cat sat on the mat. @@ -486,11 +482,11 @@ from input string that are after the word `The` or `the`. [Test the regular expression](https://regex101.com/r/avH165/1) -### 4.4 Negative Lookbehind +### 4.4 Negative Lookbehinds -Negative lookbehind is used to get all the matches that are not preceded by a -specific pattern. Negative lookbehind is denoted by `(? @@ -507,17 +503,17 @@ integral part of the RegExp. |Flag|Description| |:----:|----| -|i|Case insensitive: Sets matching to be case-insensitive.| -|g|Global Search: Search for a pattern throughout the input string.| -|m|Multiline: Anchor meta character works on each line.| +|i|Case insensitive: Match will be case-insensitive.| +|g|Global Search: Match all instances, not just the first.| +|m|Multiline: Anchor meta characters work on each line.| ### 5.1 Case Insensitive The `i` modifier is used to perform case-insensitive matching. For example, the -regular expression `/The/gi` means: uppercase letter `T`, followed by lowercase -character `h`, followed by character `e`. And at the end of regular expression +regular expression `/The/gi` means: an uppercase `T`, followed by a lowercase +`h`, followed by an `e`. And at the end of regular expression the `i` flag tells the regular expression engine to ignore the case. As you can -see we also provided `g` flag because we want to search for the pattern in the +see, we also provided `g` flag because we want to search for the pattern in the whole input string.@@ -532,13 +528,13 @@ whole input string. [Test the regular expression](https://regex101.com/r/ahfiuh/1) -### 5.2 Global search +### 5.2 Global Search -The `g` modifier is used to perform a global match (find all matches rather than +The `g` modifier is used to perform a global match (finds all matches rather than stopping after the first match). For example, the regular expression`/.(at)/g` -means: any character except new line, followed by lowercase character `a`, -followed by lowercase character `t`. Because we provided `g` flag at the end of -the regular expression now it will find all matches in the input string, not just the first one (which is the default behavior). +means: any character except a new line, followed by a lowercase `a`, +followed by a lowercase `t`. Because we provided the `g` flag at the end of +the regular expression, it will now find all matches in the input string, not just the first one (which is the default behavior)."/.(at)/" => The fat cat sat on the mat. @@ -554,12 +550,12 @@ the regular expression now it will find all matches in the input string, not jus ### 5.3 Multiline -The `m` modifier is used to perform a multi-line match. As we discussed earlier -anchors `(^, $)` are used to check if pattern is the beginning of the input or -end of the input string. But if we want that anchors works on each line we use -`m` flag. For example, the regular expression `/at(.)?$/gm` means: lowercase -character `a`, followed by lowercase character `t`, optionally anything except -new line. And because of `m` flag now regular expression engine matches pattern +The `m` modifier is used to perform a multi-line match. As we discussed earlier, +anchors `(^, $)` are used to check if a pattern is at the beginning of the input or +the end. But if we want the anchors to work on each line, we use +the `m` flag. For example, the regular expression `/at(.)?$/gm` means: a lowercase +`a`, followed by a lowercase `t` and, optionally, anything except +a new line. And because of the `m` flag, the regular expression engine now matches patterns at the end of each line in a string.@@ -578,9 +574,9 @@ at the end of each line in a string. [Test the regular expression](https://regex101.com/r/E88WE2/1) -## 6. Greedy vs lazy matching -By default regex will do greedy matching which means it will match as long as -possible. We can use `?` to match in lazy way which means as short as possible. +## 6. Greedy vs Lazy Matching +By default, a regex will perform a greedy match, which means the match will be as long as +possible. We can use `?` to match in a lazy way, which means the match should be as short as possible."/(.*at)/" => The fat cat sat on the mat.@@ -597,7 +593,7 @@ possible. We can use `?` to match in lazy way which means as short as possible. ## Contribution -* Open pull request with improvements +* Open a pull request with improvements * Discuss ideas in issues * Spread the word * Reach out with any feedback [](https://twitter.com/ziishaned)