Merge remote-tracking branch 'zeeshanu/master'

# Conflicts:
#	README.md
This commit is contained in:
GrayLand 2017-09-01 09:08:06 +08:00
commit 459095b917
14 changed files with 2873 additions and 224 deletions

View File

@ -6,23 +6,28 @@
## 翻译:
* [English](README.md)
* [Español](README-es.md)
* [Français](README-fr.md)
* [中文版](README-cn.md)
* [日本語](README-ja.md)
* [한국어](README-ko.md)
* [Turkish](README-tr.md)
## 什么是正则表达式?
> 正则表达式是一组由字母和符号组成的特殊文本, 它可以用来从文本中找出满足你想要的格式的句子.
一个正则表达式是在一个主体字符串中从左到右匹配字符串时的一种样式.
例如"Regular expression"是一个完整的句子, 但我们常使用缩写的术语"regex"或"regexp".
正则表达式可以用来替换文本中的字符串,验证形式,提取字符串等等.
"Regular expression"这个词比较拗口, 我们常使用缩写的术语"regex"或"regexp".
正则表达式可以从一个基础字符串中根据一定的匹配模式替换文本中的字符串、验证表单、提取字符串等等.
想象你正在写一个应用, 然后你想设定一个用户命名的规则, 让用户名包含字符,数字,下划线和连字符,以及限制字符的个数,好让名字看起来没那么丑.
我们使用以下正则表达式来验证一个用户名:
<br/><br/>
<p align="center">
<img src="https://i.imgur.com/Pq5Llat.png" alt="Regular expression">
<img src="./img/regexp-en.png" alt="Regular expression">
</p>
以上的正则表达式可以接受 `john_doe`, `jo-hn_doe`, `john12_as`.
@ -67,7 +72,7 @@
例如: 一个正则表达式 `the`, 它表示一个规则: 由字母`t`开始,接着是`h`,再接着是`e`.
<pre>
"the" => The fat cat sat on <a href="#learn-regex"><strong>the</strong></a> mat.
"the" => The fat cat sat on <a href="#learn-regex"><strong>the</strong></a> mat.
</pre>
[在线练习](https://regex101.com/r/dmRygT/1)
@ -84,7 +89,7 @@
## 2. 元字符
正则表达式主要依赖于元字符.
正则表达式主要依赖于元字符.
元字符不代表他们本身的字面意思, 他们都有特殊的含义. 一些元字符写在方括号中的时候有一些特殊的意思. 以下是一些元字符的介绍:
|元字符|描述|
@ -104,7 +109,7 @@
## 2.1 点运算符 `.`
`.`是元字符中最简单的例子.
`.`是元字符中最简单的例子.
`.`匹配任意单个字符, 但不匹配换行符.
例如, 表达式`.ar`匹配一个任意字符后面跟着是`a``r`的字符串.
@ -150,7 +155,7 @@
## 2.3 重复次数
后面跟着元字符 `+`, `*` or `?` 的, 用来指定匹配子模式的次数.
后面跟着元字符 `+`, `*` or `?` 的, 用来指定匹配子模式的次数.
这些元字符在不同的情况下有着不同的意思.
### 2.3.1 `*`
@ -204,7 +209,7 @@
## 2.4 `{}`
在正则表达式中 `{}` 是一个量词, 常用来一个或一组字符可以重复出现的次数.
例如, 表达式 `[0-9]{2,3}` 匹配 2~3 位 0~9 的数字.
例如, 表达式 `[0-9]{2,3}` 匹配 23 位 09 的数字.
<pre>
@ -216,7 +221,7 @@
我们可以省略第二个参数.
例如, `[0-9]{2,}` 匹配至少两位 0~9 的数字.
如果逗号也省略掉则表示重复固定的次数.
如果逗号也省略掉则表示重复固定的次数.
例如, `[0-9]{3}` 匹配3位数字
<pre>
@ -259,7 +264,7 @@
反斜线 `\` 在表达式中用于转码紧跟其后的字符. 用于指定 `{ } [ ] / \ + * . $ ^ | ?` 这些特殊字符. 如果想要匹配这些特殊字符则要在其前面加上反斜线 `\`.
例如 `.` 是用来匹配除换行符外的所有字符的. 如果想要匹配句子中的 `.` 则要写成 `\.`.
例如 `.` 是用来匹配除换行符外的所有字符的. 如果想要匹配句子中的 `.` 则要写成 `\.` 以下这个例子 `\.?`是选择性匹配`.`
<pre>
"(f|c|m)at\.?" => The <a href="#learn-regex"><strong>fat</strong></a> <a href="#learn-regex"><strong>cat</strong></a> sat on the <a href="#learn-regex"><strong>mat.</strong></a>
@ -275,7 +280,7 @@
`^` 用来检查匹配的字符串是否在所匹配字符串的开头.
例如, 在 `abc` 中使用表达式 `^a` 会得到结果 `a`. 但如果使用 `^b` 将匹配不到任何结果. 为在字符串 `abc` 中并不是以 `b` 开头.
例如, 在 `abc` 中使用表达式 `^a` 会得到结果 `a`. 但如果使用 `^b` 将匹配不到任何结果. 为在字符串 `abc` 中并不是以 `b` 开头.
例如, `^(T|t)he` 匹配以 `The``the` 开头的字符串.
@ -322,6 +327,12 @@
|\D|匹配非数字: `[^\d]`|
|\s|匹配所有空格字符, 等同于: `[\t\n\f\r\p{Z}]`|
|\S|匹配所有非空格字符: `[^\s]`|
|\f|匹配一个换页符|
|\n|匹配一个换行符|
|\r|匹配一个回车符|
|\t|匹配一个制表符|
|\v|匹配一个垂直制表符|
|\p|匹配 CR/LF (等同于 `\r\n`),用来匹配 DOS 行终止符|
## 4. 前后关联约束(前后预查)
@ -344,14 +355,14 @@
`?=...` 前置约束(存在), 表示第一部分表达式必须跟在 `?=...`定义的表达式之后.
返回结果只瞒住第一部分表达式.
定义一个前置约束(存在)要使用 `()`. 在括号内部使用一个问号和等号: `(?=...)`.
返回结果只满足第一部分表达式.
定义一个前置约束(存在)要使用 `()`. 在括号内部使用一个问号和等号: `(?=...)`.
前置约束的内容写在括号中的等号后面.
例如, 表达式 `[T|t]he(?=\sfat)` 匹配 `The``the`, 在括号中我们又定义了前置约束(存在) `(?=\sfat)` ,即 `The``the` 后面紧跟着 `(空格)fat`.
例如, 表达式 `(T|t)he(?=\sfat)` 匹配 `The``the`, 在括号中我们又定义了前置约束(存在) `(?=\sfat)` ,即 `The``the` 后面紧跟着 `(空格)fat`.
<pre>
"[T|t]he(?=\sfat)" => <a href="#learn-regex"><strong>The</strong></a> fat cat sat on the mat.
"(T|t)he(?=\sfat)" => <a href="#learn-regex"><strong>The</strong></a> fat cat sat on the mat.
</pre>
[在线练习](https://regex101.com/r/IDDARt/1)
@ -359,12 +370,12 @@
### 4.2 `?!...` 前置约束-排除
前置约束-排除 `?!` 用于筛选所有匹配结果, 筛选条件为 其后不跟随着定义的格式
`前置约束-排除` 定义和 `前置约束(存在)` 一样, 区别就是 `=` 替换成 `!` 也就是 `(?!...)`.
`前置约束-排除` 定义和 `前置约束(存在)` 一样, 区别就是 `=` 替换成 `!` 也就是 `(?!...)`.
表达式 `[T|t]he(?!\sfat)` 匹配 `The``the`, 且其后不跟着 `(空格)fat`.
表达式 `(T|t)he(?!\sfat)` 匹配 `The``the`, 且其后不跟着 `(空格)fat`.
<pre>
"[T|t]he(?!\sfat)" => The fat cat sat on <a href="#learn-regex"><strong>the</strong></a> mat.
"(T|t)he(?!\sfat)" => The fat cat sat on <a href="#learn-regex"><strong>the</strong></a> mat.
</pre>
[在线练习](https://regex101.com/r/V32Npg/1)
@ -372,10 +383,10 @@
### 4.3 `?<= ...` 后置约束-存在
后置约束-存在 记作`(?<=...)` 用于筛选所有匹配结果, 筛选条件为 其前跟随着定义的格式.
例如, 表达式 `(?<=[T|t]he\s)(fat|mat)` 匹配 `fat``mat`, 且其前跟着 `The``the`.
例如, 表达式 `(?<=(T|t)he\s)(fat|mat)` 匹配 `fat``mat`, 且其前跟着 `The``the`.
<pre>
"(?<=[T|t]he\s)(fat|mat)" => The <a href="#learn-regex"><strong>fat</strong></a> cat sat on the <a href="#learn-regex"><strong>mat</strong></a>.
"(?<=(T|t)he\s)(fat|mat)" => The <a href="#learn-regex"><strong>fat</strong></a> cat sat on the <a href="#learn-regex"><strong>mat</strong></a>.
</pre>
[在线练习](https://regex101.com/r/avH165/1)
@ -386,7 +397,7 @@
例如, 表达式 `(?<!(T|t)he\s)(cat)` 匹配 `cat`, 且其前不跟着 `The``the`.
<pre>
"(?&lt;![T|t]he\s)(cat)" => The cat sat on <a href="#learn-regex"><strong>cat</strong></a>.
"(?&lt;!(T|t)he\s)(cat)" => The cat sat on <a href="#learn-regex"><strong>cat</strong></a>.
</pre>
[在线练习](https://regex101.com/r/8Efx5G/1)
@ -421,7 +432,7 @@
### 5.2 全局搜索 (Global search)
修饰符 `g` 常用语执行一个全局搜索匹配, 即(不仅仅返回第一个匹配的, 而是返回全部).
修饰符 `g` 常用语执行一个全局搜索匹配, 即(不仅仅返回第一个匹配的, 而是返回全部).
例如, 表达式 `/.(at)/g` 表示搜索 任意字符(除了换行) + `at`, 并返回全部结果.
<pre>
@ -438,7 +449,7 @@
### 5.3 多行修饰符 (Multiline)
多行修饰符 `m` 常用语执行一个多行匹配.
多行修饰符 `m` 常用语执行一个多行匹配.
像之前介绍的 `(^,$)` 用于检查格式是否是在待检测字符串的开头或结尾. 但我们如果想要它在每行的开头和结尾生效, 我们需要用到多行修饰符 `m`.
@ -460,27 +471,6 @@
[在线练习](https://regex101.com/r/E88WE2/1)
## 额外补充
* *正整数*: `^\d+$`
* *负整数*: `^-\d+$`
* *手机国家号*: `^+?[\d\s]{3,}$`
* *手机号*: `^+?[\d\s]+(?[\d\s]{10,}$`
* *整数*: `^-?\d+$`
* *用户名*: `^[\w\d_.]{4,16}$`
* *数字和英文字母*: `^[a-zA-Z0-9]*$`
* *数字和应为字母和空格*: `^[a-zA-Z0-9 ]*$`
* *密码*: `^(?=^.{6,}$)((?=.*[A-Za-z0-9])(?=.*[A-Z])(?=.*[a-z]))^.*$`
* *邮箱*: `^([a-zA-Z0-9._%-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,4})*$`
* *IP4 地址*: `^((?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))*$`
* *纯小写字母*: `^([a-z])*$`
* *纯大写字母*: `^([A-Z])*$`
* *URL*: `^(((http|https|ftp):\/\/)?([[a-zA-Z0-9]\-\.])+(\.)([[a-zA-Z0-9]]){2,4}([[a-zA-Z0-9]\/+=%&_\.~?\-]*))*$`
* *VISA 信用卡号*: `^(4[0-9]{12}(?:[0-9]{3})?)*$`
* *日期 (MM/DD/YYYY)*: `^(0?[1-9]|1[012])[- /.](0?[1-9]|[12][0-9]|3[01])[- /.](19|20)?[0-9]{2}$`
* *日期 (YYYY/MM/DD)*: `^(19|20)?[0-9]{2}[- /.](0?[1-9]|1[012])[- /.](0?[1-9]|[12][0-9]|3[01])$`
* *MasterCard 信用卡号*: `^(5[1-5][0-9]{14})*$`
## 贡献
* 报告问题

464
README-es.md Normal file
View File

@ -0,0 +1,464 @@
<br/>
<p align="center">
<img src="https://i.imgur.com/bYwl7Vf.png" alt="Learn Regex">
</p><br/>
## Translations:
* [English](README.md)
* [Español](README-es.md)
* [Français](README-fr.md)
* [中文版](README-cn.md)
* [日本語](README-ja.md)
* [한국어](README-ko.md)
* [Turkish](README-tr.md)
## What is Regular Expression?
> Una expresión regular es un grupo de caracteres o símbolos, los cuales son usados para buscar un patrón específico dentro de un texto.
Una expresión regular es un patrón que que se compara con una cadena de caracteres de izquierda a derecha. La palabra "expresión regular", puede también ser escrita como "Regex" o "Regexp". Las expresiones regulares se utiliza para remplazar un texto, dentro de un *string* (o cadena de caracteres), validar el formato, extraer un substring de un string completo basado en la coincidencia de una patrón, y muchas cosas más.
Imagina que estas escribiendo una aplicación y quieres agregar reglas para cuando el usuario elija su nombre de usuario. Nosotros vamos a querer que el nombre de usuario contenga letras, números, guión bajo, y guíon medio. También vamos a querer limitar el número de caracteres en el nombre de usuario para que no se vea feo. Para ello usamos la siguiente expresión regular para validar el nombre de usuario
<br/><br/>
<p align="center">
<img src="./img/regexp-es.png" alt="Expresión regular">
</p>
De la expresión regular anterior, se puede aceptar las cadenas 'john_doe', 'jo-hn_doe' y 'john12_as'. La expresión no coincide con el nombre de usuario 'Jo', porque es una cadena de caracteres que contiene letras mayúsculas y es demasiado corta.
## Tabla de contenido
- [Introducción](#1-introduccion)
- [Meta caracteres](#2-meta-caracteres)
- [Full stop](#21-full-stop)
- [Conjunto de caracteres](#22-conjunto-de-caracteres)
- [Conjunto de caracteres negados](#221-conjunto-de-caracteres-negado)
- [Repeticiones](#23-repeticiones)
- [Asterísco](#231-asterisco)
- [Signo más](#232-signo-mas)
- [Signo de pregunta](#233-signo-de-pregunta)
- [Llaves](#24-llaves)
- [Grupo de caracteres](#25-grupo-de-caracteres)
- [Alternancia](#26-alternacia)
- [Caracteres especiales de escape](#27-caracteres-especiales-de-escape)
- [Anclas](#28-anclas)
- [Símbolo de intercalación](#281-simbolo-de-intercalacion)
- [Símbolo dolar](#282-simbolo-dolar)
- [Conjunto de caracteres abreviados](#3-conjunto-de-caracteres-abreviados)
- [Mirar alrededor](#4-mirar-alrededor)
- [Mirar hacia delante positivo](#41-mirar-hacia-delante-positivo)
- [Mirar hacia delante negativo](#41-mirar-hacia-delaten-negativo)
- [Mirar hacia atrás positivo](#41-mirar-hacia-atras-positivo)
- [Mirar hacia atrás negativo](#41-mirar-hacia-atras-negativo)
- [Banderas](#5-banderas)
- [mayúsculas y minúsculas](#51-mayusculas-y-minusculas)
- [Búsqueda global](#52-busqueda-global)
- [Multilinea](#53-multilinea)
- [Bonus](#bonus)
## 1. Introducción
Una expresión regular es sólo un patrón de caracteres que utilizamos para realizar búsquedas en un texto. Por ejemplo, la expresión regular «the» significa: la letra `t` seguida de la letra `h` seguida de la letra `e`.
<pre>
"the" => The fat cat sat on <a href="#learn-regex"><strong>the</strong></a> mat.
</pre>
[Prueba la expresión regular](https://regex101.com/r/dmRygT/1)
La expresión regular `123` coincide con la cadena `123`. La expresión regular se compara con una cadena de entrada al comparar cada carácter de la expresión regular con cada carácter de la cadena de entrada, uno tras otro. Las expresiones regulares son normalmente sensibles a mayúsculas y minúsculas, por lo que la expresión regular `The` no coincide con la cadena `the`.
<pre>
"The" => <a href="#learn-regex"><strong>The</strong></a> fat cat sat on the mat.
</pre>
[Prueba la expresión regular](https://regex101.com/r/1paXsy/1)
## 2. Meta caracteres
Los caracteres meta son los bloques de construcción de las expresiones regulares. Los meta caracteres no se sostienen a sí mismos, sino que se interpretan de alguna manera especial. Algunos meta caracteres tienen un significado especial y se escriben entre corchetes. Los meta caracteres son los siguientes:
|Meta character|Description|
|:----:|----|
|.|Periodo. Coincide con cualquier caracter excepto un salto de línea.|
|[ ]|Clase caracter. Coincide con cualquier caracter contenido entre corchetes.|
|[^ ]|Clase caracter negado. Coincide con cualquier caracter que no está contenido dentro de los corchetes.|
|*|Corresponde con 0 o más repeticiones del símbolo precedente.|
|+|Corresponde con 1 o más repeticiones del símbolo precedente.|
|?|Hace que el símbolo precedente sea opcional.|
|{n,m}|Llaves.Corresponde al menos "n" pero no más de "m" repeticiones del símbolo precedente.|
|(xyz)|Grupo caracter. Hace coincidir los caracteres xyz en ese orden exacto.|
|&#124;|Alternancia. Corresponde a los caracteres anteriores o los caracteres después del símbolo.|
|&#92;|Escapa el siguiente caracter. Esto le permite hacer coincidir los caracteres reservados <code>[ ] ( ) { } . * + ? ^ $ \ &#124;</code>|
|^|Hace coincidir el principio de la entrada.|
|$|Corresponde al final de la entrada.|
## 2.1 Full stop
Full stop `.` es el ejemplo más simple del meta-caracter. El caracter meta "." coincide con cualquier carácter. No coincidirá con el retorno o nuevos caracteres de línea. Por ejemplo, la expresión regular `.ar` significa: cualquier caracter, seguido de la letra`a`, seguido de la letra "r".
<pre>
".ar" => The <a href="#learn-regex"><strong>car</strong></a> <a href="#learn-regex"><strong>par</strong></a>ked in the <a href="#learn-regex"><strong>gar</strong></a>age.
</pre>
[Prueba la expresión regular](https://regex101.com/r/xc9GkU/1)
## 2.2 Conjunto de caracteres
Los conjuntos de caracteres también se llaman clase de caracteres. Los corchetes se utilizan para especificar conjuntos de caracteres. Utilice un guión dentro de un conjunto de caracteres para especificar el rango de los caracteres. El orden del rango de caracteres dentro de corchetes no importa. Por ejemplo, la expresión regular "[Tt] he" significa: una letra mayúscula "T" o <minúscula> t, seguida de la letra "h" seguida de la letra "e"
<pre>
"[Tt]he" => <a href="#learn-regex"><strong>The</strong></a> car parked in <a href="#learn-regex"><strong>the</strong></a> garage.
</pre>
[Prueba la expresión regular](https://regex101.com/r/2ITLQ4/1)
Sin embargo, un período dentro de un conjunto de caracteres significa un período literal. La expresión regular `ar [.]` Significa: un carácter minúsculo `a`, seguido de la letra` r`, seguido de un carácter `.`.
<pre>
"ar[.]" => A garage is a good place to park a c<a href="#learn-regex"><strong>ar.</strong></a>
</pre>
[Prueba la expresión regular](https://regex101.com/r/wL3xtE/1)
### 2.2.1 Conjunto de caracteres negados
En general, el símbolo de intercalación representa el comienzo de la cadena, pero cuando se escribe después del corchete de apertura niega el conjunto de caracteres. Por ejemplo, la expresión regular `[^c] ar` significa: cualquier carácter, excepto `c`, seguido del carácter `a`, seguido de la letra `r`.
<pre>
"[^c]ar" => The car <a href="#learn-regex"><strong>par</strong></a>ked in the <a href="#learn-regex"><strong>gar</strong></a>age.
</pre>
[Prueba la expresión regular](https://regex101.com/r/nNNlq3/1)
## 2.3 Repeticiones
Siguiendo los caracteres meta +, * o ?, se utilizan para especificar cuántas veces puede producirse un subpatrón. Estos meta-caracteres actúan de manera diferente en diferentes situaciones.
### 2.3.1 Asterísco
El símbolo `*` coincide con cero o más repeticiones del marcador anterior. La expresión regular `a*` significa: cero o más repeticiones del carácter en minúscula precedente `a`. Pero si aparece después de un conjunto de caracteres o una clase, entonces encuentra las repeticiones de todo el conjunto de caracteres. Por ejemplo, la expresión regular `[a-z]*` significa: cualquier número de letras minúsculas en una fila.
<pre>
"[a-z]*" => T<a href="#learn-regex"><strong>he</strong></a> <a href="#learn-regex"><strong>car</strong></a> <a href="#learn-regex"><strong>parked</strong></a> <a href="#learn-regex"><strong>in</strong></a> <a href="#learn-regex"><strong>the</strong></a> <a href="#learn-regex"><strong>garage</strong></a> #21.
</pre>
[Prueba la expresión regular](https://regex101.com/r/7m8me5/1)
El símbolo `*` se puede utilizar con el meta-caracter `.` para que coincida con cualquier cadena de caracteres `.*`. El símbolo `*` se lo puede utilizar con el caracter de espacio en blanco `\s` para que coincida con una cadena de caracteres de espacio en blanco. Por ejemplo, la expresión "\s*cat\s*" significa: cero o más espacios, seguido por el carácter en minúscula `c`, seguido del carácter en minúscula `a`, seguido del carácter en minúscula `t`, seguido de cero o más espacios.
<pre>
"\s*cat\s*" => The fat<a href="#learn-regex"><strong> cat </strong></a>sat on the <a href="#learn-regex">con<strong>cat</strong>enation</a>.
</pre>
[Prueba la expresión regular](https://regex101.com/r/gGrwuz/1)
### 2.3.2 Signo más
El símbolo `+` coincide con una o más repeticiones del carácter anterior. Por ejemplo, la expresión regular `c.+T` significa: letra en minúscula `c`, seguida por al menos uno del mismo carácter, luego el carácter en minúscula `t`.
<pre>
"c.+t" => The fat <a href="#learn-regex"><strong>cat sat on the mat</strong></a>.
</pre>
[Prueba la expresión regular](https://regex101.com/r/Dzf9Aa/1)
### 2.3.3 Signo de pregunta
En expresiones regulares el meta-caracter `?` hace que el caracter precedente sea opcional. Este símnbolo coincide con cero o una instancia del caracter precedente. Por ejemplo, la expresión regular `[T]?he` significa: El caracteropcional predecesor `T` seguido por la letra en minúscula `h`, seguido del caracter en minúscula `e`.
<pre>
"[T]he" => <a href="#learn-regex"><strong>The</strong></a> car is parked in the garage.
</pre>
[Prueba la expresión regular](https://regex101.com/r/cIg9zm/1)
<pre>
"[T]?he" => <a href="#learn-regex"><strong>The</strong></a> car is parked in t<a href="#learn-regex"><strong>he</strong></a> garage.
</pre>
[Prueba la expresión regular](https://regex101.com/r/kPpO2x/1)
## 2.4 Llaves
En la expresión regular, las llaves que también se denominan cuantificadores se utilizan para especificar el número de veces que se puede repetir un carácter o un grupo de caracteres. Por ejemplo, la expresión regular `[0-9]{2,3}` significa: Combina al menos 2 dígitos pero no más de 3 (caracteres del rango de 0 a 9).
<pre>
"[0-9]{2,3}" => The number was 9.<a href="#learn-regex"><strong>999</strong></a>7 but we rounded it off to <a href="#learn-regex"><strong>10</strong></a>.0.
</pre>
[Prueba la expresión regular](https://regex101.com/r/juM86s/1)
Podemos dejar fuera el segundo número. Por ejemplo, la expresión regular `[0-9] {2,}` significa: Combina 2 o más dígitos. Si también eliminamos la coma, la expresión regular `[0-9]{3}` significa: coincidir exactamente con 3 dígitos.
<pre>
"[0-9]{2,}" => The number was 9.<a href="#learn-regex"><strong>9997</strong></a> but we rounded it off to <a href="#learn-regex"><strong>10</strong></a>.0.
</pre>
[Prueba la expresión regular](https://regex101.com/r/Gdy4w5/1)
<pre>
"[0-9]{3}" => The number was 9.<a href="#learn-regex"><strong>999</strong></a>7 but we rounded it off to 10.0.
</pre>
[Prueba la expresión regular](https://regex101.com/r/Sivu30/1)
## 2.5 Grupos de caracteres
Grupo de caracteres es un grupo de sub-patrones que se escribe dentro de paréntesis `(...)`. Como hemos discutido antes en la expresión regular si ponemos un cuantificador después de un caracter, repetiremos el caracter anterior. Pero si ponemos cuantificador después de un grupo de caracteres, entonces repetimos todo el grupo de caracteres. Por ejemplo, la expresión regular `(ab)*` coincide con cero o más repeticiones del caracter "ab". También podemos usar el caracter de alternancia `|` meta dentro del grupo de caracteres. Por ejemplo, la expresión regular `(c|g|p)ar` significa: caracter en minúscula `c`, `g` o `p`, seguido del caracter `a`, seguido del caracter `r`.
<pre>
"(c|g|p)ar" => The <a href="#learn-regex"><strong>car</strong></a> is <a href="#learn-regex"><strong>par</strong></a>ked in the <a href="#learn-regex"><strong>gar</strong></a>age.
</pre>
[Prueba la expresión regular](https://regex101.com/r/tUxrBG/1)
## 2.6 Alternancia
En la expresión regular se usa la barra vertical `|` para definir la alternancia. La alternancia es como una condición entre múltiples expresiones. Ahora, puedes estar pensando que el conjunto de caracteres y la alternancia funciona de la misma manera. Pero la gran diferencia entre el conjunto de caracteres y la alternancia es que el conjunto de caracteres funciona a nivel de caracter pero la alternancia funciona a nivel de expresión. Por ejemplo, la expresión regular `(T|t)he|car` significa: el carcter en mayúscula `T` o en minúscula `t`, seguido del caracter en minúscula `h`, seguido del caracter en minúscula `e` o del caracter en minúscula `c`, seguido de un caracter en minúscula `a`, seguido del carácter en minúscula `r`.
<pre>
"(T|t)he|car" => <a href="#learn-regex"><strong>The</strong></a> <a href="#learn-regex"><strong>car</strong></a> is parked in <a href="#learn-regex"><strong>the</strong></a> garage.
</pre>
[Prueba la expresión regular](https://regex101.com/r/fBXyX0/1)
## 2.7 Caracteres especiales de escape
La barra invertida `\` se utiliza en la expresión regular para escapar del carácter siguiente. Esto permite especificar un símbolo como un caracter coincidente incluyendo caracteres reservados `{}[]/\+*.^|?`. Por ejemplo, la expresión regular `.` se utiliza para coincidir con cualquier caracter, excepto la nueva línea. Ahora, para emparejar `.` en una cadena de entrada, la expresión regular `(f|c|m)at\.?` significa: la letra minúscula `f`, `c` o `m`, seguida del caracter en minúscula `a`, seguido de la letra minúscula `t`, seguida del caracter opcional `.`.
<pre>
"(f|c|m)at\.?" => The <a href="#learn-regex"><strong>fat</strong></a> <a href="#learn-regex"><strong>cat</strong></a> sat on the <a href="#learn-regex"><strong>mat.</strong></a>
</pre>
[Prueba la expresión regular](https://regex101.com/r/DOc5Nu/1)
## 2.8 Anclas
En expresiones regulares, usamos anclas para comprobar si el símbolo de coincidencia es el símbolo inicial o el símbolo final de la cadena de entrada. Los anclajes son de dos tipos: El primer tipo es el símbolo `^` que comprueba si el caracter coincidente es el caracter inicial de la entrada y el segundo tipo es Dollar `$` que comprueba si el caracter coincidente es el último caracter de la cadena de entrada.
### 2.8.1 Simbolo de intercalación
El símbolo de intercalación `^` se usa para verificar si el caracter coincidente es el primer caracter de la cadena de entrada. Si aplicamos la siguiente expresión regular `^a` (si a es el símbolo inicial) a la cadena de entrada `abc` coincide con `a`. Pero si aplicamos la expresión regular `^b` en la cadena de entrada anterior, no coincide con nada. Porque en la cadena de entrada `abc` "b" no es el símbolo inicial. Vamos a echar un vistazo a otra expresión regular `^(T|t)he`, significa: mayúsculas `T` o la letra minúscula `t` es el símbolo inicial de la cadena de entrada, seguido del caracter minúscula `h` y seguido del caracter en minúscula `e`.
<pre>
"(T|t)he" => <a href="#learn-regex"><strong>The</strong></a> car is parked in <a href="#learn-regex"><strong>the</strong></a> garage.
</pre>
[Prueba la expresión regular](https://regex101.com/r/5ljjgB/1)
<pre>
"^(T|t)he" => <a href="#learn-regex"><strong>The</strong></a> car is parked in the garage.
</pre>
[Prueba la expresión regular](https://regex101.com/r/jXrKne/1)
### 2.8.2 Símbolo dolar
El símbolo de dólar `$` se utiliza para comprobar si el caracter coincidente es el último carácter de la cadena de entrada. Por ejemplo, la expresión regular `(at\.)$` significa: un caracter en minúscula `a`, seguido del caracter en minúscula `t` seguido de un carácter `.` y el marcador debe ser el final de la cadena.
<pre>
"(at\.)" => The fat c<a href="#learn-regex"><strong>at.</strong></a> s<a href="#learn-regex"><strong>at.</strong></a> on the m<a href="#learn-regex"><strong>at.</strong></a>
</pre>
[Prueba la expresión regular](https://regex101.com/r/y4Au4D/1)
<pre>
"(at\.)$" => The fat cat. sat. on the m<a href="#learn-regex"><strong>at.</strong></a>
</pre>
[Pueba la expresión regular](https://regex101.com/r/t0AkOd/1)
## 3. Conjunto de caracteres abreviados
La expresión regular proporciona abreviaturas para los conjuntos de caracteres
comúnmente utilizados, que ofrecen abreviaturas convenientes para expresiones
regulares de uso común. Los conjuntos de caracteres abreviados son los siguientes:
|Shorthand|Description|
|:----:|----|
|.|Cualquier caracter excepto la nueva línea|
|\w|Coincide con los caracteres alfanuméricos: `[a-zA-Z0-9_]`|
|\W|Coincide con los caracteres no alfanuméricos: `[^\w]`|
|\d|Coincide con dígitos: `[0-9]`|
|\D|Coincide con no dígitos: `[^\d]`|
|\s|Coincide con caracteres espaciales: `[\t\n\f\r\p{Z}]`|
|\S|Coincide con caracteres no espaciales: `[^\s]`|
## 4. Mirar alrededor
Mirar hacia delante (lookaheds) y mirar hacia atrás (Lookbehind) a veces conocidos
como lookaround son tipo específico de ***grupo que no captura*** (Utilice para
coincidir con el patrón pero no se incluye en la lista correspondiente). Los
lookaheads se usan cuando tenemos la condición de que este patrón es precedido o
seguido por otro patrón determinado. Por ejemplo, queremos obtener todos los números
que están precedidos por el carácter `$` de la siguiente cadena de entrada
`$4.44 y $10.88`. Usaremos la siguiente expresión regular `(?<=\$)[0-9\.] *`,
esto significa: obtener todos los números que contienen el carácter `.` y
están precedidos del carácter `$`. A continuación se muestran los lookarounds
que se utilizan en expresiones regulares:
|Symbol|Description|
|:----:|----|
|?=|Positive Lookahead|
|?!|Negative Lookahead|
|?<=|Positive Lookbehind|
|?<!|Negative Lookbehind|
### 4.1 Mirar hacia adelate positiva
El lookahead positivo afirma que la primera parte de la expresión debe ser
seguida por la expresión lookahead. El matchonly devuelto contiene el texto que
coincide con la primera parte de la expresión. Para definir un lookahead positivo,
se utilizan paréntesis. Dentro de esos paréntesis, un signo de interrogación con
signo igual se utiliza de esta manera: `(?= ...)`. La expresión de Lookahead se
escribe después del signo igual dentro de los paréntesis. Por ejemplo, la
expresión regular `[T|t]he (?=\Sfat) significa: opcionalmente emparejar
la letra minúscula `t` o la letra mayúscula `T`, seguida de la letra `h`, seguida
de la letra `e`. Entre paréntesis definimos lookahead positivo que indica al motor
de expresión regular que coincida con `The` o` the` seguido de la palabra `fat`.
<pre>
"[T|t]he(?=\sfat)" => <a href="#learn-regex"><strong>The</strong></a> fat cat sat on the mat.
</pre>
[Prueba la expresión regular](https://regex101.com/r/IDDARt/1)
### 4.2 Mirar hacia adelate negativa
El lookahead negativo se usa cuando necesitamos obtener todas las coincidencias
de la cadena de entrada que no son seguidas por un patrón. El aspecto negativo se
define de la misma manera que definimos el aspecto positivo, pero la única diferencia
es que en lugar del caracter igual `=` utilizamos la negción `!` , es decir,
`(?! ...)`. Vamos a echar un vistazo a la siguiente expresión regular `[T|t]he(?!\Sfat)`
que significa: obtener todas las `The` o `the` seguidos por la palabra `fat` precedido por un carácter de espacio.
<pre>
"[T|t]he(?!\sfat)" => The fat cat sat on <a href="#learn-regex"><strong>the</strong></a> mat.
</pre>
[Prueba la expresión](https://regex101.com/r/V32Npg/1)
### 4.3 Mirar hacia atras positiva
Positivo lookbehind se utiliza para obtener todos los caracteres que están precedidos
por un patrón específico. La apariencia positiva se denomina `(?<=...)`.
Por ejemplo, la expresión regular `(? <= [T|t]he\s)(fat|mat)` significa: obtener todas las palabras
`fat` o `mat` de la cadena de entrada después de la palabra `The` o `the`.
<pre>
"(?<=[T|t]he\s)(fat|mat)" => The <a href="#learn-regex"><strong>fat</strong></a> cat sat on the <a href="#learn-regex"><strong>mat</strong></a>.
</pre>
[Prueba la expresión regular](https://regex101.com/r/avH165/1)
### 4.4 Mirar hacia atras negativa
El lookbehind negativo se utiliza para obtener todas las coincidencias que no
están precedidas por un patrón específico. El lookbehind negativo se denota por
`(? <! ...)`. Por ejemplo, la expresión regular `(?<!(T|t)he(s)(cat)` significa:
obtener todas las palabras `cat` de la cadena de entrada que no están después de
la palabra` The` o `the`.
<pre>
"(?&lt;![T|t]he\s)(cat)" => The cat sat on <a href="#learn-regex"><strong>cat</strong></a>.
</pre>
[Prueba la expresión regular](https://regex101.com/r/8Efx5G/1)
## 5. Banderas
Los indicadores también se llaman modificadores porque modifican la salida
de una expresión regular. Estos indicadores se pueden utilizar en cualquier orden
o combinación, y son una parte integral de RegExp.
|Bandera|Descripción|
|:----:|----|
|i|Insensible a mayúsculas y minúsculas: ajusta la coincidencia para que no distinga mayúsculas y minúsculas.|
|g|Búsqueda global: busque un patrón en toda la cadena de entrada.|
|m|Multilinea: Ancla meta caracter trabaja en cada linea.|
### 5.1 Mayúscula y minúscula
El modificador `i` se utiliza para realizar la coincidencia entre mayúsculas y
minúsculas. Por ejemplo, la expresión regular `/The/gi` significa: letra mayúscula
`T`, seguido del caracter en minúscula `h`, seguido del carácter `e`. Y al final
de la expresión regular, el indicador `i` indica al motor de expresiones
regulares que ignore el caso. Como puede ver, también ofrecemos el indicador
`g` porque queremos buscar el patrón en toda la cadena de entrada.
<pre>
"The" => <a href="#learn-regex"><strong>The</strong></a> fat cat sat on the mat.
</pre>
[Prueba la expresión regularn](https://regex101.com/r/dpQyf9/1)
<pre>
"/The/gi" => <a href="#learn-regex"><strong>The</strong></a> fat cat sat on <a href="#learn-regex"><strong>the</strong></a> mat.
</pre>
[Prueba la expresión regular](https://regex101.com/r/ahfiuh/1)
### 5.2 Búsqueda global
El modificador `g` se utiliza para realizar una coincidencia global
(encontrar todos las coincidencias en lugar de detenerse después de la primera coincidencia).
Por ejemplo, la expresión regular `/.(At)/g` significa: cualquier carácter,
excepto la nueva línea, seguido del caracter minúsculo `a`, seguido del caracter
en minúscula `t`. Debido a que siempre `g` prevee la bandera al final de la expresión
regular ahora encontrará todas las coincidencias de toda la cadena de entrada.
<pre>
"/.(at)/" => The <a href="#learn-regex"><strong>fat</strong></a> cat sat on the mat.
</pre>
[Prueba la expresión regular](https://regex101.com/r/jnk6gM/1)
<pre>
"/.(at)/g" => The <a href="#learn-regex"><strong>fat</strong></a> <a href="#learn-regex"><strong>cat</strong></a> <a href="#learn-regex"><strong>sat</strong></a> on the <a href="#learn-regex"><strong>mat</strong></a>.
</pre>
[Prueba la expresión regular](https://regex101.com/r/dO1nef/1)
### 5.3 Multilinea
El modificador `m` se utiliza para realizar una coincidencia de varias líneas.
Como analizamos anteriormente, las anclas `(^,$)` se utilizan para comprobar si
el patrón es el comienzo de la entrada o el final de la cadena de entrada. Pero
si queremos que las anclas funcionen en cada línea usamos la bandera `m`.
Por ejemplo, la expresión regular `/at(.)?$/Gm`
significa: caracter en minúscula` a`, seguido del caracter minúsculo `t`,
opcionalmente cualquier cosa menos la nueva línea. Y debido a `m` bandera ahora
el motor de expresión regular coincide con el patrón al final de cada línea de una cadena.
<pre>
"/.at(.)?$/" => The fat
cat sat
on the <a href="#learn-regex"><strong>mat.</strong></a>
</pre>
[Prueba la expresión regular](https://regex101.com/r/hoGMkP/1)
<pre>
"/.at(.)?$/gm" => The <a href="#learn-regex"><strong>fat</strong></a>
cat <a href="#learn-regex"><strong>sat</strong></a>
on the <a href="#learn-regex"><strong>mat.</strong></a>
</pre>
[Prueba la expresión regular](https://regex101.com/r/E88WE2/1)
## Contribution
* Report issues
* Open pull request with improvements
* Spread the word
* Reach out to me directly at ziishaned@gmail.com or [![Twitter URL](https://img.shields.io/twitter/url/https/twitter.com/ziishaned.svg?style=social&label=Follow%20%40ziishaned)](https://twitter.com/ziishaned)
## License
MIT © [Zeeshan Ahmed](mailto:ziishaned@gmail.com)

468
README-fr.md Normal file
View File

@ -0,0 +1,468 @@
<br/>
<p align="center">
<img src="https://i.imgur.com/bYwl7Vf.png" alt="Learn Regex">
</p><br/>
## Traductions:
* [English](README.md)
* [Español](README-es.md)
* [Français](README-fr.md)
* [中文版](README-cn.md)
* [日本語](README-ja.md)
* [한국어](README-ko.md)
* [Turkish](README-tr.md)
## Qu'est-ce qu'une expression régulière?
> Une expression régulière est un groupement de caractères ou symboles utilisés pour trouver un schéma spécifique dans un texte.
Une expression régulière est un schéma qui est comparé à une chaîne de caractères (string) de gauche à droite. Le mot "Expression régulière"
est un terme entier, souvent abrégé par "regex" ou "regexp". Une expression régulière est utilisée pour remplacer un texte à l'intérieur
d'une chaîne de caractères (string), valider un formulaire, extraire une portion de chaine de caractères (string) basée sur un schéma, et bien plus encore.
Imaginons que nous écrivons une application et que nous voulons définir des règles pour le choix d'un pseudonyme. Nous voulons autoriser
le pseudonyme à contenir des lettres, des nombres, des underscores et des traits d'union. Nous voulons aussi limiter le nombre
de caractères dans le pseudonyme pour qu'il n'ait pas l'air moche. Nous utilisons l'expression régulière suivante pour valider un pseudonyme:
<br/><br/>
<p align="center">
<img src="./img/regexp-fr.png" alt="Expressions régulières">
</p>
L'expression régulière ci-dessus peut accepter les chaines de caractères (string) `john_doe`, `jo-hn_doe` et `john12_as`. Ça ne fonctionne pas avec `Jo` car cette chaine de caractères (string) contient une lettre majuscule et elle est trop courte.
## Table des matières
- [Introduction](#1-introduction)
- [Meta-caractères](#2-meta-caractères)
- [Full stop](#21-full-stop)
- [Inclusion de caractères](#22-inclusion-de-caractères)
- [Exclusion de caractères](#221-exclusion-de-caractères)
- [Répétitions](#23-répétitions)
- [Astérisque](#231-Asterisque)
- [Le Plus](#232-le-plus)
- [Le Point d'Interrogation](#233-le-point-d'interrogation)
- [Accolades](#24-accolades)
- [Groupement de caractères](#25-groupement-de-caractères)
- [Alternation](#26-alternation)
- [Caractère d'échappement](#27-caractère-d'échappement)
- [Ancres](#28-ancres)
- [Circonflexe](#281-circonflexe)
- [Dollar](#282-dollar)
- [Liste de caractères abrégés](#3-liste-de-caractères-abrégés)
- [Recherche](#4-recherche)
- [Recherche avant positive](#41-recherche-avant-positive)
- [Recherche avant négative](#42-recherche-avant-négative)
- [Recherche arrière positive](#43-recherche-arrière-positive)
- [Recherche arrière négative](#44-recherche-arrière-négative)
- [Drapeaux](#5-drapeaux)
- [Insensible à la casse](#51-insensible-à-la-casse)
- [Correspondance globale](#52-recherche-globale)
- [Multilignes](#53-multilignes)
## 1. Introduction
Une expression régulière est un schéma de caractères utilisés pour effectuer une recherche dans un texte.
Par exemple, l'expression régulière `the` signifie : la lettre `t`, suivie de la lettre `h`, suivie de la lettre `e`.
<pre>
"the" => The fat cat sat on <a href="#learn-regex"><strong>the</strong></a> mat.
</pre>
[Essayer l'expression régulière](https://regex101.com/r/dmRygT/1)
L'expression régulière `123` coïncide à la chaîne `123`. Chaque caractère de l'expression régulière est comparée à la chaîne passée en entrée, caractère par caractère. Les expressions régulières sont normalement sensibles à la casse, donc l'expression régulière `The` ne va pas coïncider à la chaîne de caractère `the`.
<pre>
"The" => <a href="#learn-regex"><strong>The</strong></a> fat cat sat on the mat.
</pre>
[Essayer l'expression régulière](https://regex101.com/r/1paXsy/1)
## 2. Meta-caractères
Les meta-caractères sont les blocs de construction des expressions régulières. Les meta-caractères sont interprétés de manière particulière. Certains meta-caractères ont des significations spéciales et sont écrits entre crochets.
Significations des meta-caractères:
|Meta-caractère|Description|
|:----:|----|
|.|Un point coïncide avec n'importe quel caractère unique à part le retour à la ligne.|
|[ ]|Classe de caractères. Coïncide avec n'importe quel caractère entre crochets.|
|[^ ]|Négation de classe de caractère. Coïncide avec n'importe quel caractère qui n'est pas entre les crochets.|
|*|Coïncide avec 0 ou plus répétitions du caractère précédent.|
|+|Coïncide avec 1 ou plus répétitions du caractère précédent.|
|?|Rend le caractère précédent optionnel.|
|{n,m}|Accolades. Coïncide avec au moins "n" mais pas plus que "m" répétition(s) du caractère précédent.|
|(xyz)|Groupe de caractères. Coïncide avec les caractères "xyz" dans l'ordre exact.|
|&#124;|Alternation (ou). Coïncide soit avec le caractère avant ou après le symbole.|
|&#92;|Échappe le prochain caractère. Cela permet de faire coïncider des caractères réservés tels que <code>[ ] ( ) { } . * + ? ^ $ \ &#124;</code>|
|^|Coïncide avec le début de la chaîne de caractères (string).|
|$|Coïncide avec la fin de la chaîne de caractères (string).|
## 2.1 Full stop
Le full stop `.` est l'exemple le plus simple d'un meta-caratère. Le `.` coïncide avec n'importe quel caractère unique, mais ne coïncide pas avec les caractères de retour ou de nouvelle ligne. Par exemple, l'expression régulière `.ar` signifie : n'importe quel caractère suivi par la lettre `a`, suivie par la lettre `r`.
<pre>
".ar" => The <a href="#learn-regex"><strong>car</strong></a> <a href="#learn-regex"><strong>par</strong></a>ked in the <a href="#learn-regex"><strong>gar</strong></a>age.
</pre>
[Essayer l'expression régulière](https://regex101.com/r/xc9GkU/1)
## 2.2 Inclusions de caractères
Les inclusions de caractères sont également appelées classes de caractères. Les crochets sont utilisés pour spécifier les inclusions de caractères. Un trait d'union utilisé dans une inclusion de caractères permet de définir une gamme de caractères. L'ordre utilisé dans la gamme de caractère n'a pas d'importance. Par exemple, l'expression régulière `[Tt]he` signifie : un `T` majuscule ou `t` minuscule, suivie par la lettre `h`, suivie par la lettre `e`.
<pre>
"[Tt]he" => <a href="#learn-regex"><strong>The</strong></a> car parked in <a href="#learn-regex"><strong>the</strong></a> garage.
</pre>
[Essayer l'expression régulière](https://regex101.com/r/2ITLQ4/1)
L'utilisation du point dans une inclusion de caractère signifie toutefois un `.` littéral. L'expression régulière `ar[.]` signifie : un `a` minuscule, suivi par la lettre `r` minuscule, suivie par un `.` (point).
<pre>
"ar[.]" => A garage is a good place to park a c<a href="#learn-regex"><strong>ar.</strong></a>
</pre>
[Essayer l'expression régulière](https://regex101.com/r/wL3xtE/1)
### 2.2.1 Exclusion de caractères
En règle générale, le caractère circonflexe représente le début d'une chaîne de caractères (string). Néanmoins, lorsqu'il est utilisé après le crochet ouvrant, il permet d'exclure la gamme de caractères. Par exemple, l'expression régulière `[^c]ar` signifie : n'importe quel caractère sauf `c`, suivi par la lettre `a`, suivie par la lettre `r`.
<pre>
"[^c]ar" => The car <a href="#learn-regex"><strong>par</strong></a>ked in the <a href="#learn-regex"><strong>gar</strong></a>age.
</pre>
[Essayer l'expression régulière](https://regex101.com/r/nNNlq3/1)
## 2.3 Répétitions
Les meta-caractères suivants `+`, `*` ou `?` sont utilisés pour spécifier combien de fois un sous-schéma peut apparaître. Ces meta-caractères agissent différemment selon la situation dans laquelle ils sont utilisés.
### 2.3.1 Astérisque
Le symbole `*` correspond à zéro ou plus de répétitions du schéma précédent. L'expression régulière `a*` signifie : zéro ou plus de répétitions
du précédent `a` minuscule. Mais si il se trouve après une liste de caractères alors il s'agit de la répétition de la liste entière.
Par exemple, l'expression régulière `[a-z]*` signifie : peu importe la chaine tant qu'il s'agit de lettres minuscules.
<pre>
"[a-z]*" => T<a href="#learn-regex"><strong>he</strong></a> <a href="#learn-regex"><strong>car</strong></a> <a href="#learn-regex"><strong>parked</strong></a> <a href="#learn-regex"><strong>in</strong></a> <a href="#learn-regex"><strong>the</strong></a> <a href="#learn-regex"><strong>garage</strong></a> #21.
</pre>
[Essayer l'expression régulière](https://regex101.com/r/7m8me5/1)
Le symbole `*` peut être utilisé avec le meta-caractère `.` pour correspondre à n'importe quelle chaîne de caractères (string) `.*`. Le symbole `*` peut être utilisé avec le
caractère espace vide `\s` pour correspondre à une chaîne d'espaces vides. Par exemple, l'expression `\s*cat\s*` signifie : zéro ou plus
d'espaces, suivis du caractère `c` minuscule, suivi par le caractère `a` minuscule, suivi par le caractère `t` minuscule, suivi par
zéro ou plus d'espaces.
<pre>
"\s*cat\s*" => The fat<a href="#learn-regex"><strong> cat </strong></a>sat on the <a href="#learn-regex">con<strong>cat</strong>enation</a>.
</pre>
[Essayer l'expression régulière](https://regex101.com/r/gGrwuz/1)
### 2.3.2 Le Plus
Le meta-caractère `+` correspond à une ou plusieurs répétitions du caractère précédent. Par exemple, l'expression régulière `c.+t` signifie : la lettre `c` minuscule, suivie par au moins un caractère, suivie par la lettre `t` minuscule. Le `t` coïncide par conséquent avec le dernier `t` de la phrase.
<pre>
"c.+t" => The fat <a href="#learn-regex"><strong>cat sat on the mat</strong></a>.
</pre>
[Essayer l'expression régulière](https://regex101.com/r/Dzf9Aa/1)
### 2.3.3 Le point d'interrogation
Le meta-caractère `?` rend le caractère précédent optionnel. Ce symbole permet de faire coïncider 0 ou une instance du caractère précédent. Par exemple, l'expression régulière `[T]?he` signifie : la lettre `T` majuscule optionnelle, suivie par la lettre `h` minuscule, suivie par la lettre `e` minuscule.
<pre>
"[T]he" => <a href="#learn-regex"><strong>The</strong></a> car is parked in the garage.
</pre>
[Essayer l'expression régulière](https://regex101.com/r/cIg9zm/1)
<pre>
"[T]?he" => <a href="#learn-regex"><strong>The</strong></a> car is parked in t<a href="#learn-regex"><strong>he</strong></a> garage.
</pre>
[Essayer l'expression régulière](https://regex101.com/r/kPpO2x/1)
## 2.4 Accolades
Dans une expression régulière, les accolades, qui sont aussi appelées quantifieurs, sont utilisées pour spécifier le nombre de fois qu'un
caractère ou un groupe de caractères peut être répété. Par exemple, l'expression régulière `[0-9]{2,3}` signifie : trouve au moins 2 chiffres mais pas plus de 3
(caractères dans la gamme de 0 à 9).
<pre>
"[0-9]{2,3}" => The number was 9.<a href="#learn-regex"><strong>999</strong></a>7 but we rounded it off to <a href="#learn-regex"><strong>10</strong></a>.0.
</pre>
[Essayer l'expression régulière](https://regex101.com/r/juM86s/1)
Nous pouvons omettre le second nombre. Par exemple, l'expression régulière `[0-9]{2,}` signifie : trouve 2 chiffres ou plus. Si nous supprimons aussi
la virgule l'expression régulière `[0-9]{3}` signifie : trouve exactement 3 chiffres.
<pre>
"[0-9]{2,}" => The number was 9.<a href="#learn-regex"><strong>9997</strong></a> but we rounded it off to <a href="#learn-regex"><strong>10</strong></a>.0.
</pre>
[Essayer l'expression régulière](https://regex101.com/r/Gdy4w5/1)
<pre>
"[0-9]{3}" => The number was 9.<a href="#learn-regex"><strong>999</strong></a>7 but we rounded it off to 10.0.
</pre>
[Essayer l'expression régulière](https://regex101.com/r/Sivu30/1)
## 2.5 Groupement de caractères
Un groupement de caractères est un groupe de sous-schémas qui sont écrits entre parenthèses `(...)`. Nous avions mentionné plus tôt que, dans une expression régulière,
si nous mettons un quantifieur après un caractère alors le caractère précédent sera répété. Mais si nous mettons un quantifieur après un groupement de caractères alors
il répète le groupement de caractères en entier. Par exemple, l'expression régulière `(ab)*` trouve zéro ou plus de répétitions des caractères "ab".
Nous pouvons aussi utiliser le meta-caractère d'alternation `|` à l'intérieur d'un groupement. Par exemple, l'expression régulière `(c|g|p)ar` signifie : caractère `c` minuscule,
`g` ou `p`, suivi par le caractère `a`, suivi par le caractère `r`.
<pre>
"(c|g|p)ar" => The <a href="#learn-regex"><strong>car</strong></a> is <a href="#learn-regex"><strong>par</strong></a>ked in the <a href="#learn-regex"><strong>gar</strong></a>age.
</pre>
[Essayer l'expression régulière](https://regex101.com/r/tUxrBG/1)
## 2.6 Alternation
Dans une expression régulière, la barre verticale `|` est utilisée pour définir une alternation. L'alternation est comme une condition entre plusieurs expressions. Maintenant,
nous pourrions penser que la liste de caractères et l'alternation sont la même chose. Mais la grande différence entre une liste de caractères et l'alternation
est que la liste de caractères fonctionne au niveau des caractères mais l'alternation fonctionne au niveau de l'expression. Par exemple, l'expression régulière
`(T|t)he|car` signifie : le caractère `T` majuscule ou `t` minuscule, suivi par le caractère `h` minuscule, suivi par le caractère `e` minuscule
ou le caractère `c` minuscule, suivi par le caractère `a` minuscule, suivit par le caractère `r` minuscule.
<pre>
"(T|t)he|car" => <a href="#learn-regex"><strong>The</strong></a> <a href="#learn-regex"><strong>car</strong></a> is parked in <a href="#learn-regex"><strong>the</strong></a> garage.
</pre>
[Essayer l'expression régulière](https://regex101.com/r/fBXyX0/1)
## 2.7 Caractère d'échappement
L'antislash `\` est utilisé dans les expressions régulières pour échapper (ignorer) le caractère suivant. Cela permet de spécifier un symbole comme caractère à trouver
y compris les caractères réservés `{ } [ ] / \ + * . $ ^ | ?`. Pour utiliser un caractère spécial comme caractère à trouver, préfixer `\` avant celui-ci.
Par exemple, l'expression régulière `.` est utilisée pour trouver n'importe quel caractère sauf le retour de ligne. Donc pour trouver `.` dans une chaine de caractères (string)
l'expression régulière `(f|c|m)at\.?` signifie : la lettre minuscule `f`, `c` ou `m`, suivie par le caractère `a` minuscule, suivi par la lettre
`t` minuscule, suivie par le caractère optionnel `.`.
<pre>
"(f|c|m)at\.?" => The <a href="#learn-regex"><strong>fat</strong></a> <a href="#learn-regex"><strong>cat</strong></a> sat on the <a href="#learn-regex"><strong>mat.</strong></a>
</pre>
[Essayer l'expression régulière](https://regex101.com/r/DOc5Nu/1)
## 2.8 Ancres
Dans les expressions régulières, nous utilisons des ancres pour vérifier si le symbole trouvé est le premier ou dernier symbole de la
chaine de caractères (string). Il y a 2 types d'ancres : Le premier type est le circonflexe `^` qui cherche si le caractère est le premier
caractère de la chaine de caractères (string) et le deuxième type est le Dollar `$` qui vérifie si le caractère est le dernier caractère de la chaine de caractères (string).
### 2.8.1 Circonflexe
Le symbole circonflexe `^` est utilisé pour vérifier si un caractère est le premier caractère de la chaine de caractères (string). Si nous appliquons l'expression régulière
suivante `^a` (si a est le premier symbole) à la chaine de caractères (string) `abc`, ça coïncide. Mais si nous appliquons l'expression régulière `^b` sur cette même chaine de caractères (string),
ça ne coïncide pas. Parce que dans la chaine de caractères (string) `abc` "b" n'est pas le premier symbole. Regardons une autre expression régulière
`^(T|t)he` qui signifie : le caractère `T` majuscule ou le caractère `t` minuscule est le premier symbole de la chaine de caractères (string),
suivi par le caractère `h` minuscule, suivi par le caractère `e` minuscule.
<pre>
"(T|t)he" => <a href="#learn-regex"><strong>The</strong></a> car is parked in <a href="#learn-regex"><strong>the</strong></a> garage.
</pre>
[Essayer l'expression régulière](https://regex101.com/r/5ljjgB/1)
<pre>
"^(T|t)he" => <a href="#learn-regex"><strong>The</strong></a> car is parked in the garage.
</pre>
[Essayer l'expression régulière](https://regex101.com/r/jXrKne/1)
### 2.8.2 Dollar
Le symbole Dollar `$` est utilisé pour vérifier si un caractère est le dernier caractère d'une chaine de caractères (string). Par exemple, l'expression régulière
`(at\.)$` signifie : un caractère `a` minuscule, suivi par un caractère `t` minuscule, suivi par un caractère `.` et tout cela doit être
à la fin de la chaine de caractères (string).
<pre>
"(at\.)" => The fat c<a href="#learn-regex"><strong>at.</strong></a> s<a href="#learn-regex"><strong>at.</strong></a> on the m<a href="#learn-regex"><strong>at.</strong></a>
</pre>
[Essayer l'expression régulière](https://regex101.com/r/y4Au4D/1)
<pre>
"(at\.)$" => The fat cat. sat. on the m<a href="#learn-regex"><strong>at.</strong></a>
</pre>
[Essayer l'expression régulière](https://regex101.com/r/t0AkOd/1)
## 3. Liste de caractères abrégés
Les expressions régulières fournissent des abréviations pour les listes de caractères, ce qui offres des raccourcis pratiques pour
les expressions régulières souvent utilisées. Ces abréviations sont les suivantes :
|Abréviation|Description|
|:----:|----|
|.|N'importe quel caractère à part le retour de ligne|
|\w|Caractères alphanumériques : `[a-zA-Z0-9_]`|
|\W|Caractères non-alphanumériques : `[^\w]`|
|\d|Chiffres : `[0-9]`|
|\D|Non-numériques : `[^\d]`|
|\s|Espace vide : `[\t\n\f\r\p{Z}]`|
|\S|Tout sauf espace vide : `[^\s]`|
## 4. Recherche
La recherche en avant et en arrière sont un type spécifique appelé ***groupe non-capturant*** (utilisés pour trouver un schéma mais pas
pour l'inclure dans la liste de correspondance). Les recherches positives sont utilisées quand nous avons la condition qu'un schéma doit être précédé ou suivi
par un autre schéma. Par exemple, nous voulons tous les chiffres qui sont précédés par le caractère `$` dans la chaine de caractères suivante `$4.44 and $10.88`.
Nous allons utiliser l'expression régulière suivante `(?<=\$)[0-9\.]*` qui signifie : trouver tous les nombres qui contiennent le caractère `.` et sont précédés
par le caractère `$`. Les recherches que nous trouvons dans les expressions régulières sont les suivantes:
|Symbole|Description|
|:----:|----|
|?=|Recherche en avant positive|
|?!|Recherche en avant négative|
|?<=|Recherche en arrière positive|
|?<!|Recherche en arrière négative|
### 4.1 Recherche en avant positive
La recherche en avant assure que la première partie de l'expression soit suivie par l'expression recherchée. La valeur retournée
contient uniquement le texte qui correspond à la première partie de l'expression. Pour définir une recherche en avant positive, on utilise
des parenthèses. Entre ces parenthèses, un point d'interrogation avec un signe égal est utilisé comme cela : `(?=...)`. L'expression de recherche
est écrite après le signe égal dans les parenthèses. Par exemple, l'expression régulière `[T|t]he(?=\sfat)` signifie : trouve optionnellement
la lettre `t` minuscule ou la lettre `T` majuscule, suivie par la lettre `h` minuscule, suivie par la lettre `e`. Entre parenthèses nous définissons
la recherche en avant positive qui dit quelle est l'expression à chercher. `The` ou `the` qui sont suivies par le mot `fat` précédé d'un espace.
<pre>
"[T|t]he(?=\sfat)" => <a href="#learn-regex"><strong>The</strong></a> fat cat sat on the mat.
</pre>
[Essayer l'expression régulière](https://regex101.com/r/IDDARt/1)
### 4.2 Recherche en avant négative
La recherche en avant négative est utilisée quand nous avons besoin de trouver une chaine de caractères (string) qui n'est pas suivie d'un schéma. La recherche en avant négative
est définie de la même manière que la recherche en avant positive mais la seule différence est qu'à la place du signe égal `=` nous utilisons le caractère de négation `!`
i.e. `(?!...)`. Regardons l'expression régulière suivante `[T|t]he(?!\sfat)` qui signifie : trouve tous les mots `The` ou `the` de la chaine de caractères (string)
qui ne sont pas suivis du mot `fat` précédé d'un espace.
<pre>
"[T|t]he(?!\sfat)" => The fat cat sat on <a href="#learn-regex"><strong>the</strong></a> mat.
</pre>
[Essayer l'expression régulière](https://regex101.com/r/V32Npg/1)
### 4.3 Recherche en arrière positive
La recherche en arrière positive est utilisée pour trouver une chaine de caractères (string) précédée d'un schéma. La recherche en arrière positive se note
`(?<=...)`. Par exemple, l'expression régulière `(?<=[T|t]he\s)(fat|mat)` signifie : trouve tous les mots `fat` ou `mat` de la chaine de caractères (string) qui
se trouve après le mot `The` ou `the`.
<pre>
"(?<=[T|t]he\s)(fat|mat)" => The <a href="#learn-regex"><strong>fat</strong></a> cat sat on the <a href="#learn-regex"><strong>mat</strong></a>.
</pre>
[Essayer l'expression régulière](https://regex101.com/r/avH165/1)
### 4.4 Recherche en arrière négative
La recherche en arrière négative est utilisée pour trouver une chaine de caractères (string) qui n'est pas précédée d'un schéma. La recherche en arrière négative se note
`(?<!...)`. Par exemple, l'expression régulière `(?<!(T|t)he\s)(cat)` signifie : trouve tous les mots `cat` de la chaine de caractères (string) qui
ne se trouvent pas après le mot `The` ou `the`.
<pre>
"(?&lt;![T|t]he\s)(cat)" => The cat sat on <a href="#learn-regex"><strong>cat</strong></a>.
</pre>
[Essayer l'expression régulière](https://regex101.com/r/8Efx5G/1)
## 5. Drapeaux
Les drapeaux sont aussi appelés modifieurs car ils modifient la sortie d'une expression régulière. Ces drapeaux peuvent être utilisés
dans n'importe quel ordre et combinaison et font partie intégrante de la RegExp.
|Drapeau|Description|
|:----:|----|
|i|Insensible à la casse : Définit que la correspondance sera insensible à la casse.|
|g|Recherche globale : Recherche la correspondance dans la chaine de caractères (string) entière.|
|m|Multiligne : Meta-caractère ancre qui agit sur toutes les lignes.|
### 5.1 Insensible à la casse
Le modifieur `i` est utilisé pour faire une correspondance insensible à la casse. Par exemple, l'expression régulière `/The/gi` signifie : la lettre
`T` majuscule, suivie par le caractère `h` minuscule, suivi par le caractère `e` minuscule. Et à la fin de l'expression régulière, le drapeau `i` dit au
moteur d'expression régulière d'ignorer la casse. Comme vous pouvez le voir, nous mettons aussi un drapeau `g` parce que nous voulons chercher le schéma dans
la chaine de caractères (string) entière.
<pre>
"The" => <a href="#learn-regex"><strong>The</strong></a> fat cat sat on the mat.
</pre>
[Essayer l'expression régulière](https://regex101.com/r/dpQyf9/1)
<pre>
"/The/gi" => <a href="#learn-regex"><strong>The</strong></a> fat cat sat on <a href="#learn-regex"><strong>the</strong></a> mat.
</pre>
[Essayer l'expression régulière](https://regex101.com/r/ahfiuh/1)
### 5.2 Correspondance globale
Le modifieur `g` est utilisé pour faire une recherche globale (trouver toutes les chaines de caractères (string) plutôt que de s'arrêter à la première correspondance ). Par exemple,
l'expression régulière `/.(at)/g` signifie : n'importe quel caractère sauf le retour de ligne, suivi par le caractère `a` minuscule, suivi par le caractère
`t` minuscule. Grâce au drapeau `g` à la fin de l'expression régulière maintenant il trouvera toutes les correspondances de toute la chaine de caractères (string).
<pre>
"/.(at)/" => The <a href="#learn-regex"><strong>fat</strong></a> cat sat on the mat.
</pre>
[Essayer l'expression régulière](https://regex101.com/r/jnk6gM/1)
<pre>
"/.(at)/g" => The <a href="#learn-regex"><strong>fat</strong></a> <a href="#learn-regex"><strong>cat</strong></a> <a href="#learn-regex"><strong>sat</strong></a> on the <a href="#learn-regex"><strong>mat</strong></a>.
</pre>
[Essayer l'expression régulière](https://regex101.com/r/dO1nef/1)
### 5.3 Multilignes
Le modifieur `m` est utilisé pour trouver une correspondance multiligne. Comme mentionné plus tôt, les ancres `(^, $)` sont utilisés pour vérifier si le schéma
se trouve au début ou à la fin de la chaine de caractères (string). Mais si nous voulons que l'ancre soit sur chaque ligne nous utilisons le drapeau `m`. Par exemple, l'expression régulière
`/at(.)?$/gm` signifie : le caractère `a` minuscule, suivi par le caractère `t` minuscule, suivi par optionnellement n'importe quel caractère à part le retour de ligne.
Grâce au drapeau `m` maintenant le moteur d'expression régulière trouve le schéma à chaque début de ligne dans la chaine de caractères (string).
<pre>
"/.at(.)?$/" => The fat
cat sat
on the <a href="#learn-regex"><strong>mat.</strong></a>
</pre>
[Essayer l'expression régulière](https://regex101.com/r/hoGMkP/1)
<pre>
"/.at(.)?$/gm" => The <a href="#learn-regex"><strong>fat</strong></a>
cat <a href="#learn-regex"><strong>sat</strong></a>
on the <a href="#learn-regex"><strong>mat.</strong></a>
</pre>
[Essayer l'expression régulière](https://regex101.com/r/E88WE2/1)
## Contribution
* Signaler les problèmes (issues)
* Ouvrir des "pull requests" pour les améliorations
* Parlez-en autour de vous !
* Contactez moi en anglais à ziishaned@gmail.com ou [![Twitter URL](https://img.shields.io/twitter/url/https/twitter.com/ziishaned.svg?style=social&label=Follow%20%40ziishaned)](https://twitter.com/ziishaned)
## License
MIT © [Zeeshan Ahmed](mailto:ziishaned@gmail.com)

View File

@ -6,8 +6,12 @@
## 翻訳
* [English](README.md)
* [Español](README-es.md)
* [Français](README-fr.md)
* [中文版](README-cn.md)
* [日本語](README-ja.md)
* [한국어](README-ko.md)
* [Turkish](README-tr.md)
## 正規表現とは
@ -25,7 +29,7 @@
<br/><br/>
<p align="center">
<img src="https://i.imgur.com/ekFpQUg.png" alt="Regular expression">
<img src="./img/regexp-en.png" alt="Regular expression">
</p>
この正規表現によって `john_doe, jo-hn_doe, john12_as` などは許容されることになります。
@ -188,7 +192,8 @@
シンボル `+` は直前の文字が 1 個以上続くパターンにマッチします。
例えば `c.+t` という正規表現は小文字の `c` の後に
任意の 1 文字が続き、さらに `t` が続くことを意味します。
任意の 1 文字以上が続き、さらに `t` が続くことを意味します。
この `t` は、その文における最後の `t` がマッチします。
<pre>
"c.+t" => The fat <a href="#learn-regex"><strong>cat sat on the mat</strong></a>.
@ -216,6 +221,7 @@
[正規表現の動作確認をする](https://regex101.com/r/kPpO2x/1)
## 2.4 括弧
この`t`は、その文における最後の`t`であることが明確である必要があります。
正規表現における括弧は数量子とも呼ばれますが、文字列がいくつ現れるかを示すために使用されます。
例えば、`[0-9]{2,3}` という正規表現は 2 桁以上 3 桁以下の数字
@ -341,7 +347,7 @@
正規表現ではよく使われる文字集合に対して短縮表記が提供されており、
便利なショートカットとして使用できます。
省略表記には次のようなものがあります。
短縮表記には次のようなものがあります。
|短縮表記|説明 |
|:------:|-----------------------------------|
@ -355,7 +361,7 @@
## 4. 前後参照
しばしば前後参照とも呼ばれる先読みと後読みは **非キャプチャグループ**
先読みと後読み(前後参照とも呼ばれます)は **非キャプチャグループ**
(パターンのマッチングはするがマッチングリストには含まれない)という
特殊な扱いがなされる機能です。
前後参照はあるパターンが別のあるパターンよりも先行または後続して現れることを示すために使用されます。
@ -378,12 +384,12 @@
肯定的な先読みを定義するには括弧を使用します。
その括弧の中で疑問符と等号を合わせて `(?=...)` のようにします。
先読みのパターンは括弧の中の等号の後に記述します。
例えば `[T|t]he(?=\sfat)` という正規表現は小文字の `t` か大文字の `T` のどちらかの後に `h`, `e` が続きます。
例えば `(T|t)he(?=\sfat)` という正規表現は小文字の `t` か大文字の `T` のどちらかの後に `h`, `e` が続きます。
括弧内で肯定的な先読みを定義していますが、これは `The` または `the` の後に
`fat` が続くことを表しています。
<pre>
"[T|t]he(?=\sfat)" => <a href="#learn-regex"><strong>The</strong></a> fat cat sat on the mat.
"(T|t)he(?=\sfat)" => <a href="#learn-regex"><strong>The</strong></a> fat cat sat on the mat.
</pre>
[正規表現の動作確認をする](https://regex101.com/r/IDDARt/1)
@ -393,11 +399,11 @@
否定的な先読みはあるパターンが後続しない全てのマッチング文字列を取得するために使用します。
否定的な先読みは肯定的な先読みと同じように定義しますが、 `=` の代わりに
`!` を使うところが唯一の違いで、`(?!...)` と記述します。
次の正規表現 `[T|t]he(?!\sfat)` について考えてみます。
次の正規表現 `(T|t)he(?!\sfat)` について考えてみます。
これはスペースを挟んで `fat` が後続することがない全ての `The` または `the` を得ることができます。
<pre>
"[T|t]he(?!\sfat)" => The fat cat sat on <a href="#learn-regex"><strong>the</strong></a> mat.
"(T|t)he(?!\sfat)" => The fat cat sat on <a href="#learn-regex"><strong>the</strong></a> mat.
</pre>
[正規表現の動作確認をする](https://regex101.com/r/V32Npg/1)
@ -406,11 +412,11 @@
肯定的な後読みは特定のパターンが先行するような文字列を得るために使用します。
定義の仕方は `(?<=...)` とします。
例えば `(?<=[T|t]he\s)(fat|mat)` という正規表現は
例えば `(?<=(T|t)he\s)(fat|mat)` という正規表現は
`The` または `the` の後に続く全ての `fat` または `mat` が取得できます。
<pre>
"(?<=[T|t]he\s)(fat|mat)" => The <a href="#learn-regex"><strong>fat</strong></a> cat sat on the <a href="#learn-regex"><strong>mat</strong></a>.
"(?<=(T|t)he\s)(fat|mat)" => The <a href="#learn-regex"><strong>fat</strong></a> cat sat on the <a href="#learn-regex"><strong>mat</strong></a>.
</pre>
[正規表現の動作確認をする](https://regex101.com/r/avH165/1)
@ -422,7 +428,7 @@
例えば `(?<!(T|t)he\s)(cat)``The` または `the` に続いていない全ての `cat` が取得できます。
<pre>
"(?&lt;![T|t]he\s)(cat)" => The cat sat on <a href="#learn-regex"><strong>cat</strong></a>.
"(?&lt;!(T|t)he\s)(cat)" => The cat sat on <a href="#learn-regex"><strong>cat</strong></a>.
</pre>
[正規表現の動作確認をする](https://regex101.com/r/8Efx5G/1)
@ -440,7 +446,7 @@
### 5.1 大文字・小文字を区別しない
修飾子 `i` は大文字・小文字を区別しくないときに使用します。
修飾子 `i` は大文字・小文字を区別しくないときに使用します。
例えば `/The/gi` という正規表現は大文字の `T` の後に小文字の `h`, `e` が続くという意味ですが、
最後の `i` で大文字・小文字を区別しない設定にしています。
文字列内の全マッチ列を検索したいのでフラグ `g` も渡しています。
@ -462,8 +468,8 @@
修飾子 `g` はグローバル検索(最初のマッチ列を検索する代わりに全マッチ列を検索する)を
行うために使用します。
例えば `/.(at)/g` という正規表現は、改行を除く任意の文字列の後に
小文字の `a`, `t` が続きます。正規表現の最後にフラグ `g` を渡すことで
入力文字列内の全マッチ列を検索するようにしています。
小文字の `a`, `t` が続きます。正規表現の最後にフラグ `g` を渡すことで
最初のマッチだけではなく(これがデフォルトの動作です)、入力文字列内の全マッチ列を検索するようにしています。
<pre>
"/.(at)/" => The <a href="#learn-regex"><strong>fat</strong></a> cat sat on the mat.
@ -502,33 +508,9 @@
[正規表現の動作確認をする](https://regex101.com/r/E88WE2/1)
## おまけ
* *正の整数*: `^\d+$`
* *負の整数*: `^-\d+$`
* *米国の電話番号*: `^+?[\d\s]{3,}$`
* *コード付きの米国の電話番号*: `^+?[\d\s]+(?[\d\s]{10,}$`
* *整数*: `^-?\d+$`
* *ユーザ名*: `^[\w.]{4,16}$`
* *英数字*: `^[a-zA-Z0-9]*$`
* *スペース込みの英数字*: `^[a-zA-Z0-9 ]*$`
* *パスワード*: `^(?=^.{6,}$)((?=.*[A-Za-z0-9])(?=.*[A-Z])(?=.*[a-z]))^.*$`
* *Eメール*: `^([a-zA-Z0-9._%-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,})*$`
* *IPv4 アドレス*: `^((?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))*$`
* *小文字のみ*: `^([a-z])*$`
* *大文字のみ*: `^([A-Z])*$`
* *URL*: `^(((http|https|ftp):\/\/)?([[a-zA-Z0-9]\-\.])+(\.)([[a-zA-Z0-9]]){2,4}([[a-zA-Z0-9]\/+=%&_\.~?\-]*))*$`
* *VISA クレジットカード番号*: `^(4[0-9]{12}(?:[0-9]{3})?)*$`
* *日付 (DD/MM/YYYY)*: `^(0?[1-9]|[12][0-9]|3[01])[- /.](0?[1-9]|1[012])[- /.](19|20)?[0-9]{2}$`
* *日付 (MM/DD/YYYY)*: `^(0?[1-9]|1[012])[- /.](0?[1-9]|[12][0-9]|3[01])[- /.](19|20)?[0-9]{2}$`
* *日付 (YYYY/MM/DD)*: `^(19|20)?[0-9]{2}[- /.](0?[1-9]|1[012])[- /.](0?[1-9]|[12][0-9]|3[01])$`
* *MasterCard クレジットカード番号*: `^(5[1-5][0-9]{14})*$`
* *ハッシュタグ*: 前の文字列を含む (abc123#xyz456) または角括弧内にスペースを含む (#[foo bar]) : `\S*#(?:\[[^\]]+\]|\S+)`
* *@mentions*: `\B@[a-z0-9_-]+`
## 貢献する
* 課題を発行する
* イシューを発行する
* 修正をプルリクエストする
* ドキュメントを普及させる
* 作者に直接連絡を取る: ziishaned@gmail.com または [![Twitter URL](https://img.shields.io/twitter/url/https/twitter.com/ziishaned.svg?style=social&label=Follow%20%40ziishaned)](https://twitter.com/ziishaned)

411
README-ko.md Normal file
View File

@ -0,0 +1,411 @@
<br/>
<p align="center">
<img src="https://i.imgur.com/bYwl7Vf.png" alt="Learn Regex">
</p><br/>
## 번역:
* [English](README.md)
* [Español](README-es.md)
* [Français](README-fr.md)
* [中文版](README-cn.md)
* [日本語](README-ja.md)
* [한국어](README-ko.md)
* [Turkish](README-tr.md)
## 정규표현식이란 무엇인가?
> 정규표현식은 텍스트에서 특정 패턴을 찾아내는데 사용되는 문자 혹은 기호들의 집합이다.
정규표현식(Regular expression)은 대상 문자열에 왼쪽에서 오른쪽 방향으로 매칭되는 하나의 패턴이다. "Regular expression"이라고 매번 발음하기 어렵기 때문에, 보통 약어로 "regex" 혹은 "regexp", "정규식"으로 축약되어 사용된다. 정규 표현식은 문자열 내부의 텍스트 대체, 포맷의 유효성 검사, 패턴 매칭을 기반으로한 문자열에서 일부 텍스트를 추출, 그리고 그 외에 다양한 목적을 위해 사용된다.
당신이 하나의 어플리케이션을 작성하고 있고 사용자가 사용자명을 선택할 때 사용되는 규칙들을 정하고 싶다고 상상해보자. 예를 들어, 우리는 사용자명에 문자, 숫자, 밑줄 문자(\_), 그리고 하이픈이 포함되는 것은 허용하고 싶다. 또한, 사용자명의 글자수를 제한해서 사용자명이 지저분해보이지 않도록 하고 싶다. 이때 아래 정규표현식을 사용해 입력된 사용자명이 해당 규칙에 맞는지 검사할 수 있다.
<br/><br/>
<p align="center">
<img src="./img/regexp-en.png" alt="Regular expression">
</p>
위의 정규 표현식은 `john_doe`, `jo-hn_doe`, 그리고 `john12_as` 문자열을 받아들일 수 있다. `Jo`는 대문자를 포함하고 있고 길이가 너무 짧기 때문에 위의 정규표현식과 매칭되지 않는다.
## 목차
- [기본 매쳐](#1-기본-매쳐)
- [메타 문자](#2-메타-문자)
- [마침표](#21-마침표)
- [문자 집합](#22-문자-집합)
- [부정 문자 집합](#221-부정-문자-집합)
- [반복](#23-반복)
- [별 부호](#231-별-부호)
- [덧셈 부호](#232-덧셈-부호)
- [물음표](#233-물음표)
- [중괄호](#24-중괄호)
- [문자 그룹](#25-문자-그룹)
- [대안 부호](#26-대안-부호)
- [특수 문자 이스케이핑](#27-특수-문자-이스케이핑)
- [앵커 부호](#28-앵커-부호)
- [캐럿 부호](#281-캐럿-부호)
- [달러 부호](#282-달러-부호)
- [단축형 문자열 집합](#3-단축형-문자열-집합)
- [전후방탐색](#4-전후방탐색)
- [긍정형 전방탐색](#41-긍정형-전방탐색)
- [부정형 전방탐색](#42-부정형-전방탐색)
- [긍정형 후방탐색](#43-긍정형-후방탐색)
- [부정형 후방탐색](#44-부정형-후방탐색)
- [플래그](#5-플래그)
- [대소문자 구분없음](#51-대소문자-구분없음)
- [전체 검색](#52-전체-검색)
- [멀티 라인](#53-멀티-라인)
## 1. 기본 매쳐
하나의 정규 표현식은 단지 텍스트 내부의 검색을 수행하기 위한 문자열의 패턴이다. 예를 들어, 정규 표현식 `the`는 문자 `t` 다음에 문자 `h`, 그 다음에 문자 `e`가 나오는 것을 의미한다.
<pre>
"the" => The fat cat sat on <a href="#learn-regex"><strong>the</strong></a> mat.
</pre>
[Test the regular expression](https://regex101.com/r/dmRygT/1)
정규 표현식 `123`은 문자열 `123`에 매칭된다. 정규 표현식은 정규 표현식의 각 문자(Character)와 입력된 문자열의 각 문자(Character)를 비교함으로써 해당 문자열과 매칭된다. 정규 표현식들은 일반적으로 대소문자를 구분하기 때문에, 정규 표현식 `The`는 문자열 `the`와 매칭되지 않는다.
<pre>
"The" => <a href="#learn-regex"><strong>The</strong></a> fat cat sat on the mat.
</pre>
[Test the regular expression](https://regex101.com/r/1paXsy/1)
## 2. 메타 문자
메타 문자들은 정규 표현식의 빌딩 블락들이다. 메타 문자들은 자체적인 의미를 가지지 않고 특별한 방식으로 해석되어진다. 어떤 메타 문자열들은 특별한 의미를 가지며 대괄호안에서 쓰인다.
아래는 이러한 메타 문자열들이다:
|메타 문자|설명|
|:----:|----|
|.|온점(Period)는 줄바꿈을 제외한 어떤 종류의 단일 문자와 매치.|
|[ ]|문자 클래스. 대괄호 사이에 있는 문자들로 매치.|
|[^ ]|부정 문자 클래스. 대괄호 안에 포함되지 않은 모든 문자들로 매치.|
|*|이 메타 문자의 바로 앞에 있는 심볼이 0번 이상 반복된 문자들과 매치.|
|+|이 메타 문자의 바로 앞에 있는 심볼이 한번 이상 반복된 문자들과 매치.|
|?|이 메타 문자의 바로 앞에 있는 심볼을 선택적(optional)으로 만듬.|
|{n,m}|중괄호. 이 메타 문자의 바로 앞에 위치한 심볼이 최소 n번 최대 m번의 반복된 문자들과 매치.|
|(xyz)|문자 그룹. 문자열 xyz와 정확히 같은 순서를 가진 문자들과 매치.|
|&#124;|대안. 문자가 이 메타 문자의 앞에 있는 심볼이거나 뒤에 있는 심볼이면 매치.|
|&#92;|다음 문자 이스케이프(Escape). 예약된 문자열들 <code>[ ] ( ) { } . * + ? ^ $ \ &#124;</code>을 이스케이핑함으로써 그 자체와 매칭되는 것을 허용.|
|^|입력의 시작과 매치.|
|$|입력의 끝과 매치.|
## 2.1 마침표
마침표(`.`)는 메타 문자의 가장 간단한 예다. 메타 문자 `.`는 어떠한 단일 문자와도 매치되지만 리턴 혹은 개행 문자와는 매치되지 않는다. 예를 들어, 정규 표현식 `.ar`은 어떠한 단일 문자 다음에 문자 `a`가 오고, 그 다음에 문자 `r`이 오는 패턴을 의미한다.
<pre>
".ar" => The <a href="#learn-regex"><strong>car</strong></a> <a href="#learn-regex"><strong>par</strong></a>ked in the <a href="#learn-regex"><strong>gar</strong></a>age.
</pre>
[Test the regular expression](https://regex101.com/r/xc9GkU/1)
## 2.2 문자 집합
문자 집합은 문자 클래스라고도 불린다. 대괄호는 이 문자 집합을 명시하기 위해 사용된다. 문자열 집합내에 사용된 하이픈은 문자들의 범위를 지정하는데 사용된다. 대괄호 내부에 명시된 문자들의 순서는 중요하지 않다. 예를 들어, 정규 표현식 `[Tt]he`는 대문자 `T` 혹은 소문자 `t`가 나온 다음에, 문자 `h`가 나오고 그 뒤에 문자 `e`가 나오는 패턴을 의미한다.
<pre>
"[Tt]he" => <a href="#learn-regex"><strong>The</strong></a> car parked in <a href="#learn-regex"><strong>the</strong></a> garage.
</pre>
[Test the regular expression](https://regex101.com/r/2ITLQ4/1)
하지만, 문자 집합 내부에서 사용되는 온점(Period)은 온점 그 자체를 의미한다. 정규 표현식 `ar[.]`은 소문자 `a` 다음에 문자 `r`이 오고 그 뒤에 문자 `.`이 오는 패턴을 의미한다.
<pre>
"ar[.]" => A garage is a good place to park a c<a href="#learn-regex"><strong>ar.</strong></a>
</pre>
[Test the regular expression](https://regex101.com/r/wL3xtE/1)
### 2.2.1 부정 문자 집합
일반적으로, 캐럿 기호(^)는 문자열의 시작지점을 나타내지만, 왼쪽 대괄호 바로 뒤에 위치했을때는 해당 문자 집합의 부정(negation)을 나타낸다. 예를 들어, 정규 표현식 `[^c]ar`은 문자 `c`를 제외한 어떠한 문자뒤에 문자 `a`가 오고, 그 뒤에 문자 `r`이 오는 패턴을 의미한다.
<pre>
"[^c]ar" => The car <a href="#learn-regex"><strong>par</strong></a>ked in the <a href="#learn-regex"><strong>gar</strong></a>age.
</pre>
[Test the regular expression](https://regex101.com/r/nNNlq3/1)
## 2.3 반복
메타 문자 `+`, `*` 또는 `?`은 하위패턴(subpattern)이 몇 번 발생하는지 지정하는데 사용된다. 이러한 메타 문자들은 상황에 따라 다르게 동작한다.
### 2.3.1 별 부호
`*` 부호는 부호 앞에 위치한 매처(matcher)가 0번 이상 반복된 문자열과 매치된다. 정규 표현식 `a*`은 소문자 `a`가 0번 이상 반복되는 패턴을 의미한다. 하지만, 만약 이 별 부호가 문자 집합(character set) 직후에 나오는 경우에는 문자 집합 전체의 반복을 찾게된다. 예를 들어, 정규 표현식 `[a-z]*`은 소문자들이 갯수와 상관없이 연속으로 반복되는 패턴을 의미한다.
<pre>
"[a-z]*" => T<a href="#learn-regex"><strong>he</strong></a> <a href="#learn-regex"><strong>car</strong></a> <a href="#learn-regex"><strong>parked</strong></a> <a href="#learn-regex"><strong>in</strong></a> <a href="#learn-regex"><strong>the</strong></a> <a href="#learn-regex"><strong>garage</strong></a> #21.
</pre>
[Test the regular expression](https://regex101.com/r/7m8me5/1)
`*` 부호는 메타 문자 `.`와 함께 모든 문자열과 매치되는 패턴을 만드는데 사용될 수 있다. 또한, `*` 부호는 공백 문자 `\s`와 함께 공백 문자들로 이루어진 문자열과 매치되는 패턴을 만드는데 사용될 수 있다. 예를 들어, 정규 표현식 `\s*cat\s*`는 0번 이상 공백문자가 나온 이후에 소문자 `c`, 소문자 `a`, 소문자 `t`가 자체로 나오고 그 뒤에 다시 0번 이상의 공백문자가 나오는 패턴을 의미한다.
<pre>
"\s*cat\s*" => The fat<a href="#learn-regex"><strong> cat </strong></a>sat on the <a href="#learn-regex">con<strong>cat</strong>enation</a>.
</pre>
[Test the regular expression](https://regex101.com/r/gGrwuz/1)
### 2.3.2 덧셈 부호
`+` 부호는 부호 앞에 위치한 문자가 한번 이상 반복되는 패턴을 만드는데 사용된다. 예를 들어, 정규 표현식 `c.+t`는 소문자 `c`가 나오고, 그 뒤에 한개 이상의 문자가 나온 후, 소문자 `t`가 나오는 패턴을 의미한다. 여기서 문자 `t`는 해당 문장의 제일 마지막 글자 `t`라는것을 명확히할 필요가 있다.
w
<pre>
"c.+t" => The fat <a href="#learn-regex"><strong>cat sat on the mat</strong></a>.
</pre>
[Test the regular expression](https://regex101.com/r/Dzf9Aa/1)
### 2.3.3 물음표
정규 표현식에서 메타 문자 `?`는 선행 문자를 선택적으로 만드는 역할을 한다. 물음표는 부호 앞에 쓰여진 문자가 선택적으로 나오는 패턴을 나타내는데 사용된다. 예를 들어, 정규 표현식 `[T]?he`는 대문자 `T`가 선택적으로 나온 이후에, 그 뒤에 소문자 `h`, 그 뒤에 소문자 `e`가 나오는 패턴을 의미한다.
<pre>
"[T]he" => <a href="#learn-regex"><strong>The</strong></a> car is parked in the garage.
</pre>
[Test the regular expression](https://regex101.com/r/cIg9zm/1)
<pre>
"[T]?he" => <a href="#learn-regex"><strong>The</strong></a> car is parked in t<a href="#learn-regex"><strong>he</strong></a> garage.
</pre>
[Test the regular expression](https://regex101.com/r/kPpO2x/1)
## 2.4 중괄호
정규 표현식에서 정량자(quantifier)라고도 불리는 중괄호는 하나의 문자 혹은 문자 집합으로 표시된 문자가 몇번 반복되는지 명시하는데 사용된다. 예를 들어, 정규 표현식 `[0-9]{2,3}`은 숫자 문자(0부터 9사이의 문자)가 최소 2번, 최대 3번 연속해서 나오는 문자열 패턴을 의미한다.
<pre>
"[0-9]{2,3}" => The number was 9.<a href="#learn-regex"><strong>999</strong></a>7 but we rounded it off to <a href="#learn-regex"><strong>10</strong></a>.0.
</pre>
[Test the regular expression](https://regex101.com/r/juM86s/1)
두번째 숫자를 생략하는 것이 가능하다. 예를 들어, 정규 표현식 `[0-9]{2,}`는 2번 이상의 숫자가 연속으로 나오는 패턴을 의미한다. 만약 여기서 쉼표를 삭제하는 경우, 정규 표현식 `[0-9]{3}`은 숫자가 정확히 3번 연속해서 나오는 패턴을 의미한다.
<pre>
"[0-9]{2,}" => The number was 9.<a href="#learn-regex"><strong>9997</strong></a> but we rounded it off to <a href="#learn-regex"><strong>10</strong></a>.0.
</pre>
[Test the regular expression](https://regex101.com/r/Gdy4w5/1)
<pre>
"[0-9]{3}" => The number was 9.<a href="#learn-regex"><strong>999</strong></a>7 but we rounded it off to 10.0.
</pre>
[Test the regular expression](https://regex101.com/r/Sivu30/1)
## 2.5 문자 그룹
문자 그룹은 괄호 `(...)` 안에 쓰여진 하위 패턴들의 그룹이다. 위에서 논의했듯이, 정규 표현식에서 하나의 문자 뒤에 정량자(quantifier)를 넣는 경우에는 해당 문자의 반복을 나타낸다. 하지만, 만약 하나의 문자 그룹 뒤에 정량자를 넣는 경우에는 문자 그룹 전체의 반복을 나타내게 된다. 예를 들어, 정규 표현식 `(ab)*`는 문자 "ab"가 0번 이상 반복되는 패턴을 의미한다. 대안 부호인 `|` 또한 문자 그룹 내부에서 사용할 수 있다. 예를 들어, 정규 표현식 `(c|g|p)ar`은 소문자 `c`, `g` 혹은 `p`가 나온 이후에 문자 `a`가 나오고 그 뒤에 문자 `r`이 나오는 패턴을 의미한다.
<pre>
"(c|g|p)ar" => The <a href="#learn-regex"><strong>car</strong></a> is <a href="#learn-regex"><strong>par</strong></a>ked in the <a href="#learn-regex"><strong>gar</strong></a>age.
</pre>
[Test the regular expression](https://regex101.com/r/tUxrBG/1)
## 2.6 대안 부호
정규 표현식에서 수직 막대 부호 `|`는 대안을 정의하는데 사용된다. 대안 부호는 여러개의 표현식들 사이의 조건과도 같다. 지금쯤 당신은 문자 집합(Character set)과 대안 부호가 동일하게 동작한다고 생각하고 있을 것이다. 하지만, 문자 집합과 대안 부호 사이의 가장 큰 차이점은 문자 집합은 문자 수준에서 동작하는 반면, 대안 부호는 표현식 수준에서 동작한다는 것이다. 예를 들어, 정규 표현식 `(T|t)he|car`는 대문자 `T` 혹은 소문자 `t`가 나오고 문자 `h`, 문자 `e`가 차례로 나오거나 문자 `c`, 문자 `a`, 문자 `r`이 차례로 나오는 패턴을 의미한다.
<pre>
"(T|t)he|car" => <a href="#learn-regex"><strong>The</strong></a> <a href="#learn-regex"><strong>car</strong></a> is parked in <a href="#learn-regex"><strong>the</strong></a> garage.
</pre>
[Test the regular expression](https://regex101.com/r/fBXyX0/1)
## 2.7 특수 문자 이스케이핑
백 슬래시 `\`는 정규 표현식에서 다음에 나오는 부호를 이스케이핑하는데 사용된다. 백 슬래시는 예약 문자들인 `{ } [ ] / \ + * . $ ^ | ?`를 메타 부호가 아닌 문자 그 자체로 매칭되도록 명시한다. 특수 문자를 매칭 캐릭터로 사용하기 위해서는 백 슬래시 `\`를 해당 특수 문자 앞에 붙이면 된다. 예를 들어, 정규 표현식 `.`은 개행을 제외한 어떤 문자와 매칭된다. 입력 문자열에 포함된 `.` 문자를 매치시키는 정규 표현식 `(f|c|m)at\.?`은 소문자 `f`, `c` 또는 `m` 이후에 소문자 `a``t`가 차례로 등장하고 이후에 문자 `.`가 선택적으로 나타나는 패턴을 의미한다.
<pre>
"(f|c|m)at\.?" => The <a href="#learn-regex"><strong>fat</strong></a> <a href="#learn-regex"><strong>cat</strong></a> sat on the <a href="#learn-regex"><strong>mat.</strong></a>
</pre>
[Test the regular expression](https://regex101.com/r/DOc5Nu/1)
## 2.8 앵커 부호
정규 표현식에서 앵커는 매칭 문자가 표현식의 시작 문자인지 혹은 끝 문자인지 명시하는데 사용된다. 앵커는 두가지 종류가 있다: 첫번째 종류인 캐럿 부호 `^`는 매칭 문자가 입력 문자열의 첫 시작 문자인지 나타내는데 사용되며 두번째 종류인 달러 부호 `$`는 해당 매칭 문자가 입력 문자의 마지막 문자라는 것을 명시하는데 사용된다.
### 2.8.1 캐럿 부호
캐럿 부호 `^`는 매칭 문자가 표현식의 시작이라는 것을 명시하는데 사용된다. 만약 (a가 시작 문자인지 확인하는) 정규 표현식 `^a`를 입력 문자열 `abc`에 적용하면, 이 정규 표현식은 `a`를 매칭 결과값으로 내보낸다. 반면, 정규 표현식 `^b`를 위의 입력 문자열에 적용하면, 아무런 매칭도 일어나지 않는다. 왜냐하면 입력 문자열 `abc`에서 "b"는 처음 시작 문자가 아니기 때문이다. 또 다른 정규 표현식인 `^(T|t)he`를 살펴보자. 이 정규 표현식은 대문자 `T` 또는 소문자 `t`가 입력 문자열의 시작으로 나오고, 그 뒤에 문자 `h`와 문자 `e`가 차례로 나오는 패턴을 의미한다.
<pre>
"(T|t)he" => <a href="#learn-regex"><strong>The</strong></a> car is parked in <a href="#learn-regex"><strong>the</strong></a> garage.
</pre>
[Test the regular expression](https://regex101.com/r/5ljjgB/1)
<pre>
"^(T|t)he" => <a href="#learn-regex"><strong>The</strong></a> car is parked in the garage.
</pre>
[Test the regular expression](https://regex101.com/r/jXrKne/1)
### 2.8.2 달러 부호
달러 부호 `$`는 입력 문자열의 마지막 문자가 매칭 문자로 끝나는지 확인하는데 사용된다. 예를 들어, 정규 표현식 `(at\.)$`는 소문자 `a``t` 그리고 문자 `.`가 순서대로 입력 문자열의 맨 마지막에 나오는지 확인하는 패턴을 의미한다.
<pre>
"(at\.)" => The fat c<a href="#learn-regex"><strong>at.</strong></a> s<a href="#learn-regex"><strong>at.</strong></a> on the m<a href="#learn-regex"><strong>at.</strong></a>
</pre>
[Test the regular expression](https://regex101.com/r/y4Au4D/1)
<pre>
"(at\.)$" => The fat cat. sat. on the m<a href="#learn-regex"><strong>at.</strong></a>
</pre>
[Test the regular expression](https://regex101.com/r/t0AkOd/1)
## 3. 단축형 문자열 집합
정규 표현식은 일반적으로 사용되는 문자열 집합들을 간편하게 사용할 수 있도록 여러 단축형들을 제공한다. 단축형 문자열 집합은 아래와 같다.
|단축형|설명|
|:----:|----|
|.|개행을 제외한 모든 문자|
|\w|영숫자 문자와 매치: `[a-zA-Z0-9_]`|
|\W|영숫자 문자가 아닌 문자와 매치: `[^\w]`|
|\d|숫자와 매치: `[0-9]`|
|\D|숫자가 아닌 문자와 매치: `[^\d]`|
|\s|공백 문자와 매치: `[\t\n\f\r\p{Z}]`|
|\S|공백 문자가 아닌 문자와 매치: `[^\s]`|
## 4. 전후방탐색
때때로 전후방탐색<sub>Lookaround</sub>이라고 알려진 후방탐색<sub>Lookbehind</sub>과 전방탐색<sub>Lookahead</sub>은 (패턴 매칭을 위해서 사용되지만 매칭된 리스트에는 포함되지 않는) ***넌-캡쳐링 그룹*** 의 특정 종류들이다. 전후방탐색은 하나의 패턴이 다른 특정 패턴 전이나 후에 나타나는 조건을 가지고 있을때 사용한다. 예를 들어, 우리가 입력 문자열 `$4.44 and $10.88`에 대해서 달러 부호 `$`이후에 나오는 모든 숫자를 매칭시키고 싶다고 하자. 이때 정규 표현식 `(?<=\$)[0-9\.]*`를 사용할 수 있다. 이 정규 표현식은 `$` 문자 뒤에 나오는 문자 `.`을 포함한 모든 숫자 문자를 의미한다. 다음은 정규 표현식에서 사용되는 전후방탐색들이다.
|부호|설명|
|:----:|----|
|?=|긍정형 전방탐색|
|?!|부정형 전방탐색|
|?<=|긍정형 후방탐색|
|?<!|부정형 후방탐색|
### 4.1 긍정형 전방탐색
긍정형 전방탐색는 표현식의 첫 부분뒤에 전방탐색 표현식이 뒤따라 나오는지 확인하는데 사용된다. 매칭의 결과값은 표현식의 첫 부분과 매칭된 텍스트만이 포함된다. 긍정형 전방탐색를 정의하기 위해서는 괄호가 사용된다. 이 괄호 안에서, 물음표 부호 `?`와 등호 `=`가 다음과 같이 사용된다: `(?=...)`. 전방탐색 표현식은 괄호 내부의 등호 `=` 부호 뒤에 쓰면된다. 예를 들어, 정규 표현식 `[T|t]he(?=\sfat)`는 대문자 `T` 혹은 소문자 `t` 뒤에 문자 `h`, 문자 `e`가 나오는 패턴을 의미한다. 괄호 안에서 우리는 정규 표현식 엔진에게 바로 뒤에 공백문자와 문자열 `fat`이 나오는 `The` 또는 `the`만 매치하도록 알리는 긍정형 전방탐색를 정의하였다.
<pre>
"[T|t]he(?=\sfat)" => <a href="#learn-regex"><strong>The</strong></a> fat cat sat on the mat.
</pre>
[Test the regular expression](https://regex101.com/r/IDDARt/1)
### 4.2 부정형 전방탐색
부정형 전방탐색는 입력 문자열로부터 특정 패턴이 뒤에 나오지 않기를 바라는 상황에서 사용된다. 부정형 전방탐색는 우리가 긍정형 전방탐색를 정의하는 방식과 동일하게 정의된다. 하지만, 유일한 차이점은 등호 부호 `=` 대신 부정 부호 `!` 문자를 사용한다는 것이다, 즉 `(?!...)`. 정규 표현식 `[T|t]he(?!\sfat)`를 살펴보도록 하자. 이 정규 표현식은 공백 문자와 `fat` 문자열이 연속으로 나오지 않는 모든 `The` 혹은 `the` 문자열과 매치된다.
<pre>
"[T|t]he(?!\sfat)" => The fat cat sat on <a href="#learn-regex"><strong>the</strong></a> mat.
</pre>
[Test the regular expression](https://regex101.com/r/V32Npg/1)
### 4.3 긍정형 후방탐색
긍정형 후방탐색는 특정 패턴뒤에 나오는 문자열 매치를 가져오기 위해서 사용된다. 긍정형 후방탐색는 `(?<=...)`로 표시된다. 예를 들어, 정규 표현식 `(?<=[T|t]he\s)(fat|mat)`는 입력 문자열에서 `The` 혹은 `the` 뒤에 공백이 나오고, 그 뒤에 `fat` 또는 `mat`이 나오는 패턴을 의미한다.
<pre>
"(?<=[T|t]he\s)(fat|mat)" => The <a href="#learn-regex"><strong>fat</strong></a> cat sat on the <a href="#learn-regex"><strong>mat</strong></a>.
</pre>
[Test the regular expression](https://regex101.com/r/avH165/1)
### 4.4 부정형 후방탐색
부정형 후방탐색는 특정 패턴이 뒤에 나오지 않기를 바라는 상황에서 사용된다. 부정형 후방탐색는 `(?<!...)`로 표시된다. 예를 들어, 정규 표현식 `(?<!(T|t)he\s)(cat)`은 앞에 `The ` 혹은 `the `가 위치하지 않는 모든 `cat` 문자열을 의미한다.
<pre>
"(?&lt;![T|t]he\s)(cat)" => The cat sat on <a href="#learn-regex"><strong>cat</strong></a>.
</pre>
[Test the regular expression](https://regex101.com/r/8Efx5G/1)
## 5. 플래그
플래그는 정규표현식의 출력값을 수정하기 때문에 수정자(modifier)라고도 불린다. 이러한 플래그들은 어떤 순서 혹은 조합으로 사용 가능하며 정규 표현식의 일부분이다.
|플래그|설명|
|:----:|----|
|i|대소문자 구분없음: 매칭이 대소문자를 구분하지 않도록 설정.|
|g|전체 검색: 입력 문자열 전체를 대상으로 패턴을 검색.|
|m|멀티 라인: 앵터 메타 문자가 각 줄마다 동작하도록 설정.|
### 5.1 대소문자 구분없음
수정자 `i`는 대소문자 구분없는 매칭을 수행하는데 사용된다. 예를 들어, 정규 표현식 `/The/gi`는 대문자 `T`, 소문자 `h`, 소문자 `e`가 차례로 나오는 패턴을 의미한다. 여기서 정규 표현식 마지막에 있는 `i` 플래그가 정규 표현식 엔진에게 대소문자를 구분하지 않도록 알려준다. `g` 플래그는 전체 입력 문자열 내부에서 패턴을 검색하기 위해 설정되었다.
<pre>
"The" => <a href="#learn-regex"><strong>The</strong></a> fat cat sat on the mat.
</pre>
[Test the regular expression](https://regex101.com/r/dpQyf9/1)
<pre>
"/The/gi" => <a href="#learn-regex"><strong>The</strong></a> fat cat sat on <a href="#learn-regex"><strong>the</strong></a> mat.
</pre>
[Test the regular expression](https://regex101.com/r/ahfiuh/1)
### 5.2 전체 검색
수정자 `g`는 첫번째 매칭후에 멈추지 않고 계속해서 모든 매칭을 검색하는 전체 검색을 수행하는데 사용된다. 예를 들어, 정규 표현식 `/.(at)/g`는 개행을 제외한 문자가 나오고, 그 뒤에 소문자 `a`, 소문자 `t`가 나오는 패턴을 의미한다. 여기에서 `g` 플래그를 정규 표현식의 마지막에 설정했기 때문에, 이 패턴은 입력 문자열 전체에서 나타나는 모든 패턴을 찾아낸다.
<pre>
"/.(at)/" => The <a href="#learn-regex"><strong>fat</strong></a> cat sat on the mat.
</pre>
[Test the regular expression](https://regex101.com/r/jnk6gM/1)
<pre>
"/.(at)/g" => The <a href="#learn-regex"><strong>fat</strong></a> <a href="#learn-regex"><strong>cat</strong></a> <a href="#learn-regex"><strong>sat</strong></a> on the <a href="#learn-regex"><strong>mat</strong></a>.
</pre>
[Test the regular expression](https://regex101.com/r/dO1nef/1)
### 5.3 멀티 라인
수정자 `m`은 멀티 라인 매치를 수행하는데 사용된다. 이전에 이야기 했던 것처럼, 앵커 `(^, $)`는 패턴의 시작과 끝을 확인하는데 사용된다. 하지만 만약 우리가 각 라인마다 이 앵커가 동작하게하고 싶으면 `m` 플래그를 설정하면된다. 예를 들어, 정규 표현식 `/at(.)?$/gm`은 소문자 `a`와 소문자 `t`가 차례로 나오고, 선택적으로 개행을 제외한 문자가 나오는 패턴을 의미한다. 여기서 플래그 `m`으로 인해서 정규 표현식 엔진은 입력 문자열의 각 라인에 대해서 해당 패턴을 매칭하게 된다.
<pre>
"/.at(.)?$/" => The fat
cat sat
on the <a href="#learn-regex"><strong>mat.</strong></a>
</pre>
[Test the regular expression](https://regex101.com/r/hoGMkP/1)
<pre>
"/.at(.)?$/gm" => The <a href="#learn-regex"><strong>fat</strong></a>
cat <a href="#learn-regex"><strong>sat</strong></a>
on the <a href="#learn-regex"><strong>mat.</strong></a>
</pre>
[Test the regular expression](https://regex101.com/r/E88WE2/1)
## 기여 방법
* 이슈 리포팅
* 코드 개선해서 풀 리퀘스트 열기
* 소문내기
* ziishaned@gmail.com 메일로 직접 연락하기 또는 [![Twitter URL](https://img.shields.io/twitter/url/https/twitter.com/ziishaned.svg?style=social&label=Follow%20%40ziishaned)](https://twitter.com/ziishaned)
## 라이센스
MIT © [Zeeshan Ahmed](mailto:ziishaned@gmail.com)

410
README-pt_BR.md Normal file
View File

@ -0,0 +1,410 @@
<br/>
<p align="center">
<img src="https://i.imgur.com/bYwl7Vf.png" alt="Learn Regex">
</p><br/>
## Traduções:
* [English](README.md)
* [Español](README-es.md)
* [Français](README-fr.md)
* [Português do Brasil](README-pt_BR.md)
* [中文版](README-cn.md)
* [日本語](README-ja.md)
* [한국어](README-ko.md)
## O que é uma Expressão Regular?
> Expressão Regular é um grupo de caracteres ou símbolos utilizado para encontrar um padrão específico a partir de um texto.
Uma expressão regular é um padrão que é comparado com uma cadeia de caracteres da esquerda para a direita. A expressão "Expressão regular" é longa e difícil de falar, você geralmente vai encontrar o termo abreviado como "regex" ou "regexp". Expressões regulares são usadas para substituir um texto dentro de uma string, validar formulários, extrair uma parte de uma string baseada em um padrão encontrado e muito mais.
Imagine que você está escrevendo uma aplicação e quer colocar regras para quando um usuário escolher seu username. Nós queremos permitir que o username contenha letras, números, underlines e hífens. Nós também queremos limitar o número de caracteres para não ficar muito feio. Então usamos a seguinte expressão regular para validar o username:
<br/><br/>
<p align="center">
<img src="http://i.imgur.com/8UaOzpq.png" alt="Regular expression">
</p>
A expressão regular acima aceita as strings `john_doe`, `jo-hn_doe` e `john12_as`. Ela não aceita `Jo` porque essa string contém letras maiúsculas e também é muito curta.
## Sumário
- [Combinações Básicas](#1-combinações-básicas)
- [Metacaracteres](#2-metacaracteres)
- [Ponto final](#21-ponto-final)
- [Conjunto de caracteres](#22-conjunto-de-caracteres)
- [Conjunto de caracteres negados](#221-conjunto-de-caracteres-negados)
- [Repetições](#23-repetições)
- [O Asterisco](#231-o-asterisco)
- [O Sinal de Adição](#232-o-sinal-de-adição)
- [O Ponto de Interrogação](#233-o-ponto-de-interrogação)
- [Chaves](#24-chaves)
- [Grupo de Caracteres](#25-grupo-de-caracteres)
- [Alternância](#26-alternância)
- [Escapando Caracteres Especiais](#27-escapando-caracteres-especiais)
- [Âncoras](#28-Âncoras)
- [Acento Circunflexo](#281-acento-circunflexo)
- [Sinal de Dólar](#282-sinal-de-dólar)
- [Forma Abreviada de Conjunto de Caracteres](#3-forma-abreviada-de-conjunto-de-caracteres)
- [Olhar ao Redor](#4-olhar-ao-redor)
- [Lookahead Positivo](#41-lookahead-positivo)
- [Lookahead Negativo](#42-lookahead-negativo)
- [Lookbehind Positivo](#43-lookbehind-positivo)
- [Lookbehind Negativo](#44-lookbehind-negativo)
- [Flags](#5-flags)
- [Indiferente à Maiúsculas](#51-indiferente-à-maiúsculas)
- [Busca Global](#52-busca-global)
- [Multilinhas](#53-multilinhas)
## 1. Combinações Básicas
Uma expressão regular é apenas um padrão de caracteres que usamos para fazer busca em um texto. Por exemplo, a expressão regular `the` significa: a letra `t`, seguida da letra `h`, seguida da letra `e`.
<pre>
"the" => The fat cat sat on <a href="#learn-regex"><strong>the</strong></a> mat.
</pre>
[Teste a RegExp](https://regex101.com/r/dmRygT/1)
A expressão regular `123` corresponde a string `123`. A expressão regular é comparada com uma string de entrada, comparando cada caractere da expressão regular para cada caractere da string de entrada, um após o outro. Expressões regulares são normalmente case-sensitive (sensíveis à maiúsculas), então a expressão regular `The` não vai bater com a string `the`.
<pre>
"The" => <a href="#learn-regex"><strong>The</strong></a> fat cat sat on the mat.
</pre>
[Teste a RegExp](https://regex101.com/r/1paXsy/1)
## 2. Metacaracteres
Metacaracteres são elementos fundamentais das expressões regulares. Metacaracteres não representam a si mesmos mas, ao invés disso, são interpretados de uma forma especial. Alguns metacaracteres tem um significado especial e são escritos dentro de colchetes.
Os metacaracteres são os seguintes:
|Metacaracter|Descrição|
|:----:|----|
|.|Corresponde a qualquer caractere, exceto uma quebra de linha|
|[ ]|Classe de caracteres. Corresponde a qualquer caractere contido dentro dos colchetes.|
|[^ ]|Classe de caracteres negada. Corresponde a qualquer caractere que não está contido dentro dos colchetes.|
|*|Corresponde à 0 ou mais repetições do símbolo anterior.|
|+|Corresponde à 1 ou mais repetições do símbolo anterior.|
|?|Faz com que o símbolo anterior seja opcional.|
|{n,m}|Chaves. Corresponde à no mínimo "n" mas não mais que "m" repetições do símbolo anterior.|
|(xyz)|Grupo de caracteres. Corresponde aos caracteres xyz nesta exata ordem.|
|&#124;|Alternância. Corresponde os caracteres antes ou os caracteres depois do símbolo|
|&#92;|Escapa o próximo caractere. Isso permite você utilizar os caracteres reservados <code>[ ] ( ) { } . * + ? ^ $ \ &#124;</code>|
|^|Corresponde ao início da entrada.|
|$|Corresponde ao final da entrada.|
## 2.1 Ponto final
O ponto final `.` é um simples exemplo de metacaracteres. O metacaractere `.` corresponde à qualquer caractere sozinho. Ele não se iguala ao Enter e à quebra de linha. Por exemplo, a expressão regular `.ar` significa: qualquer caractere, seguido da letra `a`, seguida da letra `r`.
<pre>
".ar" => The <a href="#learn-regex"><strong>car</strong></a> <a href="#learn-regex"><strong>par</strong></a>ked in the <a href="#learn-regex"><strong>gar</strong></a>age.
</pre>
[Teste a RegExp](https://regex101.com/r/xc9GkU/1)
## 2.2 Conjunto de caracteres
Conjuntos de caracteres também são chamados de classes de caracteres. Utilizamos colchetes para especificar conjuntos de caracteres. Use um hífen dentro de um conjunto de caracteres para especificar o intervalo de caracteres. A ordem dos caracteres dentro dos colchetes não faz diferença. Por exemplo, a expressão regular `[Tt]he` significa: um caractere maiúsculo `T` ou minúsculo `t`, seguido da letra `h`, seguida da letra `e`.
<pre>
"[Tt]he" => <a href="#learn-regex"><strong>The</strong></a> car parked in <a href="#learn-regex"><strong>the</strong></a> garage.
</pre>
[Teste a RegExp](https://regex101.com/r/2ITLQ4/1)
No entanto, um ponto final dentro de um conjunto de caracteres, significa apenas um ponto final. A expressão regular `ar[.]` significa: o caractere minúsculo `a`, seguido da letra `r`, seguida pelo caractere de ponto final `.`.
<pre>
"ar[.]" => A garage is a good place to park a c<a href="#learn-regex"><strong>ar.</strong></a>
</pre>
[Teste a RegExp](https://regex101.com/r/wL3xtE/1)
### 2.2.1 Conjunto de caracteres negados
No geral, o símbolo do circunflexo representa o início da string, mas quando está logo após o colchete de abertura, ele faz a negação do conjunto de caracteres. Por exemplo, a expressão regular `[^c]ar` significa: qualquer caractere com exceção do `c`, seguido pelo caractere `a`, seguido da letra `r`.
<pre>
"[^c]ar" => The car <a href="#learn-regex"><strong>par</strong></a>ked in the <a href="#learn-regex"><strong>gar</strong></a>age.
</pre>
[Teste a RegExp](https://regex101.com/r/nNNlq3/1)
## 2.3 Repetições
Seguindo os metacaracteres `+`, `*` ou `?` são utilizados para especificar quantas vezes um sub-padrão pode ocorrer. Esses metacaracteres atuam de formas diferentes em diferentes situações.
### 2.3.1 O Asterisco
O símbolo `*` corresponde à zero ou mais repetições do padrão antecedente. A expressão regular `a*` significa: zero ou mais repetições do caractere minúsculo precedente `a`. Mas se o asterisco aparecer depois de um conjunto de caracteres, ou classe de caracteres, ele irá procurar as repetições de todo o conjunto. Por exemplo, a expressão regular `[a-z]*` significa: qualquer quantidade de letras minúsculas numa linha.
<pre>
"[a-z]&ast;" => T<a href="#learn-regex"><strong>he</strong></a> <a href="#learn-regex"><strong>car</strong></a> <a href="#learn-regex"><strong>parked</strong></a> <a href="#learn-regex"><strong>in</strong></a> <a href="#learn-regex"><strong>the</strong></a> <a href="#learn-regex"><strong>garage</strong></a> #21.
</pre>
[Teste a RegExp](https://regex101.com/r/7m8me5/1)
O símbolo `*` pode ser usado junto do metacaractere `.` para encontrar qualquer string de caracteres `.*`. O símbolo `*` pode ser usado com o caractere de espaço em branco `\s` para encontrar uma string de caracteres em branco. Por exemplo, a expressão `\s*cat\s*` significa: zero ou mais espaços, seguidos do caractere minúsculo `c`, seguido do caractere minúsculo `a`, seguido do caractere minúsculo `t`, seguido de zero ou mais espaços.
<pre>
"\s*cat\s*" => The fat<a href="#learn-regex"><strong> cat </strong></a>sat on the <a href="#learn-regex">con<strong>cat</strong>enation</a>.
</pre>
[Teste a RegExp](https://regex101.com/r/gGrwuz/1)
### 2.3.2 O Sinal de Adição
O símbolo `+` corresponde à uma ou mais repetições do caractere anterior. Por exemplo, a expressão regular `c.+t` significa: a letra minúscula `c`, seguida por pelo menos um caractere, seguido do caractere minúsculo `t`.
<pre>
"c.+t" => The fat <a href="#learn-regex"><strong>cat sat on the mat</strong></a>.
</pre>
[Teste a RegExp](https://regex101.com/r/Dzf9Aa/1)
### 2.3.3 O Ponto de Interrogação
Em expressões regulares, o metacaractere `?` faz o caractere anterior ser opcional. Esse símbolo corresponde à zero ou uma ocorrência do caractere anterior. Por exemplo, a expressão regular `[T]?he` significa: A letra maiúsculo `T` opcional, seguida do caractere minúsculo `h`, seguido do caractere minúsculo `e`.
<pre>
"[T]he" => <a href="#learn-regex"><strong>The</strong></a> car is parked in the garage.
</pre>
[Teste a RegExp](https://regex101.com/r/cIg9zm/1)
<pre>
"[T]?he" => <a href="#learn-regex"><strong>The</strong></a> car is parked in t<a href="#learn-regex"><strong>he</strong></a> garage.
</pre>
[Teste a RegExp](https://regex101.com/r/kPpO2x/1)
## 2.4 Chaves
Em expressões regulares, chaves, que também são chamadas de quantificadores, são utilizadas para especificar o número de vezes que o caractere, ou um grupo de caracteres, pode se repetir. Por exemplo, a expressão regular `[0-9]{2,3}` significa: Encontre no mínimo 2 dígitos, mas não mais que 3 (caracteres no intervalo de 0 à 9).
<pre>
"[0-9]{2,3}" => The number was 9.<a href="#learn-regex"><strong>999</strong></a>7 but we rounded it off to <a href="#learn-regex"><strong>10</strong></a>.0.
</pre>
[Teste a RegExp](https://regex101.com/r/juM86s/1)
Nós podemos retirar o segundo número. Por exemplo, a expressão regular `[0-9]{2,}` significa: Encontre 2 ou mais dígitos. Se removermos a vírgula a expressão regular `[0-9]{3}` significa: Encontre exatamente 3 dígitos.
<pre>
"[0-9]{2,}" => The number was 9.<a href="#learn-regex"><strong>9997</strong></a> but we rounded it off to <a href="#learn-regex"><strong>10</strong></a>.0.
</pre>
[Teste a RegExp](https://regex101.com/r/Gdy4w5/1)
<pre>
"[0-9]{3}" => The number was 9.<a href="#learn-regex"><strong>999</strong></a>7 but we rounded it off to 10.0.
</pre>
[Teste a RegExp](https://regex101.com/r/Sivu30/1)
## 2.5 Grupo de Caracteres
Grupo de caracteres é um grupo de sub-padrão que é escrito dentro de parênteses `(...)`. Como falamos antes, se colocaramos um quantificador depois de um caractere, ele irá repetir o caractere anterior. Mas se colocarmos um quantificador depois de um grupo de caracteres, ele irá repetir todo o conjunto. Por exemplo, a expressão regular `(ab)*` corresponde à zero ou mais repetições dos caracteres "ab". Nós também podemos usar o metacaractere de alternância `|` dentro de um grupo de caracteres. Por exemplo, a expressão regular `(c|g|p)ar` significa: caractere minúsculo `c`, `g` ou `p`, seguido do caractere `a`, seguido do caractere `r`.
<pre>
"(c|g|p)ar" => The <a href="#learn-regex"><strong>car</strong></a> is <a href="#learn-regex"><strong>par</strong></a>ked in the <a href="#learn-regex"><strong>gar</strong></a>age.
</pre>
[Teste a RegExp](https://regex101.com/r/tUxrBG/1)
## 2.6 Alternância
Em expressões regulares, a barra vertical `|` é usada para definir alternância. Alternância é como uma condição entre múltiplas expressões. Agora, você pode estar pensando que um conjunto de caracteres e a alternância funcionam da mesma forma. Mas a grande diferença entre eles é que o conjunto de caracteres trabalha no nível de caracteres, enquanto a alternância trabalha no nível das expressões. Por exemplo, a expressão regular `(T|t)he|car` significa: o caractere maiúsculo `T` ou minúsculo `t`, seguido do caractere minúsculo `h`, seguido do caractere minúsculo `e` ou o caractere minúsculo `c`, seguido do caractere minúsculo `a`, seguido do caractere minúsculo `r`.
<pre>
"(T|t)he|car" => <a href="#learn-regex"><strong>The</strong></a> <a href="#learn-regex"><strong>car</strong></a> is parked in <a href="#learn-regex"><strong>the</strong></a> garage.
</pre>
[Teste a RegExp](https://regex101.com/r/fBXyX0/1)
## 2.7 Escapando Caracteres Especiais
Em expressões regulares, a contrabarra `\` é usada para escapar o próximo caractere. Isso possibilita especificar um símbolo como um caractere correspondente, incluindo os caracteres reservados `{ } [ ] / \ + * . $ ^ | ?`. Para usar um caractere especial como um caractere correspondente, utilize `\` antes dele. Por exemplo, a expressão regular `.` é usada para encontrar qualquer caractere, exceto nova linha. Agora, para encontrar `.` em uma string de entrada, a expressão regular `(f|c|m)at\.?` significa: letra minúscula `f`, `c` ou `m`, seguida do caractere minúsculo `a`, seguido da letra minúscula `t`, seguida do caractere `.` opcional.
<pre>
"(f|c|m)at\.?" => The <a href="#learn-regex"><strong>fat</strong></a> <a href="#learn-regex"><strong>cat</strong></a> sat on the <a href="#learn-regex"><strong>mat.</strong></a>
</pre>
[Teste a RegExp](https://regex101.com/r/DOc5Nu/1)
## 2.8 Âncoras
Em empressões regulares, usamos âncoras para verificar se o caractere encontrado está no início ou no final da string de entrada. As âncoras podem ser de dois tipos: O primeiro tipo é o Acento Circunflexo `^`, que verifica se o caractere encontrado está no início da string de entrada, e o segundo tipo é o Sinal de Dólar `$`, que verifica se o caractere encontrado é o último caractere da string.
### 2.8.1 Acento Circunflexo
O símbolo do Acento Circunflexo `^` é usado para verificar se o caractere encontrado é o primeiro caractere da string de entrada. Se aplicarmos a seguinte expressão regular `^a` (se a é o primeiro caractere) à string de entrada `abc`, ela encontra o `a`. Mas se nós aplicarmos a expressão regular `^b` na mesma string, ela não encontrará nada. Isso acontece porque, na string `abc`, "b" não é o caractere inicial. Vamos dar uma olhada em outra expressão regular, `^(T|t)he` que significa: o caractere maiúsculo `T` ou o caractere minúsculo `t` que é o primeiro símbolo da string de entrada, seguido do caractere minúsculo `h`, seguido do caractere minúsculo `e`.
<pre>
"(T|t)he" => <a href="#learn-regex"><strong>The</strong></a> car is parked in <a href="#learn-regex"><strong>the</strong></a> garage.
</pre>
[Teste a RegExp](https://regex101.com/r/5ljjgB/1)
<pre>
"^(T|t)he" => <a href="#learn-regex"><strong>The</strong></a> car is parked in the garage.
</pre>
[Teste a RegExp](https://regex101.com/r/jXrKne/1)
### 2.8.2 Sinal de Dólar
O símbolo do Sinal de Dólar `$` é usado para verificar se o caractere encontrado é o último caractere da string de entrada. Por exemplo, a expressão regular `(at\.)$` significa: um caractere minúsculo `a`, seguido do caractere minúsculo `t`, seguido de um ponto final `.` e o grupo deve estar no final da string.
<pre>
"(at\.)" => The fat c<a href="#learn-regex"><strong>at.</strong></a> s<a href="#learn-regex"><strong>at.</strong></a> on the m<a href="#learn-regex"><strong>at.</strong></a>
</pre>
[Teste a RegExp](https://regex101.com/r/y4Au4D/1)
<pre>
"(at\.)$" => The fat cat. sat. on the m<a href="#learn-regex"><strong>at.</strong></a>
</pre>
[Teste a RegExp](https://regex101.com/r/t0AkOd/1)
## 3. Forma Abreviada de Conjunto de Caracteres
As expressões regulares fornecem abreviações para conjuntos de caracteres comumente usados, que oferecem atalhos convenientes para expressões regulares comumente usadas. As abreviações são as seguintes:
|Abreviação|Descrição|
|:----:|----|
|.|Qualquer caractere, exceto nova linha|
|\w|Corresponde à caracteres alfanuméricos: `[a-zA-Z0-9_]`|
|\W|Corresponde à caracteres não alfanuméricos: `[^\w]`|
|\d|Corresponde à dígitos: `[0-9]`|
|\D|Corresponde à não dígitos: `[^\d]`|
|\s|Corresponde à caracteres de espaços em branco: `[\t\n\f\r\p{Z}]`|
|\S|Corresponde à caracteres de espaços não em branco: `[^\s]`|
## 4. Olhar ao Redor
Lookbehind (olhar atrás) e lookahead (olhar à frente), às vezes conhecidos como lookarounds (olhar ao redor), são tipos específicos de ***grupo de não captura*** (utilizado para encontrar um padrão, mas não incluí-lo na lista de ocorrêncoas). Lookarounds são usados quando temos a condição de que determinado padrão seja precedido ou seguido de outro padrão. Por exemplo, queremos capturar todos os números precedidos do caractere `$` da seguinte string de entrada: `$4.44 and $10.88`. Vamos usar a seguinte expressão regular `(?<=\$)[0-9\.]*` que significa: procure todos os números que contêm o caractere `.` e são precedidos pelo caractere `$`. À seguir estão os lookarounds que são utilizados em expressões regulares:
|Símbolo|Descrição|
|:----:|----|
|?=|Lookahead Positivo|
|?!|Lookahead Negativo|
|?<=|Lookbehind Positivo|
|?<!|Lookbehind Negativo|
### 4.1 Lookahead Positivo
O lookahead positivo impõe que a primeira parte da expressão deve ser seguida pela expressão lookahead. A combinação retornada contém apenas o texto que encontrado pela primeira parte da expressão. Para definir um lookahead positivo, deve-se usar parênteses. Dentro desses parênteses, é usado um ponto de interrogação seguido de um sinal de igual, dessa forma: `(?=...)`. Expressões lookahead são escritas depois do sinal de igual dentro do parênteses. Por exemplo, a expressão regular `[T|t]he(?=\sfat)` significa: encontre a letra minúscula `t` ou a letra maiúscula `T`, seguida da letra `h`, seguida da letra `e`. Entre parênteses, nós definimos o lookahead positivo que diz para o motor de expressões regulares para encontrar `The` ou `the` que são seguidos pela palavra `fat`.
<pre>
"[T|t]he(?=\sfat)" => <a href="#learn-regex"><strong>The</strong></a> fat cat sat on the mat.
</pre>
[Teste a RegExp](https://regex101.com/r/IDDARt/1)
### 4.2 Lookahead Negativo
O lookahead negativo é usado quando nós precisamos encontrar todas as ocorrências da string de entrada que não são seguidas por um determinado padrão. O lookahead negativo é definido da mesma forma que definimos o lookahead positivo, mas a única diferença é que, no lugar do sinal de igual `=`, usamos o caractere de negação `!`, ex.: `(?!...)`. Vamos dar uma olhada na seguinte expressão regular `[T|t]he(?!\sfat)`, que significa: obtenha as palavras `The` ou `the` da string de entrada que não são seguidas pela palavra `fat`, precedida de um caractere de espaço.
<pre>
"[T|t]he(?!\sfat)" => The fat cat sat on <a href="#learn-regex"><strong>the</strong></a> mat.
</pre>
[Teste a RegExp](https://regex101.com/r/V32Npg/1)
### 4.3 Lookbehind Positivo
Lookbehind positivo é usado para encontrar todas as ocorrências que são precedidas por um padrão específico. O lookbehind positivo é indicado por `(?<=...)`. Por exemplo, a expressão regular `(?<=[T|t]he\s)(fat|mat)` significa: obtenha todas as palavras `fat` ou `mat` da string de entrada, que estão depois das palavras `The` ou `the`.
<pre>
"(?<=[T|t]he\s)(fat|mat)" => The <a href="#learn-regex"><strong>fat</strong></a> cat sat on the <a href="#learn-regex"><strong>mat</strong></a>.
</pre>
[Teste a RegExp](https://regex101.com/r/avH165/1)
### 4.4 Lookbehind Negativo
Lookbehind negativo é usado para encontrar todas as ocorrências que não são precedidas por um padrão específico. O lookbehind negativo é indicado por `(?<!...)`. Por exemplo, a expressão regular `(?<!(T|t)he\s)(cat)` significa: obtenha todas as palavras `cat` da string de entrada, que não estão depois das palavras `The` ou `the`.
<pre>
"(?&lt;![T|t]he\s)(cat)" => The cat sat on <a href="#learn-regex"><strong>cat</strong></a>.
</pre>
[Teste a RegExp](https://regex101.com/r/8Efx5G/1)
## 5. Flags
Flags (sinalizadores) também são chamados de modificadores, porque eles modificam o resultado da expressão regular. Essas flags podem ser usadas em qualquer ordem ou combinação, e são uma parte integrante da RegExp.
|Flag|Descrição|
|:----:|----|
|i|Case insensitive: Define que o padrão será case-insensitive.|
|g|Busca global: Procura o padrão em toda a string de entrada.|
|m|Multilinhas: Os metacaracteres de âncora funcionam em cada linha.|
### 5.1 Indiferente à Maiúsculas
O modificador `i` é usado para tornar o padrão case-insensitive. Por exemplo, a expressão regular `/The/gi` significa: a letra maiúscula `T`, seguida do caractere minúsculo `h`, seguido do caractere `e`. E ao final da expressão regular, a flag `i` diz ao motor de expressões regulares para ignorar maiúsculas e minúsculas. Como você pode ver, nós também determinamos a flag `g` porque queremos procurar o padrão em toda a string de entrada.
<pre>
"The" => <a href="#learn-regex"><strong>The</strong></a> fat cat sat on the mat.
</pre>
[Teste a RegExp](https://regex101.com/r/dpQyf9/1)
<pre>
"/The/gi" => <a href="#learn-regex"><strong>The</strong></a> fat cat sat on <a href="#learn-regex"><strong>the</strong></a> mat.
</pre>
[Teste a RegExp](https://regex101.com/r/ahfiuh/1)
### 5.2 Busca Global
O modificador `g` é usado para realizar uma busca global (encontrar todas as ocorrências sem parar na primeira encontrada). Por exemplo, a expressão regular `/.(at)/g` significa: qualquer caractere, exceto nova linha, seguido do caractere minúsculo `a`, seguido do caractere minúsculo `t`. Por causa da flag `g` no final da expressão regular, agora ela vai encontrar todas as ocorrências em toda a string de entrada.
<pre>
"/.(at)/" => The <a href="#learn-regex"><strong>fat</strong></a> cat sat on the mat.
</pre>
[Teste a RegExp](https://regex101.com/r/jnk6gM/1)
<pre>
"/.(at)/g" => The <a href="#learn-regex"><strong>fat</strong></a> <a href="#learn-regex"><strong>cat</strong></a> <a href="#learn-regex"><strong>sat</strong></a> on the <a href="#learn-regex"><strong>mat</strong></a>.
</pre>
[Teste a RegExp](https://regex101.com/r/dO1nef/1)
### 5.3 Multilinhas
O modificador `m` é usado para realizar uma busca em várias linhas. Como falamos antes, as âncoras `(^, $)` são usadas para verificar se o padrão está no início ou no final da string de entrada. Mas se queremos que as âncoras funcionem em cada uma das linhas, usamos a flag `m`. Por exemplo, a expressão regular `/at(.)?$/gm` significa: o caractere minúsculo `a`, seguido do caractere minúsculo `t`, opcionalmente seguido por qualquer caractere, exceto nova linha. E por causa da flag `m`, agora o motor de expressões regulares encontra o padrão no final de cada uma das linhas da string.
<pre>
"/.at(.)?$/" => The fat
cat sat
on the <a href="#learn-regex"><strong>mat.</strong></a>
</pre>
[Teste a RegExp](https://regex101.com/r/hoGMkP/1)
<pre>
"/.at(.)?$/gm" => The <a href="#learn-regex"><strong>fat</strong></a>
cat <a href="#learn-regex"><strong>sat</strong></a>
on the <a href="#learn-regex"><strong>mat.</strong></a>
</pre>
[Teste a RegExp](https://regex101.com/r/E88WE2/1)
## Contribution
* Reporte bugs
* Abra pull request com melhorias
* Espalhe a palavra
* Me encontre diretamente em ziishaned@gmail.com ou [![Twitter URL](https://img.shields.io/twitter/url/https/twitter.com/ziishaned.svg?style=social&label=Follow%20%40ziishaned)](https://twitter.com/ziishaned)
## Licença
MIT © [Zeeshan Ahmed](mailto:ziishaned@gmail.com)

478
README-tr.md Normal file
View File

@ -0,0 +1,478 @@
<br/>
<p align="center">
<img src="https://i.imgur.com/bYwl7Vf.png" alt="Learn Regex">
</p><br/>
## Çeviriler:
* [English](README.md)
* [Español](README-es.md)
* [Français](README-fr.md)
* [中文版](README-cn.md)
* [日本語](README-ja.md)
* [한국어](README-ko.md)
* [Turkish](README-tr.md)
## Düzenli İfade Nedir?
> Düzenli ifade, bir metinden belirli bir deseni bulmak için kullanılan bir karakter veya sembol grubudur.
Bir düzenli ifade soldan sağa söz konusu harf öbekleriyle eşleşen bir desendir. "Regular expression" söylemesi zor bir tabirdir, genellikle "regex" ya da "regexp" olarak kısaltılmış terimler olarak bulacaksınız. Düzenli ifade bir harf öbeğinde ki bir metin değiştirmek, form doğrulamak, bir desen eşleşmesine dayalı harf öbeğinden bir alt harf öbeği ayıklamak ve çok daha fazlası için kullanılır.
Bir uygulama yazdığınızı hayal edin ve bir kullanıcı kullanıcı adını seçtiğinde kullanıcı adı için kurallar belirlemek istiyorsunuz. Kullanıcı adının harfler, sayılar, altçizgiler ve tireler içermesine izin vermek istiyoruz. Ayrıca, Kullanıcı adındaki karakter sayısını sınırlamak istiyoruz böylece çirkin görünmeyecek. Bir kullanıcı adını doğrulamak için aşağıdaki düzenli ifadeyi kullanıyoruz:
<br/><br/>
<p align="center">
<img src="./img/regexp-tr.png" alt="Regular expression">
</p>
Yukardaki düzenli ifade `john_doe`, `jo-hn_doe` ve `john12_as` gibi girişleri kabul edebilir.
`Jo` girişi uyuşmaz, çünkü harf öbeği büyük harf içeriyor ve aynı zamanda uzunluğu 3 karakterden az.
## İçindekiler
- [Temel Eşleştiriciler](#1-temel-eşleştiriciler)
- [Meta Karakterler](#2-meta-karakterler)
- [Nokta](#21-nokta)
- [Karakter takımı](#22-karakter-takımı)
- [Negatiflenmiş karakter seti](#221-negatiflenmiş-karakter-seti)
- [Tekrarlar](#23-tekrarlar)
- [Yıldız İşareti](#231-yıldız-İşareti)
- [Artı İşareti](#232-artı-İşareti)
- [Soru İşareti](#233-soru-İşareti)
- [Süslü Parantez](#24-süslü-parantez)
- [Karakter Grubu](#25-karakter-grubu)
- [Değişim](#26-değişim)
- [Özel Karakter Hariç Tutma](#27-Özel-karakter-hariç-tutma)
- [Sabitleyiciler](#28-sabitleyiciler)
- [Ters v işareti](#281-Şapka-İşareti)
- [Dolar işareti](#282-dolar-İşareti)
- [Kısaltma Karakter Takımları](#3-kısaltma-karakter-takımları)
- [Bakınmak](#4-bakınmak)
- [Olumlu Bakınma](#41-positive-lookahead)
- [Olumsuz Bakınma](#42-negative-lookahead)
- [Positive Lookbehind](#43-positive-lookbehind)
- [Negative Lookbehind](#44-negative-lookbehind)
- [İşaretler](#5-İşaretler)
- [Büyük/Küçük harf duyarlılığı](#51-büyükküçük-harf-duyarlılığı)
- [Bütünsel Arama](#52-genel-arama)
- [Çok satırlı](#53-Çok-satırlı)
## 1. Temel Eşleştiriciler
Bir düzenli ifade bir metin içinde arama yapabilmek için kullandığımız bir karakter desenidir.
Örneğin, `the` düzenli ifadesi şu anlama gelir: `t` harfi ardından `h`, ardından `e` harfi gelir.
<pre>
"the" => The fat cat sat on <a href="#learn-regex"><strong>the</strong></a> mat.
</pre>
[Düzenli ifadeyi test edin](https://regex101.com/r/dmRygT/1)
`123` düzenli ifadesi `123` harf öbeğiyle eşleşir. Düzenli ifade birbiri ardına, girilen harf öbeğindeki her karakter düzenli ifadenin içindeki her karakterle karşılaştırılarak eşleştirilir. Düzenli ifadeler normal olarak büyük/küçük harfe duyarlıdırlar, yani `The` düzenli ifadesi `the` harf öbeğiyle eşleşmez.
<pre>
"The" => <a href="#learn-regex"><strong>The</strong></a> fat cat sat on the mat.
</pre>
[Düzenli ifadeyi test edin](https://regex101.com/r/1paXsy/1)
## 2. Meta Karakterler
Meta karakterler düzenli ifadelerin yapı taşlarıdırlar. Meta karakterler kendileri için değil bunun yerine bazı özel yollarla yorumlanırlar. Bazı meta karakterler özel anlamları vardır ve bunlar köşeli parantez içinde yazılırlar.
Meta karakterler aşağıdaki gibidir:
|Meta karakter|Açıklama|
|:----:|----|
|.|Satır sonuc hariç herhangi bir karakterle eşleşir.|
|[ ]|Köşeli parantezler arasında bulunan herhangi bir karakterle eşleşir.|
|[^ ]|Köşeli parantez içerisinde yer alan `^` işaretinden sonra girilen karakterler haricindeki karakterlerle eşleşir.|
|*|Kendisinden önce yazılan karakterin sıfır veya daha fazla tekrarı ile eşleşir.|
|+|Kendisinden önce yazılan karakterin bir veya daha fazla tekrarı ile eşleşir.|
|?|Kendisinden önce yazılan karakterin varlık durumunu opsiyonel kılar.|
|{n,m}|Kendisinden önce yazılan karakterin en az `n` en fazla `m` değeri kadar olmasını ifade eder.|
|(xyz)|Verilen sırayla `xyz` karakterleriyle eşleşir.|
|&#124;|`|` karakterinden önce veya sonra verilen ifadelerin herhangi biriyle eşleşir. Or anlamı verir.|
|&#92;|Sonraki karakteri kaçırır. Bu, ayrılmış karakterleri eşleştirmenizi sağlar <code>[ ] ( ) { } . * + ? ^ $ \ &#124;</code>|
|^|Girilen verinin başlangıcını ifade eder.|
|$|Girilen veririnin sonunu ifade eder.|
## 2.1 Nokta
Nokta `.` meta karakterin en basit örneğidir. `.` meta karakteri satır başlangıcı hariç herhangi bir karakterle eşleşir.
Örneğin, `.ar` düzenli ifadesinin anlamı: herhangi bir karakterin ardından `a` harfi ve `r` harfi gelir.
<pre>
".ar" => The <a href="#learn-regex"><strong>car</strong></a> <a href="#learn-regex"><strong>par</strong></a>ked in the <a href="#learn-regex"><strong>gar</strong></a>age.
</pre>
[Düzenli ifadeyi test edin](https://regex101.com/r/xc9GkU/1)
## 2.2 Karakter Takımı
Karakter takımları aryıca Karakter sınıfı olarak bilinir. Karakter takımlarını belirtmek için köşeli ayraçlar kullanılır.
Karakterin aralığını belirtmek için bir karakter takımında tire kullanın. Köşeli parantezlerdeki karakter aralığının sıralaması önemli değildir.
Örneğin, `[Tt]he` düzenli ifadesinin anlamı: bir büyük `T` veya küçük `t` harflerinin ardından sırasıyla `h` ve `e` harfi gelir.
<pre>
"[Tt]he" => <a href="#learn-regex"><strong>The</strong></a> car parked in <a href="#learn-regex"><strong>the</strong></a> garage.
</pre>
[Düzenli ifadeyi test edin](https://regex101.com/r/2ITLQ4/1)
Bununla birlikte, bir karakter takımı içerisindeki bir periyot bir tam periyot demektir.
`ar[.]` düzenli ifadesinin anlamı: Küçük `a` karakteri ardından `r` harfi gelir, ardından bir `.` karakteri gelir.
<pre>
"ar[.]" => A garage is a good place to park a c<a href="#learn-regex"><strong>ar.</strong></a>
</pre>
[Düzenli ifadeyi test edin](https://regex101.com/r/wL3xtE/1)
### 2.2.1 Negatiflenmiş karakter seti
Genellikle, şapka `^` sembolü harf öbeğinin başlangıcını temsil eder, ama köşeli parantez içinde kullanıldığında verilen karakter takımını hariç tutar.
Örneğin, `[^c]ar` ifadesinin anlamı: `c` harfinden hariç herhangi bir harfin ardından `a`, ardından `r` gelir.
<pre>
"[^c]ar" => The car <a href="#learn-regex"><strong>par</strong></a>ked in the <a href="#learn-regex"><strong>gar</strong></a>age.
</pre>
[Düzenli ifadeyi test edin](https://regex101.com/r/nNNlq3/1)
## 2.3 Tekrarlar
`+`, `*` ya da `?` meta karakterlerinden sonra bir alt desenin kaç defa tekrar edebileceğini belirtmek için kullanılır. Bu meta karakterler farklı durumlarda farklı davranırlar.
### 2.3.1 Yıldız İşareti
`*` sembolü, kendinden önce girilen eşlemenin sıfır veya daha fazla tekrarıyla eşleşir. Ama bir karakter seti ya da sınıf sonrasına girildiğinde, tüm karakter setinin tekrarlarını bulur.
`a*` düzenli ifadesinin anlamı: `a` karakterinin sıfır veya daha fazla tekrarı.
`[a-z]*` düzenli ifadesinin anlamı: bir satırdaki herhangi bir sayıdaki küçük harfler.
<pre>
"[a-z]*" => T<a href="#learn-regex"><strong>he</strong></a> <a href="#learn-regex"><strong>car</strong></a> <a href="#learn-regex"><strong>parked</strong></a> <a href="#learn-regex"><strong>in</strong></a> <a href="#learn-regex"><strong>the</strong></a> <a href="#learn-regex"><strong>garage</strong></a> #21.
</pre>
[Düzenli ifadeyi test edin](https://regex101.com/r/7m8me5/1)
`*` sembolü `.` meta karakteri ile `.*` karakterinin herhangi harf öbeğine eşleştirmek için kullanılabilir. `*` sembolü boşluk karakteriyle `\s` bir harf öbeğinde boşluk karakterlerini eşleştirmek için kullanılabilir.
Örneğin, `\s*cat\s*` düzenli ifadesinin anlamı: sıfır veya daha fazla boşluk ardından küçük `c` karakteri gelir, ardından küçük `a` karakteri gelir, ardından küçük `t` karakteri gelir, ardından sıfır veya daha fazla boşluk gelir.
<pre>
"\s*cat\s*" => The fat<a href="#learn-regex"><strong> cat </strong></a>sat on the <a href="#learn-regex">con<strong>cat</strong>enation</a>.
</pre>
[Düzenli ifadeyi test edin](https://regex101.com/r/gGrwuz/1)
### 2.3.2 Artı İşareti
`+` sembolü, kendinden önce girilen eşlemenin bir veya daha fazla tekrarıyla eşleşir.
Örneğin, `c.+t` ifadesinin anlamı: küçük `c` harfi, ardından en az bir karakter gelir, ardından küçük `t` karakteri gelir.
Örnekte açıklamak gereken önemli nokta: `t` harfi cümledeki son `t` harfi olacaktır. `c` ve `t` harfi arasında en az bir karakter vardır.
<pre>
"c.+t" => The fat <a href="#learn-regex"><strong>cat sat on the mat</strong></a>.
</pre>
[Düzenli ifadeyi test edin](https://regex101.com/r/Dzf9Aa/1)
### 2.3.3 Soru İşareti
Düzenli ifadelerde `?` meta karakterinden önce girilen karakteri opsiyonel olarak tanımlar. Bu sembol önce gelen karakterin sıfır veya bir örbeğiyle eşleşir.
Örneğin, `[T]?he` ifadesinin anlamı: opsiyonel büyük `T` harfi, ardından küçük `h` karakteri gelir, ardından küçük `e` karakteri gelir.
<pre>
"[T]he" => <a href="#learn-regex"><strong>The</strong></a> car is parked in the garage.
</pre>
[Düzenli ifadeyi test edin](https://regex101.com/r/cIg9zm/1)
<pre>
"[T]?he" => <a href="#learn-regex"><strong>The</strong></a> car is parked in t<a href="#learn-regex"><strong>he</strong></a> garage.
</pre>
[Düzenli ifadeyi test edin](https://regex101.com/r/kPpO2x/1)
## 2.4 Süslü Parantez
Düzenli ifadelerde miktar belirliyiciler olarakda bilinen süslü parantezler, bir karakterin veya karakter grubunun kaç defa tekrar edebileceğini belirtmek için kullanılırlar.
Örneğin, `[0-9]{2,3}` ifadesinin anlamı: 0 ile 0 aralığındaki karakterlerden, en az 2 en fazla 3 defa ile eşleş.
<pre>
"[0-9]{2,3}" => The number was 9.<a href="#learn-regex"><strong>999</strong></a>7 but we rounded it off to <a href="#learn-regex"><strong>10</strong></a>.0.
</pre>
[Düzenli ifadeyi test edin](https://regex101.com/r/juM86s/1)
İkinci numarayı boş bırakabiliriz.
Örneğin, `[0-9]{2,}` ifadesinin anlamı: En az 2 veya daha fazla defa eşleş.
Düzenli ifadeden virgülü kaldırırsak `[0-9]{3}`: doğrudan 3 defa eşleşir.
<pre>
"[0-9]{2,}" => The number was 9.<a href="#learn-regex"><strong>9997</strong></a> but we rounded it off to <a href="#learn-regex"><strong>10</strong></a>.0.
</pre>
[Düzenli ifadeyi test edin](https://regex101.com/r/Gdy4w5/1)
<pre>
"[0-9]{3}" => The number was 9.<a href="#learn-regex"><strong>999</strong></a>7 but we rounded it off to 10.0.
</pre>
[Düzenli ifadeyi test edin](https://regex101.com/r/Sivu30/1)
## 2.5 Karakter Grubu
Karakter grubu parantezler içine yazılmış alt desenler grubudur. Daha önce tasarım deseninde değindiğimiz gibi, bir karakterden önce bir miktar belirleyici koyarsak önceki karakteri tekrar eder. Fakat miktar belirleyiciyi bir karakter grubundan sonra koyarsak tüm karakter grubunu tekrarlar.
Örneğin: `(ab)*` düzenli ifadesi "ab" karakterinin sıfır veya daha fazla tekrarıyla eşleşir.
Ayrıca karakter grubu içinde `|` meta karakterini kullanabiliriz.
Örneğin, `(c|g|p)ar` düzenli ifadesinin anlamı: küçük `c`, `g` veya `p` karakteri, ardından `a` karakteri, ardından `r` karakteri gelir.
<pre>
"(c|g|p)ar" => The <a href="#learn-regex"><strong>car</strong></a> is <a href="#learn-regex"><strong>par</strong></a>ked in the <a href="#learn-regex"><strong>gar</strong></a>age.
</pre>
[Düzenli ifadeyi test edin](https://regex101.com/r/tUxrBG/1)
## 2.6 Değişim
Düzenli ifadede dik çizgi alternasyon(değişim, dönüşüm) tanımlamak için kullanılır. Alternasyon birden fazla ifade arasındaki bir koşul gibidir. Şu an, karakter grubu ve alternasyonun aynı şekilde çalıştığını düşünüyor olabilirsiniz. Ama, Karakter grubu ve alternasyon arasındaki büyük fark karakter grubu karakter düzeyinde çalışır ama alternasyon ifade düzeyinde çalışır.
Örneğin, `(T|t)he|car` düzenli ifadesinin anlamı: Büyük `T` ya da küçük `t` karakteri, ardından küçük `h` karakteri, ardından küçük `e` ya da `c` karakteri, ardından küçük `a`, ardından küçük `r` karakteri gelir.
<pre>
"(T|t)he|car" => <a href="#learn-regex"><strong>The</strong></a> <a href="#learn-regex"><strong>car</strong></a> is parked in <a href="#learn-regex"><strong>the</strong></a> garage.
</pre>
[Düzenli ifadeyi test edin](https://regex101.com/r/fBXyX0/1)
## 2.7 Özel Karakter Hariç Tutma
`\` işareti sonraki karakteri hariç tutmak için kullanılır. Bu bir semboülü ayrılmış karakterlerde `{ } [ ] / \ + * . $ ^ | ?` dahil olmak üzere eşleşen bir karakter olarak belirtmemizi sağlar. Bir özel karakteri eşleşen bir karakter olarak kullanmak için önüne `\` işareti getirin.
Örneğin, `.` düzenli ifadesi yeni satır hariç herhangi bir karakteri eşleştirmek için kullanılır.
Bir harf öbeği içinde nokta `.` karakterini yakalamak için `.` ayrılmış karakterini hariç tutmamız gerekir. Bunun için nokta önüne `\` işaretini koymamız gereklidir.
`(f|c|m)at\.?` düzenli ifadesinin anlamı: küçük `f`, `c`ya da `m` harfi, ardından küçük `a` harfi, ardından küçük `t` harfi, ardından opsiyonel `.` karakteri gelir.
<pre>
"(f|c|m)at\.?" => The <a href="#learn-regex"><strong>fat</strong></a> <a href="#learn-regex"><strong>cat</strong></a> sat on the <a href="#learn-regex"><strong>mat.</strong></a>
</pre>
[Düzenli ifadeyi test edin](https://regex101.com/r/DOc5Nu/1)
## 2.8 Sabitleyiciler
Düzenli ifadelerde, eşleşen sembolün girilen harf öbeğinin başlangıç sembolü veya bitiş sembolü olup olmadığını kontrol etmek için sabitleyicileri kullanırız.
Sabitleyiciler iki çeşittir: İlk çeşit eşleşen karakterin girişin ilk karakteri olup olmadığını kontrol eden şapka `^` karakteri, ve ikinci çeşit eşleşen karakterin girişin son karakteri olup olmadığını kontrol eden dolar `$` karakteridir.
### 2.8.1 Şapka İşareti
Şapka `^` işareti eşleşen karakterin giriş harf öbeğinin ilk karakteri olup olmadığını kontrol etmek için kullanılır.
Eğer `^a` düzenli ifadesini `abc` harf öbeğine uygularsak `a` ile eşleşir. Ama `^b` ifadesini uygularsak bir eşleşme bulamayız. Bunun nedeni `abc` harf öbeğinde `b` karakterinin başlangıç karakteri olmamasıdır.
Bir başka örnek üzerinden ilerlersek,
`^(T|t)he` düzenli ifadesinin anlamı: büyük `T` ya da `t` karakteri giriş harf öbeğinin ilk karakteri olmak üzere, ardından küçük `h`, ardından küçük `e` karakteri gelir.
<pre>
"(T|t)he" => <a href="#learn-regex"><strong>The</strong></a> car is parked in <a href="#learn-regex"><strong>the</strong></a> garage.
</pre>
[Düzenli ifadeyi test edin](https://regex101.com/r/5ljjgB/1)
<pre>
"^(T|t)he" => <a href="#learn-regex"><strong>The</strong></a> car is parked in the garage.
</pre>
[Düzenli ifadeyi test edin](https://regex101.com/r/jXrKne/1)
### 2.8.2 Dolar İşareti
Dolar `$` işareti eşleşen karakterin giriş harf öbeğinin son karakteri olup olmadığını kontrol etmek için kullanılır.
Örneğin, `(at\.)$` ifadesinin anlamı: küçük bir `a` karakteri, ardından küçük bir `t` karakteri, ardıdan nokta `.` karakteri gelir ve bu eşleşme harf öbeğinin sonunda olmalıdır.
<pre>
"(at\.)" => The fat c<a href="#learn-regex"><strong>at.</strong></a> s<a href="#learn-regex"><strong>at.</strong></a> on the m<a href="#learn-regex"><strong>at.</strong></a>
</pre>
[Düzenli ifadeyi test edin](https://regex101.com/r/y4Au4D/1)
<pre>
"(at\.)$" => The fat cat. sat. on the m<a href="#learn-regex"><strong>at.</strong></a>
</pre>
[Düzenli ifadeyi test edin](https://regex101.com/r/t0AkOd/1)
## 3. Kısaltma Karakter Takımları
Regex, yaygın olarak kullanılan düzenli ifadeler için uygun kısaltmalar sunan sık kullanılan karakter setleri için kısaltmalar sağlar.
Kullanılan karakter setleri kısaltmaları aşağıdaki gibidir:
|Kısaltma|Açıklama|
|:----:|----|
|.|Satır başı hariç herhangi bir karakter|
|\w|Alfanumerik karakterlerle eşleşir: `[a-zA-Z0-9_]`|
|\W|Alfanumerik olmayan karakterlerle eşleşir: `[^\w]`|
|\d|Rakamlarla eşlelir: `[0-9]`|
|\D|Rakam olmayan karakterlerle eşleşir: `[^\d]`|
|\s|Boşluk karakteri ile eşleşir: `[\t\n\f\r\p{Z}]`|
|\S|Boşluk karakteri olmayan karakterlerle eşleşir: `[^\s]`|
## 4. Bakınmak
Bakınma sembolleri, bir ifade öncesinde veya sonrasında başka bir ifademiz olduğunda kullanılırlar.
Örneğin, `$4.44 ve $10.88` girişlerinden `$` karakteri önündeki tüm sayıları almak istiyoruz, bu durumda `(?<=\$)[0-9\.]*` ifadesini kullanırız.
`(?<=\$)[0-9\.]*` ifadesinin anlamı: `.` karakterini içeren ve `$` karakteriyle devam eden tüm sayıları al.
Düzenli ifadelerde kullanılan bakınma sembolleri aşağıdadır:
|Sembol|Açıklama|
|:----:|----|
|?=|Positive Lookahead (Verdiğimiz ifade sonrası arar ve `eşleşme varsa` sonuç döndürür.)|
|?!|Negative Lookahead (Verdiğimiz ifade sonrası arar ve `eşleşme yoksa` sonuç döndürür.)|
|?<=|Positive Lookbehind (Verdiğimiz ifade öncesini arar ve `eşleşme varsa` sonuç döndürür.)|
|?<-!-|Negative Lookbehind Verdiğimiz ifade öncesini arar ve `eşleşme yoksa` sonuç döndürür.|
### 4.1 Positive Lookahead
Positive Lookahead, ifadenin ilk bölümü bakınma ifadesiyle devam etmesi gerektiğini savunur. Bulunan eşleşme yalnızca ifadenin ilk bölümüyle eşleşen metin içerir. Olumlu bir bakınma tanımlamak için, içinde eşittir işareti yer alan parantezler `(?=...)` şeklinde kullanılır. Bakınma ifadesi parantezler içinde eşittir işaretinden sonra yazılır.
Örneğin, `[T|t]he(?=\sfat)` ifadesinin anlamı: opsiyonel küçük bir `t` ya da büyük `T` harfi, ardından `h` harfi gelir, ardından `e` harfi gelir. Parantez içinde ise bu dizilimin bir boşluk karakterinden sonra `fat` öbeğiyle devam edeceğini tanımlıyoruz.
<pre>
"[T|t]he(?=\sfat)" => <a href="#learn-regex"><strong>The</strong></a> fat cat sat on the mat.
</pre>
[Düzenli ifadeyi test edin](https://regex101.com/r/IDDARt/1)
### 4.2 Negative Lookahead
Negative Lookahead sembolü positive lookahead tersine, verdiğimiz desenle devam etmemesi durumunda eşleşir. Bu sembol positive lookahead gibi tanımlanır ama `=` işareti yerine `!` kullanılır.
`[T|t]he(?!\sfat)` ifadesinin anlamı: opsiyonel küçük bir `t` ya da büyük `T` harfi, ardından `h` harfi gelir, ardından `e` harfi gelir, ardından öncesinde boşluk olan bir `fat` öbeği olmamalıdır.
<pre>
"[T|t]he(?!\sfat)" => The fat cat sat on <a href="#learn-regex"><strong>the</strong></a> mat.
</pre>
[Düzenli ifadeyi test edin](https://regex101.com/r/V32Npg/1)
### 4.3 Positive Lookbehind
Positive Lookbehind, belirli bir desenden önceki eşleşmeleri almak için kullanılır. `(?<=...)` ile gösterilir.
Örneğin, `(?<=[T|t]he\s)(fat|mat)` ifadesinin anlamı: Öncesinde `The` veya `the` öbekleri olan tüm `fat` veya `mat` öbeklerini getir.
<pre>
"(?<=[T|t]he\s)(fat|mat)" => The <a href="#learn-regex"><strong>fat</strong></a> cat sat on the <a href="#learn-regex"><strong>mat</strong></a>.
</pre>
[Düzenli ifadeyi test edin](https://regex101.com/r/avH165/1)
### 4.4 Negative Lookbehind
Negative Lookbehind, belirli bir desenden önce olmayan eşleşmeleri almak için kullanılır. `(?<=!..)` ile gösterilir.
Örneğin, `(?<!(T|t)he\s)(cat)` ifadesinin anlamı: Öncesinde `The` veya `the` öbekleri yer almayan tüm `cat` öbeklerini getir.
<pre>
"(?&lt;![T|t]he\s)(cat)" => The cat sat on <a href="#learn-regex"><strong>cat</strong></a>.
</pre>
[Düzenli ifadeyi test edin](https://regex101.com/r/8Efx5G/1)
## 5. İşaretler
İşaretler ayrıca düzenleyiciler olarak bilinirler, çünkü onlar bir düzenli ifadenin çıktısını düzenlerler. Bu işaretler herhangi bir sırada veya kombinasyonda kullanılabilirler, ve bunlar Düzenli İfadelerin ayrılmaz bir parçasıdırlar.
|İşaret|Açıklama|
|:----:|----|
|i|Büyük küçük harf duyarlılık: Eşleştirmeleri küçük/büyük harfe karşı duyarsız yapar.|
|g|Genel Arama: Girilen harf öbeği boyunca bir desen arar.|
|m|Çok satırlı: Sabitleyici meta karakteri her satırda çalışır.|
### 5.1 Büyük/Küçük harf duyarlılığı
`ì` işaretleyicisi büyük/küçük harfe duyarsız eşleştirme yapmak için kullanılır.
Örneğin, `/The/gi` ifadesi: büyük `T` harfi, ardından küçük `h` harfi, ardından küçük `e` harfi gelir. ifadenin sonunda yer alan `i` işareti büyük-küçük harfe karşı duyarsız olması gerektiğini belirtir. Ayrıca `g` işaretinide kullandığımızı görebilirsiniz, tüm text içinde bu aramayı yapmak istediğimiz için `g` işaretini ayrıca belirtiyoruz.
<pre>
"The" => <a href="#learn-regex"><strong>The</strong></a> fat cat sat on the mat.
</pre>
[Düzenli ifadeyi test edin](https://regex101.com/r/dpQyf9/1)
<pre>
"/The/gi" => <a href="#learn-regex"><strong>The</strong></a> fat cat sat on <a href="#learn-regex"><strong>the</strong></a> mat.
</pre>
[Düzenli ifadeyi test edin](https://regex101.com/r/ahfiuh/1)
### 5.2 Genel Arama
`g` işareti bir giriş içinde eşleşen tüm varsayonları bulmak için kullanılır. `g` işareti kullanılmazsa ilk eşleşme bulunduktan sonra arama sona erer.
<pre>
"/.(at)/" => The <a href="#learn-regex"><strong>fat</strong></a> cat sat on the mat.
</pre>
[Test the regular expression](https://regex101.com/r/jnk6gM/1)
<pre>
"/.(at)/g" => The <a href="#learn-regex"><strong>fat</strong></a> <a href="#learn-regex"><strong>cat</strong></a> <a href="#learn-regex"><strong>sat</strong></a> on the <a href="#learn-regex"><strong>mat</strong></a>.
</pre>
[Düzenli ifadeyi test edin](https://regex101.com/r/dO1nef/1)
### 5.3 Çok Satırlı
`m` işareti çok satırlı bir eşleşme sağlamak için kullanılır. Daha önce sabitleyicilerde gördüğümüz gibi `(^, $)` sembolleri aradığımız desenin harf öbeğinin başında veya sonunda olup olmadığını kontrol etmemiz için kullanılır. Bu sabitleyicilerin tüm satırlarda çalışması için `m` işaretini kullanırız.
Örneğin, `/at(.)?$/gm` ifadesinin anlamı: küçük `a` harfi, ardından küçük `t` harfi gelir, ardından opsiyonel olarak yeni satır hariç herhangi birşey gelebilir. `m` işaretini kullandığımız için bir girişin her satırının sonunda eşleştirir.
<pre>
"/.at(.)?$/" => The fat
cat sat
on the <a href="#learn-regex"><strong>mat.</strong></a>
</pre>
[Düzenli ifadeyi test edin](https://regex101.com/r/hoGMkP/1)
<pre>
"/.at(.)?$/gm" => The <a href="#learn-regex"><strong>fat</strong></a>
cat <a href="#learn-regex"><strong>sat</strong></a>
on the <a href="#learn-regex"><strong>mat.</strong></a>
</pre>
[Düzenli ifadeyi test edin](https://regex101.com/r/E88WE2/1)
## Contribution
* Report issues
* Open pull request with improvements
* Spread the word
* Reach out to me directly at ziishaned@gmail.com or [![Twitter URL](https://img.shields.io/twitter/url/https/twitter.com/ziishaned.svg?style=social&label=Follow%20%40ziishaned)](https://twitter.com/ziishaned)
## License
MIT © [Zeeshan Ahmed](mailto:ziishaned@gmail.com)

329
README.md
View File

@ -10,30 +10,44 @@
## Translations:
* [English](README.md)
* [Español](README-es.md)
* [Français](README-fr.md)
* [Português do Brasil](README-pt_BR.md)
* [中文版](README-cn.md)
* [日本語](README-ja.md)
<<<<<<< HEAD
>>>>>>> zeeshanu/master
=======
* [한국어](README-ko.md)
* [Turkish](README-tr.md)
>>>>>>> zeeshanu/master
## What is Regular Expression?
> Regular expression is a group of characters or symbols which is used to find a specific pattern from a text.
> Regular expression is a group of characters or symbols which is used to find a specific pattern from a text.
A regular expression is a pattern that is matched against a subject string from left to right. The word "Regular expression" is a
mouthful, you will usually find the term abbreviated as "regex" or "regexp". Regular expression is used for replacing a text within
a string, validating form, extract a substring from a string based upon a pattern match, and so much more.
A regular expression is a pattern that is matched against a subject string from
left to right. The word "Regular expression" is a mouthful, you will usually
find the term abbreviated as "regex" or "regexp". Regular expression is used for
replacing a text within a string, validating form, extract a substring from a
string based upon a pattern match, and so much more.
Imagine you are writing an application and you want to set the rules for when a
user chooses their username. We want to allow the username to contain letters,
numbers, underscores and hyphens. We also want to limit the number of characters
in username so it does not look ugly. We use the following regular expression to
validate a username:
Imagine you are writing an application and you want to set the rules for when a user chooses their username. We want to
allow the username to contain letters, numbers, underscores and hyphens. We also want to limit the number of
characters in username so it does not look ugly. We use the following regular expression to validate a username:
<br/><br/>
<p align="center">
<img src="https://i.imgur.com/ekFpQUg.png" alt="Regular expression">
<img src="./img/regexp-en.png" alt="Regular expression">
</p>
Above regular expression can accept the strings `john_doe`, `jo-hn_doe` and `john12_as`. It does not match `Jo` because that string
contains uppercase letter and also it is too short.
Above regular expression can accept the strings `john_doe`, `jo-hn_doe` and
`john12_as`. It does not match `Jo` because that string contains uppercase
letter and also it is too short.
## Table of Contents
## Table of Contents
- [Basic Matchers](#1-basic-matchers)
- [Meta character](#2-meta-characters)
@ -61,12 +75,12 @@ contains uppercase letter and also it is too short.
- [Case Insensitive](#51-case-insensitive)
- [Global search](#52-global-search)
- [Multiline](#53-multiline)
- [Bonus](#bonus)
## 1. Basic Matchers
A regular expression is just a pattern of characters that we use to perform search in a text. For example, the regular expression
`the` means: the letter `t`, followed by the letter `h`, followed by the letter `e`.
A regular expression is just a pattern of characters that we use to perform
search in a text. For example, the regular expression `the` means: the letter
`t`, followed by the letter `h`, followed by the letter `e`.
<pre>
"the" => The fat cat sat on <a href="#learn-regex"><strong>the</strong></a> mat.
@ -74,9 +88,11 @@ A regular expression is just a pattern of characters that we use to perform sear
[Test the regular expression](https://regex101.com/r/dmRygT/1)
The regular expression `123` matches the string `123`. The regular expression is matched against an input string by comparing each
character in the regular expression to each character in the input string, one after another. Regular expressions are normally
case-sensitive so the regular expression `The` would not match the string `the`.
The regular expression `123` matches the string `123`. The regular expression is
matched against an input string by comparing each character in the regular
expression to each character in the input string, one after another. Regular
expressions are normally case-sensitive so the regular expression `The` would
not match the string `the`.
<pre>
"The" => <a href="#learn-regex"><strong>The</strong></a> fat cat sat on the mat.
@ -86,9 +102,10 @@ case-sensitive so the regular expression `The` would not match the string `the`.
## 2. Meta Characters
Meta characters are the building blocks of the regular expressions. Meta characters do not stand for themselves but instead are
interpreted in some special way. Some meta characters have a special meaning and are written inside square brackets.
The meta characters are as follows:
Meta characters are the building blocks of the regular expressions. Meta
characters do not stand for themselves but instead are interpreted in some
special way. Some meta characters have a special meaning and are written inside
square brackets. The meta characters are as follows:
|Meta character|Description|
|:----:|----|
@ -107,9 +124,10 @@ The meta characters are as follows:
## 2.1 Full stop
Full stop `.` is the simplest example of meta character. The meta character `.` matches any single character. It will not match return
or newline characters. For example, the regular expression `.ar` means: any character, followed by the letter `a`, followed by the
letter `r`.
Full stop `.` is the simplest example of meta character. The meta character `.`
matches any single character. It will not match return or newline characters.
For example, the regular expression `.ar` means: any character, followed by the
letter `a`, followed by the letter `r`.
<pre>
".ar" => The <a href="#learn-regex"><strong>car</strong></a> <a href="#learn-regex"><strong>par</strong></a>ked in the <a href="#learn-regex"><strong>gar</strong></a>age.
@ -119,9 +137,11 @@ letter `r`.
## 2.2 Character set
Character sets are also called character class. Square brackets are used to specify character sets. Use a hyphen inside a character set to
specify the characters' range. The order of the character range inside square brackets doesn't matter. For example, the regular
expression `[Tt]he` means: an uppercase `T` or lowercase `t`, followed by the letter `h`, followed by the letter `e`.
Character sets are also called character class. Square brackets are used to
specify character sets. Use a hyphen inside a character set to specify the
characters' range. The order of the character range inside square brackets
doesn't matter. For example, the regular expression `[Tt]he` means: an uppercase
`T` or lowercase `t`, followed by the letter `h`, followed by the letter `e`.
<pre>
"[Tt]he" => <a href="#learn-regex"><strong>The</strong></a> car parked in <a href="#learn-regex"><strong>the</strong></a> garage.
@ -129,7 +149,9 @@ expression `[Tt]he` means: an uppercase `T` or lowercase `t`, followed by the le
[Test the regular expression](https://regex101.com/r/2ITLQ4/1)
A period inside a character set, however, means a literal period. The regular expression `ar[.]` means: a lowercase character `a`, followed by letter `r`, followed by a period `.` character.
A period inside a character set, however, means a literal period. The regular
expression `ar[.]` means: a lowercase character `a`, followed by letter `r`,
followed by a period `.` character.
<pre>
"ar[.]" => A garage is a good place to park a c<a href="#learn-regex"><strong>ar.</strong></a>
@ -139,9 +161,10 @@ A period inside a character set, however, means a literal period. The regular ex
### 2.2.1 Negated character set
In general, the caret symbol represents the start of the string, but when it is typed after the opening square bracket it negates the
character set. For example, the regular expression `[^c]ar` means: any character except `c`, followed by the character `a`, followed by
the letter `r`.
In general, the caret symbol represents the start of the string, but when it is
typed after the opening square bracket it negates the character set. For
example, the regular expression `[^c]ar` means: any character except `c`,
followed by the character `a`, followed by the letter `r`.
<pre>
"[^c]ar" => The car <a href="#learn-regex"><strong>par</strong></a>ked in the <a href="#learn-regex"><strong>gar</strong></a>age.
@ -151,14 +174,17 @@ the letter `r`.
## 2.3 Repetitions
Following meta characters `+`, `*` or `?` are used to specify how many times a subpattern can occur. These meta characters act
differently in different situations.
Following meta characters `+`, `*` or `?` are used to specify how many times a
subpattern can occur. These meta characters act differently in different
situations.
### 2.3.1 The Star
The symbol `*` matches zero or more repetitions of the preceding matcher. The regular expression `a*` means: zero or more repetitions
of preceding lowercase character `a`. But if it appears after a character set or class then it finds the repetitions of the whole
character set. For example, the regular expression `[a-z]*` means: any number of lowercase letters in a row.
The symbol `*` matches zero or more repetitions of the preceding matcher. The
regular expression `a*` means: zero or more repetitions of preceding lowercase
character `a`. But if it appears after a character set or class then it finds
the repetitions of the whole character set. For example, the regular expression
`[a-z]*` means: any number of lowercase letters in a row.
<pre>
"[a-z]*" => T<a href="#learn-regex"><strong>he</strong></a> <a href="#learn-regex"><strong>car</strong></a> <a href="#learn-regex"><strong>parked</strong></a> <a href="#learn-regex"><strong>in</strong></a> <a href="#learn-regex"><strong>the</strong></a> <a href="#learn-regex"><strong>garage</strong></a> #21.
@ -166,10 +192,12 @@ character set. For example, the regular expression `[a-z]*` means: any number of
[Test the regular expression](https://regex101.com/r/7m8me5/1)
The `*` symbol can be used with the meta character `.` to match any string of characters `.*`. The `*` symbol can be used with the
whitespace character `\s` to match a string of whitespace characters. For example, the expression `\s*cat\s*` means: zero or more
spaces, followed by lowercase character `c`, followed by lowercase character `a`, followed by lowercase character `t`, followed by
zero or more spaces.
The `*` symbol can be used with the meta character `.` to match any string of
characters `.*`. The `*` symbol can be used with the whitespace character `\s`
to match a string of whitespace characters. For example, the expression
`\s*cat\s*` means: zero or more spaces, followed by lowercase character `c`,
followed by lowercase character `a`, followed by lowercase character `t`,
followed by zero or more spaces.
<pre>
"\s*cat\s*" => The fat<a href="#learn-regex"><strong> cat </strong></a>sat on the <a href="#learn-regex">con<strong>cat</strong>enation</a>.
@ -179,8 +207,10 @@ zero or more spaces.
### 2.3.2 The Plus
The symbol `+` matches one or more repetitions of the preceding character. For example, the regular expression `c.+t` means: lowercase
letter `c`, followed by at least one character, followed by the lowercase character `t`.
The symbol `+` matches one or more repetitions of the preceding character. For
example, the regular expression `c.+t` means: lowercase letter `c`, followed by
at least one character, followed by the lowercase character `t`. It needs to be
clarified that `t` is the last `t` in the sentence.
<pre>
"c.+t" => The fat <a href="#learn-regex"><strong>cat sat on the mat</strong></a>.
@ -190,9 +220,11 @@ letter `c`, followed by at least one character, followed by the lowercase charac
### 2.3.3 The Question Mark
In regular expression the meta character `?` makes the preceding character optional. This symbol matches zero or one instance of
the preceding character. For example, the regular expression `[T]?he` means: Optional the uppercase letter `T`, followed by the lowercase
character `h`, followed by the lowercase character `e`.
In regular expression the meta character `?` makes the preceding character
optional. This symbol matches zero or one instance of the preceding character.
For example, the regular expression `[T]?he` means: Optional the uppercase
letter `T`, followed by the lowercase character `h`, followed by the lowercase
character `e`.
<pre>
"[T]he" => <a href="#learn-regex"><strong>The</strong></a> car is parked in the garage.
@ -208,9 +240,10 @@ character `h`, followed by the lowercase character `e`.
## 2.4 Braces
In regular expression braces that are also called quantifiers are used to specify the number of times that a
character or a group of characters can be repeated. For example, the regular expression `[0-9]{2,3}` means: Match at least 2 digits but not more than 3 (
characters in the range of 0 to 9).
In regular expression braces that are also called quantifiers are used to
specify the number of times that a character or a group of characters can be
repeated. For example, the regular expression `[0-9]{2,3}` means: Match at least
2 digits but not more than 3 ( characters in the range of 0 to 9).
<pre>
"[0-9]{2,3}" => The number was 9.<a href="#learn-regex"><strong>999</strong></a>7 but we rounded it off to <a href="#learn-regex"><strong>10</strong></a>.0.
@ -218,8 +251,9 @@ characters in the range of 0 to 9).
[Test the regular expression](https://regex101.com/r/juM86s/1)
We can leave out the second number. For example, the regular expression `[0-9]{2,}` means: Match 2 or more digits. If we also remove
the comma the regular expression `[0-9]{3}` means: Match exactly 3 digits.
We can leave out the second number. For example, the regular expression
`[0-9]{2,}` means: Match 2 or more digits. If we also remove the comma the
regular expression `[0-9]{3}` means: Match exactly 3 digits.
<pre>
"[0-9]{2,}" => The number was 9.<a href="#learn-regex"><strong>9997</strong></a> but we rounded it off to <a href="#learn-regex"><strong>10</strong></a>.0.
@ -235,10 +269,13 @@ the comma the regular expression `[0-9]{3}` means: Match exactly 3 digits.
## 2.5 Character Group
Character group is a group of sub-patterns that is written inside Parentheses `(...)`. As we discussed before that in regular expression
if we put a quantifier after a character then it will repeat the preceding character. But if we put quantifier after a character group then
it repeats the whole character group. For example, the regular expression `(ab)*` matches zero or more repetitions of the character "ab".
We can also use the alternation `|` meta character inside character group. For example, the regular expression `(c|g|p)ar` means: lowercase character `c`,
Character group is a group of sub-patterns that is written inside Parentheses `(...)`.
As we discussed before that in regular expression if we put a quantifier after a
character then it will repeat the preceding character. But if we put quantifier
after a character group then it repeats the whole character group. For example,
the regular expression `(ab)*` matches zero or more repetitions of the character
"ab". We can also use the alternation `|` meta character inside character group.
For example, the regular expression `(c|g|p)ar` means: lowercase character `c`,
`g` or `p`, followed by character `a`, followed by character `r`.
<pre>
@ -249,11 +286,15 @@ We can also use the alternation `|` meta character inside character group. For e
## 2.6 Alternation
In regular expression Vertical bar `|` is used to define alternation. Alternation is like a condition between multiple expressions. Now,
you may be thinking that character set and alternation works the same way. But the big difference between character set and alternation
is that character set works on character level but alternation works on expression level. For example, the regular expression
`(T|t)he|car` means: uppercase character `T` or lowercase `t`, followed by lowercase character `h`, followed by lowercase character `e`
or lowercase character `c`, followed by lowercase character `a`, followed by lowercase character `r`.
In regular expression Vertical bar `|` is used to define alternation.
Alternation is like a condition between multiple expressions. Now, you may be
thinking that character set and alternation works the same way. But the big
difference between character set and alternation is that character set works on
character level but alternation works on expression level. For example, the
regular expression `(T|t)he|car` means: uppercase character `T` or lowercase
`t`, followed by lowercase character `h`, followed by lowercase character `e` or
lowercase character `c`, followed by lowercase character `a`, followed by
lowercase character `r`.
<pre>
"(T|t)he|car" => <a href="#learn-regex"><strong>The</strong></a> <a href="#learn-regex"><strong>car</strong></a> is parked in <a href="#learn-regex"><strong>the</strong></a> garage.
@ -263,11 +304,16 @@ or lowercase character `c`, followed by lowercase character `a`, followed by low
## 2.7 Escaping special character
Backslash `\` is used in regular expression to escape the next character. This allows to to specify a symbol as a matching character
including reserved characters `{ } [ ] / \ + * . $ ^ | ?`. To use a special character as a matching character prepend `\` before it.
For example, the regular expression `.` is used to match any character except newline. Now to match `.` in an input string the regular
expression `(f|c|m)at\.?` means: lowercase letter `f`, `c` or `m`, followed by lowercase character `a`, followed by lowercase letter
`t`, followed by optional `.` character.
Backslash `\` is used in regular expression to escape the next character. This
allows us to specify a symbol as a matching character including reserved
characters `{ } [ ] / \ + * . $ ^ | ?`. To use a special character as a matching
character prepend `\` before it.
For example, the regular expression `.` is used to match any character except
newline. Now to match `.` in an input string the regular expression
`(f|c|m)at\.?` means: lowercase letter `f`, `c` or `m`, followed by lowercase
character `a`, followed by lowercase letter `t`, followed by optional `.`
character.
<pre>
"(f|c|m)at\.?" => The <a href="#learn-regex"><strong>fat</strong></a> <a href="#learn-regex"><strong>cat</strong></a> sat on the <a href="#learn-regex"><strong>mat.</strong></a>
@ -277,18 +323,22 @@ expression `(f|c|m)at\.?` means: lowercase letter `f`, `c` or `m`, followed by l
## 2.8 Anchors
In regular expressions, we use anchors to check if the matching symbol is the starting symbol or ending symbol of the
input string. Anchors are of two types: First type is Caret `^` that check if the matching character is the start
character of the input and the second type is Dollar `$` that checks if matching character is the last character of the
input string.
In regular expressions, we use anchors to check if the matching symbol is the
starting symbol or ending symbol of the input string. Anchors are of two types:
First type is Caret `^` that check if the matching character is the start
character of the input and the second type is Dollar `$` that checks if matching
character is the last character of the input string.
### 2.8.1 Caret
Caret `^` symbol is used to check if matching character is the first character of the input string. If we apply the following regular
expression `^a` (if a is the starting symbol) to input string `abc` it matches `a`. But if we apply regular expression `^b` on above
input string it does not match anything. Because in input string `abc` "b" is not the starting symbol. Let's take a look at another
regular expression `^(T|t)he` which means: uppercase character `T` or lowercase character `t` is the start symbol of the input string,
followed by lowercase character `h`, followed by lowercase character `e`.
Caret `^` symbol is used to check if matching character is the first character
of the input string. If we apply the following regular expression `^a` (if a is
the starting symbol) to input string `abc` it matches `a`. But if we apply
regular expression `^b` on above input string it does not match anything.
Because in input string `abc` "b" is not the starting symbol. Let's take a look
at another regular expression `^(T|t)he` which means: uppercase character `T` or
lowercase character `t` is the start symbol of the input string, followed by
lowercase character `h`, followed by lowercase character `e`.
<pre>
"(T|t)he" => <a href="#learn-regex"><strong>The</strong></a> car is parked in <a href="#learn-regex"><strong>the</strong></a> garage.
@ -304,9 +354,10 @@ followed by lowercase character `h`, followed by lowercase character `e`.
### 2.8.2 Dollar
Dollar `$` symbol is used to check if matching character is the last character of the input string. For example, regular expression
`(at\.)$` means: a lowercase character `a`, followed by lowercase character `t`, followed by a `.` character and the matcher
must be end of the string.
Dollar `$` symbol is used to check if matching character is the last character
of the input string. For example, regular expression `(at\.)$` means: a
lowercase character `a`, followed by lowercase character `t`, followed by a `.`
character and the matcher must be end of the string.
<pre>
"(at\.)" => The fat c<a href="#learn-regex"><strong>at.</strong></a> s<a href="#learn-regex"><strong>at.</strong></a> on the m<a href="#learn-regex"><strong>at.</strong></a>
@ -322,8 +373,9 @@ must be end of the string.
## 3. Shorthand Character Sets
Regular expression provides shorthands for the commonly used character sets, which offer convenient shorthands for commonly used
regular expressions. The shorthand character sets are as follows:
Regular expression provides shorthands for the commonly used character sets,
which offer convenient shorthands for commonly used regular expressions. The
shorthand character sets are as follows:
|Shorthand|Description|
|:----:|----|
@ -337,11 +389,15 @@ regular expressions. The shorthand character sets are as follows:
## 4. Lookaround
Lookbehind and lookahead sometimes known as lookaround are specific type of ***non-capturing group*** (Use to match the pattern but not
included in matching list). Lookaheads are used when we have the condition that this pattern is preceded or followed by another certain
pattern. For example, we want to get all numbers that are preceded by `$` character from the following input string `$4.44 and $10.88`.
We will use following regular expression `(?<=\$)[0-9\.]*` which means: get all the numbers which contain `.` character and are preceded
by `$` character. Following are the lookarounds that are used in regular expressions:
Lookbehind and lookahead (also called lookaround) are specific types of
***non-capturing groups*** (Used to match the pattern but not included in matching
list). Lookaheads are used when we have the condition that this pattern is
preceded or followed by another certain pattern. For example, we want to get all
numbers that are preceded by `$` character from the following input string
`$4.44 and $10.88`. We will use following regular expression `(?<=\$)[0-9\.]*`
which means: get all the numbers which contain `.` character and are preceded
by `$` character. Following are the lookarounds that are used in regular
expressions:
|Symbol|Description|
|:----:|----|
@ -352,60 +408,70 @@ by `$` character. Following are the lookarounds that are used in regular express
### 4.1 Positive Lookahead
The positive lookahead asserts that the first part of the expression must be followed by the lookahead expression. The returned match
only contains the text that is matched by the first part of the expression. To define a positive lookahead, parentheses are used. Within
those parentheses, a question mark with equal sign is used like this: `(?=...)`. Lookahead expression is written after the equal sign inside
parentheses. For example, the regular expression `[T|t]he(?=\sfat)` means: optionally match lowercase letter `t` or uppercase letter `T`,
followed by letter `h`, followed by letter `e`. In parentheses we define positive lookahead which tells regular expression engine to match
`The` or `the` which are followed by the word `fat`.
The positive lookahead asserts that the first part of the expression must be
followed by the lookahead expression. The returned match only contains the text
that is matched by the first part of the expression. To define a positive
lookahead, parentheses are used. Within those parentheses, a question mark with
equal sign is used like this: `(?=...)`. Lookahead expression is written after
the equal sign inside parentheses. For example, the regular expression
`(T|t)he(?=\sfat)` means: optionally match lowercase letter `t` or uppercase
letter `T`, followed by letter `h`, followed by letter `e`. In parentheses we
define positive lookahead which tells regular expression engine to match `The`
or `the` which are followed by the word `fat`.
<pre>
"[T|t]he(?=\sfat)" => <a href="#learn-regex"><strong>The</strong></a> fat cat sat on the mat.
"(T|t)he(?=\sfat)" => <a href="#learn-regex"><strong>The</strong></a> fat cat sat on the mat.
</pre>
[Test the regular expression](https://regex101.com/r/IDDARt/1)
### 4.2 Negative Lookahead
Negative lookahead is used when we need to get all matches from input string that are not followed by a pattern. Negative lookahead
defined same as we define positive lookahead but the only difference is instead of equal `=` character we use negation `!` character
i.e. `(?!...)`. Let's take a look at the following regular expression `[T|t]he(?!\sfat)` which means: get all `The` or `the` words from
input string that are not followed by the word `fat` precedes by a space character.
Negative lookahead is used when we need to get all matches from input string
that are not followed by a pattern. Negative lookahead defined same as we define
positive lookahead but the only difference is instead of equal `=` character we
use negation `!` character i.e. `(?!...)`. Let's take a look at the following
regular expression `(T|t)he(?!\sfat)` which means: get all `The` or `the` words
from input string that are not followed by the word `fat` precedes by a space
character.
<pre>
"[T|t]he(?!\sfat)" => The fat cat sat on <a href="#learn-regex"><strong>the</strong></a> mat.
"(T|t)he(?!\sfat)" => The fat cat sat on <a href="#learn-regex"><strong>the</strong></a> mat.
</pre>
[Test the regular expression](https://regex101.com/r/V32Npg/1)
### 4.3 Positive Lookbehind
Positive lookbehind is used to get all the matches that are preceded by a specific pattern. Positive lookbehind is denoted by
`(?<=...)`. For example, the regular expression `(?<=[T|t]he\s)(fat|mat)` means: get all `fat` or `mat` words from input string that
are after the word `The` or `the`.
Positive lookbehind is used to get all the matches that are preceded by a
specific pattern. Positive lookbehind is denoted by `(?<=...)`. For example, the
regular expression `(?<=(T|t)he\s)(fat|mat)` means: get all `fat` or `mat` words
from input string that are after the word `The` or `the`.
<pre>
"(?<=[T|t]he\s)(fat|mat)" => The <a href="#learn-regex"><strong>fat</strong></a> cat sat on the <a href="#learn-regex"><strong>mat</strong></a>.
"(?<=(T|t)he\s)(fat|mat)" => The <a href="#learn-regex"><strong>fat</strong></a> cat sat on the <a href="#learn-regex"><strong>mat</strong></a>.
</pre>
[Test the regular expression](https://regex101.com/r/avH165/1)
### 4.4 Negative Lookbehind
Negative lookbehind is used to get all the matches that are not preceded by a specific pattern. Negative lookbehind is denoted by
`(?<!...)`. For example, the regular expression `(?<!(T|t)he\s)(cat)` means: get all `cat` words from input string that
are not after the word `The` or `the`.
Negative lookbehind is used to get all the matches that are not preceded by a
specific pattern. Negative lookbehind is denoted by `(?<!...)`. For example, the
regular expression `(?<!(T|t)he\s)(cat)` means: get all `cat` words from input
string that are not after the word `The` or `the`.
<pre>
"(?&lt;![T|t]he\s)(cat)" => The cat sat on <a href="#learn-regex"><strong>cat</strong></a>.
"(?&lt;!(T|t)he\s)(cat)" => The cat sat on <a href="#learn-regex"><strong>cat</strong></a>.
</pre>
[Test the regular expression](https://regex101.com/r/8Efx5G/1)
## 5. Flags
Flags are also called modifiers because they modify the output of a regular expression. These flags can be used in any order or
combination, and are an integral part of the RegExp.
Flags are also called modifiers because they modify the output of a regular
expression. These flags can be used in any order or combination, and are an
integral part of the RegExp.
|Flag|Description|
|:----:|----|
@ -415,10 +481,12 @@ combination, and are an integral part of the RegExp.
### 5.1 Case Insensitive
The `i` modifier is used to perform case-insensitive matching. For example, the regular expression `/The/gi` means: uppercase letter
`T`, followed by lowercase character `h`, followed by character `e`. And at the end of regular expression the `i` flag tells the
regular expression engine to ignore the case. As you can see we also provided `g` flag because we want to search for the pattern in
the whole input string.
The `i` modifier is used to perform case-insensitive matching. For example, the
regular expression `/The/gi` means: uppercase letter `T`, followed by lowercase
character `h`, followed by character `e`. And at the end of regular expression
the `i` flag tells the regular expression engine to ignore the case. As you can
see we also provided `g` flag because we want to search for the pattern in the
whole input string.
<pre>
"The" => <a href="#learn-regex"><strong>The</strong></a> fat cat sat on the mat.
@ -434,10 +502,11 @@ the whole input string.
### 5.2 Global search
The `g` modifier is used to perform a global match (find all matches rather than stopping after the first match). For example, the
regular expression`/.(at)/g` means: any character except new line, followed by lowercase character `a`, followed by lowercase
character `t`. Because we provided `g` flag at the end of the regular expression now it will find every matches from whole input
string.
The `g` modifier is used to perform a global match (find all matches rather than
stopping after the first match). For example, the regular expression`/.(at)/g`
means: any character except new line, followed by lowercase character `a`,
followed by lowercase character `t`. Because we provided `g` flag at the end of
the regular expression now it will find all matches in the input string, not just the first one (which is the default behavior).
<pre>
"/.(at)/" => The <a href="#learn-regex"><strong>fat</strong></a> cat sat on the mat.
@ -453,10 +522,13 @@ string.
### 5.3 Multiline
The `m` modifier is used to perform a multi-line match. As we discussed earlier anchors `(^, $)` are used to check if pattern is
the beginning of the input or end of the input string. But if we want that anchors works on each line we use `m` flag. For example, the
regular expression `/at(.)?$/gm` means: lowercase character `a`, followed by lowercase character `t`, optionally anything except new
line. And because of `m` flag now regular expression engine matches pattern at the end of each line in a string.
The `m` modifier is used to perform a multi-line match. As we discussed earlier
anchors `(^, $)` are used to check if pattern is the beginning of the input or
end of the input string. But if we want that anchors works on each line we use
`m` flag. For example, the regular expression `/at(.)?$/gm` means: lowercase
character `a`, followed by lowercase character `t`, optionally anything except
new line. And because of `m` flag now regular expression engine matches pattern
at the end of each line in a string.
<pre>
"/.at(.)?$/" => The fat
@ -474,34 +546,11 @@ line. And because of `m` flag now regular expression engine matches pattern at t
[Test the regular expression](https://regex101.com/r/E88WE2/1)
## Bonus
* *Positive Integers*: `^\d+$`
* *Negative Integers*: `^-\d+$`
* *US Phone Number*: `^+?[\d\s]{3,}$`
* *US Phone with code*: `^+?[\d\s]+(?[\d\s]{10,}$`
* *Integers*: `^-?\d+$`
* *Username*: `^[\w.]{4,16}$`
* *Alpha-numeric characters*: `^[a-zA-Z0-9]*$`
* *Alpha-numeric characters with spaces*: `^[a-zA-Z0-9 ]*$`
* *Password*: `^(?=^.{6,}$)((?=.*[A-Za-z0-9])(?=.*[A-Z])(?=.*[a-z]))^.*$`
* *email*: `^([a-zA-Z0-9._%-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,})*$`
* *IPv4 address*: `^((?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))*$`
* *Lowercase letters only*: `^([a-z])*$`
* *Uppercase letters only*: `^([A-Z])*$`
* *URL*: `^(((http|https|ftp):\/\/)?([[a-zA-Z0-9]\-\.])+(\.)([[a-zA-Z0-9]]){2,4}([[a-zA-Z0-9]\/+=%&_\.~?\-]*))*$`
* *VISA credit card numbers*: `^(4[0-9]{12}(?:[0-9]{3})?)*$`
* *Date (DD/MM/YYYY)*: `^(0?[1-9]|[12][0-9]|3[01])[- /.](0?[1-9]|1[012])[- /.](19|20)?[0-9]{2}$`
* *Date (MM/DD/YYYY)*: `^(0?[1-9]|1[012])[- /.](0?[1-9]|[12][0-9]|3[01])[- /.](19|20)?[0-9]{2}$`
* *Date (YYYY/MM/DD)*: `^(19|20)?[0-9]{2}[- /.](0?[1-9]|1[012])[- /.](0?[1-9]|[12][0-9]|3[01])$`
* *MasterCard credit card numbers*: `^(5[1-5][0-9]{14})*$`
* *Hashtags*: Including hashtags with preceding text (abc123#xyz456) or containing white spaces within square brackets (#[foo bar]) : `\S*#(?:\[[^\]]+\]|\S+)`
* *@mentions*: `\B@[a-z0-9_-]+`
## Contribution
* Report issues
* Open pull request with improvements
* Spread the word
* Spread the word
* Reach out to me directly at ziishaned@gmail.com or [![Twitter URL](https://img.shields.io/twitter/url/https/twitter.com/ziishaned.svg?style=social&label=Follow%20%40ziishaned)](https://twitter.com/ziishaned)
## License

BIN
img/img_original.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 6.5 KiB

BIN
img/regexp-en.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 32 KiB

BIN
img/regexp-es.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 34 KiB

BIN
img/regexp-fr.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 32 KiB

BIN
img/regexp-tr.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 34 KiB

397
img/regexp.svg Normal file

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 35 KiB