Clarify findRE, replaceRE, and findRESubmatch (#2064)

Closes #2063
This commit is contained in:
Joe Mooring 2023-04-19 11:24:58 -07:00 committed by Joe Mooring
parent e5eedbb5e8
commit 47a9181b51
3 changed files with 133 additions and 29 deletions

View File

@ -9,13 +9,25 @@ keywords: [regex]
signature:
- "findRE PATTERN INPUT [LIMIT]"
- "strings.FindRE PATTERN INPUT [LIMIT]"
relatedfuncs: [replaceRE]
relatedfuncs: [findRESubmatch, replaceRE]
---
By default, the `findRE` function finds all matches. You can limit the number of matches with an optional LIMIT parameter.
By default, `findRE` finds all matches. You can limit the number of matches with an optional LIMIT parameter.
When specifying the regular expression, use a raw [string literal] (backticks) instead of an interpreted string literal (double quotes) to simplify the syntax. With an interpreted string literal you must escape backslashes.
The syntax of the regular expression is the same general syntax used by Perl, Python, and other languages. More precisely, it is the syntax accepted by [RE2] except for `\C`.
[string literal]: https://go.dev/ref/spec#String_literals
This function uses the [RE2] regular expression library. See the [RE2 syntax documentation] for details. Note that the RE2 `\C` escape sequence is not supported.
[RE2]: https://github.com/google/re2/
[RE2 syntax documentation]: https://github.com/google/re2/wiki/Syntax/
{{% note %}}
The RE2 syntax is a subset of that accepted by [PCRE], roughly speaking, and with various [caveats].
[caveats]: https://swtch.com/~rsc/regexp/regexp3.html#caveats
[PCRE]: https://www.pcre.org/
{{% /note %}}
This example returns a slice of all second level headings (`h2` elements) within the rendered `.Content`:
@ -34,25 +46,3 @@ To limit the number of matches to one:
{{% note %}}
You can write and test your regular expression using [regex101.com](https://regex101.com/). Be sure to select the Go flavor before you begin.
{{% /note %}}
## findRESubmatch
In Hugo 0.110.0 we added a variant of `findRE` that returns a slice of strings holding the text of the leftmost match of the regular expression in s and the matches, if any, of its subexpressions.
This:
```go-html-template
{{ findRESubmatch §§<a\s*href="(.+?)">(.+?)</a>§§ §§<li><a href="#foo">Foo</a></li> <li><a href="#bar">Bar</a></li>§§ | print | safeHTML }}
```
Will print:
```
[[<a href=\"#foo\">Foo</a> #foo Foo] [<a href=\"#bar\">Bar</a> #bar Bar]]
```
{{< new-in "0.110.0" >}}
[RE2]: https://github.com/google/re2/wiki/Syntax
[string literal]: https://go.dev/ref/spec#String_literals

View File

@ -0,0 +1,102 @@
---
title: findRESubmatch
description: Returns a slice of strings holding the text of the leftmost match of the regular expression and the matches, if any, of its subexpressions
categories: [functions]
menu:
docs:
parent: functions
keywords: [regex]
signature:
- "findRESubmatch PATTERN INPUT [LIMIT]"
- "strings.FindRESubmatch PATTERN INPUT [LIMIT]"
relatedfuncs: [findRE, replaceRE]
---
By default, `findRESubmatch` finds all matches. You can limit the number of matches with an optional LIMIT parameter. A return value of nil indicates no match.
When specifying the regular expression, use a raw [string literal] (backticks) instead of an interpreted string literal (double quotes) to simplify the syntax. With an interpreted string literal you must escape backslashes.
[string literal]: https://go.dev/ref/spec#String_literals
This function uses the [RE2] regular expression library. See the [RE2 syntax documentation] for details. Note that the RE2 `\C` escape sequence is not supported.
[RE2]: https://github.com/google/re2/
[RE2 syntax documentation]: https://github.com/google/re2/wiki/Syntax/
{{% note %}}
The RE2 syntax is a subset of that accepted by [PCRE], roughly speaking, and with various [caveats].
[caveats]: https://swtch.com/~rsc/regexp/regexp3.html#caveats
[PCRE]: https://www.pcre.org/
{{% /note %}}
## Demonstrative examples
```go-html-template
{{ findRESubmatch `a(x*)b` "-ab-" }} → [["ab" ""]]
{{ findRESubmatch `a(x*)b` "-axxb-" }} → [["axxb" "xx"]]
{{ findRESubmatch `a(x*)b` "-ab-axb-" }} → [["ab" ""] ["axb" "x"]]
{{ findRESubmatch `a(x*)b` "-axxb-ab-" }} → [["axxb" "xx"] ["ab" ""]]
{{ findRESubmatch `a(x*)b` "-axxb-ab-" 1 }} → [["axxb" "xx"]]
```
## Practical example
This markdown:
```text
- [Example](https://example.org)
- [Hugo](https://gohugo.io)
```
Produces this HTML:
```html
<ul>
<li><a href="https://example.org">Example</a></li>
<li><a href="https://gohugo.io">Hugo</a></li>
</ul>
```
To match the anchor elements, capturing the link destination and text:
```go-html-template
{{ $regex := `<a\s*href="(.+?)">(.+?)</a>` }}
{{ $matches := findRESubmatch $regex .Content }}
```
Viewed as JSON, the data structure of `$matches` in the code above is:
```json
[
[
"<a href=\"https://example.org\"></a>Example</a>",
"https://example.org",
"Example"
],
[
"<a href=\"https://gohugo.io\">Hugo</a>",
"https://gohugo.io",
"Hugo"
]
]
```
To render the `href` attributes:
```go-html-template
{{ range $matches }}
{{ index . 1 }}
{{ end }}
```
Result:
```text
https://example.org
https://gohugo.io
```
{{% note %}}
You can write and test your regular expression using [regex101.com](https://regex101.com/). Be sure to select the Go flavor before you begin.
{{% /note %}}

View File

@ -5,17 +5,29 @@ categories: [functions]
menu:
docs:
parent: functions
keywords: [replace regex]
keywords: [regex]
signature:
- "replaceRE PATTERN REPLACEMENT INPUT [LIMIT]"
- "strings.ReplaceRE PATTERN REPLACEMENT INPUT [LIMIT]"
relatedfuncs: [replace,findRE]
relatedfuncs: [findRE, FindRESubmatch, replace]
---
By default, the `replaceRE` function replaces all matches. You can limit the number of matches with an optional LIMIT parameter.
By default, `replaceRE` replaces all matches. You can limit the number of matches with an optional LIMIT parameter.
When specifying the regular expression, use a raw [string literal] (backticks) instead of an interpreted string literal (double quotes) to simplify the syntax. With an interpreted string literal you must escape backslashes.
The syntax of the regular expression is the same general syntax used by Perl, Python, and other languages. More precisely, it is the syntax accepted by [RE2] except for `\C`.
[string literal]: https://go.dev/ref/spec#String_literals
This function uses the [RE2] regular expression library. See the [RE2 syntax documentation] for details. Note that the RE2 `\C` escape sequence is not supported.
[RE2]: https://github.com/google/re2/
[RE2 syntax documentation]: https://github.com/google/re2/wiki/Syntax/
{{% note %}}
The RE2 syntax is a subset of that accepted by [PCRE], roughly speaking, and with various [caveats].
[caveats]: https://swtch.com/~rsc/regexp/regexp3.html#caveats
[PCRE]: https://www.pcre.org/
{{% /note %}}
This example replaces two or more consecutive hyphens with a single hyphen: