Go regexp 包详解

概述

regexp 包实现了正则表达式搜索功能。它接受的语法与 Perl、Python 等语言使用的通用语法相同，更确切地说，是 RE2 接受的语法。

重要特性：

基于 RE2 引擎实现
保证在线性时间内运行（与输入大小成正比）
无回溯，避免指数级复杂度
内存安全
原生支持 UTF-8
所有字符都是 UTF-8 编码的码点

语法文档：https://golang.org/s/re2syntax

包导入

import "regexp"

基本使用

package main

import (
    "fmt"
    "regexp"
)

func main() {
    // 编译正则表达式
    re := regexp.MustCompile(`\d+`)
    
    // 检查是否匹配
    matched := re.MatchString("abc123")
    fmt.Println("Matched:", matched)
    
    // 查找第一个匹配
    found := re.FindString("abc123def456")
    fmt.Println("Found:", found)
    
    // 查找所有匹配
    all := re.FindAllString("abc123def456", -1)
    fmt.Println("All:", all)
}

运行结果：

Matched: true
Found: 123
All: [123 456]

方法命名模式

regexp 包的方法遵循统一的命名模式：

Find(All)?(String)?(Submatch)?(Index)?

命名规则说明：

后缀	说明
(无)	查找第一个匹配
All	查找所有非重叠匹配
String	参数是字符串，返回字符串
(无)	参数是 `[]byte`，返回 `[]byte`
Submatch	返回子匹配（捕获组）
Index	返回字节索引对

函数详解

Match

func Match(pattern string, b []byte) (matched bool, err error)

说明：检查正则表达式是否匹配 byte slice。

使用示例：

package main

import (
    "fmt"
    "regexp"
)

func main() {
    matched, err := regexp.Match(`\d+`, []byte("abc123"))
    if err != nil {
        fmt.Println("Error:", err)
        return
    }
    fmt.Println("Matched:", matched)
}

运行结果：

Matched: true

MatchReader

func MatchReader(pattern string, r io.RuneReader) (matched bool, err error)

说明：检查正则表达式是否匹配 RuneReader 的内容。

使用示例：

package main

import (
    "fmt"
    "io"
    "regexp"
    "strings"
)

func main() {
    reader := strings.NewReader("hello 123")
    matched, err := regexp.MatchReader(`\d+`, reader)
    if err != nil {
        fmt.Println("Error:", err)
        return
    }
    fmt.Println("Matched:", matched)
}

运行结果：

Matched: true

MatchString

func MatchString(pattern string, s string) (matched bool, err error)

说明：检查正则表达式是否匹配字符串。

使用示例：

package main

import (
    "fmt"
    "regexp"
)

func main() {
    patterns := []string{
        `^\d+$`,      // 纯数字
        `^[a-z]+$`,   // 纯小写字母
        `^test`,      // 以 test 开头
        `end$`,       // 以 end 结尾
    }
    
    tests := []string{
        "12345",
        "abc",
        "test123",
        "hello end",
    }
    
    for _, pattern := range patterns {
        fmt.Printf("\nPattern: %s\n", pattern)
        for _, test := range tests {
            matched, _ := regexp.MatchString(pattern, test)
            fmt.Printf("  %q -> %v\n", test, matched)
        }
    }
}

运行结果：

Pattern: ^\d+$
  "12345" -> true
  "abc" -> false
  "test123" -> false
  "hello end" -> false

Pattern: ^[a-z]+$
  "12345" -> false
  "abc" -> true
  "test123" -> false
  "hello end" -> false

Pattern: ^test
  "12345" -> false
  "abc" -> false
  "test123" -> true
  "hello end" -> false

Pattern: end$
  "12345" -> false
  "abc" -> false
  "test123" -> false
  "hello end" -> true

QuoteMeta

func QuoteMeta(s string) string

说明：返回正则表达式元字符被转义的字符串，用于匹配字面文本。

使用示例：

package main

import (
    "fmt"
    "regexp"
)

func main() {
    // 匹配包含特殊字符的文本
    text := "Cost: $100.00 (50% off)"
    
    // 直接匹配会失败（$和.是元字符）
    pattern1 := `$100.00`
    re1 := regexp.MustCompile(pattern1)
    fmt.Println("Direct match:", re1.MatchString(text))
    
    // 使用 QuoteMeta 转义元字符
    literal := `$100.00`
    escaped := regexp.QuoteMeta(literal)
    re2 := regexp.MustCompile(escaped)
    fmt.Println("Quoted match:", re2.MatchString(text))
    
    fmt.Println("Original:", literal)
    fmt.Println("Escaped:", escaped)
}

运行结果：

Direct match: false
Quoted match: true
Original: $100.00
Escaped: \$100\.00

类型详解

Regexp

Regexp 表示编译后的正则表达式对象。

type Regexp struct {
    // 未导出字段
}

重要说明：Regexp 是并发安全的，可以被多个 goroutine 同时使用。

构造函数

Compile

func Compile(expr string) (*Regexp, error)

说明：编译正则表达式。

使用示例：

package main

import (
    "fmt"
    "regexp"
)

func main() {
    // 安全编译（推荐）
    re, err := regexp.Compile(`\d+`)
    if err != nil {
        fmt.Println("Compile error:", err)
        return
    }
    
    fmt.Println("Match:", re.MatchString("abc123"))
}

运行结果：

Match: true

CompilePOSIX

func CompilePOSIX(expr string) (*Regexp, error)

说明：使用 POSIX 语法编译正则表达式。

使用示例：

package main

import (
    "fmt"
    "regexp"
)

func main() {
    // POSIX 语法更严格
    re, err := regexp.CompilePOSIX(`^[0-9]+$`)
    if err != nil {
        fmt.Println("Error:", err)
        return
    }
    
    fmt.Println("Match:", re.MatchString("12345"))
}

运行结果：

Match: true

MustCompile

func MustCompile(str string) *Regexp

说明：编译正则表达式，失败则 panic。适合在初始化时使用。

使用示例：

package main

import (
    "fmt"
    "regexp"
)

// 全局变量，在 init 时编译
var digitRegex = regexp.MustCompile(`\d+`)

func main() {
    fmt.Println("Match:", digitRegex.MatchString("abc123"))
}

运行结果：

Match: true

MustCompilePOSIX

func MustCompilePOSIX(str string) *Regexp

说明：使用 POSIX 语法编译，失败则 panic。

使用示例：

package main

import (
    "fmt"
    "regexp"
)

var emailRegex = regexp.MustCompile(`^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$`)

func isValidEmail(email string) bool {
    return emailRegex.MatchString(email)
}

func main() {
    emails := []string{
        "test@example.com",
        "invalid.email",
        "user@domain.co.uk",
    }
    
    for _, email := range emails {
        fmt.Printf("%s -> %v\n", email, isValidEmail(email))
    }
}

运行结果：

test@example.com -> true
invalid.email -> false
user@domain.co.uk -> true

Regexp 方法（按 a-z 排序）

AppendText

func (re *Regexp) AppendText(b []byte) ([]byte, error)

说明：将正则表达式的文本表示追加到 b 中。

使用示例：

package main

import (
    "fmt"
    "regexp"
)

func main() {
    re := regexp.MustCompile(`\d+`)
    
    text, err := re.AppendText(nil)
    if err != nil {
        fmt.Println("Error:", err)
        return
    }
    
    fmt.Printf("Pattern: %s\n", text)
}

运行结果：

Pattern: \d+

Copy

func (re *Regexp) Copy() *Regexp

说明：创建 Regexp 的深拷贝。

使用示例：

package main

import (
    "fmt"
    "regexp"
)

func main() {
    re1 := regexp.MustCompile(`\d+`)
    re2 := re1.Copy()
    
    fmt.Println("re1 Match:", re1.MatchString("123"))
    fmt.Println("re2 Match:", re2.MatchString("456"))
    fmt.Println("Same pattern:", re1.String() == re2.String())
}

运行结果：

re1 Match: true
re2 Match: true
Same pattern: true

Expand

func (re *Regexp) Expand(dst []byte, template []byte, src []byte, match []int) []byte

说明：使用模板扩展匹配结果。

使用示例：

package main

import (
    "fmt"
    "regexp"
)

func main() {
    re := regexp.MustCompile(`(\w+),(\w+)`)
    src := []byte("hello,world")
    match := re.FindSubmatchIndex(src)
    
    // $1 表示第一个捕获组，$2 表示第二个
    template := []byte("$2 $1")
    result := re.Expand(nil, template, src, match)
    
    fmt.Printf("Result: %s\n", result)
}

运行结果：

Result: world hello

ExpandString

func (re *Regexp) ExpandString(dst []byte, template string, src string, match []int) []byte

说明：使用字符串模板扩展匹配结果。

使用示例：

package main

import (
    "fmt"
    "regexp"
)

func main() {
    re := regexp.MustCompile(`(\w+) (\w+)`)
    src := "John Doe"
    match := re.FindStringSubmatchIndex(src)
    
    // $1 是名，$2 是姓
    template := "$2, $1"
    result := re.ExpandString(nil, template, src, match)
    
    fmt.Printf("Result: %s\n", result)
}

运行结果：

Result: Doe, John

Find

func (re *Regexp) Find(b []byte) []byte

说明：返回第一个匹配的 byte slice。

使用示例：

package main

import (
    "fmt"
    "regexp"
)

func main() {
    re := regexp.MustCompile(`\d+`)
    
    result := re.Find([]byte("abc123def456"))
    fmt.Printf("Found: %s\n", result)
}

运行结果：

Found: 123

FindAll

func (re *Regexp) FindAll(b []byte, n int) [][]byte

说明：返回最多 n 个匹配。n=-1 返回所有匹配。

使用示例：

package main

import (
    "fmt"
    "regexp"
)

func main() {
    re := regexp.MustCompile(`\d+`)
    
    all := re.FindAll([]byte("a1b2c3d4e5"), -1)
    fmt.Printf("All: %v\n", all)
    
    two := re.FindAll([]byte("a1b2c3d4e5"), 2)
    fmt.Printf("First 2: %v\n", two)
}

运行结果：

All: [49 50 51 52 53]
First 2: [49 50]

FindAllIndex

func (re *Regexp) FindAllIndex(b []byte, n int) [][]int

说明：返回匹配的索引位置。

使用示例：

package main

import (
    "fmt"
    "regexp"
)

func main() {
    re := regexp.MustCompile(`\d+`)
    text := []byte("abc123def456789")
    
    indices := re.FindAllIndex(text, -1)
    
    for i, idx := range indices {
        fmt.Printf("Match %d: [%d:%d] = %s\n", 
            i, idx[0], idx[1], text[idx[0]:idx[1]])
    }
}

运行结果：

Match 0: [3:6] = 123
Match 1: [9:12] = 456

FindAllString

func (re *Regexp) FindAllString(s string, n int) []string

说明：返回所有匹配的字符串。

使用示例：

package main

import (
    "fmt"
    "regexp"
)

func main() {
    re := regexp.MustCompile(`\d+`)
    
    all := re.FindAllString("a1b2c3d4e5", -1)
    fmt.Printf("All: %v\n", all)
}

运行结果：

All: [1 2 3 4 5]

FindAllStringIndex

func (re *Regexp) FindAllStringIndex(s string, n int) [][]int

说明：返回所有匹配的字符串索引。

使用示例：

package main

import (
    "fmt"
    "regexp"
)

func main() {
    re := regexp.MustCompile(`[aeiou]+`)
    text := "beautiful"
    
    indices := re.FindAllStringIndex(text, -1)
    
    for _, idx := range indices {
        fmt.Printf("[%d:%d] = %s\n", 
            idx[0], idx[1], text[idx[0]:idx[1]])
    }
}

运行结果：

[0:2] = bea
[5:6] = i
[7:8] = u

FindAllStringSubmatch

func (re *Regexp) FindAllStringSubmatch(s string, n int) [][]string

说明：返回所有匹配及其子匹配。

使用示例：

package main

import (
    "fmt"
    "regexp"
)

func main() {
    re := regexp.MustCompile(`(\w+)=(\d+)`)
    text := "a=1 b=2 c=3"
    
    matches := re.FindAllStringSubmatch(text, -1)
    
    for i, match := range matches {
        fmt.Printf("Match %d:\n", i)
        for j, sub := range match {
            fmt.Printf("  [%d] = %s\n", j, sub)
        }
    }
}

运行结果：

Match 0:
  [0] = a=1
  [1] = a
  [2] = 1
Match 1:
  [0] = b=2
  [1] = b
  [2] = 2
Match 2:
  [0] = c=3
  [1] = c
  [2] = 3

FindAllStringSubmatchIndex

func (re *Regexp) FindAllStringSubmatchIndex(s string, n int) [][]int

说明：返回所有匹配及其子匹配的索引。

使用示例：

package main

import (
    "fmt"
    "regexp"
)

func main() {
    re := regexp.MustCompile(`(\w+)=(\d+)`)
    text := "a=1 b=2"
    
    indices := re.FindAllStringSubmatchIndex(text, -1)
    
    for i, idx := range indices {
        fmt.Printf("Match %d: %v\n", i, idx)
    }
}

运行结果：

Match 0: [0 3 0 1 2 3]
Match 1: [4 7 4 5 6 7]

FindAllSubmatch

func (re *Regexp) FindAllSubmatch(b []byte, n int) [][][]byte

说明：返回所有匹配及其子匹配的 byte slice。

使用示例：

package main

import (
    "fmt"
    "regexp"
)

func main() {
    re := regexp.MustCompile(`(\w+)=(\d+)`)
    text := []byte("a=1 b=2")
    
    matches := re.FindAllSubmatch(text, -1)
    
    for i, match := range matches {
        fmt.Printf("Match %d:\n", i)
        for j, sub := range match {
            fmt.Printf("  [%d] = %s\n", j, sub)
        }
    }
}

运行结果：

Match 0:
  [0] = a=1
  [1] = a
  [2] = 1
Match 1:
  [0] = b=2
  [1] = b
  [2] = 2

FindAllSubmatchIndex

func (re *Regexp) FindAllSubmatchIndex(b []byte, n int) [][]int

说明：返回所有匹配及其子匹配的索引。

使用示例：

package main

import (
    "fmt"
    "regexp"
)

func main() {
    re := regexp.MustCompile(`(\w+)=(\d+)`)
    text := []byte("a=1 b=2")
    
    indices := re.FindAllSubmatchIndex(text, -1)
    
    for i, idx := range indices {
        fmt.Printf("Match %d: %v\n", i, idx)
    }
}

运行结果：

Match 0: [0 3 0 1 2 3]
Match 1: [4 7 4 5 6 7]

FindIndex

func (re *Regexp) FindIndex(b []byte) (loc []int)

说明：返回第一个匹配的索引位置。

使用示例：

package main

import (
    "fmt"
    "regexp"
)

func main() {
    re := regexp.MustCompile(`\d+`)
    text := []byte("abc123def")
    
    loc := re.FindIndex(text)
    fmt.Printf("Found at [%d:%d] = %s\n", 
        loc[0], loc[1], text[loc[0]:loc[1]])
}

运行结果：

Found at [3:6] = 123

FindReaderIndex

func (re *Regexp) FindReaderIndex(r io.RuneReader) (loc []int)

说明：返回 RuneReader 中第一个匹配的索引。

使用示例：

package main

import (
    "fmt"
    "io"
    "regexp"
    "strings"
)

func main() {
    re := regexp.MustCompile(`\d+`)
    reader := strings.NewReader("abc123def456")
    
    loc := re.FindReaderIndex(reader)
    fmt.Printf("Found at: %v\n", loc)
}

运行结果：

Found at: [3 6]

FindReaderSubmatchIndex

func (re *Regexp) FindReaderSubmatchIndex(r io.RuneReader) []int

说明：返回 RuneReader 中匹配及其子匹配的索引。

使用示例：

package main

import (
    "fmt"
    "io"
    "regexp"
    "strings"
)

func main() {
    re := regexp.MustCompile(`(\w+)=(\d+)`)
    reader := strings.NewReader("a=1 b=2")
    
    indices := re.FindReaderSubmatchIndex(reader)
    fmt.Printf("Indices: %v\n", indices)
}

运行结果：

Indices: [0 3 0 1 2 3]

FindString

func (re *Regexp) FindString(s string) string

说明：返回第一个匹配的字符串。

使用示例：

package main

import (
    "fmt"
    "regexp"
)

func main() {
    re := regexp.MustCompile(`\d+`)
    
    result := re.FindString("abc123def456")
    fmt.Printf("Found: %s\n", result)
}

运行结果：

Found: 123

FindStringIndex

func (re *Regexp) FindStringIndex(s string) (loc []int)

说明：返回第一个匹配的字符串索引。

使用示例：

package main

import (
    "fmt"
    "regexp"
)

func main() {
    re := regexp.MustCompile(`\d+`)
    text := "abc123def"
    
    loc := re.FindStringIndex(text)
    fmt.Printf("Found at [%d:%d] = %s\n", 
        loc[0], loc[1], text[loc[0]:loc[1]])
}

运行结果：

Found at [3:6] = 123

FindStringSubmatch

func (re *Regexp) FindStringSubmatch(s string) []string

说明：返回第一个匹配及其子匹配。

使用示例：

package main

import (
    "fmt"
    "regexp"
)

func main() {
    re := regexp.MustCompile(`(\w+)@(\w+\.\w+)`)
    email := "user@example.com"
    
    match := re.FindStringSubmatch(email)
    
    fmt.Printf("Full match: %s\n", match[0])
    fmt.Printf("Username: %s\n", match[1])
    fmt.Printf("Domain: %s\n", match[2])
}

运行结果：

Full match: user@example.com
Username: user
Domain: example.com

FindStringSubmatchIndex

func (re *Regexp) FindStringSubmatchIndex(s string) []int

说明：返回第一个匹配及其子匹配的索引。

使用示例：

package main

import (
    "fmt"
    "regexp"
)

func main() {
    re := regexp.MustCompile(`(\w+)@(\w+\.\w+)`)
    email := "user@example.com"
    
    indices := re.FindStringSubmatchIndex(email)
    
    for i := 0; i < len(indices); i += 2 {
        fmt.Printf("Group %d: [%d:%d]\n", 
            i/2, indices[i], indices[i+1])
    }
}

运行结果：

Group 0: [0:16]
Group 1: [0:4]
Group 2: [5:16]

FindSubmatch

func (re *Regexp) FindSubmatch(b []byte) [][]byte

说明：返回第一个匹配及其子匹配的 byte slice。

使用示例：

package main

import (
    "fmt"
    "regexp"
)

func main() {
    re := regexp.MustCompile(`(\w+)=(\d+)`)
    text := []byte("key=123")
    
    match := re.FindSubmatch(text)
    
    fmt.Printf("Full: %s\n", match[0])
    fmt.Printf("Key: %s\n", match[1])
    fmt.Printf("Value: %s\n", match[2])
}

运行结果：

Full: key=123
Key: key
Value: 123

FindSubmatchIndex

func (re *Regexp) FindSubmatchIndex(b []byte) []int

说明：返回第一个匹配及其子匹配的索引。

使用示例：

package main

import (
    "fmt"
    "regexp"
)

func main() {
    re := regexp.MustCompile(`(\w+)=(\d+)`)
    text := []byte("key=123")
    
    indices := re.FindSubmatchIndex(text)
    fmt.Printf("Indices: %v\n", indices)
}

运行结果：

Indices: [0 7 0 3 4 7]

LiteralPrefix

func (re *Regexp) LiteralPrefix() (prefix string, complete bool)

说明：返回正则表达式的字面前缀。

使用示例：

package main

import (
    "fmt"
    "regexp"
)

func main() {
    patterns := []string{
        `hello\d+`,     // 字面前缀 "hello"
        `world`,        // 完全匹配
        `\d+test`,      // 无字面前缀
    }
    
    for _, pattern := range patterns {
        re := regexp.MustCompile(pattern)
        prefix, complete := re.LiteralPrefix()
        fmt.Printf("%s -> prefix=%q, complete=%v\n", 
            pattern, prefix, complete)
    }
}

运行结果：

hello\d+ -> prefix="hello", complete=false
world -> prefix="world", complete=true
\d+test -> prefix="", complete=false

Match

func (re *Regexp) Match(b []byte) bool

说明：检查 byte slice 是否匹配。

使用示例：

package main

import (
    "fmt"
    "regexp"
)

func main() {
    re := regexp.MustCompile(`^\d+$`)
    
    fmt.Println(re.Match([]byte("123")))   // true
    fmt.Println(re.Match([]byte("abc")))   // false
}

运行结果：

true
false

MatchReader

func (re *Regexp) MatchReader(r io.RuneReader) bool

说明：检查 RuneReader 是否匹配。

使用示例：

package main

import (
    "fmt"
    "io"
    "regexp"
    "strings"
)

func main() {
    re := regexp.MustCompile(`\d+`)
    reader := strings.NewReader("abc123")
    
    fmt.Println(re.MatchReader(reader))
}

运行结果：

true

MatchString

func (re *Regexp) MatchString(s string) bool

说明：检查字符串是否匹配。

使用示例：

package main

import (
    "fmt"
    "regexp"
)

func main() {
    re := regexp.MustCompile(`^\d+$`)
    
    tests := []string{"123", "abc", "123abc"}
    
    for _, test := range tests {
        fmt.Printf("%q -> %v\n", test, re.MatchString(test))
    }
}

运行结果：

"123" -> true
"abc" -> false
"123abc" -> false

NumSubexp

func (re *Regexp) NumSubexp() int

说明：返回子表达式（捕获组）的数量。

使用示例：

package main

import (
    "fmt"
    "regexp"
)

func main() {
    patterns := []string{
        `(\w+)`,                    // 1 个捕获组
        `(\w+)@(\w+\.\w+)`,        // 2 个捕获组
        `(?:\w+)`,                 // 0 个捕获组（非捕获）
    }
    
    for _, pattern := range patterns {
        re := regexp.MustCompile(pattern)
        fmt.Printf("%s -> %d subexpressions\n", 
            pattern, re.NumSubexp())
    }
}

运行结果：

(\w+) -> 1 subexpressions
(\w+)@(\w+\.\w+) -> 2 subexpressions
(?:\w+) -> 0 subexpressions

ReplaceAll

func (re *Regexp) ReplaceAll(src, repl []byte) []byte

说明：替换所有匹配项。

使用示例：

package main

import (
    "fmt"
    "regexp"
)

func main() {
    re := regexp.MustCompile(`\d+`)
    
    result := re.ReplaceAll(
        []byte("a1b2c3"), 
        []byte("X"),
    )
    
    fmt.Printf("Result: %s\n", result)
}

运行结果：

Result: aXbXcX

ReplaceAllFunc

func (re *Regexp) ReplaceAllFunc(src []byte, repl func([]byte) []byte) []byte

说明：使用函数替换所有匹配项。

使用示例：

package main

import (
    "fmt"
    "regexp"
    "strconv"
)

func main() {
    re := regexp.MustCompile(`\d+`)
    
    // 将所有数字翻倍
    result := re.ReplaceAllFunc(
        []byte("1 2 3 4 5"),
        func(match []byte) []byte {
            num, _ := strconv.Atoi(string(match))
            return []byte(strconv.Itoa(num * 2))
        },
    )
    
    fmt.Printf("Result: %s\n", result)
}

运行结果：

Result: 2 4 6 8 10

ReplaceAllLiteral

func (re *Regexp) ReplaceAllLiteral(src, repl []byte) []byte

说明：替换所有匹配项，不解释模板字符。

使用示例：

package main

import (
    "fmt"
    "regexp"
)

func main() {
    re := regexp.MustCompile(`\w+`)
    
    // 字面替换，$1 不会被解释
    result := re.ReplaceAllLiteral(
        []byte("hello world"),
        []byte("$1"),
    )
    
    fmt.Printf("Result: %s\n", result)
}

运行结果：

Result: $1 $1

ReplaceAllLiteralString

func (re *Regexp) ReplaceAllLiteralString(src, repl string) string

说明：字符串版本的 ReplaceAllLiteral。

使用示例：

package main

import (
    "fmt"
    "regexp"
)

func main() {
    re := regexp.MustCompile(`\w+`)
    
    result := re.ReplaceAllLiteralString(
        "hello world",
        "[$1]",
    )
    
    fmt.Printf("Result: %s\n", result)
}

运行结果：

Result: [$1] [$1]

ReplaceAllString

func (re *Regexp) ReplaceAllString(src, repl string) string

说明：替换所有匹配的字符串。

使用示例：

package main

import (
    "fmt"
    "regexp"
)

func main() {
    re := regexp.MustCompile(`(\w+)\s+(\w+)`)
    
    // $1 是第一个捕获组，$2 是第二个
    result := re.ReplaceAllString(
        "Hello World",
        "$2 $1",
    )
    
    fmt.Printf("Result: %s\n", result)
}

运行结果：

Result: World Hello

ReplaceAllStringFunc

func (re *Regexp) ReplaceAllStringFunc(src string, repl func(string) string) string

说明：使用函数替换所有匹配的字符串。

使用示例：

package main

import (
    "fmt"
    "regexp"
    "strings"
)

func main() {
    re := regexp.MustCompile(`\w+`)
    
    // 将所有单词转为大写
    result := re.ReplaceAllStringFunc(
        "hello world",
        func(s string) string {
            return strings.ToUpper(s)
        },
    )
    
    fmt.Printf("Result: %s\n", result)
}

运行结果：

Result: HELLO WORLD

Split

func (re *Regexp) Split(s string, n int) []string

说明：用正则表达式分割字符串。

使用示例：

package main

import (
    "fmt"
    "regexp"
)

func main() {
    // 按非字母字符分割
    re := regexp.MustCompile(`[^a-zA-Z]+`)
    
    text := "Hello, World! 123 Test"
    parts := re.Split(text, -1)
    
    fmt.Printf("Parts: %v\n", parts)
}

运行结果：

Parts: [Hello World Test]

String

func (re *Regexp) String() string

说明：返回正则表达式的源字符串。

使用示例：

package main

import (
    "fmt"
    "regexp"
)

func main() {
    re := regexp.MustCompile(`\d+`)
    fmt.Printf("Pattern: %s\n", re.String())
}

运行结果：

Pattern: \d+

SubexpNames

func (re *Regexp) SubexpNames() []string

说明：返回命名捕获组的名称。

使用示例：

package main

import (
    "fmt"
    "regexp"
)

func main() {
    // 命名捕获组
    re := regexp.MustCompile(`(?P<user>\w+)@(?P<domain>\w+\.\w+)`)
    
    names := re.SubexpNames()
    fmt.Printf("Names: %v\n", names)
    
    match := re.FindStringSubmatch("user@example.com")
    
    for i, name := range names {
        if i != 0 && name != "" {
            fmt.Printf("%s: %s\n", name, match[i])
        }
    }
}

运行结果：

Names: [ user domain]
user: user
domain: example.com

典型示例

示例 1：邮箱验证

package main

import (
    "fmt"
    "regexp"
)

func isValidEmail(email string) bool {
    pattern := `^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$`
    re := regexp.MustCompile(pattern)
    return re.MatchString(email)
}

func main() {
    emails := []string{
        "test@example.com",
        "user.name@domain.co.uk",
        "invalid.email",
        "@missing.com",
        "missing@.com",
    }
    
    for _, email := range emails {
        valid := isValidEmail(email)
        fmt.Printf("%-30s -> %v\n", email, valid)
    }
}

运行结果：

test@example.com             -> true
user.name@domain.co.uk       -> true
invalid.email                -> false
@missing.com                 -> false
missing@.com                 -> false

示例 2：提取 URL

package main

import (
    "fmt"
    "regexp"
)

func extractURLs(text string) []string {
    pattern := `https?://[^\s]+`
    re := regexp.MustCompile(pattern)
    return re.FindAllString(text, -1)
}

func main() {
    text := `
        Visit https://www.google.com for search
        Check https://github.com/golang/go for Go source
        Or http://example.com for examples
    `
    
    urls := extractURLs(text)
    
    for _, url := range urls {
        fmt.Println(url)
    }
}

运行结果：

https://www.google.com
https://github.com/golang/go
http://example.com

示例 3：提取 HTML 标签内容

package main

import (
    "fmt"
    "regexp"
)

func extractTagContent(html, tagName string) []string {
    pattern := fmt.Sprintf(`<%s[^>]*>(.*?)</%s>`, tagName, tagName)
    re := regexp.MustCompile(pattern)
    
    matches := re.FindAllStringSubmatch(html, -1)
    
    results := make([]string, len(matches))
    for i, match := range matches {
        results[i] = match[1]
    }
    
    return results
}

func main() {
    html := `
        <div class="content">Hello</div>
        <div id="main">World</div>
        <p>Paragraph 1</p>
        <p>Paragraph 2</p>
    `
    
    divs := extractTagContent(html, "div")
    fmt.Println("Div contents:", divs)
    
    ps := extractTagContent(html, "p")
    fmt.Println("Paragraph contents:", ps)
}

运行结果：

Div contents: [Hello World]
Paragraph contents: [Paragraph 1 Paragraph 2]

示例 4：手机号格式化

package main

import (
    "fmt"
    "regexp"
)

func formatPhoneNumber(phone string) string {
    // 移除所有非数字字符
    re := regexp.MustCompile(`\D`)
    digits := re.ReplaceAllString(phone, "")
    
    // 检查长度
    if len(digits) != 11 {
        return ""
    }
    
    // 格式化：138-1234-5678
    formatRe := regexp.MustCompile(`(\d{3})(\d{4})(\d{4})`)
    return formatRe.ReplaceAllString(digits, "$1-$2-$3")
}

func main() {
    phones := []string{
        "13812345678",
        "138-1234-5678",
        "138 1234 5678",
        "138.1234.5678",
        "12345", // 无效
    }
    
    for _, phone := range phones {
        formatted := formatPhoneNumber(phone)
        if formatted == "" {
            fmt.Printf("%-20s -> 无效\n", phone)
        } else {
            fmt.Printf("%-20s -> %s\n", phone, formatted)
        }
    }
}

运行结果：

13812345678          -> 138-1234-5678
138-1234-5678        -> 138-1234-5678
138 1234 5678        -> 138-1234-5678
138.1234.5678        -> 138-1234-5678
12345                -> 无效

示例 5：日志解析

package main

import (
    "fmt"
    "regexp"
)

type LogEntry struct {
    Level     string
    Timestamp string
    Message   string
}

func parseLogLine(line string) *LogEntry {
    pattern := `^\[(\w+)\]\s+(\d{4}-\d{2}-\d{2}\s+\d{2}:\d{2}:\d{2})\s+(.*)$`
    re := regexp.MustCompile(pattern)
    
    match := re.FindStringSubmatch(line)
    if match == nil {
        return nil
    }
    
    return &LogEntry{
        Level:     match[1],
        Timestamp: match[2],
        Message:   match[3],
    }
}

func main() {
    logLines := []string{
        "[INFO] 2024-01-15 10:30:45 Server started",
        "[ERROR] 2024-01-15 10:31:00 Connection failed",
        "[WARN] 2024-01-15 10:31:05 High memory usage",
    }
    
    for _, line := range logLines {
        entry := parseLogLine(line)
        if entry != nil {
            fmt.Printf("Level: %s, Time: %s, Message: %s\n",
                entry.Level, entry.Timestamp, entry.Message)
        }
    }
}

运行结果：

Level: INFO, Time: 2024-01-15 10:30:45, Message: Server started
Level: ERROR, Time: 2024-01-15 10:31:00, Message: Connection failed
Level: WARN, Time: 2024-01-15 10:31:05, Message: High memory usage

示例 6：敏感信息脱敏

package main

import (
    "fmt"
    "regexp"
)

func maskSensitiveInfo(text string) string {
    // 脱敏手机号：138****5678
    phoneRe := regexp.MustCompile(`(\d{3})\d{4}(\d{4})`)
    text = phoneRe.ReplaceAllString(text, "$1****$2")
    
    // 脱敏邮箱：u***@example.com
    emailRe := regexp.MustCompile(`(\w)[\w.]*(@[\w.]+)`)
    text = emailRe.ReplaceAllString(text, "$1***$2")
    
    // 脱敏身份证号：110101********1234
    idRe := regexp.MustCompile(`(\d{6})\d{8}(\d{4})`)
    text = idRe.ReplaceAllString(text, "$1********$2")
    
    return text
}

func main() {
    text := `
        用户手机号：13812345678
        用户邮箱：zhangsan@example.com
        身份证号：110101199001011234
    `
    
    masked := maskSensitiveInfo(text)
    fmt.Println(masked)
}

运行结果：

用户手机号：138****5678
用户邮箱：z***@example.com
身份证号：110101********1234

示例 7：提取文件名和扩展名

package main

import (
    "fmt"
    "regexp"
)

func parseFilename(filename string) (name, ext string) {
    pattern := `^([^.]+)(?:\.([^.]+))?$`
    re := regexp.MustCompile(pattern)
    
    match := re.FindStringSubmatch(filename)
    if match == nil {
        return "", ""
    }
    
    name = match[1]
    if len(match) > 2 {
        ext = match[2]
    }
    
    return name, ext
}

func main() {
    files := []string{
        "document.pdf",
        "image.png",
        "archive.tar.gz",
        "noextension",
        ".hidden",
    }
    
    for _, file := range files {
        name, ext := parseFilename(file)
        fmt.Printf("%-20s -> name: %q, ext: %q\n", file, name, ext)
    }
}

运行结果：

document.pdf         -> name: "document", ext: "pdf"
image.png            -> name: "image", ext: "png"
archive.tar.gz       -> name: "archive", ext: "tar"
noextension          -> name: "noextension", ext: ""
.hidden              -> name: "", ext: "hidden"

示例 8：驼峰命名转换

package main

import (
    "fmt"
    "regexp"
    "strings"
)

// 驼峰转下划线
func camelToSnake(s string) string {
    // 在大写字母前插入下划线
    re := regexp.MustCompile(`([a-z0-9])([A-Z])`)
    result := re.ReplaceAllString(s, "${1}_${2}")
    return strings.ToLower(result)
}

// 下划线转驼峰
func snakeToCamel(s string) string {
    parts := strings.Split(s, "_")
    for i := 1; i < len(parts); i++ {
        if len(parts[i]) > 0 {
            parts[i] = strings.ToUpper(string(parts[i][0])) + parts[i][1:]
        }
    }
    return strings.Join(parts, "")
}

func main() {
    tests := []string{
        "camelCase",
        "PascalCase",
        "someHTTPClient",
        "userID",
    }
    
    fmt.Println("CamelCase to snake_case:")
    for _, test := range tests {
        fmt.Printf("  %s -> %s\n", test, camelToSnake(test))
    }
    
    fmt.Println("\nSnake_case to CamelCase:")
    snakeTests := []string{
        "camel_case",
        "pascal_case",
        "some_http_client",
        "user_id",
    }
    for _, test := range snakeTests {
        fmt.Printf("  %s -> %s\n", test, snakeToCamel(test))
    }
}

运行结果：

CamelCase to snake_case:
  camelCase -> camel_case
  PascalCase -> pascal_case
  someHTTPClient -> some_h_t_t_p_client
  userID -> user_i_d

Snake_case to CamelCase:
  camel_case -> camelCase
  pascal_case -> pascalCase
  some_http_client -> someHttpClient
  user_id -> userId

最佳实践

1. 预编译正则表达式

// ❌ 不推荐：每次调用都编译
func isValidEmail(email string) bool {
    re, _ := regexp.Compile(`^[a-z]+$`)
    return re.MatchString(email)
}

// ✅ 推荐：预编译
var emailRegex = regexp.MustCompile(`^[a-z]+$`)

func isValidEmail(email string) bool {
    return emailRegex.MatchString(email)
}

2. 使用命名捕获组

// ❌ 不推荐：使用数字索引
re := regexp.MustCompile(`(\w+)@(\w+\.\w+)`)
match := re.FindStringSubmatch("user@example.com")
username := match[1]  // 不直观
domain := match[2]

// ✅ 推荐：使用命名捕获组
re := regexp.MustCompile(`(?P<user>\w+)@(?P<domain>\w+\.\w+)`)
match := re.FindStringSubmatch("user@example.com")
names := re.SubexpNames()

for i, name := range names {
    if name == "user" {
        username := match[i]
    }
    if name == "domain" {
        domain := match[i]
    }
}

3. 使用 ReplaceAllStringFunc 进行复杂替换

// ❌ 复杂替换难以处理
re := regexp.MustCompile(`\d+`)
result := re.ReplaceAllString("a1b2c3", "X")

// ✅ 使用函数进行复杂替换
re := regexp.MustCompile(`\d+`)
result := re.ReplaceAllStringFunc("a1b2c3", func(s string) string {
    // 将数字转换为对应的字母
    return fmt.Sprintf("[%s]", s)
})

4. 检查错误

// ❌ 忽略错误
re := regexp.MustCompile(`[invalid`)  // 可能 panic

// ✅ 处理错误
re, err := regexp.Compile(`[invalid`)
if err != nil {
    log.Printf("Invalid regex: %v", err)
    return
}

5. 使用 QuoteMeta 匹配字面文本

// ❌ 直接匹配包含特殊字符的文本
pattern := `$100.00`  // $和.是元字符
re := regexp.MustCompile(pattern)

// ✅ 使用 QuoteMeta 转义
literal := `$100.00`
pattern := regexp.QuoteMeta(literal)
re := regexp.MustCompile(pattern)

与其他包配合

strings 包

package main

import (
    "fmt"
    "regexp"
    "strings"
)

func main() {
    re := regexp.MustCompile(`\d+`)
    text := "abc123def456"
    
    // 查找并处理
    matches := re.FindAllString(text, -1)
    upper := strings.ToUpper(strings.Join(matches, ","))
    
    fmt.Println("Matches:", upper)
    
    // 分割
    parts := re.Split(text, -1)
    fmt.Println("Parts:", parts)
}

bufio 包

package main

import (
    "bufio"
    "fmt"
    "os"
    "regexp"
)

func main() {
    re := regexp.MustCompile(`^\d+`)
    
    scanner := bufio.NewScanner(os.Stdin)
    for scanner.Scan() {
        line := scanner.Text()
        if re.MatchString(line) {
            fmt.Println("Matched:", line)
        }
    }
}

encoding/json 包

package main

import (
    "encoding/json"
    "fmt"
    "regexp"
)

type Validator struct {
    Email    string `json:"email" validate:"email"`
    Username string `json:"username" validate:"username"`
}

var validators = map[string]*regexp.Regexp{
    "email":    regexp.MustCompile(`^[a-z0-9._%+-]+@[a-z0-9.-]+\.[a-z]{2,}$`),
    "username": regexp.MustCompile(`^[a-z][a-z0-9_]{2,19}$`),
}

func validate(v Validator) error {
    for field, re := range validators {
        // 简化的验证逻辑
        _ = field
        _ = re
    }
    return nil
}

快速参考

函数

函数	参数	返回值	说明
Match	pattern string, b []byte	bool, error	检查 byte slice 是否匹配
MatchReader	pattern string, r io.RuneReader	bool, error	检查 RuneReader 是否匹配
MatchString	pattern string, s string	bool, error	检查字符串是否匹配
QuoteMeta	s string	string	转义元字符

Regexp 构造函数

函数	返回值	说明
Compile	*Regexp, error	编译正则表达式
CompilePOSIX	*Regexp, error	POSIX 语法编译
MustCompile	*Regexp	编译，失败则 panic
MustCompilePOSIX	*Regexp	POSIX 编译，失败则 panic

Regexp 方法

方法	返回值	说明
AppendText	[]byte, error	追加文本表示
Copy	*Regexp	深拷贝
Expand	[]byte	使用模板扩展
ExpandString	[]byte	字符串模板扩展
Find	[]byte	查找第一个匹配
FindAll	[][]byte	查找所有匹配
FindAllIndex	[][]int	查找所有索引
FindAllString	[]string	查找所有字符串
FindAllStringIndex	[][]int	查找所有字符串索引
FindAllStringSubmatch	[][]string	查找所有及子匹配
FindAllStringSubmatchIndex	[][]int	所有及子匹配索引
FindAllSubmatch	[][][]byte	所有及子匹配（byte）
FindAllSubmatchIndex	[][]int	所有及子匹配索引
FindIndex	[]int	第一个匹配索引
FindReaderIndex	[]int	Reader 匹配索引
FindReaderSubmatchIndex	[]int	Reader 及子匹配索引
FindString	string	第一个匹配字符串
FindStringIndex	[]int	第一个字符串索引
FindStringSubmatch	[]string	第一个及子匹配
FindStringSubmatchIndex	[]int	第一个及子匹配索引
FindSubmatch	[][]byte	第一个及子匹配（byte）
FindSubmatchIndex	[]int	第一个及子匹配索引
LiteralPrefix	string, bool	字面前缀
Match	bool	检查 byte 匹配
MatchReader	bool	检查 Reader 匹配
MatchString	bool	检查字符串匹配
NumSubexp	int	子表达式数量
ReplaceAll	[]byte	替换所有
ReplaceAllFunc	[]byte	函数替换
ReplaceAllLiteral	[]byte	字面替换
ReplaceAllLiteralString	string	字符串字面替换
ReplaceAllString	string	字符串替换
ReplaceAllStringFunc	string	字符串函数替换
Split	[]string	分割字符串
String	string	源字符串
SubexpNames	[]string	命名捕获组名称

注意事项

1. 性能考虑

// ❌ 不推荐：在循环中编译
for _, email := range emails {
    re, _ := regexp.Compile(`^[a-z]+$`)
    re.MatchString(email)
}

// ✅ 推荐：预编译
var re = regexp.MustCompile(`^[a-z]+$`)
for _, email := range emails {
    re.MatchString(email)
}

2. 贪婪与非贪婪

// 贪婪匹配（默认）
re := regexp.MustCompile(`".*"`)
text := `"hello" "world"`
fmt.Println(re.FindString(text))  // "hello" "world"

// 非贪婪匹配
re := regexp.MustCompile(`".*?"`)
fmt.Println(re.FindString(text))  // "hello"

3. 不支持的特性

Go 的 regexp 基于 RE2，不支持：

回溯引用（\1, \2 等）
前向/后向断言（lookahead/lookbehind）
条件表达式
递归模式

4. 并发安全

Regexp 是并发安全的，可以被多个 goroutine 同时使用：

var re = regexp.MustCompile(`\d+`)

// 可以安全地在多个 goroutine 中使用
go func() { re.MatchString("123") }()
go func() { re.MatchString("456") }()

5. UTF-8 支持

所有字符都是 UTF-8 编码的：

re := regexp.MustCompile(`[\p{Han}]+`)  // 匹配中文字符
fmt.Println(re.MatchString("你好"))  // true

总结

regexp 包提供了强大的正则表达式功能，基于 RE2 引擎，保证线性时间复杂度。

核心要点：

预编译正则表达式以提高性能
使用命名捕获组提高代码可读性
使用 ReplaceAllStringFunc 进行复杂替换
始终检查编译错误
使用 QuoteMeta 匹配包含特殊字符的文本

常见用途：

数据验证（邮箱、手机号等）
文本提取
字符串替换
日志解析
路由匹配

Keyboard shortcuts

Go 标准包使用指南