unicode 包详解

概述

unicode 包提供了测试 Unicode 码点某些属性的数据和函数。

主要用途：

测试字符的 Unicode 属性
字符大小写转换
字符分类和验证
国际化文本处理

核心功能：

以 “Is” 开头的函数用于检查 rune 属于哪个范围表
注意：rune 可能属于多个范围
提供 Unicode 类别、脚本和属性表

Go 版本要求：所有 Go 版本

包导入

import "unicode"

常量详解

基本常量

const (
    MaxRune         = '\U0010FFFF'  // 最大 Unicode 码点
    ReplacementChar = '\uFFFD'      // 替换字符（用于无效 UTF-8）
    MaxASCII        = '\u007F'      // 最大 ASCII 字符
    MaxLatin1       = '\u00FF'      // 最大 Latin-1 字符
)

说明：

MaxRune：Unicode 标准定义的最大码点值
ReplacementChar：用于替换无效 UTF-8 序列的字符（）
MaxASCII：ASCII 字符集的最大值
MaxLatin1：Latin-1（ISO-8859-1）字符集的最大值

示例：

fmt.Printf("MaxRune: %U\n", unicode.MaxRune)
fmt.Printf("ReplacementChar: %c\n", unicode.ReplacementChar)
fmt.Printf("MaxASCII: %d\n", unicode.MaxASCII)

大小写映射索引

const (
    UpperCase = iota
    LowerCase
    TitleCase
    MaxCase
)

说明：

用于 CaseRanges 内部 Delta 数组的索引
在 To 函数中使用

示例：

r := 'A'
lower := unicode.To(unicode.LowerCase, r)
fmt.Printf("%c -> %c\n", r, lower)  // A -> a

特殊 Delta 值

const UpperLower = -1

说明：

如果 CaseRange 的 Delta 字段是 UpperLower，表示该 CaseRange 表示交替的大写和小写序列
例如：Upper Lower Upper Lower

变量详解

Unicode 类别表

var (
    // C 类别：其他控制字符
    Cc     = _Cc  // Control
    Cf     = _Cf  // Format
    Cn     = _Cn  // Unassigned
    Co     = _Co  // Private use
    Cs     = _Cs  // Surrogate
    
    // L 类别：字母
    LC     = _LC  // Cased letter
    L      = _L   // Letter
    Lm     = _Lm  // Modifier letter
    Lo     = _Lo  // Other letter
    Lower  = _Ll  // Lowercase letter
    Ll     = _Ll  // Lowercase letter
    Lt     = _Lt  // Titlecase letter
    Title  = _Lt  // Titlecase letter
    Upper  = _Lu  // Uppercase letter
    Lu     = _Lu  // Uppercase letter
    
    // M 类别：标记
    Mark   = _M   // Mark
    M      = _M   // Mark
    Mc     = _Mc  // Spacing mark
    Me     = _Me  // Enclosing mark
    Mn     = _Mn  // Nonspacing mark
    
    // N 类别：数字
    Digit  = _Nd  // Decimal number
    Nd     = _Nd  // Decimal number
    Nl     = _Nl  // Letter number
    No     = _No  // Other number
    Number = _N   // Number
    N      = _N   // Number
    
    // P 类别：标点符号
    Pc     = _Pc  // Connector punctuation
    Pd     = _Pd  // Dash punctuation
    Pe     = _Pe  // Close punctuation
    Pf     = _Pf  // Final punctuation
    Pi     = _Pi  // Initial punctuation
    Po     = _Po  // Other punctuation
    Punct  = _P   // Punctuation
    P      = _P   // Punctuation
    Ps     = _Ps  // Open punctuation
    
    // S 类别：符号
    Sc     = _Sc  // Currency symbol
    Sk     = _Sk  // Modifier symbol
    Sm     = _Sm  // Math symbol
    So     = _So  // Other symbol
    Symbol = _S   // Symbol
    S      = _S   // Symbol
    
    // Z 类别：分隔符
    Space  = _Z   // Separator
    Z      = _Z   // Separator
    Zl     = _Zl  // Line separator
    Zp     = _Zp  // Paragraph separator
    Zs     = _Zs  // Space separator
    
    // 其他
    Other  = _C   // Other
    C      = _C   // Other
)

说明：

这些变量的类型都是 *RangeTable
用于 Unicode 字符分类

示例：

// 测试字符是否是大写字母
if unicode.Is(unicode.Upper, 'A') {
    fmt.Println("A is uppercase")
}

// 测试字符是否是数字
if unicode.Is(unicode.Number, '5') {
    fmt.Println("5 is a number")
}

Unicode 脚本表

var (
    Arabic              = _Arabic
    Armenian            = _Armenian
    Bengali             = _Bengali
    Bopomofo            = _Bopomofo
    Braille             = _Braille
    Canadian_Aboriginal = _Canadian_Aboriginal
    Cherokee            = _Cherokee
    Cyrillic            = _Cyrillic
    Devanagari          = _Devanagari
    Georgian            = _Georgian
    Greek               = _Greek
    Gujarati            = _Gujarati
    Gurmukhi            = _Gurmukhi
    Han                 = _Han           // 汉字
    Hangul              = _Hangul        // 韩文
    Hebrew              = _Hebrew
    Hiragana            = _Hiragana      // 平假名
    Katakana            = _Katakana      // 片假名
    Kannada             = _Kannada
    Lao                 = _Lao
    Latin               = _Latin         // 拉丁字母
    Malayalam           = _Malayalam
    Oriya               = _Oriya
    Tamil               = _Tamil
    Telugu              = _Telugu
    Thai                = _Thai
    // ... 更多脚本
)

说明：

这些变量的类型都是 *RangeTable
用于识别字符所属的书写系统

示例：

// 测试字符是否是汉字
if unicode.Is(unicode.Han, '中') {
    fmt.Println("中 is a Chinese character")
}

// 测试字符是否是拉丁字母
if unicode.Is(unicode.Latin, 'A') {
    fmt.Println("A is a Latin character")
}

Unicode 属性表

var (
    ASCII_Hex_Digit                    = _ASCII_Hex_Digit
    Bidi_Control                       = _Bidi_Control
    Dash                               = _Dash
    Deprecated                         = _Deprecated
    Diacritic                          = _Diacritic
    Extender                           = _Extender
    Hex_Digit                          = _Hex_Digit
    Hyphen                             = _Hyphen
    IDS_Binary_Operator                = _IDS_Binary_Operator
    IDS_Trinary_Operator               = _IDS_Trinary_Operator
    Ideographic                        = _Ideographic
    Join_Control                       = _Join_Control
    Logical_Order_Exception            = _Logical_Order_Exception
    Noncharacter_Code_Point            = _Noncharacter_Code_Point
    Other_Alphabetic                   = _Other_Alphabetic
    Other_Default_Ignorable_Code_Point = _Other_Default_Ignorable_Code_Point
    Other_Grapheme_Extend              = _Other_Grapheme_Extend
    Other_ID_Continue                  = _Other_ID_Continue
    Other_ID_Start                     = _Other_ID_Start
    Other_Lowercase                    = _Other_Lowercase
    Other_Math                         = _Other_Math
    Other_Uppercase                    = _Other_Uppercase
    Pattern_Syntax                     = _Pattern_Syntax
    Pattern_White_Space                = _Pattern_White_Space
    Prepended_Concatenation_Mark       = _Prepended_Concatenation_Mark
    Quotation_Mark                     = _Quotation_Mark
    Radical                            = _Radical
    Regional_Indicator                 = _Regional_Indicator
    Sentence_Terminal                  = _Sentence_Terminal
    STerm                              = _Sentence_Terminal
    Soft_Dotted                        = _Soft_Dotted
    Terminal_Punctuation               = _Terminal_Punctuation
    Unified_Ideograph                  = _Unified_Ideograph
    Variation_Selector                 = _Variation_Selector
    White_Space                        = _White_Space
)

说明：

这些变量的类型都是 *RangeTable
用于测试 Unicode 的各种属性

示例：

// 测试字符是否是空白字符
if unicode.Is(unicode.White_Space, ' ') {
    fmt.Println("Space is whitespace")
}

// 测试字符是否是十六进制数字
if unicode.Is(unicode.Hex_Digit, 'A') {
    fmt.Println("A is a hex digit")
}

映射表

// Categories：Unicode 类别表映射
var Categories = map[string]*RangeTable{
    "C":  C, "Cc": Cc, "Cf": Cf, "Cn": Cn, "Co": Co, "Cs": Cs,
    "L":  L, "LC": LC, "Ll": Ll, "Lm": Lm, "Lo": Lo, "Lt": Lt, "Lu": Lu,
    "M":  M, "Mc": Mc, "Me": Me, "Mn": Mn,
    "N":  N, "Nd": Nd, "Nl": Nl, "No": No,
    "P":  P, "Pc": Pc, "Pd": Pd, "Pe": Pe, "Pf": Pf, "Pi": Pi, "Po": Po, "Ps": Ps,
    "S":  S, "Sc": Sc, "Sk": Sk, "Sm": Sm, "So": So,
    "Z":  Z, "Zl": Zl, "Zp": Zp, "Zs": Zs,
}

// CategoryAliases：类别别名映射
var CategoryAliases = map[string]string{
    "Cased_Letter":     "LC",
    "Letter":           "L",
    "Lowercase_Letter": "Ll",
    "Uppercase_Letter": "Lu",
    // ... 更多别名
}

// Properties：Unicode 属性表映射
var Properties = map[string]*RangeTable{
    "ASCII_Hex_Digit": ASCII_Hex_Digit,
    "White_Space":     White_Space,
    // ... 更多属性
}

// Scripts：Unicode 脚本文字表
var Scripts = map[string]*RangeTable{
    "Latin":  Latin,
    "Greek":  Greek,
    "Cyrillic": Cyrillic,
    "Han":    Han,
    // ... 更多脚本
}

// CaseRanges：大小写映射表
var CaseRanges = []CaseRange{
    // ... 所有字母的大小写映射
}

函数详解（按 A-Z 分层归类）

I

In

func In(r rune, ranges ...*RangeTable) bool

作用：报告 rune 是否是其中一个范围的成员

参数说明：

r：要测试的 rune
ranges：范围表切片

返回值：

如果 rune 在任何范围中返回 true，否则返回 false

示例：

// 测试是否是字母或数字
if unicode.In('A', unicode.Letter, unicode.Number) {
    fmt.Println("A is a letter or number")
}

// 测试是否是标点符号
if unicode.In(',', unicode.Punct, unicode.Symbol) {
    fmt.Println(", is punctuation or symbol")
}

Is

func Is(rangeTab *RangeTable, r rune) bool

作用：报告 rune 是否在指定的范围表中

参数说明：

rangeTab：范围表
r：要测试的 rune

返回值：

如果 rune 在范围表中返回 true，否则返回 false

示例：

// 测试是否是大写字母
if unicode.Is(unicode.Upper, 'A') {
    fmt.Println("A is uppercase")
}

// 测试是否是汉字
if unicode.Is(unicode.Han, '中') {
    fmt.Println("中 is a Han character")
}

IsControl

func IsControl(r rune) bool

作用：报告 rune 是否是控制字符

参数说明：

r：要测试的 rune

返回值：

如果是控制字符返回 true，否则返回 false

说明：

C（其他）Unicode 类别包括更多码点（如代理对）
使用 Is(C, r) 测试它们

示例：

fmt.Println(unicode.IsControl('\n'))   // true
fmt.Println(unicode.IsControl('\t'))   // true
fmt.Println(unicode.IsControl('A'))    // false
fmt.Println(unicode.IsControl('\u0000')) // true

IsDigit

func IsDigit(r rune) bool

作用：报告 rune 是否是十进制数字

参数说明：

r：要测试的 rune

返回值：

如果是十进制数字返回 true，否则返回 false

示例：

fmt.Println(unicode.IsDigit('5'))    // true
fmt.Println(unicode.IsDigit('A'))    // false
fmt.Println(unicode.IsDigit('٠'))    // true (阿拉伯 - 印度数字)
fmt.Println(unicode.IsDigit('①'))    // false (带圈数字不是十进制数字)

IsGraphic

func IsGraphic(r rune) bool

作用：报告 rune 是否被 Unicode 定义为图形字符

参数说明：

r：要测试的 rune

返回值：

如果是图形字符返回 true，否则返回 false

说明：

图形字符包括：字母、标记、数字、标点、符号和空格
来自类别 L、M、N、P、S、Zs

示例：

fmt.Println(unicode.IsGraphic('A'))     // true
fmt.Println(unicode.IsGraphic(' '))     // true
fmt.Println(unicode.IsGraphic('\n'))    // false
fmt.Println(unicode.IsGraphic('中'))     // true

IsLetter

func IsLetter(r rune) bool

作用：报告 rune 是否是字母（类别 L）

参数说明：

r：要测试的 rune

返回值：

如果是字母返回 true，否则返回 false

示例：

fmt.Println(unicode.IsLetter('A'))     // true
fmt.Println(unicode.IsLetter('中'))     // true
fmt.Println(unicode.IsLetter('5'))     // false
fmt.Println(unicode.IsLetter('@'))     // false

IsLower

func IsLower(r rune) bool

作用：报告 rune 是否是小写字母

参数说明：

r：要测试的 rune

返回值：

如果是小写字母返回 true，否则返回 false

示例：

fmt.Println(unicode.IsLower('a'))     // true
fmt.Println(unicode.IsLower('A'))     // false
fmt.Println(unicode.IsLower('中'))     // false (汉字没有大小写)

IsMark

func IsMark(r rune) bool

作用：报告 rune 是否是标记字符（类别 M）

参数说明：

r：要测试的 rune

返回值：

如果是标记字符返回 true，否则返回 false

示例：

// 组合音符
fmt.Println(unicode.IsMark('\u0300'))  // true (重音符)
fmt.Println(unicode.IsMark('A'))       // false

IsNumber

func IsNumber(r rune) bool

作用：报告 rune 是否是数字（类别 N）

参数说明：

r：要测试的 rune

返回值：

如果是数字返回 true，否则返回 false

示例：

fmt.Println(unicode.IsNumber('5'))      // true
fmt.Println(unicode.IsNumber('Ⅳ'))      // true (罗马数字)
fmt.Println(unicode.IsNumber('A'))      // false

IsOneOf

func IsOneOf(ranges []*RangeTable, r rune) bool

作用：报告 rune 是否是其中一个范围的成员

参数说明：

ranges：范围表切片
r：要测试的 rune

返回值：

如果 rune 在任何范围中返回 true，否则返回 false

说明：

函数 “In” 提供更好的签名，应优先使用

示例：

ranges := []*unicode.RangeTable{unicode.Letter, unicode.Number}
fmt.Println(unicode.IsOneOf(ranges, 'A'))  // true
fmt.Println(unicode.IsOneOf(ranges, '5'))  // true
fmt.Println(unicode.IsOneOf(ranges, '@'))  // false

IsPrint

func IsPrint(r rune) bool

作用：报告 rune 是否被 Go 定义为可打印字符

参数说明：

r：要测试的 rune

返回值：

如果是可打印字符返回 true，否则返回 false

说明：

可打印字符包括：字母、标记、数字、标点、符号和 ASCII 空格字符
来自类别 L、M、N、P、S 和 ASCII 空格字符
与 IsGraphic 的区别在于只有 ASCII 空格被认为是可打印的

示例：

fmt.Println(unicode.IsPrint('A'))      // true
fmt.Println(unicode.IsPrint(' '))      // true
fmt.Println(unicode.IsPrint('\n'))     // false
fmt.Println(unicode.IsPrint('中'))      // true
fmt.Println(unicode.IsPrint('\u00A0')) // false (NBSP 不是 ASCII 空格)

IsPunct

func IsPunct(r rune) bool

作用：报告 rune 是否是 Unicode 标点符号字符（类别 P）

参数说明：

r：要测试的 rune

返回值：

如果是标点符号返回 true，否则返回 false

示例：

fmt.Println(unicode.IsPunct('.'))      // true
fmt.Println(unicode.IsPunct(','))      // true
fmt.Println(unicode.IsPunct('!'))      // true
fmt.Println(unicode.IsPunct('A'))      // false

IsSpace

func IsSpace(r rune) bool

作用：报告 rune 是否是 Unicode 的 White Space 属性定义的空格字符

参数说明：

r：要测试的 rune

返回值：

如果是空格字符返回 true，否则返回 false

说明：

在 Latin-1 空间中包括：‘\t’, ‘\n’, ‘\v’, ‘\f’, ‘\r’, ’ ’, U+0085 (NEL), U+00A0 (NBSP)
其他空格字符定义由类别 Z 和属性 Pattern_White_Space 设置

示例：

fmt.Println(unicode.IsSpace(' '))      // true
fmt.Println(unicode.IsSpace('\t'))     // true
fmt.Println(unicode.IsSpace('\n'))     // true
fmt.Println(unicode.IsSpace('A'))      // false
fmt.Println(unicode.IsSpace('\u00A0')) // true (NBSP)

IsSymbol

func IsSymbol(r rune) bool

作用：报告 rune 是否是符号字符

参数说明：

r：要测试的 rune

返回值：

如果是符号字符返回 true，否则返回 false

示例：

fmt.Println(unicode.IsSymbol('$'))     // true
fmt.Println(unicode.IsSymbol('€'))     // true
fmt.Println(unicode.IsSymbol('+'))     // true
fmt.Println(unicode.IsSymbol('A'))     // false

IsTitle

func IsTitle(r rune) bool

作用：报告 rune 是否是标题大小写字母

参数说明：

r：要测试的 rune

返回值：

如果是标题大小写字母返回 true，否则返回 false

示例：

fmt.Println(unicode.IsTitle('ǅ'))      // true (DŽ 的标题形式)
fmt.Println(unicode.IsTitle('A'))      // false (这是大写，不是标题)
fmt.Println(unicode.IsTitle('a'))      // false

IsUpper

func IsUpper(r rune) bool

作用：报告 rune 是否是大写字母

参数说明：

r：要测试的 rune

返回值：

如果是大写字母返回 true，否则返回 false

示例：

fmt.Println(unicode.IsUpper('A'))      // true
fmt.Println(unicode.IsUpper('a'))      // false
fmt.Println(unicode.IsUpper('中'))      // false

S

SimpleFold

func SimpleFold(r rune) rune

作用：迭代 Unicode 定义的简单大小写折叠等价的码点

参数说明：

r：起始 rune

返回值：

如果存在，返回大于 r 的最小等价 rune
否则返回大于等于 0 的最小 rune
如果 r 不是有效的 Unicode 码点，返回 r

示例：

fmt.Printf("SimpleFold('A') = %c\n", unicode.SimpleFold('A'))  // a
fmt.Printf("SimpleFold('a') = %c\n", unicode.SimpleFold('a'))  // A
fmt.Printf("SimpleFold('K') = %c\n", unicode.SimpleFold('K'))  // k
fmt.Printf("SimpleFold('k') = %c\n", unicode.SimpleFold('k'))  // K (开尔文符号)
fmt.Printf("SimpleFold('1') = %c\n", unicode.SimpleFold('1'))  // 1

T

To

func To(_case int, r rune) rune

作用：将 rune 映射到指定的大小写

参数说明：

_case：大小写类型（UpperCase、LowerCase 或 TitleCase）
r：要转换的 rune

返回值：

转换后的 rune

示例：

fmt.Printf("To(UpperCase, 'a') = %c\n", unicode.To(unicode.UpperCase, 'a'))  // A
fmt.Printf("To(LowerCase, 'A') = %c\n", unicode.To(unicode.LowerCase, 'A'))  // a
fmt.Printf("To(TitleCase, 'a') = %c\n", unicode.To(unicode.TitleCase, 'a'))  // A

ToLower

func ToLower(r rune) rune

作用：将 rune 映射到小写

参数说明：

r：要转换的 rune

返回值：

小写形式的 rune

示例：

fmt.Printf("ToLower('A') = %c\n", unicode.ToLower('A'))  // a
fmt.Printf("ToLower('中') = %c\n", unicode.ToLower('中'))  // 中 (不变)

ToTitle

func ToTitle(r rune) rune

作用：将 rune 映射到标题大小写

参数说明：

r：要转换的 rune

返回值：

标题大小写形式的 rune

示例：

fmt.Printf("ToTitle('a') = %c\n", unicode.ToTitle('a'))  // A
fmt.Printf("ToTitle('中') = %c\n", unicode.ToTitle('中'))  // 中 (不变)

ToUpper

func ToUpper(r rune) rune

作用：将 rune 映射到大写

参数说明：

r：要转换的 rune

返回值：

大写形式的 rune

示例：

fmt.Printf("ToUpper('a') = %c\n", unicode.ToUpper('a'))  // A
fmt.Printf("ToUpper('中') = %c\n", unicode.ToUpper('中'))  // 中 (不变)

类型详解（按 A-Z 分层归类）

C

CaseRange

type CaseRange struct {
    Lo    rune      // 范围起始
    Hi    rune      // 范围结束
    Delta [3]rune   // 大小写映射的增量
}

作用：表示简单大小写转换的 Unicode 码点范围

说明：

范围从 Lo 到 Hi（包含），步长为 1
Delta 是要添加到码点以达到不同大小写的数字
可能是负数
如果为零，表示字符在对应的大小写中
有一个特殊情况表示交替的大写和小写对序列

R

Range16

type Range16 struct {
    Lo     uint16  // 范围起始
    Hi     uint16  // 范围结束
    Stride uint16  // 步长
}

作用：表示 16 位 Unicode 码点的范围

说明：

范围从 Lo 到 Hi（包含），具有指定的步长

Range32

type Range32 struct {
    Lo     uint32  // 范围起始
    Hi     uint32  // 范围结束
    Stride uint32  // 步长
}

作用：表示 Unicode 码点的范围，用于一个或多个值不适合 16 位的情况

说明：

范围从 Lo 到 Hi（包含），具有指定的步长
Lo 和 Hi 必须始终 >= 1<<16

RangeTable

type RangeTable struct {
    R16         []Range16  // 16 位范围
    R32         []Range32  // 32 位范围
    LatinOffset int        // Latin-1 偏移量
}

作用：通过列出集合中的码点范围来定义一组 Unicode 码点

说明：

范围列在两个切片中以节省空间：16 位范围切片和 32 位范围切片
两个切片必须按排序顺序且不重叠
R32 应该只包含 >= 0x10000 (1<<16) 的值

S

SpecialCase

type SpecialCase []CaseRange

作用：表示特定于语言的大小写映射（如土耳其语）

说明：

SpecialCase 的方法通过覆盖标准映射来自定义映射

预定义的特殊大小写：

var TurkishCase SpecialCase = _TurkishCase
var AzeriCase SpecialCase = _TurkishCase

示例：

// 使用土耳其语特殊大小写
r := 'i'
upper := unicode.TurkishCase.ToUpper(r)
fmt.Printf("Turkish uppercase of 'i': %c\n", upper)  // İ (带点的大写 I)

SpecialCase 方法详解

T

ToLower

func (special SpecialCase) ToLower(r rune) rune

作用：将 rune 映射到小写，优先考虑特殊映射

参数说明：

r：要转换的 rune

返回值：

小写形式的 rune

ToTitle

func (special SpecialCase) ToTitle(r rune) rune

作用：将 rune 映射到标题大小写，优先考虑特殊映射

参数说明：

r：要转换的 rune

返回值：

标题大小写形式的 rune

ToUpper

func (special SpecialCase) ToUpper(r rune) rune

作用：将 rune 映射到大写，优先考虑特殊映射

参数说明：

r：要转换的 rune

返回值：

大写形式的 rune

典型示例

1. 基本字符测试

package main

import (
    "fmt"
    "unicode"
)

func main() {
    chars := []rune{'A', 'a', '5', ' ', '\n', '中', '@'}
    
    for _, r := range chars {
        fmt.Printf("'%c':\n", r)
        fmt.Printf("  IsLetter: %v\n", unicode.IsLetter(r))
        fmt.Printf("  IsDigit: %v\n", unicode.IsDigit(r))
        fmt.Printf("  IsSpace: %v\n", unicode.IsSpace(r))
        fmt.Printf("  IsPrint: %v\n", unicode.IsPrint(r))
        fmt.Printf("  IsPunct: %v\n", unicode.IsPunct(r))
        fmt.Println()
    }
}

2. 大小写转换

package main

import (
    "fmt"
    "unicode"
)

func main() {
    text := "Hello, 世界!"
    
    // 转换为大写
    for _, r := range text {
        fmt.Printf("%c", unicode.ToUpper(r))
    }
    fmt.Println()
    
    // 转换为小写
    for _, r := range text {
        fmt.Printf("%c", unicode.ToLower(r))
    }
    fmt.Println()
}

3. 字符分类统计

package main

import (
    "fmt"
    "unicode"
)

func main() {
    text := "Hello, 世界！123"
    
    var letters, digits, spaces, punctuation, others int
    
    for _, r := range text {
        switch {
        case unicode.IsLetter(r):
            letters++
        case unicode.IsDigit(r):
            digits++
        case unicode.IsSpace(r):
            spaces++
        case unicode.IsPunct(r):
            punctuation++
        default:
            others++
        }
    }
    
    fmt.Printf("Letters: %d\n", letters)
    fmt.Printf("Digits: %d\n", digits)
    fmt.Printf("Spaces: %d\n", spaces)
    fmt.Printf("Punctuation: %d\n", punctuation)
    fmt.Printf("Others: %d\n", others)
}

4. 验证输入

package main

import (
    "fmt"
    "unicode"
)

func isValidUsername(s string) bool {
    if len(s) == 0 {
        return false
    }
    
    for _, r := range s {
        // 只允许字母、数字和下划线
        if !unicode.IsLetter(r) && !unicode.IsDigit(r) && r != '_' {
            return false
        }
    }
    
    return true
}

func main() {
    usernames := []string{"user123", "user@name", "用户_1", ""}
    
    for _, name := range usernames {
        fmt.Printf("%q: %v\n", name, isValidUsername(name))
    }
}

5. 使用脚本表

package main

import (
    "fmt"
    "unicode"
)

func main() {
    chars := []rune{'A', '中', 'ア', 'Ы', 'א'}
    
    scripts := map[string]*unicode.RangeTable{
        "Latin":    unicode.Latin,
        "Han":      unicode.Han,
        "Katakana": unicode.Katakana,
        "Cyrillic": unicode.Cyrillic,
        "Hebrew":   unicode.Hebrew,
    }
    
    for _, r := range chars {
        fmt.Printf("'%c': ", r)
        for name, table := range scripts {
            if unicode.Is(table, r) {
                fmt.Printf("%s ", name)
            }
        }
        fmt.Println()
    }
}

6. 使用 In 函数

package main

import (
    "fmt"
    "unicode"
)

func main() {
    chars := []rune{'A', '5', '@', ' '}
    
    for _, r := range chars {
        if unicode.In(r, unicode.Letter, unicode.Number) {
            fmt.Printf("'%c' is letter or number\n", r)
        } else {
            fmt.Printf("'%c' is not letter or number\n", r)
        }
    }
}

7. 使用 SpecialCase

package main

import (
    "fmt"
    "unicode"
)

func main() {
    // 标准大写
    fmt.Printf("Standard ToUpper('i'): %c\n", unicode.ToUpper('i'))
    
    // 土耳其语大写
    fmt.Printf("Turkish ToUpper('i'): %c\n", unicode.TurkishCase.ToUpper('i'))
    
    // 测试带点的 I
    fmt.Printf("Turkish ToLower('İ'): %c\n", unicode.TurkishCase.ToLower('İ'))
}

8. 使用 SimpleFold

package main

import (
    "fmt"
    "unicode"
)

func main() {
    // 展示大小写折叠循环
    r := 'A'
    for i := 0; i < 3; i++ {
        fmt.Printf("%c -> ", r)
        r = unicode.SimpleFold(r)
    }
    fmt.Printf("%c\n", r)
    
    // 开尔文符号示例
    r = 'K'
    for i := 0; i < 4; i++ {
        fmt.Printf("%c -> ", r)
        r = unicode.SimpleFold(r)
    }
    fmt.Printf("%c\n", r)
}

9. 使用类别表

package main

import (
    "fmt"
    "unicode"
)

func main() {
    chars := []rune{'A', '5', ' ', '\n', '中', '!'}
    
    for _, r := range chars {
        fmt.Printf("'%c':\n", r)
        
        // 检查主要类别
        categories := map[string]*unicode.RangeTable{
            "Letter": unicode.Letter,
            "Number": unicode.Number,
            "Space":  unicode.Space,
            "Control": unicode.C,
            "Punct":  unicode.Punct,
        }
        
        for name, table := range categories {
            if unicode.Is(table, r) {
                fmt.Printf("  %s\n", name)
            }
        }
        fmt.Println()
    }
}

10. 字符串验证器

package main

import (
    "fmt"
    "unicode"
)

func isAllLetters(s string) bool {
    for _, r := range s {
        if !unicode.IsLetter(r) {
            return false
        }
    }
    return true
}

func isAllDigits(s string) bool {
    for _, r := range s {
        if !unicode.IsDigit(r) {
            return false
        }
    }
    return true
}

func isAllPrintable(s string) bool {
    for _, r := range s {
        if !unicode.IsPrint(r) {
            return false
        }
    }
    return true
}

func main() {
    tests := []string{"Hello", "123", "Hello123", "Hello\n", "世界"}
    
    for _, s := range tests {
        fmt.Printf("%q:\n", s)
        fmt.Printf("  All letters: %v\n", isAllLetters(s))
        fmt.Printf("  All digits: %v\n", isAllDigits(s))
        fmt.Printf("  All printable: %v\n", isAllPrintable(s))
        fmt.Println()
    }
}

最佳实践

1. 使用 In 代替 IsOneOf

// 推荐
if unicode.In(r, unicode.Letter, unicode.Number) {
    // ...
}

// 不推荐（但也可以用）
ranges := []*unicode.RangeTable{unicode.Letter, unicode.Number}
if unicode.IsOneOf(ranges, r) {
    // ...
}

2. 理解 IsPrint 和 IsGraphic 的区别

// IsPrint：只有 ASCII 空格
unicode.IsPrint(' ')      // true
unicode.IsPrint('\u00A0') // false (NBSP)

// IsGraphic：所有 Unicode 空格
unicode.IsGraphic(' ')      // true
unicode.IsGraphic('\u00A0') // true

3. 使用 SpecialCase 处理特定语言

// 土耳其语
upper := unicode.TurkishCase.ToUpper('i')  // İ

// 阿塞拜疆语
upper := unicode.AzeriCase.ToUpper('i')    // İ

4. 遍历字符串测试字符

text := "Hello, 世界!"
for _, r := range text {
    if unicode.IsLetter(r) {
        // 处理字母
    }
}

5. 组合使用多个测试

func isAlphanumeric(r rune) bool {
    return unicode.IsLetter(r) || unicode.IsDigit(r)
}

func isWordChar(r rune) bool {
    return isAlphanumeric(r) || r == '_'
}

与其他包配合

unicode/utf8 包

import (
    "unicode"
    "unicode/utf8"
)

s := "Hello"
r, size := utf8.DecodeRuneInString(s)
if unicode.IsLetter(r) {
    fmt.Printf("First char %c is a letter\n", r)
}

strings 包

import (
    "strings"
    "unicode"
)

// 自定义大小写转换
func toTitle(s string) string {
    return strings.Map(unicode.ToTitle, s)
}

regexp 包

import (
    "regexp"
    "unicode"
)

// 使用 Unicode 属性
re := regexp.MustCompile(`\p{Han}+`)  // 匹配汉字

注意事项

1. rune 可能属于多个类别

// 一个字符可以同时是字母和可打印
r := 'A'
fmt.Println(unicode.IsLetter(r))   // true
fmt.Println(unicode.IsPrint(r))    // true

2. 某些字符没有大小写

// 汉字没有大小写
fmt.Println(unicode.ToUpper('中'))  // 中
fmt.Println(unicode.ToLower('中'))  // 中

3. IsSpace 包括多种空白

// 包括：\t, \n, \v, \f, \r, ' ', U+0085, U+00A0
fmt.Println(unicode.IsSpace('\t'))    // true
fmt.Println(unicode.IsSpace('\u00A0')) // true (NBSP)

4. SpecialCase 只影响特定字符

// 土耳其语特殊大小写只影响 'i' 和 'I'
fmt.Println(unicode.TurkishCase.ToUpper('i'))  // İ
fmt.Println(unicode.TurkishCase.ToUpper('a'))  // A (不受影响)

5. 没有完整大小写折叠机制

// SimpleFold 只处理简单大小写折叠
// 对于涉及多个 rune 的字符不适用

6. 使用正确的范围表

// 测试十进制数字
unicode.IsDigit('5')        // true
unicode.Is(unicode.Nd, '5') // true

// 测试所有数字（包括罗马数字等）
unicode.IsNumber('5')       // true
unicode.IsNumber('Ⅳ')       // true

快速参考

常量速查表

常量	值	说明
`MaxRune`	U+10FFFF	最大 Unicode 码点
`ReplacementChar`	U+FFFD	替换字符
`MaxASCII`	U+007F	最大 ASCII 字符
`MaxLatin1`	U+00FF	最大 Latin-1 字符

函数速查表

函数	说明
`In`	测试是否在任何范围中
`Is`	测试是否在指定范围表中
`IsControl`	测试是否是控制字符
`IsDigit`	测试是否是十进制数字
`IsGraphic`	测试是否是图形字符
`IsLetter`	测试是否是字母
`IsLower`	测试是否是小写字母
`IsMark`	测试是否是标记字符
`IsNumber`	测试是否是数字
`IsPrint`	测试是否是可打印字符
`IsPunct`	测试是否是标点符号
`IsSpace`	测试是否是空格字符
`IsSymbol`	测试是否是符号
`IsTitle`	测试是否是标题大小写
`IsUpper`	测试是否是大写字母
`SimpleFold`	大小写折叠迭代
`To`	转换到指定大小写
`ToLower`	转换为小写
`ToTitle`	转换为标题大小写
`ToUpper`	转换为大写

类别速查表

类别	说明
`L` / `Letter`	字母
`M` / `Mark`	标记
`N` / `Number`	数字
`P` / `Punct`	标点符号
`S` / `Symbol`	符号
`Z` / `Space`	分隔符
`C` / `Other`	其他

常见模式

// 测试字符类型
unicode.IsLetter(r)
unicode.IsDigit(r)
unicode.IsSpace(r)
unicode.IsPunct(r)

// 大小写转换
unicode.ToUpper(r)
unicode.ToLower(r)
unicode.ToTitle(r)

// 组合测试
unicode.In(r, unicode.Letter, unicode.Number)

// 使用类别表
unicode.Is(unicode.Upper, r)
unicode.Is(unicode.Han, r)

总结

unicode 包提供了全面的 Unicode 字符属性测试和转换功能：

核心功能：

字符属性测试（Is* 函数）
大小写转换（To* 函数）
Unicode 类别、脚本和属性表
特殊语言支持（土耳其语等）

主要类型：

RangeTable：Unicode 范围表
CaseRange：大小写映射范围
SpecialCase：特殊大小写映射

使用建议：

使用 In 函数进行多重测试
理解 IsPrint 和 IsGraphic 的区别
对特定语言使用 SpecialCase
注意某些字符没有大小写
理解 rune 可能属于多个类别

典型用法：

// 字符测试
if unicode.IsLetter(r) {
    // 处理字母
}

// 大小写转换
upper := unicode.ToUpper(r)

// 组合测试
if unicode.In(r, unicode.Letter, unicode.Number) {
    // 处理字母或数字
}

通过 unicode 包，可以方便地处理各种 Unicode 字符属性测试和转换，支持国际化应用程序的开发。

Keyboard shortcuts

Go 标准包使用指南