go/scanner - Go 源码扫描器

go/scanner 包提供了 Go 源代码的词法扫描功能，将源码转换为 token 序列。

概述

go/scanner 包用于将 Go 源代码扫描为 token 序列，是 Go 编译器和解析器的底层组件，提供词法分析功能。

包导入：

import (
    "go/scanner"
    "go/token"
    "fmt"
    "os"
)

基本使用：

// 1. 创建 FileSet
fset := token.NewFileSet()

// 2. 添加源文件
file := fset.AddFile("", fset.Base(), len(src))

// 3. 创建扫描器
var s scanner.Scanner
s.Init(file, src, nil, scanner.ScanComments)

// 4. 扫描 token
for {
    pos, tok, lit := s.Scan()
    if tok == token.EOF {
        break
    }
    fmt.Printf("%s: %s %q\n", fset.Position(pos), tok, lit)
}

典型示例：

示例 1：扫描源码并打印 token：

package main

import (
    "fmt"
    "go/scanner"
    "go/token"
)

func main() {
    src := []byte(`
package main

func main() {
    x := 42
    fmt.Println(x)
}
`)

    // 创建 FileSet
    fset := token.NewFileSet()
    file := fset.AddFile("", fset.Base(), len(src))

    // 创建扫描器
    var s scanner.Scanner
    s.Init(file, src, nil, scanner.ScanComments)

    // 扫描并打印 token
    for {
        pos, tok, lit := s.Scan()
        if tok == token.EOF {
            break
        }
        fmt.Printf("%s\t%s\t%q\n", fset.Position(pos), tok, lit)
    }
}

运行：

$ go run main.go
1:1	PACKAGE	""
1:9	ident	"main"
3:1	FUNC	""
3:6	ident	"main"
3:10	(	""
3:11	)	""
3:13	{	""
4:5	ident	"x"
4:7	:=	""
4:10	INT	"42"
5:5	ident	"fmt"
5:9	.	""
5:10	ident	"Println"
5:17	(	""
5:18	ident	"x"
5:19	)	""
6:1	}	""

示例 2：统计 token 数量：

package main

import (
    "fmt"
    "go/scanner"
    "go/token"
)

func main() {
    src := []byte(`
package main

import "fmt"

func Add(a, b int) int {
    return a + b
}
`)

    fset := token.NewFileSet()
    file := fset.AddFile("", fset.Base(), len(src))

    var s scanner.Scanner
    s.Init(file, src, nil, 0)

    // 统计各类 token 数量
    var identCount, litCount, opCount int
    
    for {
        _, tok, lit := s.Scan()
        if tok == token.EOF {
            break
        }
        
        switch tok {
        case token.IDENT:
            identCount++
        case token.INT, token.FLOAT, token.STRING, token.CHAR:
            litCount++
        default:
            if len(lit) > 0 {
                opCount++
            }
        }
    }
    
    fmt.Printf("标识符：%d\n", identCount)
    fmt.Printf("字面量：%d\n", litCount)
    fmt.Printf("操作符/关键字：%d\n", opCount)
}

运行：

$ go run main.go
标识符：7
字面量：1
操作符/关键字：15

一、Scanner 结构体

扫描器结构体

Scanner

定义：

type Scanner struct {
    // 内部字段，不应直接访问
}

说明：

Go 源码扫描器
将源码转换为 token 序列
所有字段都是内部的，不应直接访问

方法：

Init(file *token.File, src []byte, err ErrorHandler, mode Mode)
Scan() (pos token.Pos, tok token.Token, lit string)
ScanComments() (pos token.Pos, tok token.Token, lit string)
ErrorCount() int

示例：

package main

import (
    "fmt"
    "go/scanner"
    "go/token"
)

func main() {
    src := []byte(`package main; var x int`)
    
    fset := token.NewFileSet()
    file := fset.AddFile("", fset.Base(), len(src))
    
    var s scanner.Scanner
    s.Init(file, src, nil, 0)
    
    // 扫描所有 token
    for {
        pos, tok, lit := s.Scan()
        if tok == token.EOF {
            break
        }
        fmt.Printf("%s: %s %q\n", fset.Position(pos), tok, lit)
    }
    
    fmt.Printf("\n错误数：%d\n", s.ErrorCount())
}

运行：

$ go run main.go
1:1: PACKAGE ""
1:9: ident "main"
1:14: ; ""
1:16: VAR ""
1:20: ident "x"
1:22: INT ""

错误数：0

二、Mode 类型

扫描模式类型

Mode

定义：

type Mode uint

说明：

控制扫描器的行为
使用位掩码组合多个选项

包级别常量

ScanComments

定义：

const ScanComments Mode = 1 << iota

说明：

扫描注释（默认不扫描）
注释作为 token.COMMENT 返回

示例：

package main

import (
    "fmt"
    "go/scanner"
    "go/token"
)

func main() {
    src := []byte(`
package main

// 这是注释
var x int
`)

    fset := token.NewFileSet()
    file := fset.AddFile("", fset.Base(), len(src))
    
    // 不扫描注释
    var s1 scanner.Scanner
    s1.Init(file, src, nil, 0)
    
    fmt.Println("不扫描注释:")
    for {
        _, tok, _ := s1.Scan()
        if tok == token.EOF {
            break
        }
        if tok == token.COMMENT {
            fmt.Println("  发现注释")
        }
    }
    
    // 扫描注释
    var s2 scanner.Scanner
    s2.Init(file, src, nil, scanner.ScanComments)
    
    fmt.Println("\n扫描注释:")
    for {
        pos, tok, lit := s2.Scan()
        if tok == token.EOF {
            break
        }
        if tok == token.COMMENT {
            fmt.Printf("  %s: %s\n", fset.Position(pos), lit)
        }
    }
}

运行：

$ go run main.go
不扫描注释:

扫描注释:
  4:1: // 这是注释

DontInsertSemis

定义：

const DontInsertSemis Mode = 1 << iota

说明：

不自动插入分号
默认情况下扫描器会自动插入分号

示例：

package main

import (
    "fmt"
    "go/scanner"
    "go/token"
)

func main() {
    src := []byte(`
package main
var x int
var y int
`)

    fset := token.NewFileSet()
    file := fset.AddFile("", fset.Base(), len(src))
    
    // 自动插入分号
    var s1 scanner.Scanner
    s1.Init(file, src, nil, 0)
    
    fmt.Println("自动插入分号:")
    for {
        _, tok, _ := s1.Scan()
        if tok == token.EOF {
            break
        }
        if tok == token.SEMICOLON {
            fmt.Println("  发现分号")
        }
    }
    
    // 不插入分号
    var s2 scanner.Scanner
    s2.Init(file, src, nil, scanner.DontInsertSemis)
    
    fmt.Println("\n不插入分号:")
    for {
        _, tok, _ := s2.Scan()
        if tok == token.EOF {
            break
        }
        if tok == token.SEMICOLON {
            fmt.Println("  发现分号")
        }
    }
}

运行：

$ go run main.go
自动插入分号:
  发现分号
  发现分号

不插入分号:

三、包级别类型

ErrorHandler 类型

定义：

type ErrorHandler func(pos token.Position, msg string)

说明：

错误处理函数类型
用于处理扫描过程中的错误

参数：

pos：错误位置
msg：错误消息

示例：

package main

import (
    "fmt"
    "go/scanner"
    "go/token"
)

func main() {
    // 无效的 Go 代码
    src := []byte(`
package main
var x = 
`)

    fset := token.NewFileSet()
    file := fset.AddFile("", fset.Base(), len(src))
    
    // 自定义错误处理
    var errors []string
    handler := func(pos token.Position, msg string) {
        errors = append(errors, fmt.Sprintf("%s: %s", pos, msg))
    }
    
    var s scanner.Scanner
    s.Init(file, src, handler, 0)
    
    // 扫描
    for {
        _, tok, _ := s.Scan()
        if tok == token.EOF {
            break
        }
    }
    
    fmt.Printf("错误数：%d\n", len(errors))
    for _, err := range errors {
        fmt.Printf("  %s\n", err)
    }
}

运行：

$ go run main.go
错误数：1
  :1: unexpected EOF

四、Scanner 方法（按字母顺序）

获取错误数量

ErrorCount

定义：

func (s *Scanner) ErrorCount() int

说明：

返回扫描过程中遇到的错误数量
用于检查扫描是否成功

返回值：

int：错误数量

示例：

package main

import (
    "fmt"
    "go/scanner"
    "go/token"
)

func main() {
    src := []byte(`package main; var x = `)
    
    fset := token.NewFileSet()
    file := fset.AddFile("", fset.Base(), len(src))
    
    var s scanner.Scanner
    s.Init(file, src, nil, 0)
    
    // 扫描所有 token
    for {
        _, tok, _ := s.Scan()
        if tok == token.EOF {
            break
        }
    }
    
    if s.ErrorCount() > 0 {
        fmt.Printf("扫描失败：%d 个错误\n", s.ErrorCount())
    } else {
        fmt.Println("扫描成功")
    }
}

运行：

$ go run main.go
扫描失败：1 个错误

初始化扫描器

Init

定义：

func (s *Scanner) Init(file *token.File, src []byte, err ErrorHandler, mode Mode)

说明：

初始化扫描器
必须在调用 Scan 之前调用

参数：

file：token 文件
src：源代码
err：错误处理函数（可为 nil）
mode：扫描模式

示例：

package main

import (
    "fmt"
    "go/scanner"
    "go/token"
)

func main() {
    src := []byte(`package main; const Pi = 3.14`)
    
    fset := token.NewFileSet()
    file := fset.AddFile("test.go", fset.Base(), len(src))
    
    var s scanner.Scanner
    
    // 初始化
    s.Init(file, src, nil, scanner.ScanComments)
    
    // 扫描
    for {
        pos, tok, lit := s.Scan()
        if tok == token.EOF {
            break
        }
        fmt.Printf("%s: %s %q\n", fset.Position(pos), tok, lit)
    }
}

运行：

$ go run main.go
1:1: PACKAGE ""
1:9: ident "main"
1:14: ; ""
1:16: CONST ""
1:22: ident "Pi"
1:25: = ""
1:27: FLOAT "3.14"

扫描下一个 token

Scan

定义：

func (s *Scanner) Scan() (pos token.Pos, tok token.Token, lit string)

说明：

扫描下一个 token
返回位置、token 类型和字面量

返回值：

pos：token 位置
tok：token 类型
lit：字面量值（标识符、数字、字符串等）

示例：

package main

import (
    "fmt"
    "go/scanner"
    "go/token"
)

func main() {
    src := []byte(`x := 42`)
    
    fset := token.NewFileSet()
    file := fset.AddFile("", fset.Base(), len(src))
    
    var s scanner.Scanner
    s.Init(file, src, nil, 0)
    
    for {
        pos, tok, lit := s.Scan()
        if tok == token.EOF {
            break
        }
        
        fmt.Printf("位置：%s, 类型：%s, 字面量：%q\n", 
            fset.Position(pos), tok, lit)
    }
}

运行：

$ go run main.go
位置：1:1, 类型：ident, 字面量："x"
位置：1:3, 类型：:=, 字面量：""
位置：1:6, 类型：INT, 字面量："42"

扫描注释

ScanComments

定义：

func (s *Scanner) ScanComments() (pos token.Pos, tok token.Token, lit string)

说明：

专门扫描注释的简化版本
用于只关心注释的场景

返回值：

pos：token 位置
tok：token 类型
lit：注释内容

示例：

package main

import (
    "fmt"
    "go/scanner"
    "go/token"
)

func main() {
    src := []byte(`
package main

// 单行注释
/* 多行注释 */
var x int // 行尾注释
`)

    fset := token.NewFileSet()
    file := fset.AddFile("", fset.Base(), len(src))
    
    var s scanner.Scanner
    s.Init(file, src, nil, scanner.ScanComments)
    
    fmt.Println("所有注释:")
    for {
        pos, tok, lit := s.Scan()
        if tok == token.EOF {
            break
        }
        if tok == token.COMMENT {
            fmt.Printf("  %s: %s\n", fset.Position(pos), lit)
        }
    }
}

运行：

$ go run main.go
所有注释:
  4:1: // 单行注释
  5:1: /* 多行注释 */
  6:12: // 行尾注释

五、快速参考

Scanner 结构体

方法	说明	返回值
Init(file, src, err, mode)	初始化扫描器	-
Scan()	扫描下一个 token	`(Pos, Token, lit)`
ScanComments()	扫描注释	`(Pos, Token, lit)`
ErrorCount()	获取错误数量	`int`

Mode 常量

常量	说明	效果
ScanComments	扫描注释	返回 COMMENT token
DontInsertSemis	不插入分号	不自动添加分号

包级别类型

类型	说明
Mode	扫描模式类型
ErrorHandler	错误处理函数类型

Token 分类

分类	Token 示例
关键字	PACKAGE, FUNC, VAR, CONST
标识符	IDENT
字面量	INT, FLOAT, STRING, CHAR
操作符	+, -, *, /, =
分隔符	(, ), {, }, ;
注释	COMMENT（需 ScanComments 模式）

使用场景

场景	推荐方法	模式
词法分析	Scan()	0
提取注释	ScanComments()	ScanComments
保留分号	Scan()	DontInsertSemis
错误处理	Scan()	0 + ErrorHandler

常见 Token

Token	字面量示例
IDENT	“x”, “fmt”, “Println”
INT	“42”, “0x1F”
FLOAT	“3.14”, “1e-10”
STRING	“"hello"”
CHAR	“‘x’”
COMMENT	“// comment”, “/* block */”

六、最佳实践

1. 基本扫描

package main

import (
    "fmt"
    "go/scanner"
    "go/token"
)

func scanSource(src []byte) {
    fset := token.NewFileSet()
    file := fset.AddFile("", fset.Base(), len(src))
    
    var s scanner.Scanner
    s.Init(file, src, nil, 0)
    
    for {
        pos, tok, lit := s.Scan()
        if tok == token.EOF {
            break
        }
        fmt.Printf("%s: %s %q\n", fset.Position(pos), tok, lit)
    }
}

2. 错误处理

package main

import (
    "fmt"
    "go/scanner"
    "go/token"
)

func scanWithErrorHandling(src []byte) error {
    fset := token.NewFileSet()
    file := fset.AddFile("", fset.Base(), len(src))
    
    var errors []string
    handler := func(pos token.Position, msg string) {
        errors = append(errors, fmt.Sprintf("%s: %s", pos, msg))
    }
    
    var s scanner.Scanner
    s.Init(file, src, handler, 0)
    
    for {
        _, tok, _ := s.Scan()
        if tok == token.EOF {
            break
        }
    }
    
    if s.ErrorCount() > 0 {
        return fmt.Errorf("扫描失败：%v", errors)
    }
    return nil
}

3. 提取所有注释

package main

import (
    "fmt"
    "go/scanner"
    "go/token"
)

func extractComments(src []byte) []string {
    fset := token.NewFileSet()
    file := fset.AddFile("", fset.Base(), len(src))
    
    var s scanner.Scanner
    s.Init(file, src, nil, scanner.ScanComments)
    
    var comments []string
    for {
        _, tok, lit := s.Scan()
        if tok == token.EOF {
            break
        }
        if tok == token.COMMENT {
            comments = append(comments, lit)
        }
    }
    
    return comments
}

4. 统计代码

package main

import (
    "fmt"
    "go/scanner"
    "go/token"
)

func countTokens(src []byte) map[token.Token]int {
    fset := token.NewFileSet()
    file := fset.AddFile("", fset.Base(), len(src))
    
    var s scanner.Scanner
    s.Init(file, src, nil, 0)
    
    counts := make(map[token.Token]int)
    for {
        _, tok, _ := s.Scan()
        if tok == token.EOF {
            break
        }
        counts[tok]++
    }
    
    return counts
}

5. 提取标识符

package main

import (
    "fmt"
    "go/scanner"
    "go/token"
)

func extractIdentifiers(src []byte) []string {
    fset := token.NewFileSet()
    file := fset.AddFile("", fset.Base(), len(src))
    
    var s scanner.Scanner
    s.Init(file, src, nil, 0)
    
    var idents []string
    for {
        _, tok, lit := s.Scan()
        if tok == token.EOF {
            break
        }
        if tok == token.IDENT {
            idents = append(idents, lit)
        }
    }
    
    return idents
}

七、注意事项

1. 必须初始化

// 错误：未初始化
var s scanner.Scanner
s.Scan() // panic

// 正确
var s scanner.Scanner
s.Init(file, src, nil, 0)
s.Scan()

2. FileSet 必须正确设置

// 错误：FileSet 为空
var s scanner.Scanner
s.Init(nil, src, nil, 0) // panic

// 正确
fset := token.NewFileSet()
file := fset.AddFile("", fset.Base(), len(src))
s.Init(file, src, nil, 0)

3. 扫描到 EOF

// 正确：扫描到 EOF
for {
    _, tok, _ := s.Scan()
    if tok == token.EOF {
        break
    }
    // 处理 token
}

// 错误：可能遗漏 token
for i := 0; i < 10; i++ {
    s.Scan() // 可能提前结束或不够
}

4. 字面量的使用

// IDENT、INT、FLOAT、STRING、CHAR 有字面量
pos, tok, lit := s.Scan()
if tok == token.IDENT {
    fmt.Printf("标识符：%s\n", lit)
}

// 关键字和操作符字面量为空
if tok == token.FUNC {
    fmt.Printf("关键字：%s, 字面量：%q\n", tok, lit)
    // 输出：关键字：func, 字面量：""
}

最后更新：2026-04-04
Go 版本：Go 1.23+

Keyboard shortcuts

Go 标准包使用指南