# 代码生成设计

¥Code generation design

从 v7 开始,Ajv 使用 代码生成模块 (opens new window) 替换了之前使用的 doT (opens new window) 模板。

¥Starting from v7 Ajv uses CodeGen module (opens new window) that replaced doT (opens new window) templates used earlier.

这一改变的动机:

¥The motivations for this change:

  • doT 模板很难维护和更改,特别是对于偶尔的贡献者而言。

    ¥doT templates were difficult to maintain and to change, particularly for the occasional contributors.

  • 他们不鼓励校验关键字代码中的模块化,并且还导致代码不同部分之间的隐式依赖。

    ¥they discouraged modularity within validation keywords code and also led to implicit dependencies between different parts of code.

  • 即使所有已识别的问题都已修补,但如果使用不受信任的结构,他们仍面临远程代码执行的风险。

    ¥they had risks of remote code execution in case untrusted schemas were used, even though all identified issues were patched.

  • 现在得到广泛支持的 ES6 模板字面量为 AST 和普通字符串连接提供了一个很好的替代方案 - Ajv 启动时此选项不可用。

    ¥ES6 template literals that are now widely supported offer a great alternative to both ASTs and to plain string concatenation - this option was not available when Ajv started.

# 安全代码生成

¥Safe code generation

CodeGen 模块定义了两个标记模板,它们应该传递给所有代码生成方法并在其他标记模板中使用:

¥CodeGen module defines two tagged templates that should be passed to all code generation methods and used in other tagged templates:

  • _ - 创建私有 _Code 类的实例,该实例在代码或其他标记模板中使用时不会被转义。

    ¥_ - to create instances of private _Code class that will not be escaped when used in code or other tagged templates.

  • str - 为字符串表达式创建代码。

    ¥str - to create code for string expressions.

例如,这段代码:

¥For example, this code:

const x = 0
// Name is a subclass of _Code that can be safely used in code - it only allows valid identifiers
// gen.const creates a unique variable name with the prefix "num".
const num: Name = gen.const("num", 5)
gen.if(
  // _`...` returns the instance of _Code with safe interpolation of `num` and `x`.
  // if `x` was a string, it would be inserted into code as a quoted string value rather than as a code fragment,
  // so if `x` contained some code, it would not be executed.
  _`${num} > ${x}`,
  () => log("greater"),
  () => log("smaller or equal")
)

function log(comparison: string): void {
  // msg creates a string expression with concatenation - see generated code below
  // type Code = _Code | Name, _Code can only be constructed with template literals
  const msg: Code = str`${num} is ${comparison} than ${x}`
  // msg is _Code instance, so it will be inserted via another template without quotes
  gen.code(_`console.log(${msg})`)
}

生成以下 JavaScript 代码:

¥generates this javascript code:

const num0 = 5
if (num0 > 0) {
  console.log(num0 + " is greater than 0")
} else {
  console.log(num0 + " is smaller or equal than 0")
}

上面的 .const.if.code 是 CodeGen 类的方法,它们在类实例 gen 中生成代码 - 有关所有可用方法,请参阅 源代码 (opens new window);有关其他代码生成示例,请参阅 tests (opens new window)

¥.const, .if and .code above are methods of CodeGen class that generate code inside class instance gen - see source code (opens new window) for all available methods and tests (opens new window) for other code generation examples.

这些方法仅接受私有类 _Code 的实例,其他值将被 Typescript 编译器拒绝 - 传递不安全字符串的风险在类型级别上得到了缓解。

¥These methods only accept instances of private class _Code, other values will be rejected by Typescript compiler - the risk to pass unsafe string is mitigated on type level.

如果在 _ 模板字面量中使用字符串变量,则其值将安全地用引号括起来 - 在许多情况下它非常有用,因为它允许通过同一模板注入可以是字符串或数字的值。在最坏的情况下,生成的代码可能无效,但它将防止攻击者通过不受信任的结构作为应插入代码中的字符串值(例如,而不是数字)传递代码执行的风险。另请参阅示例中的注释。

¥If a string variable were used in _ template literal, its value would be safely wrapped in quotes - in many cases it is quite useful, as it allows to inject values that can be either string or number via the same template. In the worst case, the generated code could be invalid, but it will prevent the risk of code execution that attacker could pass via untrusted schema as a string value that should be inserted in code (e.g., instead of a number). Also see the comment in the example.

# 代码优化

¥Code optimization

CodeGen 类生成代码树并在渲染代码之前执行多项优化:

¥CodeGen class generates code trees and performs several optimizations before the code is rendered:

  1. 删除空的和无法到达的分支(例如 if(true) 之后的 else 分支等)。

    ¥removes empty and unreachable branches (e.g. else branch after if(true), etc.).

  2. 删除未使用的变量声明。

    ¥removes unused variable declarations.

  3. 将仅使用一次的变量和显式标记为 "constant"(即具有引用透明度)的赋值表达式替换为表达式本身。

    ¥replaces variables that are used only once and assigned expressions that are explicitly marked as "constant" (i.e. having referential transparency) with the expressions themselves.

优化假设没有副作用

这些优化假设 if 条件、for 语句头和赋值表达式中的表达式没有任何副作用 - 所有预定义的校验关键字都是这种情况。

¥These optimizations assume that the expressions in if conditions, for statement headers and assigned expressions are free of any side effects - this is the case for all pre-defined validation keywords.

示例请参见 这些测试 (opens new window)

¥See these tests (opens new window) for examples.

默认情况下,Ajv 进行 1-pass 优化 - 基于测试套件,它将代码大小减少了 10.5%,树节点数量减少了 16.7%(TODO 对校验时间进行了基准测试)。第二次优化将其更改幅度小于 0.1%,因此除非你有非常复杂的结构,或者你生成独立代码并希望它通过相关的 eslint 规则,否则你不需要它。

¥By default Ajv does 1-pass optimization - based on the test suite it reduces the code size by 10.5% and the number of tree nodes by 16.7% (TODO benchmark the validation time). The second optimization pass changes it by less than 0.1%, so you won't need it unless you have really complex schemas or if you generate standalone code and want it to pass relevant eslint rules.

可以使用 options 更改优化结构:

¥Optimization mode can be changed with options:

  • {code: {optimize: false}} - 禁用(例如,当结构编译时间更重要时),

    ¥{code: {optimize: false}} - to disable (e.g., when schema compilation time is more important),

  • {code: {optimize: 2}} - 2 遍优化。

    ¥{code: {optimize: 2}} - 2-pass optimization.

# 用户定义的关键字

¥User-defined keywords

虽然标记模板字面量根据运行时值封装传递的字符串,但 CodeGen 类方法依赖类型来确保传递参数的安全性 - 没有运行时检查传递的值是 _Code 类的实例。

¥While tagged template literals wrap passed strings based on their run-time values, CodeGen class methods rely on types to ensure safety of passed parameters - there is no run-time checks that the passed value is an instance of _Code class.

强烈建议仅使用 Typescript 定义其他关键字 - 使用纯 JavaScript 仍然允许将不安全的字符串传递给代码生成方法。

¥It is strongly recommended to define additional keywords only with Typescript - using plain JavaScript would still allow passing unsafe strings to code generation methods.

优化和副作用

如果你的用户定义的关键字需要通过优化消除副作用(见上文),你可能需要禁用它。

¥If your user-defined keywords need to have side-effects that are removed by optimization (see above), you may need to disable it.