Abstract Syntax Trees

coding

January 14, 2022

After realizing that rhythms are trees, I want to investigate a special kind of tree, that is normally used by program language compilers to represent the syntax of a program.

I have the idea to transform rhythmical objects into an AST to allow easier transformations before rendering. Before doing that, I want to look at ASTs in general, particularly by using unified.

Unified

As already described in the last post, this whole blog is only possible thanks to unified and the tool ecosystem around mdx, markdown and html. At the heart of unified, there is unist, which is a specification for syntax trees.

unist, mdast, hast

In a nutshell, unist defines types that can be seen as a basis to represent ASTs. The 2 most important are:

interface Node {
  type: string; // The variant of a node.
  data?: Data | undefined; // Information from the ecosystem.
  position?: Position | undefined; // Location of a node in a source document. Must not be present if a node is generated.
}
export interface Parent extends Node {
  children: Node[]; // List representing the children of a node.
}
show others
export interface Data {
  [key: string]: unknown;
}
export interface Position {
  start: Point; // Place of the first character of the parsed source region.
  end: Point; // Place of the first character after the parsed source region.
  indent?: number[] | undefined; // Start column at each index (plus start line) in the source region, for elements that span multiple lines.
}
export interface Point {
  line: number; // Line in a source file (1-indexed integer).
  column: number; // Column in a source file (1-indexed integer).
  offset?: number | undefined; // Character in a source file (0-indexed integer).
}
export interface Literal extends Node {
  value: unknown;
}

Also checkout this guide

A unified tool for a specific language can use those types as a basis to define its AST. For example, this blog is using:

mdast Example

Using mdast, we can parse this markdown text:

# Test

In botany, a tree is a perennial plant with

- an elongated stem, or trunk,
- supporting branches
- leaves

...into

mdast json

not a browser

These nodes are an extension of the unist standard. I omitted the position info to make it shorter. Open this tree in AST Explorer.

mdast visualization

Of course, we can also visualize this AST with a tree:

color type
root
heading
text
paragraph
list
listItem

md to mdast in code

To turn markdown into an AST, we can use remark's parse function:

import { remark } from 'remark';

const ast = await remark().parse(
  '# Test\n\nIn botany, a tree is a perennial plant with\n\n- an elongated stem, or trunk,\n- supporting branches\n- leaves\n'
);
console.log(ast);

Internally parse uses mdast-util-from-markdown to convert the string to an AST.

In practice, you will most likely parse markdown, transform the AST using a plugin and then outputting markdown again. This is what the process function does:

import remarkToc from 'remark-toc';

const file = await remark()
  .use(remarkToc)
  .process('# Hi\n\n## Table of contents\n\n## Hello\n\n*Some* ~more~ _things_.');
console.log(String(file));

This will find all headings and generate a table of contents below "Table of contents", creating this diff for the above example:

1
# Hi
1
# Hi
2
2
3
## Table of contents
3
## Table of contents
4
4
5
+
*   [Hello](#hello)
6
+
5
## Hello
7
## Hello
6
8
7
*Some* ~more~ _things_.
9
*Some* ~more~ _things_.

Writing plugins

Above remark-toc is a plugin that transforms markdown AST. To write our own plugin, we take the AST as input and output back the transformed AST.

Now having obtained an AST, we can use it to our behalf. For example, we could transform all unordered lists to ordered ones like this:

import { remark } from 'remark';
import { visit } from 'unist-util-visit';

export const bringOrder = () => (tree) => {
  visit(tree, 'list', (node: any) => {
    node.ordered = true;
  });
};
const file = await remark()
  .use(bringOrder)
  .process(
    '# Test\n\nIn botany, a tree is a perennial plant with\n\n- an elongated stem, or trunk,\n- supporting branches\n- leaves'
  );
console.log(String(file));

which produces the following diff:

1
# Test
1
# Test
2
2
3
In botany, a tree is a perennial plant with
3
In botany, a tree is a perennial plant with
4
4
5
-
- an elongated stem, or trunk,
5
+
1. an elongated stem, or trunk,
6
-
- supporting branches
6
+
2. supporting branches
7
-
- leaves
7
+
3. leaves

Also check out this egghead.io course about unified.

Outlook

In the future, I could see using unified style ASTs to represent rhythical objects. It would simplify the implementation of plugins a lot. In the end, the rhythmical renderer would talk the AST and output events.

Also having a unified AST format allows implementing a "mini notation" รก la tidal, that enables writing rhythmical in a shorter string notation.

Felix Roos 2023