4. Generate the AST

After defining the grammar, you can generate the abstract syntax tree (AST) of your language. The AST is a tree representation of the source code that can be used to analyze and transform the code. The AST definition is generated by the Langium CLI. Simply call the following command on your terminal:

npm run langium:generate

This line will call langium generate on your Langium project. The Langium CLI will generate the files in the src/generated directory. It will create the following files (depending on your given Langium configuration):

a grammar file: which contains your entire grammar definition in JSON format.
a module file: which contains language-specific setup objects for the final module definition of your language.
an ast file: which contains the definition of your AST.
several syntax highlighting files: like for PrismJS, TextMate or Monarch.

The syntax tree

An AST of your language is now ready to be get parsed. One important concept in Langium are cross-references. With them you can reference other elements in your language. For example, you can reference a variable in a function call. The AST will contain a reference to the variable. This is useful for code analysis and transformation. Technologies like ANTLR or other parser-only generators do not support this feature. For them you are forced to resolve these references in-place everytime the developer is confronted with them.

After these generation steps, cross-references are not resolved yet. This is done in the next step.

Example

Imagine you are using the Hello-World example from the Yeoman generator. For an input file like this you will get the following syntax tree from Langium during runtime:

person John
person Jane

Hello John!
Hello Jane!

graph TB
  Model-->persons
  Model-->greetings
  
  persons-->P1[Person]
  P1 --> H1('person')
  P1 --> N1[name]
  N1 --> NL1('John')
  
  persons-->P2[Person]
  P2 --> H2('person')
  P2 --> N2[name]
  N2 --> NL2('Jane')

  greetings-->G1[Greeting]
  G1 --> KW1('hello')
  G1 --> PRef1[Ref]
  G1 --> EM1('!')
  PRef1 --> QM1{?}

  greetings-->G2[Greeting]
  G2 --> KW2('hello')
  G2 --> PRef2[Ref]
  G2 --> EM2('!')
  PRef2 --> QM2{?}

Mind the gaps (question marks) for the cross-references inside the greetings. This job has to be done by the developer. Fortunately Langium provides a default implementation for cross-reference resolution. You can also implement your own resolution strategy.

How to test the parser?

You can test the parser by comparing the generated AST with the expected AST. Here is an example:

import { createHelloWorldServices } from "./your-project//hello-world-module.js";
import { EmptyFileSystem } from "langium";
import { parseHelper } from "langium/test";
import { Model } from "../../src/language/generated/ast.js";

//arrange
const services = createHelloWorldServices(EmptyFileSystem);
const parse = parseHelper<Model>(services.HelloWorld);

//act
const document = await parse(`
    person John
    person Jane
    
    Hello John!
    Hello Jane!
`);

//assert
const model = document.parseResult.value;
expect(model.persons).toHaveLength(2);
expect(model.persons[0].name).toBe("John");
expect(model.persons[1].name).toBe("Jane");
expect(model.greetings).toHaveLength(2);
//be aware of the fact that the following checks will fail at this point, because the cross-references are not resolved yet
expect(model.greetings[0].person.ref?.name).toBe("John");
expect(model.greetings[1].person.ref?.name).toBe("Jane");

The expect function can be any assertion library you like. The Hello world example uses Vitest.