3. Write the grammar

Your Langium project is now setup and ready to be used. The next step is to define the grammar of your language. The grammar is the most important part of your language definition. It defines the syntax of your language and how the language elements are structured.

The grammar is defined in a .langium file. Make sure that you have installed the VS Code extension for Langium. This extension provides syntax highlighting and code completion for .langium files. Here’s the grammar from the Hello-World example that was generated by the Yeoman generator:

grammar HelloWorld

hidden terminal WS: /\s+/;
terminal ID: /[_a-zA-Z][\w]*/;

entry Model: (persons+=Person | greetings+=Greeting)*;

Person:
    'person' name=ID;

Greeting:
    'Hello' person=[Person] '!';

Let’s go through this one by one:

grammar HelloWorld

Before we tell Langium anything about our grammar contents, we first need to give it a name - in this case it’s HelloWorld. The langium-cli will pick this up to prefix any generated services with this name.

hidden terminal WS: /\s+/;
terminal ID: /[_a-zA-Z][\w]*/;

Here we define our two needed terminals for this grammar: The whitespace WS and identifier ID terminals. Terminals parse a part of our document by matching it against their regular expression. The WS terminal parses any whitespace characters with the regex /\s+/. This allows us to consume whitespaces in our document. As the terminal is declared as hidden, the parser will parse any whitespace and discard the results. That way, we don’t have to care about how many whitespaces a user uses in their document. Secondly, we define our ID terminal. It parses any string that starts with an underscore or letter and continues with any amount of characters that match the \w regex token. It will match Alice, _alice, or _al1c3 but not 4lice or #alice. Langium is using the JS regex dialect for terminal definitions.

entry Model: (persons+=Person | greetings+=Greeting)*;

The Model parser rule is the entry point to our grammar. Parsing always starts with the entry rule. Here we define a repeating group of alternatives: persons+=Person | greetings+=Greeting. This will always try to parse either a Person or a Greeting and add it to the respective list of persons or greetings in the Model object. Since the alternative is wrapped in a repeating group *, the parser will continue until all input has been consumed.

Person: 'person' name=ID;

The Person rule starts off with the 'person' keyword. Keywords are like terminals, in the sense that they parse a part of the document. The set of keywords and terminals create the tokens that your language is able to parse. You can imagine that the 'person' keyword here is like an indicator to tell the parser that an object of type Person should be parsed. After the keyword, we assign the Person a name by parsing an ID.

Greeting: 'Hello' person=[Person] '!';

Like the previous rule, the Greeting starts with a keyword. With the person assignment we introduce the cross reference, indicated by the brackets []. A cross reference will allow your grammar to reference other elements that are contained in your file or workspace. By default, Langium will try to resolve this cross reference by parsing the terminal that is associated with its name property. In this case, we are looking for a Person whose name property matches the parsed ID.

That finishes the short introduction to Langium! Feel free to play around with the grammar and use npm run langium:generate to regenerate the generated TypeScript files. To go further, we suggest that you continue with our tutorials.