Before diving into Langium itself, let’s get your environment ready for development:
- You have a working Node environment with version 12 or higher.
- Install Yeoman and the Langium extension generator.
npm i -g yo generator-langium
For our getting started example, we would also recommend you to install the latest version of vscode.
To create your first working DSL, execute the yeoman generator:
Yeoman will prompt you with a few basic questions about your DSL:
- Extension name: Will be used as the folder name of your extension and its
- Language name: Will be used as the name of the grammar and as a prefix for some generated files and service classes.
- File extensions: A comma separated list of file extensions for your DSL.
Afterwards, it will generate a new project and start installing all dependencies, including the
langium framework as well as the
langium-cli command line tool required for generating code based on your grammar definition.
After everything has successfully finished running, open your newly created Langium project with vscode via the UI (File > Open Folder…) or execute the following command, replacing
hello-world with your chosen project name:
Press F5 or open the debug view and start the available debug configuration to launch the extension in a new Extension Development Host window. Open a folder and create a file with your chosen file extension (
.hello is the default). The
hello-world language accepts two kinds of entities: The
Hello entity. Here’s a quick example on how to use them both:
person Alice Hello Alice! person Bob Hello Bob!
src/language-server/hello-world.langium in your newly created project contains your grammar.
If you’re already familiar with the terms used in parsing or DSL frameworks, you can skip this short excursion and go straight to the next part. However, anyone who is new to DSL development should carefully read the following primer on the terms we are using in our documentation:
abstract syntax tree: A tree of elements that represents a text document. Each element is a simple JS object that combines multiple input tokens into a single object. Commonly abbreviated as AST.
document: An abstract term to refer to a text file on your file system or an open editor document in your IDE.
grammar: Defines the form of your language. In Langium, a grammar is also responsible for describing how the AST is built.
parser: A program that takes a document as its input and computes an abstract syntax tree as its output.
parser rule: A parser rule describes how a certain AST element is supposed to be parsed. This is done by invoking other parser rules or terminals.
terminal: A terminal is the smallest parseable part of a document. It usually represents small pieces of text like names, numbers, keywords or comments.
token: A token is a substring of the document that matches a certain terminal. It contains information about which kind of terminal it represents as well as its location in the document.
Here’s the grammar that parses the previous text snippet:
grammar HelloWorld hidden terminal WS: /\s+/; terminal ID: /[_a-zA-Z][\w_]*/; entry Model: (persons+=Person | greetings+=Greeting)*; Person: 'person' name=ID; Greeting: 'Hello' person=[Person] '!';
Let’s go through this one by one:
Before we tell Langium anything about our grammar contents, we first need to give it a name - in this case it’s
langium-cli will pick this up to prefix any generated services with this name.
hidden terminal WS: /\s+/; terminal ID: /[_a-zA-Z][\w]*/;
Here we define our two needed terminals for this grammar: The whitespace
WS and identifier
ID terminals. Terminals parse a part of our document by matching it against their regular expression. The
WS terminal parses any whitespace characters with the regex
/\s+/. This allows us consume whitespaces in our document. As the terminal is declared as
hidden, the parser will parse any whitespace and discard the results. That way, we don’t have to care about how many whitespaces a user uses in their document. Secondly, we define our
ID terminal. It parses any string that starts with an underscore or letter and continues with any amount of characters that match the
\w regex token. It will match
_al1c3 but not
#alice. Langium is using the JS regex dialect for terminal definitions.
entry Model: (persons+=Person | greetings+=Greeting)*;
Model parser rule is the
entry point to our grammar. Parsing always starts with the
entry rule. Here we define a repeating group of alternatives:
persons+=Person | greetings+=Greeting. This will always try to parse either a
Person or a
Greeting and add it to the respective list of
greetings in the
Model object. Since the alternative is wrapped in a repeating group
*, the parser will continue until all input has been consumed.
Person: 'person' name=ID;
Person rule starts off with the
'person' keyword. Keywords are like terminals, in the sense that they parse a part of the document. The set of keywords and terminals create the tokens that your language is able to parse. You can imagine that the
'person' keyword here is like an indicator to tell the parser that an object of type
Person should be parsed. After the keyword, we assign the
Person a name by parsing an
Greeting: 'Hello' person=[Person] '!';
Like the previous rule, the
Greeting starts with a keyword. With the
person assignment we introduce the cross reference, indicated by the brackets
. A cross reference will allow your grammar to reference other elements that are contained in your file or workspace. By default, Langium will try to resolve this cross reference by parsing the terminal that is associated with its
name property. In this case, we are looking for a
name property matches the parsed
That finishes the short introduction to Langium! Feel free to play around with the grammar and use
npm run langium:generate to regenerate the generated TypeScript files. To go further, we suggest that you continue with our tutorials.