Treegenerator
The treegenerator, implemented in the module treegenerator.py located in the tool/modules subdirectory, has the task of producing a syntax tree (often referred to as “abstract syntax tree” or AST) from JavaScript source code. It can be invoked from the command line like this
treegenerator.py <file>.js
and will print out an XML representation of the syntax tree to STDOUT.
The treegenerator parses the tokens delivered by the tokenzier (tokenizer.py) and builds up a parse tree. The nodes of this tree are defined in the module tree.py and are instances of the class “Node”. The Node class defines methods to add and remove children to a node, query and set node attributes, maintain relation to the parent node, and more.
In order to transform the tokens of the source file into a syntax tree, treegenerator.py implements a class “TokenStream” that serves as a wrapper and OO interface to the output of the tokenizer. The most important method is the next() method that implements an iterator over the stream of tokens and on invocation delivers the next token from the stream. But this method also takes care of comments when they are encountered, generating comment nodes for the tree.
On the parser side, the main worker function createSyntaxTree calls the basic parse routine readStatement, which in turn calls itself and various other “read*” routines to read and parse nested statements and language constructs. Each of these read routines utilize the next() method of the TokenStream singleton to retreive new tokens and to integrate them into the syntax tree.
As mentioned before, special attention is given to comments. The TokenStream.next() method intercepts comments, generates tree nodes for them, and either attaches them to the current node in the tree, or accumulates them in an internal property, to be retreived for later nodes. This decision is made on the basis of the comments “connect”edness, a token attribute that hints towards whether the comment is likely to belong to the preceding lexical construct (e.g. an assignment), or the following (e.g. a function definition). These hints are based on rather rough guessings are yield not always the expected result, so that a comment might end up out of place.
