Tokenizer
The tokenizer converts JavaScript source code into an array of tokens. These tokens have a JSON-style map-like structure.
As an example, this tiny bit of source code
if (true) {
a = 3
}
gets converted into the following list of tokens
{'column': 1, 'detail': 'IF', 'source': u'if', 'line': 1, 'type': 'reserved', 'id': 'x_mini.js'}
{'column': 4, 'detail': 'LP', 'source': u'(', 'line': 1, 'type': 'token', 'id': 'x_mini.js'}
{'column': 5, 'detail': 'TRUE', 'source': u'true', 'line': 1, 'type': 'reserved', 'id': 'x_mini.js'}
{'column': 9, 'detail': 'RP', 'source': u')', 'line': 1, 'type': 'token', 'id': 'x_mini.js'}
{'column': 11, 'detail': 'LC', 'source': u'{', 'line': 1, 'type': 'token', 'id': 'x_mini.js'}
{'column': 12, 'detail': '', 'source': '', 'line': 1, 'type': 'eol', 'id': 'x_mini.js'}
{'column': 3, 'detail': 'public', 'source': u'a', 'line': 2, 'type': 'name', 'id': 'x_mini.js'}
{'column': 5, 'detail': 'ASSIGN', 'source': u'=', 'line': 2, 'type': 'token', 'id': 'x_mini.js'}
{'column': 7, 'detail': 'int', 'source': u'3', 'line': 2, 'type': 'number', 'id': 'x_mini.js'}
{'column': 8, 'detail': '', 'source': '', 'line': 2, 'type': 'eol', 'id': 'x_mini.js'}
{'column': 3, 'detail': 'RC', 'source': u'}', 'line': 3, 'type': 'token', 'id': 'x_mini.js'}
{'column': 4, 'detail': '', 'source': '', 'line': 3, 'type': 'eol', 'id': 'x_mini.js'}
{'column': 1, 'detail': '', 'source': '', 'line': 4, 'type': 'eof', 'id': 'x_mini.js'}
As you can see each token is captured as a key-value pair map and represents a significant piece of the input file. The source attribute of each map holds the literal string found in the input that is associated with that token. Other attributes record the position of the token in the source file (’line’, ‘column’), its type or its token class (’detail’). The file name of the input file is used as an id shared among all tokens. (To get a list of all recognized token classes like ‘RP’ (right paren) and ‘LC’ (left curly), see the module config.py).
Used from the Command Line
The tokenizer module, tokenizer.py, can be invoked on the command line, takes an input file as its argument, and writes the list of tokens to STDOUT. More help on command line options can be obtained by issuing
tokenizer.py --help
Used from Code
You can also use tokenizer.py programmatically:
import tokenizer tokenArray = tokenizer.parseStream(string) # or tokenArray = tokenizer.parseFile(filename)
This allows you to capture the list of tokens produced from a piece of source code for further processing. Within qooxdoo, this is exactly what happens in the treegenerator.
