Document Information

Last modified:
2007/05/31 19:10 by fjakobs

Tokenizer

The tokenizer converts JavaScript source code into an array of tokens. These tokens have a JSON-style map-like structure.

As an example, this tiny bit of source code

if (true) {
  a = 3
}

gets converted into the following list of tokens

{'column': 1, 'detail': 'IF', 'source': u'if', 'line': 1, 'type': 'reserved', 'id': 'x_mini.js'}
{'column': 4, 'detail': 'LP', 'source': u'(', 'line': 1, 'type': 'token', 'id': 'x_mini.js'}
{'column': 5, 'detail': 'TRUE', 'source': u'true', 'line': 1, 'type': 'reserved', 'id': 'x_mini.js'}
{'column': 9, 'detail': 'RP', 'source': u')', 'line': 1, 'type': 'token', 'id': 'x_mini.js'}
{'column': 11, 'detail': 'LC', 'source': u'{', 'line': 1, 'type': 'token', 'id': 'x_mini.js'}
{'column': 12, 'detail': '', 'source': '', 'line': 1, 'type': 'eol', 'id': 'x_mini.js'}
{'column': 3, 'detail': 'public', 'source': u'a', 'line': 2, 'type': 'name', 'id': 'x_mini.js'}
{'column': 5, 'detail': 'ASSIGN', 'source': u'=', 'line': 2, 'type': 'token', 'id': 'x_mini.js'}
{'column': 7, 'detail': 'int', 'source': u'3', 'line': 2, 'type': 'number', 'id': 'x_mini.js'}
{'column': 8, 'detail': '', 'source': '', 'line': 2, 'type': 'eol', 'id': 'x_mini.js'}
{'column': 3, 'detail': 'RC', 'source': u'}', 'line': 3, 'type': 'token', 'id': 'x_mini.js'}
{'column': 4, 'detail': '', 'source': '', 'line': 3, 'type': 'eol', 'id': 'x_mini.js'}
{'column': 1, 'detail': '', 'source': '', 'line': 4, 'type': 'eof', 'id': 'x_mini.js'}

As you can see each token is captured as a key-value pair map and represents a significant piece of the input file. The source attribute of each map holds the literal string found in the input that is associated with that token. Other attributes record the position of the token in the source file (’line’, ‘column’), its type or its token class (’detail’). The file name of the input file is used as an id shared among all tokens. (To get a list of all recognized token classes like ‘RP’ (right paren) and ‘LC’ (left curly), see the module config.py).

Used from the Command Line

The tokenizer module, tokenizer.py, can be invoked on the command line, takes an input file as its argument, and writes the list of tokens to STDOUT. More help on command line options can be obtained by issuing

tokenizer.py --help

Used from Code

You can also use tokenizer.py programmatically:

import tokenizer

tokenArray = tokenizer.parseStream(string)
# or
tokenArray = tokenizer.parseFile(filename)

This allows you to capture the list of tokens produced from a piece of source code for further processing. Within qooxdoo, this is exactly what happens in the treegenerator.

Information

Last modified:
2007/05/31 19:10 by fjakobs

Account

Not logged in

 
 

Job Offers

To further improve qooxdoo we are seeking javascript developers. Read more...

Rich Ajax Platform (RAP)

RAP uses qooxdoo, Java and the Eclipse development model to build rich web applications. Read more...

qooxdoo Web Toolkit (QWT)

Similar to GWT this framework allows to create impressive qooxdoo applications just using Java. Read more...

Pustefix

Pustefix is a MVC-based web application framework using Java and XML/XSLT. Read more...

 
SourceForge.net Logo

Bad Behavior has blocked 0 potential spam attempts in the last 7 days.