About tokens ============= .. warning:: The module token_utils has been made into its own library. It should be installed automatically when installing ideas. You can get it on its own using ``python -m pip install token-utils`` While **ideas** aims to provide support for all kinds of transformations, including those that affect the Abstract Syntax Tree or the bytecode, most transformations deal with exploring alternative syntax that is not compatible with Python's current syntax defined by its `grammar `_. Such alternative syntax cannot be parsed by Python without generating a ``SyntaxError`` thus preventing the execution of the code. For this reason, almost all of our examples transform the code prior to letting Python parse it. We do this using a set of tools built upon Python's `tokenize module `_. .. sidebar:: An excellent tutorial `Brown Water Python `_, written by Aaron Meurer, is an excellent and very comprehensive tutorial about Python's tokenize module. In our description of these tools below, we assume that you are somewhat familiar with the concept of token objects generated by Python's tokenize module. If you are not familiar with those, we suggest that you read through at least once through the documentation about Python's tokenize module mentioned above. The main points to understand: - Using the ``tokenize`` function, a source can be broken down in tokens, which, as generated by Python, are 5-tuples carrying information about their **type**, their **string** content, their position in the source (identified by starting and ending **row**, aka line number, and **column**), as well as the content of the line where they are found. - From a list of tokens, the original source can essentially recreated by using the ``untokenize`` function. However, as stated in the documentation: *The result is guaranteed to tokenize back to match the input so that the conversion is lossless and round-trips are assured. The guarantee applies only to the token type and token string as the spacing between tokens (column positions) may change.* - To ``untokenize`` using the function from the Python standard library, one can use either a list of 5-tuple tokens, or a list of two-tuple tokens that include only the **type** and **string** information. .. sidebar:: Perfect round-trip Unlike Python's version, the process of tokenizing and untokenizing a source using ideas' own ``tokenize`` and ``untokenize`` functions is guaranteed to yield back an exact copy of the original source, with all the spacing information intact. Experience has shown that being able to recover the original source with spacing included is **extremely** useful when writing tests about the expected results for some source transformation. About Ideas' tokens ------------------- Recently (Feb. 21, 2020), on the Python-ideas mailing list, `Andrew Barnert wrote `_: *Unfortunately, the boilerplate to write an import hook is more complicated than you’d like (and pretty hard to figure out the first time),* **and the support for filtering on the token stream (the most obvious way to do this one)** *rather than the text stream, AST, or bytecode* **is pretty minimal and clumsy.** [emphasis added] **Ideas** uses ``token-utils`` which defines its own ``Token`` class built from Python's tokens. While they carry the same information, they are much easier to use and manipulate. Below is the API from the ``token_utils`` module. .. tip:: While we show below the full API of the `token_utils` module, you might want to first to to next page to see a demonstration of its usage, done in an actual programming session using a Jupyter notebook. token_utils.py API extracted by Sphinx -------------------------------------- .. automodule:: token_utils :members: