About tokens
=============

.. warning::

     The module token_utils has been made into its own library.
     It should be installed automatically when installing ideas.
     You can get it on its own using ``python -m pip install token-utils``

While **ideas** aims to provide support for all kinds of transformations,
including those that affect the Abstract Syntax Tree or the bytecode,
most transformations deal with exploring alternative syntax that is
not compatible with Python's current syntax defined by its
`grammar <https://docs.python.org/3/reference/grammar.html>`_.
Such alternative syntax cannot be parsed by Python without generating
a ``SyntaxError`` thus preventing the execution of the code.
For this reason, almost all of our examples transform the code
prior to letting Python parse it.  We do this using a set of tools
built upon Python's `tokenize module <https://docs.python.org/3/library/tokenize.html>`_.

.. sidebar:: An excellent tutorial

  `Brown Water Python <https://www.asmeurer.com/brown-water-python/>`_, written by Aaron Meurer, is an excellent and very comprehensive tutorial about Python's
  tokenize module.

In our description of these tools below, we assume that you are
somewhat familiar with the concept of token objects generated by Python's
tokenize module.  If you are not familiar with those, we suggest that
you read through at least once through the documentation about Python's
tokenize module mentioned above.


The main points to understand:

- Using the ``tokenize`` function, a source can be broken down in tokens,
  which, as generated by Python, are 5-tuples carrying information about their
  **type**, their **string** content, their position in the source
  (identified by starting and ending **row**, aka line number, and **column**),
  as well as the content of the line where they are found.
- From a list of tokens, the original source can essentially recreated
  by using the ``untokenize`` function.
  However, as stated in the documentation:

    *The result is guaranteed to tokenize back to match the input so that
    the conversion is lossless and round-trips are assured.
    The guarantee applies only to the token type and
    token string as the spacing between tokens (column positions) may change.*

- To ``untokenize`` using the function from the Python
  standard library, one can use either a list of 5-tuple tokens,
  or a list of two-tuple tokens that include only the **type** and **string**
  information.

.. sidebar:: Perfect round-trip

    Unlike Python's version, the process of tokenizing and untokenizing a source
    using ideas' own ``tokenize`` and ``untokenize`` functions
    is guaranteed to yield back an exact copy of the original source, with all
    the spacing information intact.
    Experience has shown that being able to recover the
    original source with spacing included is **extremely** useful when writing
    tests about the expected results for some source transformation.

About Ideas' tokens
-------------------

Recently (Feb. 21, 2020), on the Python-ideas mailing list,
`Andrew Barnert wrote <https://mail.python.org/archives/list/python-ideas@python.org/message/UNL62EMSNPA5USUS7SCEQZQ63PVP2FDL/>`_:

    *Unfortunately, the boilerplate to write an import hook is more complicated than you’d
    like (and pretty hard to figure out the first time),* **and the support for filtering on the
    token stream (the most obvious way to do this one)**
    *rather than the text stream, AST, or bytecode* **is pretty minimal and clumsy.**
    [emphasis added]


**Ideas** uses ``token-utils`` which defines its own ``Token`` class built
from Python's tokens.
While they carry the same information, they are much easier to use and manipulate.

Below is the API from the ``token_utils`` module.

.. tip::

    While we show below the full API of the `token_utils` module,
    you might want to first to to next page to see a demonstration
    of its usage, done in an actual programming session using a Jupyter notebook.


token_utils.py API extracted by Sphinx
--------------------------------------

.. automodule:: token_utils
   :members: