Breaking off

Having my own version of the python parser has proven, so far, to be clumsy and chaotic. Clumsy because it means that I need a special interpreter just to run my language (which in any case uses an interpreter!), chaotic because the building of such interpreter has proven to not work stably in different machines. This means that currently it only works for me.

Because of this and because I wanted even more control over the parser (who said allowing to write things like rsync(--help)?), I decided to check my options. A friend of mine, more used to playing with languages, suggested using pypy to create my own parser, but that just lead me a little further: why not outright 'steal' pypy's parser? After all, they have their own, which is also generated from Python's Python.adsl.

In fact it took me one hour to port the parser and a couple more porting the AST builder. This included porting them to Python3 (both by running 2to3 and then applying some changes by hand, notably dict.iteritems -> dict.items) and trying to remove as much dependency on the rest of pypy, specially from rpython.

The last step was to migrate from their own AST implementation to Python's, but here's where (again) I hit the last brick wall: the ast.AST class and subclasses are very special. They're implemented in C, but the Python API does not allow to create nodes with the line and column info. for a moment I contemplated the option of creating another extension (that is, written in C) to make those calls, but the the obvious solution came to mind: a massive replacement from:

return ast.ASTClass ([params], foo.lineno, foo.column)

into:

new_node = ast.ASTClass ([params])
new_node.lineno = foo.lineno
new_node.column = foo.column
return new_node

and some other similar changes. See here if you're really interested in all the details . I can only be grateful for regular expressions, capturing groups and editors that support both.

The following code is able to parse and dump a simple python script:

#! /usr/bin/env python3
import ast

from pypy.interpreter.pyparser import pyparse
from pypy.interpreter.astcompiler import astbuilder

info= pyparse.CompileInfo('setup.py', 'exec')
p= pyparse.PythonParser(None)
t= p.parse_source (open ('setup.py').read(), info)
a= astbuilder.ast_from_node (None, t, info)

print (ast.dump (a))

The result is the following (formatted by hand):

Module(body=[
    ImportFrom(module='distutils.core', names=[alias(name='setup', asname=None)], level=0),
    Import(names=[alias(name='ayrton', asname=None)]),
    Expr(value=Call(func=Name(id='setup', ctx=<class '_ast.Load'>), args=None, keywords=[
        keyword(arg='name', value=Str(s='ayrton')),
        keyword(arg='version', value=Attribute(value=Name(id='ayrton', ctx=<class '_ast.Load'>), attr='__version__', ctx=<class '_ast.Load'>)),
        keyword(arg='description', value=Str(s='a shell-like scripting language based on Python3.')),
        keyword(arg='author', value=Str(s='Marcos Dione')),
        keyword(arg='author_email', value=Str(s='mdione@grulic.org.ar')),
        keyword(arg='url', value=Str(s='https://github.com/StyXman/ayrton')),
        keyword(arg='packages', value=List(elts=[Str(s='ayrton')], ctx=<class '_ast.Load'>)),
        keyword(arg='scripts', value=List(elts=[Str(s='bin/ayrton')], ctx=<class '_ast.Load'>)),
        keyword(arg='license', value=Str(s='GPLv3')),
        keyword(arg='classifiers', value=List(elts=[
            Str(s='Development Status :: 3 - Alpha'),
            Str(s='Environment :: Console'),
            Str(s='Intended Audience :: Developers'),
            Str(s='Intended Audience :: System Administrators'),
            Str(s='License :: OSI Approved :: GNU General Public License v3 or later (GPLv3+)'), Str(s='Operating System :: POSIX'),
            Str(s='Programming Language :: Python :: 3'),
            Str(s='Topic :: System'),
            Str(s='Topic :: System :: Systems Administration')
        ],
        ctx=<class '_ast.Load'>))
    ], starargs=None, kwargs=None))
])

The next steps are to continue removing references to pypy code, and make sure it can actually parse all possible code. Then I should revisit the harcoded limitations in the parser (in particular in this loop and then be able to freely format program calls :).

Interesting times are arriving to ayrton!

Update: fixed last link. Thanks nueces!