Breaking bad

For the last two weeks I've been trying to slightly change Python's syntax. The goal was to allow a mix of keyword and normal parameters in a function, such as:

f (a=42, b)

I know that this sounds crazy, but (I think) I have a good reason for it:

  • in ayrton some functions act as frontends for executables;
  • keyword parameters for these functions are converted to options (a=42 gets converted to -a 42, foo=27 to --foo 27);
  • there is an arbitrary number of commands that consider their arguments as positional, like requiring that all the options come before the arguments; notably, rsync.

So, how to accomplish this? Well, there are good news and bad news... and plain news. The good news is that the lexer allows this kind of constructions. The plain news is that keywords-after-arguments is enforced in the syntax checker, which is implemented in Python/ast.c. This code ends up in the libpython. The bad news is that because all the functions defined in this file are declared as static, which means, among other things, that they're not exported by the library they're in. So, even if I only want to overwrite ast_for_call(), I still have to copy over all the rest of the functions. Finally, I already mentioned a slightly bad news, which is that I will have to make my own interpreter so this change applies, which implies also copying the Python interpreter's main function.

All that aside, how to make this syntax Python-compatible again? Well, the idea is simple: if a keyword is found before an argument, I convert it into a tuple with a string with the name of the parameter and its value. Thus, a=42 becomes ('a', 42) and then the frontend function resolves that to -a 42.

Between the plain and good news camps, the patch is rather simple:

```diff diff -r d047928ae3f6 Python/ast.c +++ b/Python/ast.c Thu Oct 17 12:37:17 2013 +0200 @@ -2408,7 +2408,7 @@ argument: [test '='] (test) [comp_for] # Really [keyword '='] test */

 -    int i, nargs, nkeywords, ngens;
 +    int i, nargs, nkeywords, ngens, convert_keywords;
 asdl_seq *args;
 asdl_seq *keywords;
 expr_ty vararg = NULL, kwarg = NULL;
 @@ -2418,15 +2418,20 @@
 nargs = 0;
 nkeywords = 0;
 ngens = 0;
 +    convert_keywords= 0;
 for (i = 0; i < NCH(n); i++) {
     node *ch = CHILD(n, i);
     if (TYPE(ch) == argument) {
 -            if (NCH(ch) == 1)
 +            if (NCH(ch) == 1) {
             nargs++;
 -            else if (TYPE(CHILD(ch, 1)) == comp_for)
 +                if (nkeywords) {
 +                    convert_keywords= 1;
 +                }
 +            } else if (TYPE(CHILD(ch, 1)) == comp_for)
             ngens++;
 -            else
 +            else {
             nkeywords++;
 +            }
     }
 }
 if (ngens > 1 || (ngens && (nargs || nkeywords))) {
 @@ -2440,6 +2445,11 @@
     return NULL;
 }

 +    if (convert_keywords) {
 +        nargs+= nkeywords;
 +        nkeywords= 0;
 +    }
 +
 args = asdl_seq_new(nargs + ngens, c->c_arena);
 if (!args)
     return NULL;
 @@ -2451,13 +2461,15 @@
 for (i = 0; i < NCH(n); i++) {
     node *ch = CHILD(n, i);
     if (TYPE(ch) == argument) {
 -            expr_ty e;
 +            expr_ty e, e1;
         if (NCH(ch) == 1) {
 +                /*
             if (nkeywords) {
                 ast_error(c, CHILD(ch, 0),
                           "non-keyword arg after keyword arg");
                 return NULL;
             }
 +                */
             if (vararg) {
                 ast_error(c, CHILD(ch, 0),
                           "only named arguments may follow *expression");
 @@ -2478,26 +2489,27 @@
             keyword_ty kw;
             identifier key, tmp;
             int k;
 +                asdl_seq *t;

             /* CHILD(ch, 0) is test, but must be an identifier? */
 -                e = ast_for_expr(c, CHILD(ch, 0));
 -                if (!e)
 +                e1 = ast_for_expr(c, CHILD(ch, 0));
 +                if (!e1)
                 return NULL;
             /* f(lambda x: x[0] = 3) ends up getting parsed with
              * LHS test = lambda x: x[0], and RHS test = 3.
              * SF bug 132313 points out that complaining about a keyword
              * then is very confusing.
              */
 -                if (e->kind == Lambda_kind) {
 +                if (e1->kind == Lambda_kind) {
                 ast_error(c, CHILD(ch, 0), "lambda cannot contain assignment");
                 return NULL;
 -                } else if (e->kind != Name_kind) {
 +                } else if (e1->kind != Name_kind) {
                 ast_error(c, CHILD(ch, 0), "keyword can't be an expression");
                 return NULL;
 -                } else if (forbidden_name(c, e->v.Name.id, ch, 1)) {
 +                } else if (forbidden_name(c, e1->v.Name.id, ch, 1)) {
                 return NULL;
             }
 -                key = e->v.Name.id;
 +                key = e1->v.Name.id;
             for (k = 0; k < nkeywords; k++) {
                 tmp = ((keyword_ty)asdl_seq_GET(keywords, k))->arg;
                 if (!PyUnicode_Compare(tmp, key)) {
 @@ -2508,10 +2520,21 @@
             e = ast_for_expr(c, CHILD(ch, 2));
             if (!e)
                 return NULL;
 -                kw = keyword(key, e, c->c_arena);
 -                if (!kw)
 -                    return NULL;
 -                asdl_seq_SET(keywords, nkeywords++, kw);
 +                if (!convert_keywords) {
 +                    kw = keyword(key, e, c->c_arena);
 +                    if (!kw)
 +                        return NULL;
 +                    asdl_seq_SET(keywords, nkeywords++, kw);
 +                } else {
 +                    /* build a tuple ('name', expr) */
 +                    t = asdl_seq_new(2, c->c_arena);
 +                    if (!t)
 +                        return NULL;
 +                    asdl_seq_SET(t, 0, Str (key, e1->lineno, e1->col_offset, c->c_arena));
 +                    asdl_seq_SET(t, 1, e);
 +                    /* ... and out it as an argument */
 +                    asdl_seq_SET(args, nargs++, Tuple (t, Load, e1->lineno, e1->col_offset, c->c_arena));
 +                }
         }
     }
     else if (TYPE(ch) == STAR) {
 """]]

 Don't you hate when you have to modify a line just to add braces?

 But of course, this break in the syntax should only be allowed for the frontend
 functions. This check is not implemented yet, so technically the new syntax is
 valid also for the Python code `ayrton` runs. This is not ideal, but hey, this
 is still a WIP :) For the moment this line of development will stay out of the
 main line. I will integrate it when I'm really convinced that it's a good idea.