pefan

A few weeks ago I needed to do some line based manipulation that kinda went further of what you can easyly do with awk. My old-SysAdmin brain kicked in and the first though was, if you're going to use awk and sed, you might as well use perl. Thing is, I really can't remember when was the last time I wrote even a oneliner in perl, maybe 2011, in my last SysAdmin-like position.

Since then I've been using python for almost anything, so why not? Well, the python interpreter does not have an equivalent of perl's -n switch; and while we're at it, -a, -F, -p are also interesting for this.

So I wrote a little program for that. Based on those switch names, I called it pefan. As python does not have perl's special variables, and in particuar, $_ and @_, the wrapper sets the line variable for each line of the input, and if you use the -a or -F switches, the variable data with the list that's the result of splitting the line.

Meanwhile, while reading the perlrun manpage to write this post, I found out that -i and even -s sound useful, so I'll be adding support for those in the future. I'm also thinking of adding support for curly-brace-based block definitions, to make oneliners easier to write. Yes, it's a travesty, but it's all in line with my push to make python more SysAdmin friendly.

In the meantime, I added a couple of switches I find useful too. See the whole usage:

usage: pefan.py [-h] [-a] -e SCRIPT [-F SPLIT_CHAR] [-i] [-M MODULE_SPEC]
                [-m MODULE_SPEC] [-N] [-n] [-p] [--no-print] [-r RANDOM]
                [-s SETUP] [-t [FORMAT]] ...

Tries to emulate Perl's (Yikes!) -peFan switches.

positional arguments:
FILE                  Files to process. If ommited or file name is '-',
                      stdin is used. Notice you can use '-' at any point in
                      the list; f.i. "foo bar - baz".

optional arguments:
-h, --help            show this help message and exit
-a, --split           Turns on autosplit, so the line is split in elements.
                      The list of e lements go in the 'data' variable.
-e SCRIPT, --script SCRIPT
                      The script to run inside the loop.
-F SPLIT_CHAR, --split-char SPLIT_CHAR
                      The field delimiter. This implies [-a|--split].
-i, --ignore-empty    Do not print empty lines.
-M MODULE_SPEC, --import MODULE_SPEC
                      Import modules before runing any code. MODULE_SPEC can
                      be MODULE or MODULE,NAME,... The latter uses the 'from
                      MODULE import NAME, ...' variant. MODULE or NAMEs can
                      have a :AS_NAME suffix.
-m MODULE_SPEC        Same as [-M|--import]
-N, --enumerate-lines
                      Prepend each line with its line number, like less -N
                      does.
-n, --iterate         Iterate over all the lines of inputs. Each line is
                      assigned in the 'line' variable. This is the default.
-p, --print           Print the resulting line. This is the default.
--no-print            Don't automatically print the resulting line, the
                      script knows what to do with it
-r RANDOM, --random RANDOM
                      Print only a fraction of the output lines.
-s SETUP, --setup SETUP
                      Code to be run as setup. Run only once after importing
                      modules and before iterating over input.
-t [FORMAT], --timestamp [FORMAT]
                      Prepend a timestamp using FORMAT. By default prints it
                      in ISO-8601.

FORMAT can use Python's strftime()'s codes (see
https://docs.python.org/3/library/datetime.html#strftime-and-strptime-
behavior).

Go get it here.