A set of columns is defined as follows:

columns             ::= '{' column-definition [',' column-definition]... '}'

column-definition   :: output-column: column-spec

column-spec         ::= fmt [ '->' post-processor ]

output-column       ::= An alphanumeric string, with an optional
                        trailing '='. If the trailing '=' is present,
                        the '=' is discarded, but allows for the `fmt`
                        to be blank (and any `post-processor` is not
                        invoked).

fmt                 ::= A Python format string.

post-processor      ::= A Python expresssion.

The fmt is a Python format string which can include fixed text (such as whitespace) and cell text using replacement fields. The replacement fields use the column headings from the input CSV file as the field_name, with any characters not in the set VAR_CHARS removed. In this example column-definition, the incoming CSV file has a column heading “Worker.Id”, which is transformed into field_name “WorkerId” which is used directly as the value of output-column “Works Id”:

"Works id": "{WorkerId}"

And here, “ref_t”, “row_number”, “Date”, “Client Name”, “Role Name” are transformed, combined and formatted using “[.] , ,” to form the “Display as” output-column value:

"Display as": "[{ref_t}.{row_number}] {Date}, {ClientName}, {RoleName}"

The fmt also allows limited processing of the cell text. For example:

Example fmt usage

fmt Usage

Description

{Email} address

Extract the “Email” cell and combine it with fixed text “ address”.

{Name[0]}

Extract the “Name” cell, and use its first letter only.

\{AccountNumber:>08}

Extract the “AccountNumber” cell, zero fill to a width of 8, and add a preceding (single) “\\”.

The result of the fmt is always stripped of any leading and trailing whitespace.

The optional post-processor is a Python expression used to apply more powerful transformations than the fmt supports. The expression can also use:

  • A variable called “_” initialised with the result of fmt.

  • Variables called “_1”, “_2”, etc. initialised with the result of each placeholder in fmt.

  • Most standard Python builtins, plus re, string, unicodedata, datetime from datetime and utc from datetime.timezone

  • A builtin one() which is like all() or any() but requires exactly one set value.

For example:

Example post-processor usage

post-processor Usage

Description

{AccountNumber} -> '\{:>08}'.format(int(_) * 100)

Extract the “AccountNumber” cell, convert it to an int, multiply that by 100, zero fill to a width of 8, and add a preceding (single) “\\”.

{Date} -> str(datetime.strptime(_, '%d/%m/%Y').date())

Extract the “Date” cell, parse it as %d/%m/%Y datetime, extract the date portion, and convert it into a string (in the default format, yyyy-mm-dd).

"Display as": "{StartDate} -> ['Mo','Tu','We','Th','Fr','Sa','Su'][datetime.strptime(_[:10], '%d/%m/%Y').weekday()]

Extract a day of the week from a date.

{A},{B},{C} -> 'ABC'[_.split(',').index('YES')]

Finds the first of cells A, B and C which is ‘YES’, and returns either ‘A’ or ‘B’ or ‘C’ accordingly.

{First Name} -> unicodedata.normalize('NFD', _).encode('ascii', 'ignore').decode()

This is one possible way to eliminate accented characters. NOTE: this is a lossy process, and not recommended. Also, there are many possible variations of this that you may prefer.

"email": "{Email}@@{Surname}.{Works Id}@example.org -> [e for e in _.split('@@') if e][0]"

Provide a default value if the given value is blank. Here, if “Email” is blank, a default is derived from “Surname” and “Works Id”.

"Units": "{Contract} -> '0' if _.split(',')[0] !='PAYE' else '1'"

Exclude rows without a value of ‘PAYE’ in “Contract” cell by using 1 or 0 skipped for CSV data-feed uploader.

"Display as": "£{Rate1}@{Hours1}hrs £{Rate2}@{Hours2}hrs £{Rate3}@{Hours3}hrs -> _.replace('£0.00@0.00hrs', '').strip()"

Load multiple shifts but exclude those with a value of zero: “£0.00@0.00hrs”.

"Display as": {Client}" ->  _.replace('Limited ', '').replace(' Ltd', '')"

Remove “Limited “ and “ Ltd” in Display as field.

"end_date": "{end_date}  -> _1 if (_1 and _1 <= str(datetime.utcnow().date())) else '' "

Exclude “end_date” if it is after today.

The result of the post-processor is NOT further normalised.