Keeping named files in bash

February 1st, 2016

It just took me way too long to solve this, so I’m going to write it up quickly:

The Problem

  • I have a file containing the basenames of spreadsheets that I want to keep in a directory structure. I want to delete all the other spreadsheets.
  • There are levels of nested directories and the files are distributed among them
  • The file names, and the directory names contain spaces (don’t look at me like that, it’s other people’s work on Windoze — at least I’m using bash
  • I’m using bash on Linux, so I have proper find, etc.

A Solution


find . -name \*.xls\?  | while read  f
    if ! grep -q "${f##*/}" $modelfiles
        rm "$f"

I’m sure there are more efficient ways. Contact me if you have ideas. Meanwhile I’ve got to get back to work so I’m sticking with “it works”.

UPDATE: that didn’t work when I tried to re-run it just now. So switched from grep -v to if ! …

Waiting on the platform

October 10th, 2015

A recent conversation with Lauren Serota reminded me of a challenge we often face building small-to-medium sized digital solutions for our clients: the right platform.

We were talking about building custom phone apps in the context of Myanmar’s smartphone revolution, and the challenge of finding the right partners to implement innovative mobile service delivery systems. She told a story of one such development which looked great but cost too much to make small changes to. That entailed going through another development cycle with the software company; for changes that felt like they should be under the service team’s control.
“They wrote a custom native app.”, she said, “instead of building a platform.”
This started me revisiting similar stories from Aptivate, where we were the software company.

The IT Specialists

I’m thinking about AuthorAID, which helps support researchers from developing countries preparing academic articles for publication in peer-reviewed journals. It is a fairly complex web application which functions not entirely unlike a dating site (matching mentors with mentees) with a way to track collaboration through research publishing tasks. We built it with Django, a powerful framework rapid web development.

Django was a great choice, it enabled us to build AuthorAID at a cost within INASP‘s budget; an impressive achievement because AuthorAID is essentially a fairly complex, bespoke web application serving a relatively small audience. That’s why it’s our web platform of choice. But neither AuthorAID, nor Django are the platform I think Lauren was talking about.

As we maintain and develop AuthorAID for INASP, some of the changes we make, like adding faceted search, feel like they belong squarely with the “IT specialists”. Others, like adding fields to the form that users fill when they register, feel like they belong with the team running the site.

A lot of Django’s power for rapidly building web applications comes from its focus on meta-data. To add those new fields to the sign-up form, we edit the Model file which is written in Python, the programing language Django is written in. So, you have to be a programmer to change them. But once that’s done, changes to web-forms can be generated automatically from the models. The new field values will need to be stored in Django’s relational database, but the necessary changes to the database schema can also be generated from the models, and theres a whole system to manage applying those database changes. Django is a great tool for the “IT specialists”. Which makes sense: it was created by web developers at a Newspaper.

This works out well for INASP who retain a relationship with Aptivate that enables us to act as their “IT specialists”, for the website stuff. But some of the web applications we build aren’t commissioned by organisations like INASP, but as part of change programmes with a limited lifetime and budget. They can’t afford to keep us on as IT specialists, in case their web application needs to be tweaked or extended. If they want to change the data they gather or report on, they need to be able to make those changes themselves. They don’t want a bespoke application built with a rapid-development framework; I think they need the “Platform”.

The General Solution

I’m thinking now of another project that, in the end, we didn’t do. We were written into the budget proposal for a three year programme with institutional funding, to provide a database for monitoring and evaluation (M&E) data. Budget was small and the requirements weren’t very clear until the programme started and a team assembled. We did our discovery process soon found out that our clients had high hopes for their M&E database, but they could predict neither the structure of the data they would be storing, nor the work-flows they would want it to support.

Trying to manage the risk of Building The Wrong Thing, we pushed for specifics specific examples.
“It doesn’t have to be an all singing, all dancing system.”, they told us in one meeting.
“Okay”, I replied, “but you have to tell us exactly which songs, and which dances you will want it to do.
Eventually they told us not to keep trying to understand their specific requirements, and “just deliver the generic version”.

It’s much harder (read more expensive) to build a generic tool than a specific one. It’s very hard to get the abstractions right, especially without passing through specific examples on the way. But our client had had something specific in mind: a low-cost, generic solution. I think I understand where they were coming from: Excel. The generic solution, for fifty bucks, built on billions of dollars of R&D and decades of getting the abstractions right.

Most people have never bought bespoke software, and don’t know how to. If we were one of those people, the closest thing we can imagine to bespoke software might be a customised spreadsheet; with some protected cells, nice colouring, some predefined formulae and some VBA macros behind the scenes. I’m actually working on one of these at the moment, for another client. If I don’t do all the customisation work, what will be left? The full awesome power of Excel. Users would have to work harder to use it but, with the right skills, they could do so. Excel is a cheap product and all that customization effort is expensive IT specialist consultancy. I understand it must feel frustrating When the IT specialists start asking ‘which songs and which dances?’. Just give us The Thing. The thing we can live in and work it out for ourselves. The web-based Excel* for data-driven applications.

* I know this sounds like Google sheets, but they weren’t actually looking for an on-line spreadsheet.

Compared to evolving a customised spreadsheet, building a bespoke web application in, say, Django is a completely different proposition. It takes significant work to get to an MVP and when you get there the only data and workflow it supports are those the product designers have chosen to include. There are no blank workbooks to fall back on.

I’m wondering if there is something awesome we should be using as a platform for web applications. And if there isn’t shouldn’t we be designing it?

Deadlock: BDD testing a python Tornado app with py.test and Splinter

September 23rd, 2015

I’m only a day into my new asynchronous web development project in Tornado, and I already have a deadlock. And this is remarkable, I think, because all I have done so far is a copy-and-paste the Hello World example app from the Tornado website. An example which “does not use any of Tornado’s asynchronous features”. So, how did I manage to get deadlock?

Well, on top of Hello World, I have a test. A functional test, written with pytest-bdd that uses pytest-splinter makes sure here is some sort of web server up and running. Now pytest-splinter provides a browser object as a py.test fixture. Tests can use this to access the site, so
long as a server is running. Since I’m shopping for py.test plugins, I added pytest-tornado whose http_server fixture starts a tornado HTTP server, serving the application I provide by defining an app fixture.

All set to go? I thought I was. But I got the big crunch. My splinter browser starts but no page loads. What’s up? Turns out the tests aren’t asynchronous. The splinter driver is blocking and the asynchronous web server never gets a look-in to serve the content. Pytest-tornado wants me
to write tests using Tornado’s asynchronous http client, so it and the server can play nicely together.

That’s all fair, but I want to use Splinter on this project so I’m going to want to start my server in a thread. I created an apprunner module that can start my Tornado app in a thread (and return the url to access it). And also provides a function to instruct the server to stop, and exit the

import threading
import tornado
from .application import Application

ioloop = tornado.ioloop.IOLoop.instance()

def start(TESTING_PORT=8192):
    """ Start Application runing on given port. Retrn the listening url"""
    global thread
    app = Application()
    thread = threading.Thread(target=ioloop.start)
    return "http://localhost:{}".format(TESTING_PORT)

def stop():
    """ Ask IOLoop to stop running, and wait for its thread to complete"""
    global thread

Then I define my own fixture to start and stop the server using pytest

def app():
    yield start()

And now I can write lovely clean BDD steps using splinter browser and my
own app running in a thread, like this:

@when("I visit the site")
def visit_site(browser, app):

What do chameleons eat?

June 28th, 2015

When this question comes up in a children’s book these days, what do the characters do?


Zoe (and Beans) rushes inside to Google it.

Which brings me to Mr Christmas and his flying teapot:20150627_073030

All you do is refuel every thousand miles, and you can fly up to three times the speed of sound!

Nic was sceptical.   Google couldn’t give the answer, but Wolfram Alpha did.

At three times the speed of sound, he’s going to have to refuel (with teabags) every 26 minutes and three seconds. For a trip round the world’s circumference that’s nearly eleven hours travel not counting time taken to deliver presents. On top of that he has to stop at least every half hour to brew up. Now add in loo breaks… he’s not going to make it.

Exaile as a DJ workstation

May 6th, 2015

Turning This…

Exaile out the box

Exaile, out of the box, is a smart looking multi-platform music player. The Exaile website lists “Simple interface” as its number-one feature, and mentions that it has lots of plugins. Exaile should be the first choice for Swing-dance DJs, and anyone else who wants to build play-lists on the fly.

Into This…

Screenshot from 2015-04-01 21:15:04

Exaile does stuff other music players haven’t even heard of. To get the most out of it you have to enable and configure some of its plugins. Here’s how I set up mine:
Read the rest of this entry »

Loss of language

May 30th, 2014

brang01 brang02brang03 brang04 brang05 brang06

Silly World

May 18th, 2014



May 11th, 2014


Re-thinking dates

April 6th, 2014

Thinking about how to tackle an up-coming open-data project, I found the impressive RethinkDB. Rather than convert the data on-import into a relational database, I want to load the raw data into a NoSql database, and then map to a converted table with links back to the raw source-rows. RethinkDB imports csv out of the box. And it’s ReQL query language is expressive enough even to parse arse-about-face ‘dd/mm/yyyy’ British dates (though that turns out to be a bit of a mouthful):

  newDate: r.time( 

Apache upgrade

January 1st, 2014

We use the Ubuntu biannual releases as reminder to do backups. Backup; upgrade; work through the bugs. Just upgraded to 13.10 Saucy Salamander without noticing much change. Played with Unity for an hour or so, to see if it had become amazing; then switched back to Xmonad, and back to work on my website project.

Ah! Local websites not working. I just got the default page for my local apache instance.

First off I tried to re-enable the site with ‘a2ensite’, which complained that my config file didn’t exist. But it was looking for the file with ‘.conf’ extension. Easily fixed.

Next I was told permission to access the site was denied by the server configuration. At this point I got derailed. I thought there was something wrong with my virtual host setup. Lots of websites told me I had to have a ‘NameVirtualHost’ directive. So I added one. And was then told by apache that this had no effect and would be removed in future.

Although this was a change, the conf files had been upgraded properly, as you’d hope. Problem was, of course, with my virtual host files. I have worked round the permission error by adding

<Directory "">;
Options  All
AllowOverride All
Require all granted