Thinking about how to tackle an up-coming open-data project, I found the impressive RethinkDB. Rather than convert the data on-import into a relational database, I want to load the raw data into a NoSql database, and then map to a converted table with links back to the raw source-rows. RethinkDB imports csv out of the box. And it’s ReQL query language is expressive enough even to parse arse-about-face ‘dd/mm/yyyy’ British dates (though that turns out to be a bit of a mouthful):
r.table('raw').update({ newDate: r.time( r.row('DATE').split('/').nth(2).coerceTo('NUMBER'), r.row('DATE').split('/').nth(1).coerceTo('NUMBER'), r.row('DATE').split('/').nth(0).coerceTo('NUMBER'), 'Z' ) }).run(...)