I do a lot of work with CSV files, and up to now I’ve been mostly doing clumsy stuff with standard text tools like “awk -F,” and the like. Here’s a roundup of some Unix command line tools for working with CSV files, lifted from One Thing Well
- xsv: slice, split, summary stats, etc. also supports creating a quick index to accelerate things. Seems pretty solid. Rust program, which is awfully weird, but he provides a Linux binary.
- csvkit: suite of programs for manipulating CSV. Includes import/export to JSON, Postgres, Excel, etc. Also has a SQL query via in-memory sqlite3. Python program.
- Miller: “like sed, awk, cut, join, and sort for name-indexed data such as CSV”. More limited queries than xsv or csvkit. But it embraces more input file types, including CSV files that change schema and arbitrary keyword/value data. Seems perfect for the “long list of JSON object” files I often work with. (Although no JSON support?!)
- csvfix: thin docs, another xsv-style tool. Has more operations than other tools.
- q: SQL queries on CSV files via sqlite. Python program.
- TextQL: SQL queries via sqlite. Go program.
- fsql: SQL queries via some Perl SQL implementation. Supports JSON and YAML.
I just tried xsv and it’s quite nice, may be the closest to what I need day to day. I’ve used q before and remember it working nicely, albeit a bit slowly. Arbitrary SQL is appealing.