Friday, June 20, 2014

Adding newlines in a SQL mysqldump to split extended inserts

The official mysqldump supports more or less two output styles: separate INSERTs (one insert statement per row) or extended INSERTs (one insert per table). Extended INSERTs are much faster, but MySQL write them all in one line, the result being a SQL very hard to read. Can we get the best of both worlds ?

Separate INSERTs

INSERT INTO mytable (id) VALUES (1);
INSERT INTO mytable (id) VALUES (2);

Extended INSERTs

INSERT INTO mytable (id) VALUES (1),(2);

New-And-Improved INSERTs


Current solutions

Using sed

mysqldump --extended-insert | sed 's/),(/),\n(/g'

Only problem is, lines will be split, even in the middle of strings, altering your data.

Using net_buffer_length

mysqldump --extended-insert --net_buffer_length=5000

mysqldump will make sure lines are not longer than 5000 (or whatever), starting a new INSERT when needed. The problem is that the behaviour is kinda random, diffs are hard to analyze and it may break your data if you are storing columns longer than this.

Writing a parser

This question has been often asked without a proper reply, so I decided to write a simple parser. Precisely, we need to check for quotes, parenthesis, and escape characters.

I first wrote it in PHP:

But then I realized it was too slow, so I rewrote it in C, using strcspn to find string occurence:

The only flaw that I can think of is that the parser will fail if the 10001st character of a line is an escaped quote, it will see it as an unescaped quote.

Happy dumping !