Websites, Hosting and Friends

Monday, December 1, 2014

MySQL 5.7 and Wordpress problem

If you upgrade to MySQL 5.7, you may encounter bugs with legacy software. Wordpress, which I also consider some kind of legacy software, does not handle this very well with its default settings.

You may encounter the "Submit for review" bug where you cannot add new posts. It may be related to permissions, auto_increment and other stuff, but here is another case: bad date formats and invalid data altogether.

In MySQL <= 5.6, by default, invalid values are coalesced into valid ones when needed. For example, attempting to set a field NULL on a non-null string will result in empty string. Starting with MySQL 5.7, this is not permitted.

Hence, if you want to upgrade to 5.7 and use all the goodies, you should consider putting it in a more compatible mode, adding this to your /etc/my.cnf:

[mysqld]
# Default:
# sql_mode = ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ENGINE_SUBSTITUTION
sql_mode = ALLOW_INVALID_DATES,NO_ENGINE_SUBSTITUTION

See official documentation for complete information

Wednesday, November 5, 2014

Starting a text editor from inside a Virtual Machine

Using Vagrant, Docker or other virtual development environment is becoming quite popular. However, a drawback of this is that you cannot start a visual text editor on your main machine because heh, the files are not there. Purist may tell you that you should be using Vim or Emacs, but I like Sublime, I want to keep using it.

A simple trick I came up with it to call my text editor by SSH using path translation. Of course, you will need to translate the paths and otherwise adapt it to your needs, but this should get you started:

In this example, /media/data/projects on the Guest exists as ~/projects on the Host and ~ on the Host exists as /media/home on the Guest.

Make sure you have ~/.ssh/authorized_keys and ForwardAgent properly setup to avoid password prompts each time.

You could also use this trick to open a webpage by using the “open” program on Mac. Equivalent exists on most Linux distros.

Wednesday, October 8, 2014

Creating a static private network on VMWare Fusion with Ubuntu

This tutorial is using VMWare Fusion 7, Ubuntu 14.04.1 server and OSX 10.9

The goal here is to create a private network shared with selected VMs and the host, while offering NATing to connect to the Internet. VMWare offers some documentation, which works great with DHCP, but I needed to specify everything static for custom needs.

Creating a private network

Go to the Network tab of general settings (⌘,)
Unlock the screen by clicking on the lock.
Add a custom network by clicking on the +.
Make sure all options are checked (see screenshot).
Specify a subnet IP, I will be using 192.168.200.0.
Activating the DHCP here is needed for the host to connect to it, even though our VMs will be using static IPs.

Configure the VM’s network adapter

Make sure your VM is powered off.
Go to the VM’s settings (⌘E)
Click on Network Adapter.
Select your newly created network (for me it was vmnet2).

Configure the OS’s network adapter

Edit the network interfaces :
$ sudo nano /etc/network/interfaces

# The loopback
network interface
auto lo
iface lo inet loopback

# The primary network interface
auto eth0
iface eth0 inet static
address 192.168.200.100
netmask 255.255.255.0
gateway 192.168.200.2
dns-nameservers 192.168.200.2
Reboot
Check to see if Internet works :
ping google.com
Check to see if you can see your host :
ping 192.168.200.1
Try SSHing to your VM from your host :ssh 192.168.200.100

You can add more VMs to this private network, just remember to change the IP from 192.168.200.100 to something else from 192.168.200.3 to 192.168.200.253.

Friday, June 20, 2014

Adding newlines in a SQL mysqldump to split extended inserts

The official mysqldump supports more or less two output styles: separate INSERTs (one insert statement per row) or extended INSERTs (one insert per table). Extended INSERTs are much faster, but MySQL write them all in one line, the result being a SQL very hard to read. Can we get the best of both worlds ?

Separate INSERTs

INSERT INTO mytable (id) VALUES (1);

INSERT INTO mytable (id) VALUES (2);

Extended INSERTs

INSERT INTO mytable (id) VALUES (1),(2);

New-And-Improved™ INSERTs

INSERT INTO mytable (id) VALUES

(1),

(2);

Current solutions

Using sed

mysqldump --extended-insert | sed 's/),(/),\n(/g'

Only problem is, lines will be split, even in the middle of strings, altering your data.

Using net_buffer_length

mysqldump --extended-insert --net_buffer_length=5000

mysqldump will make sure lines are not longer than 5000 (or whatever), starting a new INSERT when needed. The problem is that the behaviour is kinda random, diffs are hard to analyze and it may break your data if you are storing columns longer than this.

Writing a parser

This question has been often asked without a proper reply, so I decided to write a simple parser. Precisely, we need to check for quotes, parenthesis, and escape characters.

I first wrote it in PHP:

But then I realized it was too slow, so I rewrote it in C, using strcspn to find string occurence:

The only flaw that I can think of is that the parser will fail if the 10001st character of a line is an escaped quote, it will see it as an unescaped quote.

Happy dumping !

Monday, June 16, 2014

Using PHP’s mb_detect_encoding to cleanup your data

Ever heard of iso-8859-1 ? Yeah… that nightmare… With it, my name ends up more often than not… SÃ©bastien. The computers gurus came up one day with UTF-8 and all our problems should have been solved; one encoding to rule them all.

Sweet, let’s all switch to UTF-8 ! Oh wait… legacy projects… PHP internal encoding is still not UTF-8 and functions like strlen() are still not able to properly process multi-bytes strings. It is being said that UTF-8 should land in PHP 6, but in the mean time, we still have to do something.

I am currently working on a big project with a lot of spaghetti-legacy code with a lot of entry points to the database. Almost all the data is stored in one big table, but not encoded uniformly. When I started working on it, tables had fields with a combinaison of ascii_bin, utf8_general_ci, latin_general_ci and latin_swedish_ci … and we are in Canada ! In all those fields, data was stored with absolutely no guaranty of its encoding. Data was retrieved and passed through a series of UTF-8 encode/decode and stuff like this:

if (strpos($string, 'Ã©') !== false) {
$string = utf8_decode($string);
}

I eventually managed to change every field, change all database connections and remove and traces of encode/decode. However, I still had the problem of having data not encoded properly in some rows/columns. You may or may not be familiar with mb_detect_encoding, here is a very simple trick:

Once you have detected the encoding, use iconv to convert it. This crunched through my 1GB database in no time and I was then sure that everything was in UTF-8.

Of course, this is an example with a database, it can work with any data. This script could also be faster if all updates where done at the same time for each row.

Please, save yourself some trouble, make sure all user content is in UTF-8.

Pages