Pages

Friday, June 20, 2014

Adding newlines in a SQL mysqldump to split extended inserts

The official mysqldump supports more or less two output styles: separate INSERTs (one insert statement per row) or extended INSERTs (one insert per table). Extended INSERTs are much faster, but MySQL write them all in one line, the result being a SQL very hard to read. Can we get the best of both worlds ?

Separate INSERTs

INSERT INTO mytable (id) VALUES (1);
INSERT INTO mytable (id) VALUES (2);

Extended INSERTs

INSERT INTO mytable (id) VALUES (1),(2);

New-And-Improved INSERTs

INSERT INTO mytable (id) VALUES
(1),
(2);

Current solutions

Using sed

mysqldump --extended-insert | sed 's/),(/),\n(/g'

Only problem is, lines will be split, even in the middle of strings, altering your data.

Using net_buffer_length

mysqldump --extended-insert --net_buffer_length=5000

mysqldump will make sure lines are not longer than 5000 (or whatever), starting a new INSERT when needed. The problem is that the behaviour is kinda random, diffs are hard to analyze and it may break your data if you are storing columns longer than this.

Writing a parser

This question has been often asked without a proper reply, so I decided to write a simple parser. Precisely, we need to check for quotes, parenthesis, and escape characters.

I first wrote it in PHP:



But then I realized it was too slow, so I rewrote it in C, using strcspn to find string occurence:


The only flaw that I can think of is that the parser will fail if the 10001st character of a line is an escaped quote, it will see it as an unescaped quote.

Happy dumping !

Monday, June 16, 2014

Using PHP’s mb_detect_encoding to cleanup your data

Ever heard of iso-8859-1 ? Yeah… that nightmare… With it, my name ends up more often than not… Sébastien. The computers gurus came up one day with UTF-8 and all our problems should have been solved; one encoding to rule them all.

Sweet, let’s all switch to UTF-8 ! Oh wait… legacy projects… PHP internal encoding is still not UTF-8 and functions like strlen() are still not able to properly process multi-bytes strings. It is being said that UTF-8 should land in PHP 6, but in the mean time, we still have to do something.

I am currently working on a big project with a lot of spaghetti-legacy code with a lot of entry points to the database. Almost all the data is stored in one big table, but not encoded uniformly. When I started working on it, tables had fields with a combinaison of  ascii_bin, utf8_general_ci, latin_general_ci and latin_swedish_ci … and we are in Canada ! In all those fields, data was stored with absolutely no guaranty of its encoding. Data was retrieved and passed through a series of UTF-8 encode/decode and stuff like this:

if (strpos($string, 'é') !== false) {
   $string = utf8_decode($string);
}

I eventually managed to change every field, change all database connections and remove and traces of encode/decode. However, I still had the problem of having data not encoded properly in some rows/columns. You may or may not be familiar with mb_detect_encoding, here is a very simple trick:

Once you have detected the encoding, use iconv to convert it. This crunched through my 1GB database in no time and I was then sure that everything was in UTF-8.

Of course, this is an example with a database, it can work with any data. This script could also be faster if all updates where done at the same time for each row.

Please, save yourself some trouble, make sure all user content is in UTF-8.

Friday, May 23, 2014

Simple custom CSS checkboxes with Font Awesome

Browsers make it notoriously hard to modify the default form elements like dropdowns and checkboxes. For ages, the only way to add a custom style was to use images and CSS Checkbox gives a very good example of this. However, using images is more of a hack than anything else, developers usually prefer to avoid them for good reason: hard to modify, adds to the loading time, flicker when hovering unless you preload or you use sprites, and resolution issues when scaling.

Then came the Javascript solutions. I personally like Chosen for custom dropdowns. These solutions are usually well supported cross-browser and are highly customizable and reliable, but they have the downside of being… Javascript. This means additional libraries, loading time, etc.

The holy grail is to use native CSS: you can customize it as you like, it can be built on top of your site’s stylesheet, inheriting the colours, the fonts, etc. There are a lot of examples out there that will show you working solutions, but they are usually complex. Here is a quick tutorial:

The basics

The concept is to define a custom element that will have a different style based on the state of the input element. No javascript and barely any extra markup. We will hide the original input, but it will continue to work as intended.

Base HTML

<label>
  <input type="checkbox">
  <span class="icon"></span>
  My label
</label>

Base CSS

input {
  display: none;
}
.icon {
  visibility: hidden;
}
input:checked + .icon {
  visibility: visible;
}

This way, the input is always hidden and the .icon is hidden only when the input is not checked.

** The :checked selector is not available for IE8 and lower.

Working demo

For a basic demonstration: http://codepen.io/lavoiesl/pen/rjcIx. The CSS is divided in section to clearly see what is essential. This demo uses Font Awesome to have a nice and clean checkmark.

For more complex examples, see http://cssdeck.com/labs/css-checkbox-styles.

Wednesday, April 9, 2014

Secure your SSL/TLS server

Heartbleed

Recently the Heartbleed bug came to light. It is a bug in the OpenSSL library that causes information to leak from the server. It is an undetectable backdoor that allows to gain the private key of your server. Let’s just say it is VERY important to fix it. Most distros have been very quick to propagate the OpenSSL update, so running your favorite update manager should fix it in no time.

To verify if you have protected, run this command and check for built on to be greater or equal to April 7th, 2014:
$ openssl version -a

OpenSSL 1.0.1e 11 Feb 2013
built on: Mon Apr  7 20:33:19 UTC 2014
platform: debian-amd64

Disable weak ciphers

The way SSL/TLS works is that the client and the server must agree on a cipher to use for encryption. If you were to attack a server, you would obviously use the least secure cipher. To protect against this, simply disable ciphers to be known as weak or those which flaws have been discovered.

I am using this configuration for Apache:

SSLCipherSuite ALL:!ADH:!AECDH:RC4+RSA:+HIGH:+MEDIUM:!LOW:!SSLv2:!EXPORT

For Nginx, see their configuration reference. Since 1.0.5, they are using a sensible default. Otherwise, you can use the same as above.

Do not use a too weak or too strong private key

The private key must never be discovered. Otherwise, anyone could decrypt the content and could perpetrate a MITM attack. If the private key is too weak, it could eventually be guessed given enough data. However, SSL/TLS handshakes are very CPU intensive for both the server and the client. Using a key too long will considerably slow down your website. In most cases, 2048 is perfect.

Test your own server

SSL Labs provides a free test suite that will test your ciphers and for known attacks including BEAST and Heartbleed. This is a must: https://www.ssllabs.com/ssltest/

Further reading

I am not a security expert, I simply happen to have done hosting for quite a time. I suggest you do not take my word blindly and go check this very pertinent paper from SSL Labs.


Thursday, January 30, 2014

Is Bitcoin mining profitable ?

I thought I would go for a change and talk about some Bitcoin mining. There have been some big hype around it. Some large companies are making top dollar with it. One of them, CoinTerra, has gained a very serious reputation with a team of highly skilled people.

Their latest technology, the TerraMiner IV, offers an astonishing 2 TH/s for about 6k USD. If you go at the Bitcoin wisdom calculator and select it using default settings, it shows potential profit of about 10,000 USD in about 8 months, Awesome !

The catch ?

Why are they selling money-making machines ? Why don’t they run them themselves ? Well, the bulk of the profit is made in the first few weeks. Due to the increasing difficulty in mining, hardware quickly becomes less effective so they trick into pre-order 2-3 months before you will have your machine. That is pretty insane…

If you go at the the calculator and input real, plausible shipping dates, you will see you cannot make money.

The options

I did every calculations with 25% difficulty increase and 0.05 USD/kWh. This is optimistic, it is usually considered that the increase is more around 30-35 these days. This also excludes any taxes, shipping fees, customs, etc.

I selected those 4 because it covers a large variety of hardware people often point as the most cost-effective or appealing. I encourage you to try your own.

Profit is calculated until 2015-01-01 or end of lease.

Company Gear GHs Price Start Profit
CoinTerra TerraMiner IV 2,000 6,000 $ 2014-05-15 -2,500 $
ButterflyLabs Monarch 600 2,200 $ 2014-03-01 -1,300 $
Black Arrow Rent-Some-Minions 4 months 1000 2,600 $ 2014-05-01 -800 $
Black Arrow Rent-Some-Minions 6 months 100,000 260,000 $ 2014-05-01 -70,000 $

For the infamous Monarch, the date is highly speculative due to their poor reputation. For it to be profitable, it would need to ship, at my door, within 12 days from today. When is the last time you received a ~40 lbs package by mail in less than 2 weeks ? Always excluding all taxes and shipping fees.

Need I say more ?

A huge hardware seller, TerraHash, has closed, upsetting a lot of customers and ButterflyLabs keeps getting bad press, I do not think they value their customers much.

The increase in difficulty

Not only is the difficulty increasing too quickly for casual miners to follow, the increase is itself increasing ! You have to factor the reduction of days between each change.

I compiled the data from bitcoinwisdom as % difficulty increase per day . Moore’s law dictates a linear increase in computing technology of 100% per 2 years or 0.14% per day, but we have here an increase of 2-6% per day and it seems like it is increasing. It is most probably due to the increase in miners. The computing power for mining has not followed Moore’s law so far, but this because new technologies were used, not discovered


You can’t win, the amount of power required to follow is way more than the normal increase in technology.

Less expensive options, for fun

So you just want to have some fun ?

One of the most suggested hardware for beginners is the ASICMiner Block Erupter USB 300MH/s Sapphire. It is advertised you can get one for as low as 35 USD. Pretty cheap ? Well even at that price tag, you will never reclaim more than 2-3 dollars. More expensive options are similar.

Conclusion

If you are able to build awesome hardware or buy a lot and use it quickly, you can make 2-3 times your investment, but in probably 6 months, it will be worthless. It has been profitable in the past, but now it is just a crazy technological race. Maybe it will calm down and we will be able to mine a bit some time. In the mean time, grab yourself a starter kit and join the party !