Pages

Showing posts with label sysadmin. Show all posts
Showing posts with label sysadmin. Show all posts

Monday, December 1, 2014

MySQL 5.7 and Wordpress problem

If you upgrade to MySQL 5.7, you may encounter bugs with legacy software. Wordpress, which I also consider some kind of legacy software, does not handle this very well with its default settings.

You may encounter the "Submit for review" bug where you cannot add new posts. It may be related to permissions, auto_increment and other stuff, but here is another case: bad date formats and invalid data altogether.

In MySQL <= 5.6, by default, invalid values are coalesced into valid ones when needed. For example, attempting to set a field NULL on a non-null string will result in empty string. Starting with MySQL 5.7, this is not permitted.

Hence, if you want to upgrade to 5.7 and use all the goodies, you should consider putting it in a more compatible mode, adding this to your /etc/my.cnf:

[mysqld]
# Default:
# sql_mode = ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ENGINE_SUBSTITUTION 
sql_mode = ALLOW_INVALID_DATES,NO_ENGINE_SUBSTITUTION

See official documentation for complete information

Friday, June 20, 2014

Adding newlines in a SQL mysqldump to split extended inserts

The official mysqldump supports more or less two output styles: separate INSERTs (one insert statement per row) or extended INSERTs (one insert per table). Extended INSERTs are much faster, but MySQL write them all in one line, the result being a SQL very hard to read. Can we get the best of both worlds ?

Separate INSERTs

INSERT INTO mytable (id) VALUES (1);
INSERT INTO mytable (id) VALUES (2);

Extended INSERTs

INSERT INTO mytable (id) VALUES (1),(2);

New-And-Improved INSERTs

INSERT INTO mytable (id) VALUES
(1),
(2);

Current solutions

Using sed

mysqldump --extended-insert | sed 's/),(/),\n(/g'

Only problem is, lines will be split, even in the middle of strings, altering your data.

Using net_buffer_length

mysqldump --extended-insert --net_buffer_length=5000

mysqldump will make sure lines are not longer than 5000 (or whatever), starting a new INSERT when needed. The problem is that the behaviour is kinda random, diffs are hard to analyze and it may break your data if you are storing columns longer than this.

Writing a parser

This question has been often asked without a proper reply, so I decided to write a simple parser. Precisely, we need to check for quotes, parenthesis, and escape characters.

I first wrote it in PHP:



But then I realized it was too slow, so I rewrote it in C, using strcspn to find string occurence:


The only flaw that I can think of is that the parser will fail if the 10001st character of a line is an escaped quote, it will see it as an unescaped quote.

Happy dumping !

Friday, December 6, 2013

GlusterFS performance on different frameworks

A couple months ago, I did a comparison of different distributed filesystems. It came out that GlusterFS was the easiest and most feature full, but it was slow. Since I would really like to use it, I decided to give another chance. Instead of doing raw benchmarks using sysbench, I decided to stress test a basic installation of the three PHP frameworks/CMS I use the most using siege.

My test environment:

  • MacBook Pro (Late 2003, Retina, i7 2.66 Ghz)
  • PCIe-based Flash Storage
  • 2-4 virtuals machines using VMware Fusion 4, each with 2 GB of RAM.
  • Ubuntu 13.10 server edition with PHP 5.5 and OPCache enabled
  • GlusterFS running on all VMs with a volume in replica mode
  • The volume was mounted using nodiratime,noatime using GlusterFS native driver (NFS was slower)
The test:
  1. siege -c 20 -r 5 http://localhost/foo  # Cache warming
  2. siege -c 20 -r 100 http://localhost/foo  # Actual test
I then compared the local filesystem (inside the VM) vs the Gluster volume using these setups:
  • 2 nodes, 4 cores per node
  • 2 nodes, 2 cores per node
  • 4 nodes, 2 cores per node
The compared value is the total time to serve 20 x 100 requests in parallel.
All tests were ran 2-3 times while my computer was doing nothing and the results were very consistent.

Symfony Wordpress Drupal Average
2 nodes
4 cores
Local 2.91 s 9.92 s 5.39 s 6.07 s
Gluster 10.84 s 23.94 s 7.81 s 14.20 s
2 nodes
2 cores
Local 5.41 s 19.14 s 9.67 s 11.41 s
Gluster 25.05 s 31.91 s 15.17 s 24.04 s
4 nodes
2 cores
Local 5.57 s 19.6 s 9.79 s 11.65 s
Gluster 30.56 s 35.92 s 18.36 s 28.28 s
Local vs
Gluster
2 nodes, 4 cores 273 % 141 % 45 % 153 %
2 nodes, 2 cores 363 % 67 % 57 % 162 %
4 nodes, 2 cores 449 % 83 % 88 % 206 %
Average 361 % 97 % 63 % 174 %
2 nodes vs
4 nodes
Local 3 % 2 % 1 % 2 %
Gluster 22 % 13 % 21 % 19 %
4 cores vs
2 cores
Local 86 % 93 % 79 % 86 %
Gluster 131 % 33 % 94 % 86 %


Observations:
  1. Red — Wordpress and Drupal have an acceptable loss in performance under Gluster, but Symfony is catastrophic.
  2. Blue — The local tests are slightly slower when using 4 nodes vs 2 nodes. This is normal, my computer had 4 VMs running.
  3. Green — The gluster tests are 20% slower on a 4 node setup because there is more communication between the nodes to keep them all in sync. 20% overhead for double the nodes isn’t that bad.
  4. Purple — The local tests are 85% quicker using 4 cores vs 2 cores. A bit under 100% is normal, there is always some overhead to parallel processing.
  5. Yellow — For the Gluster tests, Symfony and Drupal scale very well with the number of nodes, but Wordpress is stalling, I am not sure why.

I am still not sure why Symfony is so much slower on GlusterFS, but really, I can’t use it in production for the moment because I/O is already the weak point of my infrastructure. I am in the process of looking for a different hosting solution, maybe it will be better then.

Saturday, August 31, 2013

Service management utility for Mac OSX (launchctl helper)

Having dealt with services mostly on Linux, I grew accustomed to type service php restart. On Mac, this is more like:

launchctl unload ~/Library/LaunchAgents/homebrew-php.josegonzalez.php55.plist
launchctl load ~/Library/LaunchAgents/homebrew-php.josegonzalez.php55.plist


Which is ugly, hard to remember and launchctl has no way of listing all available services. Plus, those plist can reside in all those directories:
  • /System/Library/LaunchDaemons
  • /System/Library/LaunchAgent
  • /Library/LaunchDaemons
  • /Library/LaunchAgents
  • ~/Library/LaunchAgents
Those in you home directory generally don’t need sudo, while the others do.

This is why I can up with an utility to manage services. It searches in all directories above for your service, prompts for sudo if it is in a system directory and provide goodies like restart, reload and link.

Usage:

service selfupdate
update from the Gist
service php
searches for a plist containing 'php'
service php load|unload|reload
insert or remove a plist from launchctl
service php start|stop|restart
manage a daemon, but leave it in launchctl  (does not work with Agents)
service php link
If you use Homebrew, which you should, it will link the plist of this Formula into ~/Library/LaunchAgents, reloading if needed. Very useful when upgrading.


Manage all optional services at once

If you have several services running, especially if you are a developer, I also recommend to use a script to start/stop all of them at once when you are not working. They may not be using much resources, but having them running keeps the laptop working and can drain you battery very quickly. 

Saturday, April 13, 2013

LAMP Cluster — Distributed filesystem

This post is part of: Guide to replicated LAMP stack hosting with failover


The core concept of choosing a filesystem for a Web hosting cluster is to eliminate single points of failure, but sometimes it is just not easy like that. A true distributed system will still need to be performant, at least on reads. The problem relies in the fact that the bottleneck if very often the I/O so if your filesystem is not performant, you will end up spending a fortune on scaling, without gaining real performance.

Making priorities

You can’t have everything, so start by making a list of priorities. Different systems will have different needs, but I figured I could afford a possibility of failure as long as the system could be restorable since I would be keeping periodic backups.
  1. Low maintenance
    • It must be possible to read/write from any folder without adding a manifest for each site.
    • The system must be completely autonomous and require no maintenance from a sysadmin. (Conflict management).
  2. Simple / Cheap
    • Must be installed on each Web nodes or a maximum of 2 small/medium extra nodes
    • Must run on Ubuntu, without recompiling the kernel. Kernel modules are acceptable.
  3. Performant
    • Reads less than 50% slower than standard ext3 reads.
    • Writes less than 80% slower than standard ext3 writes.
    • Must be good at handling a lot of small files. Currently, my server hosts 470k files for a total of 6.8 GB. That is an average of 15 KB per file!
  4. Consistency
    Changes must propagate to all servers within 5 seconds.
    • Uploaded files stored in database but not yet synced may generate some errors for a short period if viewed by other users on other servers.
    • Temporary files are only relevant on the local machine so a delay is not a big deal.
    • HTTP Sessions will be sticky at the LodeBalancer level so user specific information will be handled properly.
  5. Must handle ACLs
    • For permissions to be set perfectly, we will be using ACLs.
    • ACLs may not be readable within the Web node, but they must still be enforced.
  6. Durability
    • Must handle filesystem failures — be repairable very quickly.
    • File losses are acceptable in the event of a filesystem failure.
    • Filesystem must continue to function even if a Web node goes offline.
    • No single point of failure. If there is one, if must be isolated on its own machine.

A. Synchronisation

Synchronisation means that there is no filesystem solution, all the files are stored on the local filesystem and synchronisation is made with the other nodes periodically or by watching I/O events.

Cluster synchronisation involving replication between all the nodes is usually very hard. To improve performance and reduce the risk of conflicts, it is often a good idea to elect a replication leader and a backup. If the leader is unavailable, the backup will be used instead. This way, all the nodes will sync with only one.
  • Pros
    • Very fast read/write
    • Very simple to setup
  • Cons
    • May have troubles synchronizing ACLs
    • May generate a lot of I/O
    • Will most likely generate conflicts

Rsync

The typical tool for fast file syncing is rsync. It is highly reliable and a bit of BASH scripting will get you started. However, as the number of files grows, it may become slow. For around a million files, it may easily take over 5 seconds. With our needs, it means it will have to run continuously, which will generate a lot of I/O and thus impact the overall performance.

Csync2

Csync2 is a promising tool that works like rsync, but it keeps file hints in a SQLite database. When a file changes, it flags in the database that the file needs checking. This way, the full sync only needs to check marked files.

Csync2 supports multi-master replication and slaves (receive-only). However, I found while testing that it is not really adapted to a lot of small files changing frequently: it tends to generate a lot of conflicts that need to be attended manually.

It may not be the best solution for Web hosting, but for managing deployment of libraries or similar tasks, it would be awesome.

B. Simple sharing (NFS)

Even simpler than file syncing is plain old sharing. A node is responsible of hosting the files and serves the files directly. Windows uses Samba/CIFS, Mac uses AFP and Linux uses NFS.

NFS is very old, like 1989 old. Even the latest version, NFSv4, came around in 2000. This means it is very stable and very good at what it does.
  • Pros
    • Supports ACLs (NFSv4)
    • Very cheap and simple setup
    • Up to a certain scale, fast read/write
  • Cons
    • Single point of failure
    • Hard to setup proper failover
    • Not scalable

C. Distributed / Replicated

A distributed filesystem may operate at a device, block or inode level. You can think of this a bit like a database cluster. It usually involves journals and is the most advanced solution.

  • Pros
    • Very robust
    • Scalable
  • Cons
    • Writes are often painfully slow
    • Reads can also be slow
    • Often complex to setup

GlusterFS

Gluster runs over Fuse and NFS. Each node can have its own block and the daemon handles the replication transparently, without the needs of a management node. 

Overall, it is very good software, the write performance is decent and it handles failures quite well. There has been a lot of recent work to improve caching, async writes, write-ahead, etc.  However, in my experience, the read performance is disastrous. I really tried tuning it a lot, but I still feel like I haven’t  found the true potential of this. 

Ultimately, I had to let it down for the moment because of a lack of time to tune it more. It has a large community and is widely spread, so I will probably end up giving it another chance.

Lustre

Lustre seems like the Holy Grail of distributed filesystems. From Wikipedia: “At the present time, six of the top 10 and more than 60 of the top 100 supercomputers in the world have Lustre file systems in them.”

It appears to have everything I could dream of: speed, scalability, locks, ACLs, you name it. 

However, I was never able to try it. It requires dedicated machines with various roles: management, data, file servers (API). This means I would need 4-5 additional machines. On top of that, it needs custom kernel modules.

Definitely on my wish-list, but inaccessible for the moment.

DRBD

DRBD is not cluster solution, it does live backup. Usually, it is used to make a full mirror of a server that can be swapped with the master at any moment, should it fail. This is often used to patch solutions where replication is not built-it. Examples of this are NFS or MySQL. There is a way to setup a 3-nodes solution, but it is far from perfect.

Conclusion

In the end, I found that synced solutions were not reliable enough and distributed solutions were too complex so I chose NFS. My plan is to add a DRBD soon to provide a durability layer but a more serious solution will have to wait. If my cluster scales to the point that NFS can’t suffice to the task, this will mean I will have enough clients, enough money and enough reasons to consider a proper solution.


Comparison
Maintenance Complexity Performance Scalability Durability Consistency ACLs
Rsync Low Low Very high Low High Low Yes
Csync2 High Medium Very high Low High Low Yes
NFS None None Medium None None Very high Enforced
GlusterFS None Medium Low High High Very high Yes
Lustre None Very high High Very high Very high Very high Yes
DRBD None Medium n/a 2 or 3 Very high n/a Yes



LAMP Cluster — Choosing an Operating System

This post is part of: Guide to replicated LAMP stack hosting with failover


Beside choosing Linux vs Mac or Windows, the OS should not impact your users, it is mostly a sysadmin choice. Your users, the ones who will be connecting via SSH, will expect binaries to be available without modifying their PATH and common tools like Git or SVN to be already installed, but it does not really matter how it was installed.

The key to be sure that nobody has a hard time making everything work is to do things the most standard and common way possible.

Choose between the most used distributions

This is really important. Choosing a distribution for your laptop or your development server is not the same thing as choosing a production environment. Forget Gentoo and friends, being connected directly to the bare-bone of your system is nice when you are learning or building a world-class new system, but for you own setup, you want something tested by the whole community, something that works. Even if it involves a bit of magic.

A good example of some magic is what Ubuntu does with networking. I admit that since 10.x, I don’t really understand all the cooperation between /etc/resolv.conf, /etc/network/interfaces, dhclient, /etc/init.d/networking and such. At some point, they all seem to redefine each other and in a particular release, a script will start to throw some warnings, but it works. Never has the network failed me on Ubuntu, which is something quite relevant when you need to access a remote machine.

Edge vs stable

I use the infamous expression “Debian stable” when I want to refer to something configured in such a conservative way that will work, but at the cost of using the technology of 2005. I know, Debian stable is not that bad, but I tend to have some faith in the testing procedures of the maintainers.

My rule of thumb is: when a stable version is available, use it. If I want the features of a less stable version, I make sure it lived at least a month and do a quick search on its stability and trustworthiness.

An example of this is that I don’t restrict myself to Ubuntu Long Term Support editions. I am happy to use it and will usually keep using it for a bit longer that the other releases but it is only every 2 years, sometimes this is not enough. Moreover, I tend to upgrade or reinstall every year or two, so I don’t hit the end-of-support limit.

Here are some of the top distributions, ordered by edginess:

Versions available
Distribution Apache PHP MySQL Varnish
Ubuntu 12.10 2.2.22 5.4.6 5.5.29 3.0.2
Debian wheezy 2.2.22 5.4.4 5.5.28 3.0.2
OpenSuse 12.3 2.2.22 5.3.17 5.5.30 3.0.3
Ubuntu 12.04 LTS 2.2.22 5.3.10 5.5.29 3.0.2
Debian squeeze (stable) 2.2.16 5.3.3 5.1.66 2.1.3
CentOS 6 2.2.15 5.3.3 5.1.66 manual

PHP 5.3.3, our main concern, was released in July 2010 and important fixes have occurred since, so this is out of the question.

Varnish 2 is very different from Varnish 3, so this needs to be looked at.

It is usually possible to install newer versions, but this implies relying on third-party packaging, multiple installed binaries or even compiling yourself.

Forget benchmarks

Linux is very low-print system; a common mistake is trying to over optimize it. What will eat your CPU is PHP and MySQL, what will eat your memory is MySQL and the number of connections you can handle is mostly dependant on your webserver. If you machine is spending too much time in kernel space, it is probably because you need a bigger one. Also, don’t forget to benchmark your disks. See my post on choosing hardware.

Conclusion

Your choice must be focused on stability, ease-of-use and community size. 

Personally, I prefer Debian-based solutions. Aptitude works very well and I just happen to have been more in contact with it.

All and all, I went for Ubuntu 12.10, PHP 5.4 offers really good performance improvement over 5.3 and it has been long enough since this release of Ubuntu happened.

Thursday, March 21, 2013

LAMP Cluster — Comparison of different hosting platforms

This post is part of: Guide to replicated LAMP stack hosting with failover

The first step of building a hosting service is to choose your provider. Each system have its strengths and weaknesses and you will have to choose according to your needs and your proficiency at using the provided tools. This step is crucial and if possible, you should spend some time testing and benchmarking each of them to see if it matches your expectations.

DISCLAIMER: Below, I something mention prices; they are meant as an indication rather than a real comparison. Comparing different services can sometimes get very tricky as they don’t include the same things and their performance is difficult to compare effectively.

Platform as a service

This is basically what you want to build. Another company will provide you some services in the cloud and you will configure your applications to use them. Here, all the scalability and redundancy is done for you and you will only pay for what you use.

The thing is though, you have absolutely no control on the Operating System and you are bound to the services the platform provides you.

This guide is all about building it, so these are more here as a comparison basis than actual alternatives.

Google App Engine


Typically, GAE runs Java and Python. It provides PHP through an emulation layer and some SQL, but there is no MySQL. Some quick research and I found people discussing it: Wordpress and Drupal.  Short answer: no MySQL, can’t be done. 

However, if you want to host some Django on it, please do!

Windows Azure

This does almost everything you need, they have a really wide array of services. If you need something else, you can deploy a custom VM and do what you want. This is perfect for prototyping.

However, fiddling with their price calculator, I found that it can become quickly expensive, they charge for almost everything. Yes they have PHP/MySQL support, but the idea is to have some scale economy. I did an estimate: 
  • 4 small Web and Worker instances
  • 1 small Linux Virtual Machine (for testing, management, etc.)
  • 10 x 100 MB Databases
  • 100 GB bandwith
  • 100 GB storage
This is rather conservative; you will probably need way more than 4 small instances. Only thing is, it is almost 500$ per month. Hardly a bargain.

Heroku

Heroku is mostly known for its Ruby support, but it has a very wide array of add-ons; it even has MySQL support through ClearDB. Their prices tend to be lower than Azure and it is closer to open source initiatives, which I tend to use a lot. I have never actually used Heroku, so I can’t really approximate what I would need, but the sheer amount of possible configurations is incredible.

A big plus is also the deployment procedure that is backed by Git. It involves describing a project with a configuration file and simply pushing to Heroku. There is quite a lot of examples out there on how to deploy Wordpress, Drupal, etc. If I wasn’t trying to build an infrastructure myself, I would definitely consider it strongly.

Virtual machines on physical hardware

If you plan on the long-term, investing in hardware might be a good idea. Hardware is way less expensive and providers like iWeb tend to give a lot of bandwidth (if not unlimited). Upgrades are usually way less expensive, but they involve downtime and risk.

You still need some virtualization

For ease of management, you will almost certainly want a virtualization solution: this way you can create, backup, scale and migrate virtual machines in only a couple steps. In the most popular solutions, OpenStack is free and open source while VMware has a very good reputation with vCenter. The downside is that it means you have yet another thing to configure.

You still need multiple servers

If you go with physical machines, you will need some RAID and everything, but that all means downtime when something breaks. To reduce the risks, you will still need a second or third machine to provide some backup. Really, managing physical hardware is an art all by itself; if you wish to provide some good quality Web hosting, you will need someone specialized in that matter.

Why not let a third party do all this for you ?

Virtual private servers (VPS)

We want full control over the system, scalability, virtualization management, etc. So it all comes to a nice in-the-middle solution: virtual machines provided by a third-party. Here, all the hard stuff is already done, you will most certainly have multiple locations in the world to choose from and you can usually trust the hardware to not fail completely. Sometimes there is downtime, but losing data is extremely rare.

Below are multiple choices I know of, but I suggest you try FindTheBest for a more thorough comparison.

For the setup I will be talking about in another post, we need this setup:
  • 1 small/medium management node
  • 3 medium/large working nodes
  • 2 small/medium utility nodes
  • 1 small dev node

Amazon Web Services (AWS)

I have been a client of Amazon EC2 for more than two years. They offer a wide array of services:
  • Virtual machines (EC2)
  • DNS servvices (Route53)
  • Load balancing + Auto scaling
  • Dedicated databases with automatic fallback (RDS)
  • High performance I/O (EBS)
  • Low performance, high durability I/O (S3)
  • CDN (CloudFront)
  • Highly configurable firewall
  • And much much more
A lot of websites are running on Amazon services. The problem is, it is expensive and it is built for computing, not Web hosting. This means it is perfect for a rather short burst of computing like crunching data but it becomes expensive if it is online all the time. Also, in the concept of pay-per-use, everything you do will end up costing you something, which can built up rather quickly. Over the last two years, the performance has been going downhill, but recently, they have been lowering their prices so it might be getting a better alternative.

Here is an example using their calculator. (333 $/month)

Google Compute Engine (GCE)

Google also has a service that is very similar to Amazon EC2, but with less options and it seems to have a better performance/price ratio. I am not familiar with their services, but I thought it was worth mentioning.

Windows Azure

As mentioned above, Azure has virtual machines as well, but you can connect them with the rest of the platform so it can be a nice hybrid solution.

However, it is still pretty pricy. For our setup, 3 medium, 2 small and 2 x-small, we are already at 478 $/month — and no bandwidth or storage is included yet.
Linode exists since 2003, but I only discovered it last year. They are growing rapidly, new features are coming in and the amount of included things is going up and up every month. What I like about Linode is that I feel like I am in total control of my machines.
  • Multiple availability zones (like most other providers)
  • Very easing permission management (you can give read-only access to your clients)
  • Very powerful admin panel.
  • Powerful recovery tools
    • Remote connection via SSH or in-browser to the host so you can rescue your VM while it boots
    • Possibility to switch kernels and reboot in rescue mode
    • Possibility to reset root password from admin panel
    • Possibility to rebuild from a backup or a fresh install without destroying the VM.
  • Unexpensive Load Balancers
  • Support for StackScripts, a way to run scripts while deploying a new VM
  • High class (free) support. From my experience, replies typically take 1-5 minutes!
  • Unlimited DNS zones
  • Very high transfer caps
  • Unmetered disk operations
  • Unmetered Gigabit in-zone data transfer
And they are on a rampage. They recently upgraded their network and all VM now have 8 cores. You wonder how it is possible to have a 8 cores on a small instance, but it is actually the priority on those CPU that scales, not their power. In other words, the higher your package, the more reliable its performance is.

Seriously, the more I work with Linode, the more they feel right, it just feels like they know their thing and do the best they can to give you everything they can.

Have a try, you can use a small instance for a month. Here is my referral link. I get 20$ if you buy something.

For a setup similar to the AWS detailed above, it boils down to around 220 $/month, but you have to build the database, memcache, CDN yourself.

Performance evaluation

Whatever provider you choose, be sure to test its performance. This is especially true for CPU and disks. The number of cores and their clock speed means little to nothing. The best tool I found was SysBench. For disk operations specifically, you can choose various profiles like read-only, sequential read/write,  random read/write or specify a ratio of read/write.

When benchmarking for websites, you typically want a lot of small files (10kB - 1MB) that will be read sequentially and some big files (1MB-5MB) with a read/write ratio of about 95%.

Maybe more on that later.


Tuesday, March 12, 2013

Guide to replicated LAMP stack hosting with failover

Motivations on building on your own hosting

For my company, I started to think about offering a hosting service. Cheap solutions exist like 1and1.com and iweb.com where you can rent a VPS or have some shared hosting, but it implies some setup for each client, managing credentials, analyzing the needs of everyone, etc. What about upgrades? What about failovers? What about a custom services like Lucene or Rails that could be running?

Defining the needs

Before trying to find solutions, let’s try to find the correct questions. What are we trying to accomplish exactly? Those are generic needs; I will provide my answers, but at some point, I had to take some shortcuts to be able to complete it and meet some profitability requirements. If you have different priorities or your are working on a different scale, your answer will most probably differ at some point.

Scalable

Upgrades must be possible without any downtime. It must be easy so we can react in a matter of minutes to an emergency load or a crash. Also, we will spend quite some time configuring everything so we would like to keep it even if we triple our load. Ideally, we want to be able to scale both horizontally (adding more machines) and vertically (upgrading the machines). 

Highly Available

This means fail-overs, redundancy and stability. The key is load balancing, but we want to remove single points of failures as much as possible. If there is any, we want to have complete trust in them and they should do the least possible be as isolated as possible.

Secure

The purpose of this guide is not to build a banking system, but we still want to be secure. We want some strong password policies, firewalls and most importantly: backups. The whole system should be re-installable in an hour or two if something major happens and clients’ files and databases should be revertable hourly, daily, weekly or something around those lines.

Compatible and flexible

We will have almost no control over the applications, but we still want to standardize some key elements. For example, having 2 database systems could be acceptable, but running 2 different web servers is a bit over zealous. Some clients may also have some particular needs like a search engine or cronjobs, we need to be ready.

Performant

Between scalability and availability, we often achieve performance but only if each of the channels are independent. In general, websites do much more reads than writes, both in the database and on the filesystem. However, because we want to be compatible and application agnostic, we won’t be able to resort to techniques like declaring a folder read-only or having a slave database. We may not be able to constrain application, but we can reward those who are well configured: we can provide some opt-in features like reverse proxies, shared cache, temporary folders, etc.

Profitable

And the last but not the least, we like profits, so the whole system must have a predictable cost that can be forwarded to the appropriate client. Scalability plays a big role here because we can scale just as much as we need, when we need.

Overview

As I am starting to write this guide, the system is already operational and in production. I already stumbled across many problems, but I am sure some others are still to come. Here is an overview of all the parts I want to address, links will become available as they are written. The considered options are also listed to give you an idea of where I am going with all this.
  1. Hosting platform
    • Cloud virtual machines
      • Linode
      • Amazon
      • Rackspace
    • Physical virtual machines
      • iWeb (I’m in Montreal, Canada)
    • Platform as a service
      • Windows Azure
  2. Linux
    • CentOS
    • Ubuntu Server
    • Debian
  3. Filesystem
    • Synchronisation
      • csync2
      • rsync
    • Distributed
      • GlusterFS
      • Lustre
      • DRBD
    • Shared
      • NFS
  4. Load balancer
    • Amazon / Linode Load balancer
    • HAProxy
    • Nginx
  5. Reverse proxy with caching
    • Nginx
    • Varnish
  6. Web server
    • Apache
      • 2.2 / 2.4
      • Prefork / Worker / Event
    • Nginx
  7. MySQL
    • MySQL Cluster
    • Master/Master replication
    • Master/Slave replication + mysqlnd_ms
    • Percona XtraDB Cluster + Galera
  8. PHP
    • 5.2 / 5.3 / 5.4
    • Apache module
    • PHP-FPM
  9. Configuration system
    • Puppet
    • Chef
    • Custom scripts
  10. Backups
    • Full machine backups
    • Rsync to remote machine
    • Tarballs
  11. Monitoring
    • Zabbix
    • Nagios
    • Ganglia


As you can see, there is a lot to talk about.

Tuesday, January 15, 2013

Generating a unique node id per hostname in Puppet

To generate a unique integer ID in Puppet, we can use the hostname and convert is to base 10 using a erb inline template.


Friday, January 4, 2013

Fixing permissions using inotify and ACLs

I was working on a shared hosting project and I noticed the permissions were getting a bit complicated.

For each website:
  • Each user must have read/write permission
  • Each user's group must have read/write permission (reseller access)
  • Apache must have read access
  • Would be nice if admins had read/write as well
New files can appear at any time, created by all those users, and it would be nice to keep trace of who created them. This is where ACLs kick in.

For the case above, you need this command:
setfacl -m 'u:www-data:r-X,g:example-group:rwX,g:sudo:rwX,o::---' "/srv/www/example.org/htdocs"

ACLs quirks

The problem with ACLs is that they are very fragile and a lot of programs don't propagate them, even if you specify default rules. For example, all programs that move instead of creating will honour the ACL of the directory in which the file was created and the destination. Hence, if you untar the whole website in the directory, the default permissions won't be applied.

Running a script

You could run a script that will loop through all your websites and force the permissions through a cron, but this is a slow process (around 2 minutes on my current setup with ~10 websites), there is a delay before it will run (15-30 minutes, depending on how you set it up), and it is very I/O intensive.

inotifywait

inotify is a library that can warn you if some event occurs on a file or in a folder (recursively). It can be very useful for various tasks like syncing files, doing backups, recording edits progress, etc. inotifywait it simply a program that will wait until the event occurs, so you can execute what you want after.

Watching specifically on CREATE and MOVE events, we can fix the permissions of only the files we need.

The event ATTRIB (chown, chmod, setfacl) was intentionally left out not to cause loops, but it could probably be worked around.

Included below is an example of watching two folders and the corresponding init script. Tested on Ubuntu. 


Wednesday, December 26, 2012

Run a script with lowest priority

When doing low-priority tasks like backups are fixing permissions on a cronjob, it is a good idea to modify the niceness of the script. By using ionice and renice, you can ensure it won't get in the way of your important programs.

Friday, December 21, 2012

Choosing the best wireless router for a small office

At my office, we needed a new router. With more than 50 devices that are quite heavily using the network, we we crushing our basic personal router.

After some unsuccessful searches, I asked the question on Reddit and I worked from there.

Our setup:
  • 20~25 wired computers
  • 20~25 wireless computers
  • ~25 smartphones
Our needs:
  • ~500 GB download/month
  • ~300 GB upload/month
  • Google Apps (Docs, Spreadsheet, Gmail, etc.) – A lot of AJAX requests
  • SSH – Needs very low latency
  • Skype – High bandwidth usage
  • Dropbox – Moderate bandwidth usage
  • Sending and receiving >5GB files
  • Syncing thousands of files over FTP
We had for routers:
  • Dlink DIR-655 acting as the gateway: http://goo.gl/TXXs0[1]
  • WRT54 acting as a router and a wifi AP, configured with DD-WRT: http://goo.gl/Wu89u[2]
Unfortunately, it seemed like the network was always slowing down for no reason and we needed to restart the router a couple time per day. We tried flashing the routers, switching them around and doing the updates, they just can’t support the load.

I knew I wanted some sort of QoS to prioritize usage. Ex: Skype > SSH > Browsers > FTP and at our location, good quality connections at a decent price is pretty hard to find so I was looking forward to a Dual-WAN setup.

Upload speed

When you are trying to improve your Internet connection for a lot of people, it will most likely be your upload speed that will be the bottleneck. For 30 heavy users, try to have at least 5mbps in upload

Network link speed negotiation

Router often use a non-standard algorithm to detect the speed of the link between the router and the modem, 10, 100 or 1000 mbps and sometimes, it can’t get it right. Fail to detect the current speed and you may end up talking at 10mbps to a modem trying to work at 60mbps. More often than not, you want this to be set at 100mbps.

AP

The router I chose has wireless antennas, but I you are 30 people on the same antennas, you will experience come slowdown. Consider installing some access-points over the place. You will have to wire them together, but it is much simpler than wiring everyone.

Router

I had heard good comments about Draytek and so Reddit confirmed it. They are pretty solid, easy to configure and a much cheaper alternative than their Cisco counterparts. I finally went for a Draytek 2920n. This is about 250$, supports balancing and failover with a second Internet connection and it is managed in about the same fashion as a common router. The big difference is that it  is much more powerful and can handle a traffic of a hundred devices without struggling. It was a godsend, we didn’t even need to upgrade the Internet connection, check it out.

Other options

As @devopstom suggested me, other great alternatives are Peplink, Netgear, HP and Meraki.

Meraki is especially cool with all the remote management, very easy management and automatic updates, but they were a bit too expensive for our needs.

Monday, December 17, 2012

Varnish 3 configuration

Shared caches are pretty strict when it comes to caching or not a request. This is good, otherwise it would cache requests that are not meant to be cached.

However, there is some things that can be done to improve cacheability:

  1. Ignore Google Analytics cookies
  2. Remove empty Cookie line
  3. Normalize Accept-Encoding header
  4. Allow replying with a stale response if the backend is slow
Bonus:
  1. Remove some headers to reduce header size and hide some details about the server (security).
  2. Add a debug header to help understand why a request is cached or not.


Thursday, November 1, 2012

Simple templating system using Bash

I often have to deploy several config files that are very similar. Things like Apache VirtualHost and PHP FPM pools. The solution to this kind of problem is to use something like Puppet or Chef that will apply a real template engine and much more like creating folders and stuff. However, this kind of solution is often lengthy to implement and prevents you from doing some quick editing on-the-fly.

Hence, for very simple needs, I started using simple scripts that would only replace variables and give me a basic template to start with. This is however not very flexible and needs to be adapted for each case. And so I did a templater that replaces variables with the value in the environment. It also supports defining default values and variable interpolation.

Example with Apache + FPM

{{LOG_DIR=/var/log/apache2}}
{{RUN_DIR=/var/run/php-fpm}}
{{FCGI=$RUN_DIR/$DOMAIN.fcgi}}
{{SOCKET=$RUN_DIR/$DOMAIN.sock}}
{{EMAIL=$USER@$DOMAIN}}
{{DOC_ROOT=/home/$USER/sites/$DOMAIN/htdocs}}
<VirtualHost *:80>
  ServerAdmin {{EMAIL}}
  ServerName {{DOMAIN}}
  ServerAlias www.{{DOMAIN}}

  DocumentRoot "{{DOC_ROOT}}"

  <Directory "{{DOC_ROOT}}">
    AllowOverride All
    Order allow,deny
    Allow From All
  </Directory>

  AddHandler php-script .php
  Action php-script /php5.fastcgi virtual
  Alias /php5.fastcgi {{FCGI}}
  FastCGIExternalServer {{FCGI}} -socket {{SOCKET}}

  LogLevel warn
  CustomLog {{LOG_DIR}}/{{DOMAIN}}.access.log combined
  ErrorLog {{LOG_DIR}}/{{DOMAIN}}.error.log
</VirtualHost>

Invocation

DOMAIN=cslavoie.com ./templater.sh examples/vhost-php.conf

Help

If you add the -h switch to the invocation, it will print all the variables and their current values

And the code is available on GitHub.

Wednesday, October 10, 2012

Batch update modules or themes of Drupal 6/7 in command line

To update Drupal modules, you need to download manually all modules and this can quickly become tedious. Especially if you have multiple modules and Drupal installations.

I wrote a simple script that parse the module's page and install or update the most up-to-date version.

This works for themes as well. To use it, simply go into a modules or themes folder and run one of the tool.

Valid folders are:
  • /modules
  • /themes
  • /sites/all/modules
  • /sites/all/themes
  • /sites/*/modules
  • /sites/*/themes
However, not that you should not try to update root folders, they are reserved for core modules and are updated with Drupal.

drupal-install-module.sh will install or update one module, drupal-update-modules.sh will batch update all modules in current folder.

Tuesday, October 9, 2012

Verifying DNS propagation

When changing DNS settings, the propagation can take from 15 minutes to two days. Clients and bosses are usually not very fond of this principle so it is often a good idea to be ready to provide a better answer than this.

Finding your nameserver (NS)

Start by finding your nameserver, you should probably already know it. If not, registrar often make them very easy to find. If not, a simple Google search should get you started. You will have 2-5 nameservers and they are usually in the form of ns1.registrar.com.

It is important to get the real information because NS propagation is part of the process.

Query your NS directly

To verify your settings, fire up a terminal and use dig. You can add MX to verify MX records. Basic dig syntax is like this:

dig [+trace] [MX] example.com [@ns1.registrar.com]

In our case, we query the NS directly so we use 

dig example.com @ns1.registrar.com

You should have an answer section giving you an A record, which is you IP address. If you get an error, you server is not configured properly and you can wait as long as you want, it will never work.

Verifying NS propagation

When a domain name is bought, the associated DNS is sent to the root servers. This is usually fairly quick (~20 minutes). By passing the option +trace to dig, it will bypass any local cache and query the root servers directly. You will see 3-5 iterations until you have your answer.

dig +trace example.com

If you get an error, it usually means your registrar has not sent the new informations to the root servers yet or the root servers have not updated their cache. Verify your NS settings with your registrar and wait a bit. More than 30 minutes is very usual and you should contact your registrar.

Verifying world propagation

Online tools exist to test the propagation against several NS around the world. I personally like http://www.whatsmydns.net/. Verify that the information is correct and once 80% of the server are agreeing, you are fairly confident that everyone near you will see the same as you.

Clearing local and domain cache

Most enterprise and routers have a DNS cache to speedup resolution, you can restart your router to clear it up. Otherwise, fancier network will have a mean to do this cleanly.

To clear local cache, it depends on your system.
  • Windows: ipconfig /flushdns
  • Mac: dscacheutil -flushcache or lookupd -flushcache
  • Linux: Restart nscd and/or dnsmasq or equivalent
It may be tricky to get your client to do it though…

Contacting your ISP or bypassing them

If most world servers are correctly answering since a couple hours, you want want to contact your ISP and ask them what’s up. It is not uncommon that ISPs have caches of a couple hours to lower to stress of on their servers.  If they are not very collaborative, you can manually enter a new DNS for your network. Two fast and safe choices:

Google:

  • 8.8.8.8
  • 8.8.4.4

OpenDNS:

  • 208.67.222.222
  • 208.67.220.220

Tuesday, September 18, 2012

Protect Webserver against DOS attacks using UFW

Ubuntu comes bundled with UFW, which is an interface to iptables. This is basically a very lightweight router/firewall inside the Linux kernel that runs way before any other application.

Typical setup of ufw is to allow HTTP(S), limit SSH and shut everything else. This is not a UFW or iptables tutorial, you may find a lot of online help to guide you through all your needs. However, I personally had a lot of difficulties to find good documentation on how to protect yourself against HTTP attacks.

A lot of HTTP requests is normal

The problem is that HTTP can get very noisy. A typical Web page can easily have up to a hundred of assets but usually, if you receive 100 requests in a second, it means you are under siege. If you really need to have 100 assets on a single Web page, you need a CDN, not as better server.

Rate limiting

These rules have been mostly guessed through trial-and-error and some search around the Web, tweak to fit your needs. A rate limit of x connections per y seconds means that if x connections has been initiated in the last y seconds by this profile, it will be dropped. Dropping is actually a nice protection against flooding because the sender won't know that you dropped it. He might think the packet was lost, that the port is closed or even better, the server is overloaded. Imagine how nice, your attacker thinks he succeeded, but in fact you are up and running, him being blocked.

Connections per IP
A connection is an open channel. A typical browser will open around 5 connections per page load and they should last under 5 seconds each. Firefox, for example, has a default max of 15 connections per server and 256 total.

I decided to go for 20 connections / 10 seconds / IP. 

Connections per Class C
Same a above, but this time we apply the rule to the whole Class C of the IP because it is quite common for someone to have a bunch of available IPs. This means for example all IPs looking like 11.12.13.*

I decided to go for 50 simultaneous connections.

Packets per IP
This is the challenging part. Due to a limitation that is not easy to circumvent, it is only possible to keep track of the last 20 packets. At the same time, it might add a considerable overhead to track 100 packets for each IPs. While big website may eventually need more than this, like I said, you should take a look in a proper CDN.

I decided to go for 20 packets / second / IP

Configuring UFW

The following instructions are targeted at UFW, but it is really just a wrapper so it should be easy to adapt them for a generic system.

Edit /etc/ufw/before.rules, putting each part where it belongs

Make sure ufw runs and reload everything using ufw reload.

Testing the results

Make sure everything runs smoothly by refreshing your browser like a mad-man. You should start getting timeout after ~15 refreshes and it should come back in less than 30 seconds. This is good.

But if you want to get serious on your tests, some tools may help you putting your server to its knees. It is highly discouraged to use this on a production server, but it is still better if you do it yourself than if you wait for someone to try.

Try those with UFW enabled and disabled to see the difference but be careful, some machines may downright crash on you or fill all available space with logs.
  • http://ha.ckers.org/slowloris/
    Written in Perl, features a lot of common attacks, including HTTPS
  • http://www.sectorix.com/2012/05/17/hulk-web-server-dos-tool/
    Written in Python, basic multi-threaded attack, very easy to use.
  • http://www.joedog.org/siege-home/
    Compiled, available in Ubuntu repositories, very good to benchmark
  • http://blitz.io/
    Online service when you can test freely with up to 250 concurrent users
To confirm that everything works perfectly, SSH into your machine and start a tail -f /var/log/ufw.log to see the packets being dropped and htop to watch the CPU have fun. 

SSH into another machine and start a script. You should see the CPU sky-rocket for a few seconds and then go back to normal. Logs will start to appear and your stress-tool will have some problems. While all this is going on, you should be able to browse normally your website using your computer. 

Great success.

Friday, September 14, 2012

Generate missing Nginx mime types using /usr/share/mime/globs

Nginx comes with a rather small set of mime types compared to a default Linux system

Linux uses a glob pattern to match a filename while Nginx matches only extension, but we can still use every glob in the format of *.ext

So here is a small PHP script converting/sorting/filtering/formatting everything in a nice output.


Wednesday, September 12, 2012

Configure ElasticSearch on a single shared host and reduce memory usage

ElasticSearch is a powerful, yet easy to use, search engine based on Lucene. Compared to others, it features a JSON API and wonderful scaling capabilities via a distributed scheme and the defaults are aimed towards such scalability.

However, you may want to use ElasticSearch on a single host, mixed with your Web server, database and everything. The problem is that ES is quite a CPU and memory hog by default. Here’s what I found through trial and error and some heavy search.

This idea is to give ES some power, but leave some for the rest of the services. At the same time, if you tell ES that it can grab half of your memory and the OS needs some, ES will get killed, which isn’t nice.

My host was configured this way:
  • ElasticSearch 0.19.9, official .deb package
  • Ubuntu 12.04
  • 1.5GB of RAM
  • Dual-Core 2.6ghz
  • LEMP stack
After installing the official package:
  1. Allow user elasticsearch to lock memory
    1. Edit /etc/security/limits.conf and add:
      elasticsearch hard memlock 100000
  2. Edit the init script: /etc/init.d/elasticsearch
    1. Change ES_HEAP_SIZE to 10-20% of your machine, I used 128m
    2. Change MAX_OPEN_FILES to something sensible.
      Default is 65536, I used 15000
      Update: I asked the question on ElasticSearch group and it may be a bad idea, without giving any advantage.
    3. Change MAX_LOCKED_MEMORY to 100000  (~100MB)
      Be sure to set it at the same value as 1.1
    4. Change JAVA_OPTS to "-server"
      I don’t exactly know why, but if you check in the logs, you will see Java telling you to do so.
  3. Edit the config file: /etc/elasticsearch/elasticsearch.yml
    1. Disable replication capabilities
      1. index.number_of_shards: 1
      2. index.number_of_replicas: 0
    2. Reduce memory usage
      1. index.term_index_interval: 256
      2. index.term_index_divisor: 5
    3. Ensure ES is binded to localhost
      network.host: 127.0.0.1
    4. Enable blocking TCP because you are always on localhost
      network.tcp.block: true
  4. Flush and restart the server
    1. curl localhost:9200/_flush
    2. /etc/init.d/elasticsearch restart

Friday, July 13, 2012

Introducing Dotfiles Builder

Managing bashrc sucks

We all have our nice little bashrc that we are proud of. It tests for files, programs and terminal features, detect your OS version, builds a PATH, etc. For all of our OS and different setups, various solutions exist.

Keeping several versions

Pros:

  • Ultimate fine-tuning
  • Easy to understand
  • Usually optimized for every setup

Cons:

  • Very time consuming to manage
  • Hard to “backport” new ideas

Keep a single unusable file with everything and edit accordingly

Pros:

  • Easy to backport, you just need to rembember to do it
  • Good performance
  • Since you edit at each deployment, nice fine-tuning capability

Cons:

  • The single file can become unbearably cluttered.
  • You eventually end up managing several version.
  • Tedious to edit at each deployment

Include several subfiles

Pros:

  • Still have a lot fine-tuning capabilities
  • If well constructed, can be easy to understand
  • Easy to deploy new features

Cons:

  • Hard to detect which file to include
  • Multiplicates the number of files to manage
  • Slow performance
  • Until recently, this was my prefered method.

Wanted features

So, what does a good bashrc have?

Should have:

  • Good performance. On a busy server, you really don't want to wait 5 seconds for your new terminal because your IO is sky rocketing.
    • Reduce number of included files
    • Reduce tests for environment features
    • Reduce tests for program and files
  • High flexibility
    • Cross-OS compatible
    • A lot of feature detection
    • Ideally, configuration files
  • Ease and speed of configuration
    • It should not take more than a minute to setup a new bashrc
    • If you need to specify your developer email, it would be nice to do it only once.

Yes, you read right, reduce tests AND do a lot of feature detection. You don't want to do Java specific configuration or set an empty variable if Java is not even installed, but you do want Java to be automatically detected.

Generating a bashrc

Let's face it, you will install or remove Java way less often then you will start a new shell. Why then test for Java at each new shell?

This is where I introduce the Dotfiles Builder. The script runs in Bash and outputs the wanted bashrc.

This way, instead of doing:

if [ -d "$HOME/bin" ]; then
  PATH="$HOME/bin:$PATH"
fi

You would do:

if [ -d "$HOME/bin" ]; then
  echo "PATH=\"$HOME/bin:$PATH\""
fi

And the result would simply be

PATH="$HOME/bin:$PATH"

But constructing PATH is a rather common task and you want to make sure the folder is not already on your PATH. Why not wrap it up ?

Take a look at the alpha version: https://github.com/lavoiesl/dotfiles-builder/
As well as the example output.

This is a very alpha version of the intended program, but I still want to share what I have and maybe get some feedback and collaborators along the way. Currently, it only generates a bashrc, but expect more to come.