Pages

Showing posts with label Linux. Show all posts
Showing posts with label Linux. Show all posts

Saturday, April 13, 2013

LAMP Cluster — Distributed filesystem

This post is part of: Guide to replicated LAMP stack hosting with failover


The core concept of choosing a filesystem for a Web hosting cluster is to eliminate single points of failure, but sometimes it is just not easy like that. A true distributed system will still need to be performant, at least on reads. The problem relies in the fact that the bottleneck if very often the I/O so if your filesystem is not performant, you will end up spending a fortune on scaling, without gaining real performance.

Making priorities

You can’t have everything, so start by making a list of priorities. Different systems will have different needs, but I figured I could afford a possibility of failure as long as the system could be restorable since I would be keeping periodic backups.
  1. Low maintenance
    • It must be possible to read/write from any folder without adding a manifest for each site.
    • The system must be completely autonomous and require no maintenance from a sysadmin. (Conflict management).
  2. Simple / Cheap
    • Must be installed on each Web nodes or a maximum of 2 small/medium extra nodes
    • Must run on Ubuntu, without recompiling the kernel. Kernel modules are acceptable.
  3. Performant
    • Reads less than 50% slower than standard ext3 reads.
    • Writes less than 80% slower than standard ext3 writes.
    • Must be good at handling a lot of small files. Currently, my server hosts 470k files for a total of 6.8 GB. That is an average of 15 KB per file!
  4. Consistency
    Changes must propagate to all servers within 5 seconds.
    • Uploaded files stored in database but not yet synced may generate some errors for a short period if viewed by other users on other servers.
    • Temporary files are only relevant on the local machine so a delay is not a big deal.
    • HTTP Sessions will be sticky at the LodeBalancer level so user specific information will be handled properly.
  5. Must handle ACLs
    • For permissions to be set perfectly, we will be using ACLs.
    • ACLs may not be readable within the Web node, but they must still be enforced.
  6. Durability
    • Must handle filesystem failures — be repairable very quickly.
    • File losses are acceptable in the event of a filesystem failure.
    • Filesystem must continue to function even if a Web node goes offline.
    • No single point of failure. If there is one, if must be isolated on its own machine.

A. Synchronisation

Synchronisation means that there is no filesystem solution, all the files are stored on the local filesystem and synchronisation is made with the other nodes periodically or by watching I/O events.

Cluster synchronisation involving replication between all the nodes is usually very hard. To improve performance and reduce the risk of conflicts, it is often a good idea to elect a replication leader and a backup. If the leader is unavailable, the backup will be used instead. This way, all the nodes will sync with only one.
  • Pros
    • Very fast read/write
    • Very simple to setup
  • Cons
    • May have troubles synchronizing ACLs
    • May generate a lot of I/O
    • Will most likely generate conflicts

Rsync

The typical tool for fast file syncing is rsync. It is highly reliable and a bit of BASH scripting will get you started. However, as the number of files grows, it may become slow. For around a million files, it may easily take over 5 seconds. With our needs, it means it will have to run continuously, which will generate a lot of I/O and thus impact the overall performance.

Csync2

Csync2 is a promising tool that works like rsync, but it keeps file hints in a SQLite database. When a file changes, it flags in the database that the file needs checking. This way, the full sync only needs to check marked files.

Csync2 supports multi-master replication and slaves (receive-only). However, I found while testing that it is not really adapted to a lot of small files changing frequently: it tends to generate a lot of conflicts that need to be attended manually.

It may not be the best solution for Web hosting, but for managing deployment of libraries or similar tasks, it would be awesome.

B. Simple sharing (NFS)

Even simpler than file syncing is plain old sharing. A node is responsible of hosting the files and serves the files directly. Windows uses Samba/CIFS, Mac uses AFP and Linux uses NFS.

NFS is very old, like 1989 old. Even the latest version, NFSv4, came around in 2000. This means it is very stable and very good at what it does.
  • Pros
    • Supports ACLs (NFSv4)
    • Very cheap and simple setup
    • Up to a certain scale, fast read/write
  • Cons
    • Single point of failure
    • Hard to setup proper failover
    • Not scalable

C. Distributed / Replicated

A distributed filesystem may operate at a device, block or inode level. You can think of this a bit like a database cluster. It usually involves journals and is the most advanced solution.

  • Pros
    • Very robust
    • Scalable
  • Cons
    • Writes are often painfully slow
    • Reads can also be slow
    • Often complex to setup

GlusterFS

Gluster runs over Fuse and NFS. Each node can have its own block and the daemon handles the replication transparently, without the needs of a management node. 

Overall, it is very good software, the write performance is decent and it handles failures quite well. There has been a lot of recent work to improve caching, async writes, write-ahead, etc.  However, in my experience, the read performance is disastrous. I really tried tuning it a lot, but I still feel like I haven’t  found the true potential of this. 

Ultimately, I had to let it down for the moment because of a lack of time to tune it more. It has a large community and is widely spread, so I will probably end up giving it another chance.

Lustre

Lustre seems like the Holy Grail of distributed filesystems. From Wikipedia: “At the present time, six of the top 10 and more than 60 of the top 100 supercomputers in the world have Lustre file systems in them.”

It appears to have everything I could dream of: speed, scalability, locks, ACLs, you name it. 

However, I was never able to try it. It requires dedicated machines with various roles: management, data, file servers (API). This means I would need 4-5 additional machines. On top of that, it needs custom kernel modules.

Definitely on my wish-list, but inaccessible for the moment.

DRBD

DRBD is not cluster solution, it does live backup. Usually, it is used to make a full mirror of a server that can be swapped with the master at any moment, should it fail. This is often used to patch solutions where replication is not built-it. Examples of this are NFS or MySQL. There is a way to setup a 3-nodes solution, but it is far from perfect.

Conclusion

In the end, I found that synced solutions were not reliable enough and distributed solutions were too complex so I chose NFS. My plan is to add a DRBD soon to provide a durability layer but a more serious solution will have to wait. If my cluster scales to the point that NFS can’t suffice to the task, this will mean I will have enough clients, enough money and enough reasons to consider a proper solution.


Comparison
Maintenance Complexity Performance Scalability Durability Consistency ACLs
Rsync Low Low Very high Low High Low Yes
Csync2 High Medium Very high Low High Low Yes
NFS None None Medium None None Very high Enforced
GlusterFS None Medium Low High High Very high Yes
Lustre None Very high High Very high Very high Very high Yes
DRBD None Medium n/a 2 or 3 Very high n/a Yes



LAMP Cluster — Choosing an Operating System

This post is part of: Guide to replicated LAMP stack hosting with failover


Beside choosing Linux vs Mac or Windows, the OS should not impact your users, it is mostly a sysadmin choice. Your users, the ones who will be connecting via SSH, will expect binaries to be available without modifying their PATH and common tools like Git or SVN to be already installed, but it does not really matter how it was installed.

The key to be sure that nobody has a hard time making everything work is to do things the most standard and common way possible.

Choose between the most used distributions

This is really important. Choosing a distribution for your laptop or your development server is not the same thing as choosing a production environment. Forget Gentoo and friends, being connected directly to the bare-bone of your system is nice when you are learning or building a world-class new system, but for you own setup, you want something tested by the whole community, something that works. Even if it involves a bit of magic.

A good example of some magic is what Ubuntu does with networking. I admit that since 10.x, I don’t really understand all the cooperation between /etc/resolv.conf, /etc/network/interfaces, dhclient, /etc/init.d/networking and such. At some point, they all seem to redefine each other and in a particular release, a script will start to throw some warnings, but it works. Never has the network failed me on Ubuntu, which is something quite relevant when you need to access a remote machine.

Edge vs stable

I use the infamous expression “Debian stable” when I want to refer to something configured in such a conservative way that will work, but at the cost of using the technology of 2005. I know, Debian stable is not that bad, but I tend to have some faith in the testing procedures of the maintainers.

My rule of thumb is: when a stable version is available, use it. If I want the features of a less stable version, I make sure it lived at least a month and do a quick search on its stability and trustworthiness.

An example of this is that I don’t restrict myself to Ubuntu Long Term Support editions. I am happy to use it and will usually keep using it for a bit longer that the other releases but it is only every 2 years, sometimes this is not enough. Moreover, I tend to upgrade or reinstall every year or two, so I don’t hit the end-of-support limit.

Here are some of the top distributions, ordered by edginess:

Versions available
Distribution Apache PHP MySQL Varnish
Ubuntu 12.10 2.2.22 5.4.6 5.5.29 3.0.2
Debian wheezy 2.2.22 5.4.4 5.5.28 3.0.2
OpenSuse 12.3 2.2.22 5.3.17 5.5.30 3.0.3
Ubuntu 12.04 LTS 2.2.22 5.3.10 5.5.29 3.0.2
Debian squeeze (stable) 2.2.16 5.3.3 5.1.66 2.1.3
CentOS 6 2.2.15 5.3.3 5.1.66 manual

PHP 5.3.3, our main concern, was released in July 2010 and important fixes have occurred since, so this is out of the question.

Varnish 2 is very different from Varnish 3, so this needs to be looked at.

It is usually possible to install newer versions, but this implies relying on third-party packaging, multiple installed binaries or even compiling yourself.

Forget benchmarks

Linux is very low-print system; a common mistake is trying to over optimize it. What will eat your CPU is PHP and MySQL, what will eat your memory is MySQL and the number of connections you can handle is mostly dependant on your webserver. If you machine is spending too much time in kernel space, it is probably because you need a bigger one. Also, don’t forget to benchmark your disks. See my post on choosing hardware.

Conclusion

Your choice must be focused on stability, ease-of-use and community size. 

Personally, I prefer Debian-based solutions. Aptitude works very well and I just happen to have been more in contact with it.

All and all, I went for Ubuntu 12.10, PHP 5.4 offers really good performance improvement over 5.3 and it has been long enough since this release of Ubuntu happened.

Tuesday, March 12, 2013

Guide to replicated LAMP stack hosting with failover

Motivations on building on your own hosting

For my company, I started to think about offering a hosting service. Cheap solutions exist like 1and1.com and iweb.com where you can rent a VPS or have some shared hosting, but it implies some setup for each client, managing credentials, analyzing the needs of everyone, etc. What about upgrades? What about failovers? What about a custom services like Lucene or Rails that could be running?

Defining the needs

Before trying to find solutions, let’s try to find the correct questions. What are we trying to accomplish exactly? Those are generic needs; I will provide my answers, but at some point, I had to take some shortcuts to be able to complete it and meet some profitability requirements. If you have different priorities or your are working on a different scale, your answer will most probably differ at some point.

Scalable

Upgrades must be possible without any downtime. It must be easy so we can react in a matter of minutes to an emergency load or a crash. Also, we will spend quite some time configuring everything so we would like to keep it even if we triple our load. Ideally, we want to be able to scale both horizontally (adding more machines) and vertically (upgrading the machines). 

Highly Available

This means fail-overs, redundancy and stability. The key is load balancing, but we want to remove single points of failures as much as possible. If there is any, we want to have complete trust in them and they should do the least possible be as isolated as possible.

Secure

The purpose of this guide is not to build a banking system, but we still want to be secure. We want some strong password policies, firewalls and most importantly: backups. The whole system should be re-installable in an hour or two if something major happens and clients’ files and databases should be revertable hourly, daily, weekly or something around those lines.

Compatible and flexible

We will have almost no control over the applications, but we still want to standardize some key elements. For example, having 2 database systems could be acceptable, but running 2 different web servers is a bit over zealous. Some clients may also have some particular needs like a search engine or cronjobs, we need to be ready.

Performant

Between scalability and availability, we often achieve performance but only if each of the channels are independent. In general, websites do much more reads than writes, both in the database and on the filesystem. However, because we want to be compatible and application agnostic, we won’t be able to resort to techniques like declaring a folder read-only or having a slave database. We may not be able to constrain application, but we can reward those who are well configured: we can provide some opt-in features like reverse proxies, shared cache, temporary folders, etc.

Profitable

And the last but not the least, we like profits, so the whole system must have a predictable cost that can be forwarded to the appropriate client. Scalability plays a big role here because we can scale just as much as we need, when we need.

Overview

As I am starting to write this guide, the system is already operational and in production. I already stumbled across many problems, but I am sure some others are still to come. Here is an overview of all the parts I want to address, links will become available as they are written. The considered options are also listed to give you an idea of where I am going with all this.
  1. Hosting platform
    • Cloud virtual machines
      • Linode
      • Amazon
      • Rackspace
    • Physical virtual machines
      • iWeb (I’m in Montreal, Canada)
    • Platform as a service
      • Windows Azure
  2. Linux
    • CentOS
    • Ubuntu Server
    • Debian
  3. Filesystem
    • Synchronisation
      • csync2
      • rsync
    • Distributed
      • GlusterFS
      • Lustre
      • DRBD
    • Shared
      • NFS
  4. Load balancer
    • Amazon / Linode Load balancer
    • HAProxy
    • Nginx
  5. Reverse proxy with caching
    • Nginx
    • Varnish
  6. Web server
    • Apache
      • 2.2 / 2.4
      • Prefork / Worker / Event
    • Nginx
  7. MySQL
    • MySQL Cluster
    • Master/Master replication
    • Master/Slave replication + mysqlnd_ms
    • Percona XtraDB Cluster + Galera
  8. PHP
    • 5.2 / 5.3 / 5.4
    • Apache module
    • PHP-FPM
  9. Configuration system
    • Puppet
    • Chef
    • Custom scripts
  10. Backups
    • Full machine backups
    • Rsync to remote machine
    • Tarballs
  11. Monitoring
    • Zabbix
    • Nagios
    • Ganglia


As you can see, there is a lot to talk about.