Pages

Showing posts with label VPS. Show all posts
Showing posts with label VPS. Show all posts

Thursday, March 21, 2013

LAMP Cluster — Comparison of different hosting platforms

This post is part of: Guide to replicated LAMP stack hosting with failover

The first step of building a hosting service is to choose your provider. Each system have its strengths and weaknesses and you will have to choose according to your needs and your proficiency at using the provided tools. This step is crucial and if possible, you should spend some time testing and benchmarking each of them to see if it matches your expectations.

DISCLAIMER: Below, I something mention prices; they are meant as an indication rather than a real comparison. Comparing different services can sometimes get very tricky as they don’t include the same things and their performance is difficult to compare effectively.

Platform as a service

This is basically what you want to build. Another company will provide you some services in the cloud and you will configure your applications to use them. Here, all the scalability and redundancy is done for you and you will only pay for what you use.

The thing is though, you have absolutely no control on the Operating System and you are bound to the services the platform provides you.

This guide is all about building it, so these are more here as a comparison basis than actual alternatives.

Google App Engine


Typically, GAE runs Java and Python. It provides PHP through an emulation layer and some SQL, but there is no MySQL. Some quick research and I found people discussing it: Wordpress and Drupal.  Short answer: no MySQL, can’t be done. 

However, if you want to host some Django on it, please do!

Windows Azure

This does almost everything you need, they have a really wide array of services. If you need something else, you can deploy a custom VM and do what you want. This is perfect for prototyping.

However, fiddling with their price calculator, I found that it can become quickly expensive, they charge for almost everything. Yes they have PHP/MySQL support, but the idea is to have some scale economy. I did an estimate: 
  • 4 small Web and Worker instances
  • 1 small Linux Virtual Machine (for testing, management, etc.)
  • 10 x 100 MB Databases
  • 100 GB bandwith
  • 100 GB storage
This is rather conservative; you will probably need way more than 4 small instances. Only thing is, it is almost 500$ per month. Hardly a bargain.

Heroku

Heroku is mostly known for its Ruby support, but it has a very wide array of add-ons; it even has MySQL support through ClearDB. Their prices tend to be lower than Azure and it is closer to open source initiatives, which I tend to use a lot. I have never actually used Heroku, so I can’t really approximate what I would need, but the sheer amount of possible configurations is incredible.

A big plus is also the deployment procedure that is backed by Git. It involves describing a project with a configuration file and simply pushing to Heroku. There is quite a lot of examples out there on how to deploy Wordpress, Drupal, etc. If I wasn’t trying to build an infrastructure myself, I would definitely consider it strongly.

Virtual machines on physical hardware

If you plan on the long-term, investing in hardware might be a good idea. Hardware is way less expensive and providers like iWeb tend to give a lot of bandwidth (if not unlimited). Upgrades are usually way less expensive, but they involve downtime and risk.

You still need some virtualization

For ease of management, you will almost certainly want a virtualization solution: this way you can create, backup, scale and migrate virtual machines in only a couple steps. In the most popular solutions, OpenStack is free and open source while VMware has a very good reputation with vCenter. The downside is that it means you have yet another thing to configure.

You still need multiple servers

If you go with physical machines, you will need some RAID and everything, but that all means downtime when something breaks. To reduce the risks, you will still need a second or third machine to provide some backup. Really, managing physical hardware is an art all by itself; if you wish to provide some good quality Web hosting, you will need someone specialized in that matter.

Why not let a third party do all this for you ?

Virtual private servers (VPS)

We want full control over the system, scalability, virtualization management, etc. So it all comes to a nice in-the-middle solution: virtual machines provided by a third-party. Here, all the hard stuff is already done, you will most certainly have multiple locations in the world to choose from and you can usually trust the hardware to not fail completely. Sometimes there is downtime, but losing data is extremely rare.

Below are multiple choices I know of, but I suggest you try FindTheBest for a more thorough comparison.

For the setup I will be talking about in another post, we need this setup:
  • 1 small/medium management node
  • 3 medium/large working nodes
  • 2 small/medium utility nodes
  • 1 small dev node

Amazon Web Services (AWS)

I have been a client of Amazon EC2 for more than two years. They offer a wide array of services:
  • Virtual machines (EC2)
  • DNS servvices (Route53)
  • Load balancing + Auto scaling
  • Dedicated databases with automatic fallback (RDS)
  • High performance I/O (EBS)
  • Low performance, high durability I/O (S3)
  • CDN (CloudFront)
  • Highly configurable firewall
  • And much much more
A lot of websites are running on Amazon services. The problem is, it is expensive and it is built for computing, not Web hosting. This means it is perfect for a rather short burst of computing like crunching data but it becomes expensive if it is online all the time. Also, in the concept of pay-per-use, everything you do will end up costing you something, which can built up rather quickly. Over the last two years, the performance has been going downhill, but recently, they have been lowering their prices so it might be getting a better alternative.

Here is an example using their calculator. (333 $/month)

Google Compute Engine (GCE)

Google also has a service that is very similar to Amazon EC2, but with less options and it seems to have a better performance/price ratio. I am not familiar with their services, but I thought it was worth mentioning.

Windows Azure

As mentioned above, Azure has virtual machines as well, but you can connect them with the rest of the platform so it can be a nice hybrid solution.

However, it is still pretty pricy. For our setup, 3 medium, 2 small and 2 x-small, we are already at 478 $/month — and no bandwidth or storage is included yet.
Linode exists since 2003, but I only discovered it last year. They are growing rapidly, new features are coming in and the amount of included things is going up and up every month. What I like about Linode is that I feel like I am in total control of my machines.
  • Multiple availability zones (like most other providers)
  • Very easing permission management (you can give read-only access to your clients)
  • Very powerful admin panel.
  • Powerful recovery tools
    • Remote connection via SSH or in-browser to the host so you can rescue your VM while it boots
    • Possibility to switch kernels and reboot in rescue mode
    • Possibility to reset root password from admin panel
    • Possibility to rebuild from a backup or a fresh install without destroying the VM.
  • Unexpensive Load Balancers
  • Support for StackScripts, a way to run scripts while deploying a new VM
  • High class (free) support. From my experience, replies typically take 1-5 minutes!
  • Unlimited DNS zones
  • Very high transfer caps
  • Unmetered disk operations
  • Unmetered Gigabit in-zone data transfer
And they are on a rampage. They recently upgraded their network and all VM now have 8 cores. You wonder how it is possible to have a 8 cores on a small instance, but it is actually the priority on those CPU that scales, not their power. In other words, the higher your package, the more reliable its performance is.

Seriously, the more I work with Linode, the more they feel right, it just feels like they know their thing and do the best they can to give you everything they can.

Have a try, you can use a small instance for a month. Here is my referral link. I get 20$ if you buy something.

For a setup similar to the AWS detailed above, it boils down to around 220 $/month, but you have to build the database, memcache, CDN yourself.

Performance evaluation

Whatever provider you choose, be sure to test its performance. This is especially true for CPU and disks. The number of cores and their clock speed means little to nothing. The best tool I found was SysBench. For disk operations specifically, you can choose various profiles like read-only, sequential read/write,  random read/write or specify a ratio of read/write.

When benchmarking for websites, you typically want a lot of small files (10kB - 1MB) that will be read sequentially and some big files (1MB-5MB) with a read/write ratio of about 95%.

Maybe more on that later.


Tuesday, September 18, 2012

Protect Webserver against DOS attacks using UFW

Ubuntu comes bundled with UFW, which is an interface to iptables. This is basically a very lightweight router/firewall inside the Linux kernel that runs way before any other application.

Typical setup of ufw is to allow HTTP(S), limit SSH and shut everything else. This is not a UFW or iptables tutorial, you may find a lot of online help to guide you through all your needs. However, I personally had a lot of difficulties to find good documentation on how to protect yourself against HTTP attacks.

A lot of HTTP requests is normal

The problem is that HTTP can get very noisy. A typical Web page can easily have up to a hundred of assets but usually, if you receive 100 requests in a second, it means you are under siege. If you really need to have 100 assets on a single Web page, you need a CDN, not as better server.

Rate limiting

These rules have been mostly guessed through trial-and-error and some search around the Web, tweak to fit your needs. A rate limit of x connections per y seconds means that if x connections has been initiated in the last y seconds by this profile, it will be dropped. Dropping is actually a nice protection against flooding because the sender won't know that you dropped it. He might think the packet was lost, that the port is closed or even better, the server is overloaded. Imagine how nice, your attacker thinks he succeeded, but in fact you are up and running, him being blocked.

Connections per IP
A connection is an open channel. A typical browser will open around 5 connections per page load and they should last under 5 seconds each. Firefox, for example, has a default max of 15 connections per server and 256 total.

I decided to go for 20 connections / 10 seconds / IP. 

Connections per Class C
Same a above, but this time we apply the rule to the whole Class C of the IP because it is quite common for someone to have a bunch of available IPs. This means for example all IPs looking like 11.12.13.*

I decided to go for 50 simultaneous connections.

Packets per IP
This is the challenging part. Due to a limitation that is not easy to circumvent, it is only possible to keep track of the last 20 packets. At the same time, it might add a considerable overhead to track 100 packets for each IPs. While big website may eventually need more than this, like I said, you should take a look in a proper CDN.

I decided to go for 20 packets / second / IP

Configuring UFW

The following instructions are targeted at UFW, but it is really just a wrapper so it should be easy to adapt them for a generic system.

Edit /etc/ufw/before.rules, putting each part where it belongs

Make sure ufw runs and reload everything using ufw reload.

Testing the results

Make sure everything runs smoothly by refreshing your browser like a mad-man. You should start getting timeout after ~15 refreshes and it should come back in less than 30 seconds. This is good.

But if you want to get serious on your tests, some tools may help you putting your server to its knees. It is highly discouraged to use this on a production server, but it is still better if you do it yourself than if you wait for someone to try.

Try those with UFW enabled and disabled to see the difference but be careful, some machines may downright crash on you or fill all available space with logs.
  • http://ha.ckers.org/slowloris/
    Written in Perl, features a lot of common attacks, including HTTPS
  • http://www.sectorix.com/2012/05/17/hulk-web-server-dos-tool/
    Written in Python, basic multi-threaded attack, very easy to use.
  • http://www.joedog.org/siege-home/
    Compiled, available in Ubuntu repositories, very good to benchmark
  • http://blitz.io/
    Online service when you can test freely with up to 250 concurrent users
To confirm that everything works perfectly, SSH into your machine and start a tail -f /var/log/ufw.log to see the packets being dropped and htop to watch the CPU have fun. 

SSH into another machine and start a script. You should see the CPU sky-rocket for a few seconds and then go back to normal. Logs will start to appear and your stress-tool will have some problems. While all this is going on, you should be able to browse normally your website using your computer. 

Great success.

Wednesday, September 12, 2012

Configure ElasticSearch on a single shared host and reduce memory usage

ElasticSearch is a powerful, yet easy to use, search engine based on Lucene. Compared to others, it features a JSON API and wonderful scaling capabilities via a distributed scheme and the defaults are aimed towards such scalability.

However, you may want to use ElasticSearch on a single host, mixed with your Web server, database and everything. The problem is that ES is quite a CPU and memory hog by default. Here’s what I found through trial and error and some heavy search.

This idea is to give ES some power, but leave some for the rest of the services. At the same time, if you tell ES that it can grab half of your memory and the OS needs some, ES will get killed, which isn’t nice.

My host was configured this way:
  • ElasticSearch 0.19.9, official .deb package
  • Ubuntu 12.04
  • 1.5GB of RAM
  • Dual-Core 2.6ghz
  • LEMP stack
After installing the official package:
  1. Allow user elasticsearch to lock memory
    1. Edit /etc/security/limits.conf and add:
      elasticsearch hard memlock 100000
  2. Edit the init script: /etc/init.d/elasticsearch
    1. Change ES_HEAP_SIZE to 10-20% of your machine, I used 128m
    2. Change MAX_OPEN_FILES to something sensible.
      Default is 65536, I used 15000
      Update: I asked the question on ElasticSearch group and it may be a bad idea, without giving any advantage.
    3. Change MAX_LOCKED_MEMORY to 100000  (~100MB)
      Be sure to set it at the same value as 1.1
    4. Change JAVA_OPTS to "-server"
      I don’t exactly know why, but if you check in the logs, you will see Java telling you to do so.
  3. Edit the config file: /etc/elasticsearch/elasticsearch.yml
    1. Disable replication capabilities
      1. index.number_of_shards: 1
      2. index.number_of_replicas: 0
    2. Reduce memory usage
      1. index.term_index_interval: 256
      2. index.term_index_divisor: 5
    3. Ensure ES is binded to localhost
      network.host: 127.0.0.1
    4. Enable blocking TCP because you are always on localhost
      network.tcp.block: true
  4. Flush and restart the server
    1. curl localhost:9200/_flush
    2. /etc/init.d/elasticsearch restart