Web development all-around, including PHP, CSS, HTML, Hosting, MySQL, Symfony, Drupal and Wordpress.
Monday, December 1, 2014
MySQL 5.7 and Wordpress problem
You may encounter the "Submit for review" bug where you cannot add new posts. It may be related to permissions, auto_increment and other stuff, but here is another case: bad date formats and invalid data altogether.
In MySQL <= 5.6, by default, invalid values are coalesced into valid ones when needed. For example, attempting to set a field NULL on a non-null string will result in empty string. Starting with MySQL 5.7, this is not permitted.
Hence, if you want to upgrade to 5.7 and use all the goodies, you should consider putting it in a more compatible mode, adding this to your /etc/my.cnf:
[mysqld]
# Default:
# sql_mode = ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ENGINE_SUBSTITUTION
sql_mode = ALLOW_INVALID_DATES,NO_ENGINE_SUBSTITUTION
See official documentation for complete information
Friday, June 20, 2014
Adding newlines in a SQL mysqldump to split extended inserts
Separate INSERTs
Extended INSERTs
New-And-Improved™ INSERTs
Current solutions
Using sed
Using net_buffer_length
Writing a parser
But then I realized it was too slow, so I rewrote it in C, using strcspn to find string occurence:
Happy dumping !
Friday, December 6, 2013
GlusterFS performance on different frameworks
My test environment:
- MacBook Pro (Late 2003, Retina, i7 2.66 Ghz)
- PCIe-based Flash Storage
- 2-4 virtuals machines using VMware Fusion 4, each with 2 GB of RAM.
- Ubuntu 13.10 server edition with PHP 5.5 and OPCache enabled
- GlusterFS running on all VMs with a volume in replica mode
- The volume was mounted using nodiratime,noatime using GlusterFS native driver (NFS was slower)
- siege -c 20 -r 5 http://localhost/foo # Cache warming
- siege -c 20 -r 100 http://localhost/foo # Actual test
- 2 nodes, 4 cores per node
- 2 nodes, 2 cores per node
- 4 nodes, 2 cores per node
All tests were ran 2-3 times while my computer was doing nothing and the results were very consistent.
Symfony | Wordpress | Drupal | Average | ||
---|---|---|---|---|---|
2 nodes 4 cores |
Local | 2.91 s | 9.92 s | 5.39 s | 6.07 s |
Gluster | 10.84 s | 23.94 s | 7.81 s | 14.20 s | |
2 nodes 2 cores |
Local | 5.41 s | 19.14 s | 9.67 s | 11.41 s |
Gluster | 25.05 s | 31.91 s | 15.17 s | 24.04 s | |
4 nodes 2 cores |
Local | 5.57 s | 19.6 s | 9.79 s | 11.65 s |
Gluster | 30.56 s | 35.92 s | 18.36 s | 28.28 s | |
Local vs Gluster |
2 nodes, 4 cores | 273 % | 141 % | 45 % | 153 % |
2 nodes, 2 cores | 363 % | 67 % | 57 % | 162 % | |
4 nodes, 2 cores | 449 % | 83 % | 88 % | 206 % | |
Average | 361 % | 97 % | 63 % | 174 % | |
2 nodes vs 4 nodes |
Local | 3 % | 2 % | 1 % | 2 % |
Gluster | 22 % | 13 % | 21 % | 19 % | |
4 cores vs 2 cores |
Local | 86 % | 93 % | 79 % | 86 % |
Gluster | 131 % | 33 % | 94 % | 86 % |
- Red — Wordpress and Drupal have an acceptable loss in performance under Gluster, but Symfony is catastrophic.
- Blue — The local tests are slightly slower when using 4 nodes vs 2 nodes. This is normal, my computer had 4 VMs running.
- Green — The gluster tests are 20% slower on a 4 node setup because there is more communication between the nodes to keep them all in sync. 20% overhead for double the nodes isn’t that bad.
- Purple — The local tests are 85% quicker using 4 cores vs 2 cores. A bit under 100% is normal, there is always some overhead to parallel processing.
- Yellow — For the Gluster tests, Symfony and Drupal scale very well with the number of nodes, but Wordpress is stalling, I am not sure why.
Saturday, August 31, 2013
Service management utility for Mac OSX (launchctl helper)
service php restart
. On Mac, this is more like:
launchctl unload ~/Library/LaunchAgents/homebrew-php.josegonzalez.php55.plist
launchctl load ~/Library/LaunchAgents/homebrew-php.josegonzalez.php55.plist
Which is ugly, hard to remember and launchctl has no way of listing all available services. Plus, those plist can reside in all those directories:
- /System/Library/LaunchDaemons
- /System/Library/LaunchAgent
- /Library/LaunchDaemons
- /Library/LaunchAgents
- ~/Library/LaunchAgents
This is why I can up with an utility to manage services. It searches in all directories above for your service, prompts for sudo if it is in a system directory and provide goodies like
restart
, reload
and link
.Usage:
service selfupdate
- update from the Gist
service php
- searches for a plist containing 'php'
service php load|unload|reload
- insert or remove a plist from launchctl
service php start|stop|restart
- manage a daemon, but leave it in launchctl (does not work with Agents)
service php link
- If you use Homebrew, which you should, it will link the plist of this Formula into ~/Library/LaunchAgents, reloading if needed. Very useful when upgrading.
Manage all optional services at once
Saturday, April 13, 2013
LAMP Cluster — Distributed filesystem
The core concept of choosing a filesystem for a Web hosting cluster is to eliminate single points of failure, but sometimes it is just not easy like that. A true distributed system will still need to be performant, at least on reads. The problem relies in the fact that the bottleneck if very often the I/O so if your filesystem is not performant, you will end up spending a fortune on scaling, without gaining real performance.
Making priorities
You can’t have everything, so start by making a list of priorities. Different systems will have different needs, but I figured I could afford a possibility of failure as long as the system could be restorable since I would be keeping periodic backups.- Low maintenance
- It must be possible to read/write from any folder without adding a manifest for each site.
- The system must be completely autonomous and require no maintenance from a sysadmin. (Conflict management).
- Simple / Cheap
- Must be installed on each Web nodes or a maximum of 2 small/medium extra nodes
- Must run on Ubuntu, without recompiling the kernel. Kernel modules are acceptable.
- Performant
- Reads less than 50% slower than standard ext3 reads.
- Writes less than 80% slower than standard ext3 writes.
- Must be good at handling a lot of small files. Currently, my server hosts 470k files for a total of 6.8 GB. That is an average of 15 KB per file!
- Consistency
Changes must propagate to all servers within 5 seconds. - Uploaded files stored in database but not yet synced may generate some errors for a short period if viewed by other users on other servers.
- Temporary files are only relevant on the local machine so a delay is not a big deal.
- HTTP Sessions will be sticky at the LodeBalancer level so user specific information will be handled properly.
- Must handle ACLs
- For permissions to be set perfectly, we will be using ACLs.
- ACLs may not be readable within the Web node, but they must still be enforced.
- Durability
- Must handle filesystem failures — be repairable very quickly.
- File losses are acceptable in the event of a filesystem failure.
- Filesystem must continue to function even if a Web node goes offline.
- No single point of failure. If there is one, if must be isolated on its own machine.
A. Synchronisation
Synchronisation means that there is no filesystem solution, all the files are stored on the local filesystem and synchronisation is made with the other nodes periodically or by watching I/O events.Cluster synchronisation involving replication between all the nodes is usually very hard. To improve performance and reduce the risk of conflicts, it is often a good idea to elect a replication leader and a backup. If the leader is unavailable, the backup will be used instead. This way, all the nodes will sync with only one.
- Pros
- Very fast read/write
- Very simple to setup
- Cons
- May have troubles synchronizing ACLs
- May generate a lot of I/O
- Will most likely generate conflicts
Rsync
The typical tool for fast file syncing is rsync. It is highly reliable and a bit of BASH scripting will get you started. However, as the number of files grows, it may become slow. For around a million files, it may easily take over 5 seconds. With our needs, it means it will have to run continuously, which will generate a lot of I/O and thus impact the overall performance.Csync2
Csync2 is a promising tool that works like rsync, but it keeps file hints in a SQLite database. When a file changes, it flags in the database that the file needs checking. This way, the full sync only needs to check marked files.Csync2 supports multi-master replication and slaves (receive-only). However, I found while testing that it is not really adapted to a lot of small files changing frequently: it tends to generate a lot of conflicts that need to be attended manually.
It may not be the best solution for Web hosting, but for managing deployment of libraries or similar tasks, it would be awesome.
B. Simple sharing (NFS)
Even simpler than file syncing is plain old sharing. A node is responsible of hosting the files and serves the files directly. Windows uses Samba/CIFS, Mac uses AFP and Linux uses NFS.NFS is very old, like 1989 old. Even the latest version, NFSv4, came around in 2000. This means it is very stable and very good at what it does.
- Pros
- Supports ACLs (NFSv4)
- Very cheap and simple setup
- Up to a certain scale, fast read/write
- Cons
- Single point of failure
- Hard to setup proper failover
- Not scalable
C. Distributed / Replicated
A distributed filesystem may operate at a device, block or inode level. You can think of this a bit like a database cluster. It usually involves journals and is the most advanced solution.- Pros
- Very robust
- Scalable
- Cons
- Writes are often painfully slow
- Reads can also be slow
- Often complex to setup
GlusterFS
Gluster runs over Fuse and NFS. Each node can have its own block and the daemon handles the replication transparently, without the needs of a management node.Overall, it is very good software, the write performance is decent and it handles failures quite well. There has been a lot of recent work to improve caching, async writes, write-ahead, etc. However, in my experience, the read performance is disastrous. I really tried tuning it a lot, but I still feel like I haven’t found the true potential of this.
Ultimately, I had to let it down for the moment because of a lack of time to tune it more. It has a large community and is widely spread, so I will probably end up giving it another chance.
Lustre
Lustre seems like the Holy Grail of distributed filesystems. From Wikipedia: “At the present time, six of the top 10 and more than 60 of the top 100 supercomputers in the world have Lustre file systems in them.”It appears to have everything I could dream of: speed, scalability, locks, ACLs, you name it.
However, I was never able to try it. It requires dedicated machines with various roles: management, data, file servers (API). This means I would need 4-5 additional machines. On top of that, it needs custom kernel modules.
Definitely on my wish-list, but inaccessible for the moment.
DRBD
DRBD is not cluster solution, it does live backup. Usually, it is used to make a full mirror of a server that can be swapped with the master at any moment, should it fail. This is often used to patch solutions where replication is not built-it. Examples of this are NFS or MySQL. There is a way to setup a 3-nodes solution, but it is far from perfect.Conclusion
Maintenance | Complexity | Performance | Scalability | Durability | Consistency | ACLs | |
---|---|---|---|---|---|---|---|
Rsync | Low | Low | Very high | Low | High | Low | Yes |
Csync2 | High | Medium | Very high | Low | High | Low | Yes |
NFS | None | None | Medium | None | None | Very high | Enforced |
GlusterFS | None | Medium | Low | High | High | Very high | Yes |
Lustre | None | Very high | High | Very high | Very high | Very high | Yes |
DRBD | None | Medium | n/a | 2 or 3 | Very high | n/a | Yes |
LAMP Cluster — Choosing an Operating System
Beside choosing Linux vs Mac or Windows, the OS should not impact your users, it is mostly a sysadmin choice. Your users, the ones who will be connecting via SSH, will expect binaries to be available without modifying their PATH and common tools like Git or SVN to be already installed, but it does not really matter how it was installed.
The key to be sure that nobody has a hard time making everything work is to do things the most standard and common way possible.
Choose between the most used distributions
This is really important. Choosing a distribution for your laptop or your development server is not the same thing as choosing a production environment. Forget Gentoo and friends, being connected directly to the bare-bone of your system is nice when you are learning or building a world-class new system, but for you own setup, you want something tested by the whole community, something that works. Even if it involves a bit of magic.A good example of some magic is what Ubuntu does with networking. I admit that since 10.x, I don’t really understand all the cooperation between /etc/resolv.conf, /etc/network/interfaces, dhclient, /etc/init.d/networking and such. At some point, they all seem to redefine each other and in a particular release, a script will start to throw some warnings, but it works. Never has the network failed me on Ubuntu, which is something quite relevant when you need to access a remote machine.
Edge vs stable
Here are some of the top distributions, ordered by edginess:
Distribution | Apache | PHP | MySQL | Varnish |
---|---|---|---|---|
Ubuntu 12.10 | 2.2.22 | 5.4.6 | 5.5.29 | 3.0.2 |
Debian wheezy | 2.2.22 | 5.4.4 | 5.5.28 | 3.0.2 |
OpenSuse 12.3 | 2.2.22 | 5.3.17 | 5.5.30 | 3.0.3 |
Ubuntu 12.04 LTS | 2.2.22 | 5.3.10 | 5.5.29 | 3.0.2 |
Debian squeeze (stable) | 2.2.16 | 5.3.3 | 5.1.66 | 2.1.3 |
CentOS 6 | 2.2.15 | 5.3.3 | 5.1.66 | manual |
PHP 5.3.3, our main concern, was released in July 2010 and important fixes have occurred since, so this is out of the question.
Varnish 2 is very different from Varnish 3, so this needs to be looked at.
It is usually possible to install newer versions, but this implies relying on third-party packaging, multiple installed binaries or even compiling yourself.
Forget benchmarks
Conclusion
Thursday, March 21, 2013
LAMP Cluster — Comparison of different hosting platforms
The first step of building a hosting service is to choose your provider. Each system have its strengths and weaknesses and you will have to choose according to your needs and your proficiency at using the provided tools. This step is crucial and if possible, you should spend some time testing and benchmarking each of them to see if it matches your expectations.
DISCLAIMER: Below, I something mention prices; they are meant as an indication rather than a real comparison. Comparing different services can sometimes get very tricky as they don’t include the same things and their performance is difficult to compare effectively.
Platform as a service
Google App Engine
Windows Azure
- 4 small Web and Worker instances
- 1 small Linux Virtual Machine (for testing, management, etc.)
- 10 x 100 MB Databases
- 100 GB bandwith
- 100 GB storage
Heroku
A big plus is also the deployment procedure that is backed by Git. It involves describing a project with a configuration file and simply pushing to Heroku. There is quite a lot of examples out there on how to deploy Wordpress, Drupal, etc. If I wasn’t trying to build an infrastructure myself, I would definitely consider it strongly.
Virtual machines on physical hardware
If you plan on the long-term, investing in hardware might be a good idea. Hardware is way less expensive and providers like iWeb tend to give a lot of bandwidth (if not unlimited). Upgrades are usually way less expensive, but they involve downtime and risk.You still need some virtualization
For ease of management, you will almost certainly want a virtualization solution: this way you can create, backup, scale and migrate virtual machines in only a couple steps. In the most popular solutions, OpenStack is free and open source while VMware has a very good reputation with vCenter. The downside is that it means you have yet another thing to configure.You still need multiple servers
Why not let a third party do all this for you ?
Virtual private servers (VPS)
For the setup I will be talking about in another post, we need this setup:
- 1 small/medium management node
- 3 medium/large working nodes
- 2 small/medium utility nodes
- 1 small dev node
Amazon Web Services (AWS)
- Virtual machines (EC2)
- DNS servvices (Route53)
- Load balancing + Auto scaling
- Dedicated databases with automatic fallback (RDS)
- High performance I/O (EBS)
- Low performance, high durability I/O (S3)
- CDN (CloudFront)
- Highly configurable firewall
- And much much more
Google Compute Engine (GCE)
Windows Azure
- Multiple availability zones (like most other providers)
- Very easing permission management (you can give read-only access to your clients)
- Very powerful admin panel.
- Powerful recovery tools
- Remote connection via SSH or in-browser to the host so you can rescue your VM while it boots
- Possibility to switch kernels and reboot in rescue mode
- Possibility to reset root password from admin panel
- Possibility to rebuild from a backup or a fresh install without destroying the VM.
- Unexpensive Load Balancers
- Support for StackScripts, a way to run scripts while deploying a new VM
- High class (free) support. From my experience, replies typically take 1-5 minutes!
- Unlimited DNS zones
- Very high transfer caps
- Unmetered disk operations
- Unmetered Gigabit in-zone data transfer
Performance evaluation
When benchmarking for websites, you typically want a lot of small files (10kB - 1MB) that will be read sequentially and some big files (1MB-5MB) with a read/write ratio of about 95%.
Maybe more on that later.
Tuesday, March 12, 2013
Guide to replicated LAMP stack hosting with failover
Motivations on building on your own hosting
Defining the needs
Scalable
Highly Available
Secure
Compatible and flexible
Performant
Profitable
And the last but not the least, we like profits, so the whole system must have a predictable cost that can be forwarded to the appropriate client. Scalability plays a big role here because we can scale just as much as we need, when we need.Overview
- Hosting platform
- Cloud virtual machines
- Linode
- Amazon
- Rackspace
- Physical virtual machines
- iWeb (I’m in Montreal, Canada)
- Platform as a service
- Windows Azure
- Linux
- CentOS
- Ubuntu Server
- Debian
- Filesystem
- Synchronisation
- csync2
- rsync
- Distributed
- GlusterFS
- Lustre
- DRBD
- Shared
- NFS
- Load balancer
- Amazon / Linode Load balancer
- HAProxy
- Nginx
- Reverse proxy with caching
- Nginx
- Varnish
- Web server
- Apache
- 2.2 / 2.4
- Prefork / Worker / Event
- Nginx
- MySQL
- MySQL Cluster
- Master/Master replication
- Master/Slave replication + mysqlnd_ms
- Percona XtraDB Cluster + Galera
- PHP
- 5.2 / 5.3 / 5.4
- Apache module
- PHP-FPM
- Configuration system
- Puppet
- Chef
- Custom scripts
- Backups
- Full machine backups
- Rsync to remote machine
- Tarballs
- Monitoring
- Zabbix
- Nagios
- Ganglia
Tuesday, January 15, 2013
Generating a unique node id per hostname in Puppet
To generate a unique integer ID in Puppet, we can use the hostname and convert is to base 10 using a erb inline template.
Friday, January 4, 2013
Fixing permissions using inotify and ACLs
For each website:
- Each user must have read/write permission
- Each user's group must have read/write permission (reseller access)
- Apache must have read access
- Would be nice if admins had read/write as well
ACLs quirks
Running a script
inotifywait
Wednesday, December 26, 2012
Run a script with lowest priority
Friday, December 21, 2012
Choosing the best wireless router for a small office
After some unsuccessful searches, I asked the question on Reddit and I worked from there.
Our setup:
- 20~25 wired computers
- 20~25 wireless computers
- ~25 smartphones
- ~500 GB download/month
- ~300 GB upload/month
- Google Apps (Docs, Spreadsheet, Gmail, etc.) – A lot of AJAX requests
- SSH – Needs very low latency
- Skype – High bandwidth usage
- Dropbox – Moderate bandwidth usage
- Sending and receiving >5GB files
- Syncing thousands of files over FTP
- Dlink DIR-655 acting as the gateway: http://goo.gl/TXXs0[1]
- WRT54 acting as a router and a wifi AP, configured with DD-WRT: http://goo.gl/Wu89u[2]
I knew I wanted some sort of QoS to prioritize usage. Ex: Skype > SSH > Browsers > FTP and at our location, good quality connections at a decent price is pretty hard to find so I was looking forward to a Dual-WAN setup.
Upload speed
Network link speed negotiation
AP
Router
I had heard good comments about Draytek and so Reddit confirmed it. They are pretty solid, easy to configure and a much cheaper alternative than their Cisco counterparts. I finally went for a Draytek 2920n. This is about 250$, supports balancing and failover with a second Internet connection and it is managed in about the same fashion as a common router. The big difference is that it is much more powerful and can handle a traffic of a hundred devices without struggling. It was a godsend, we didn’t even need to upgrade the Internet connection, check it out.Other options
Monday, December 17, 2012
Varnish 3 configuration
However, there is some things that can be done to improve cacheability:
- Ignore Google Analytics cookies
- Remove empty Cookie line
- Normalize Accept-Encoding header
- Allow replying with a stale response if the backend is slow
- Remove some headers to reduce header size and hide some details about the server (security).
- Add a debug header to help understand why a request is cached or not.
Thursday, November 1, 2012
Simple templating system using Bash
Hence, for very simple needs, I started using simple scripts that would only replace variables and give me a basic template to start with. This is however not very flexible and needs to be adapted for each case. And so I did a templater that replaces variables with the value in the environment. It also supports defining default values and variable interpolation.
Example with Apache + FPM
{{LOG_DIR=/var/log/apache2}} {{RUN_DIR=/var/run/php-fpm}} {{FCGI=$RUN_DIR/$DOMAIN.fcgi}} {{SOCKET=$RUN_DIR/$DOMAIN.sock}} {{EMAIL=$USER@$DOMAIN}} {{DOC_ROOT=/home/$USER/sites/$DOMAIN/htdocs}} <VirtualHost *:80> ServerAdmin {{EMAIL}} ServerName {{DOMAIN}} ServerAlias www.{{DOMAIN}} DocumentRoot "{{DOC_ROOT}}" <Directory "{{DOC_ROOT}}"> AllowOverride All Order allow,deny Allow From All </Directory> AddHandler php-script .php Action php-script /php5.fastcgi virtual Alias /php5.fastcgi {{FCGI}} FastCGIExternalServer {{FCGI}} -socket {{SOCKET}} LogLevel warn CustomLog {{LOG_DIR}}/{{DOMAIN}}.access.log combined ErrorLog {{LOG_DIR}}/{{DOMAIN}}.error.log </VirtualHost>
Invocation
Help
Wednesday, October 10, 2012
Batch update modules or themes of Drupal 6/7 in command line
I wrote a simple script that parse the module's page and install or update the most up-to-date version.
This works for themes as well. To use it, simply go into a modules or themes folder and run one of the tool.
Valid folders are:
- /modules
- /themes
- /sites/all/modules
- /sites/all/themes
- /sites/*/modules
- /sites/*/themes
drupal-install-module.sh will install or update one module, drupal-update-modules.sh will batch update all modules in current folder.
Tuesday, October 9, 2012
Verifying DNS propagation
Finding your nameserver (NS)
Query your NS directly
Verifying NS propagation
Verifying world propagation
Clearing local and domain cache
- Windows: ipconfig /flushdns
- Mac: dscacheutil -flushcache or lookupd -flushcache
- Linux: Restart nscd and/or dnsmasq or equivalent
Contacting your ISP or bypassing them
Google:
- 8.8.8.8
- 8.8.4.4
OpenDNS:
- 208.67.222.222
- 208.67.220.220
Tuesday, September 18, 2012
Protect Webserver against DOS attacks using UFW
Typical setup of ufw is to allow HTTP(S), limit SSH and shut everything else. This is not a UFW or iptables tutorial, you may find a lot of online help to guide you through all your needs. However, I personally had a lot of difficulties to find good documentation on how to protect yourself against HTTP attacks.
A lot of HTTP requests is normal
The problem is that HTTP can get very noisy. A typical Web page can easily have up to a hundred of assets but usually, if you receive 100 requests in a second, it means you are under siege. If you really need to have 100 assets on a single Web page, you need a CDN, not as better server.Rate limiting
Connections per IP
Connections per Class C
Packets per IP
Configuring UFW
The following instructions are targeted at UFW, but it is really just a wrapper so it should be easy to adapt them for a generic system.Edit /etc/ufw/before.rules, putting each part where it belongs
Make sure ufw runs and reload everything using ufw reload.
Testing the results
- http://ha.ckers.org/slowloris/
Written in Perl, features a lot of common attacks, including HTTPS - http://www.sectorix.com/2012/05/17/hulk-web-server-dos-tool/
Written in Python, basic multi-threaded attack, very easy to use. - http://www.joedog.org/siege-home/
Compiled, available in Ubuntu repositories, very good to benchmark - http://blitz.io/
Online service when you can test freely with up to 250 concurrent users
Friday, September 14, 2012
Generate missing Nginx mime types using /usr/share/mime/globs
Linux uses a glob pattern to match a filename while Nginx matches only extension, but we can still use every glob in the format of *.ext
So here is a small PHP script converting/sorting/filtering/formatting everything in a nice output.
Wednesday, September 12, 2012
Configure ElasticSearch on a single shared host and reduce memory usage
However, you may want to use ElasticSearch on a single host, mixed with your Web server, database and everything. The problem is that ES is quite a CPU and memory hog by default. Here’s what I found through trial and error and some heavy search.
This idea is to give ES some power, but leave some for the rest of the services. At the same time, if you tell ES that it can grab half of your memory and the OS needs some, ES will get killed, which isn’t nice.
My host was configured this way:
- ElasticSearch 0.19.9, official .deb package
- Ubuntu 12.04
- 1.5GB of RAM
- Dual-Core 2.6ghz
- LEMP stack
- Allow user elasticsearch to lock memory
- Edit /etc/security/limits.conf and add:
elasticsearch hard memlock 100000 - Edit the init script: /etc/init.d/elasticsearch
- Change ES_HEAP_SIZE to 10-20% of your machine, I used 128m
ChangeMAX_OPEN_FILESto something sensible.Default is 65536, I used 15000
Update: I asked the question on ElasticSearch group and it may be a bad idea, without giving any advantage.- Change MAX_LOCKED_MEMORY to 100000 (~100MB)
Be sure to set it at the same value as 1.1 - Change JAVA_OPTS to "-server"
I don’t exactly know why, but if you check in the logs, you will see Java telling you to do so. - Edit the config file: /etc/elasticsearch/elasticsearch.yml
- Disable replication capabilities
- index.number_of_shards: 1
- index.number_of_replicas: 0
- Reduce memory usage
- index.term_index_interval: 256
- index.term_index_divisor: 5
- Ensure ES is binded to localhost
network.host: 127.0.0.1 - Enable blocking TCP because you are always on localhost
network.tcp.block: true - Flush and restart the server
- curl localhost:9200/_flush
- /etc/init.d/elasticsearch restart
Friday, July 13, 2012
Introducing Dotfiles Builder
Managing bashrc sucks
We all have our nice little bashrc that we are proud of. It tests for files, programs and terminal features, detect your OS version, builds a PATH, etc. For all of our OS and different setups, various solutions exist.
Keeping several versions
Pros:
- Ultimate fine-tuning
- Easy to understand
- Usually optimized for every setup
Cons:
- Very time consuming to manage
- Hard to “backport” new ideas
Keep a single unusable file with everything and edit accordingly
Pros:
- Easy to backport, you just need to rembember to do it
- Good performance
- Since you edit at each deployment, nice fine-tuning capability
Cons:
- The single file can become unbearably cluttered.
- You eventually end up managing several version.
- Tedious to edit at each deployment
Include several subfiles
Pros:
- Still have a lot fine-tuning capabilities
- If well constructed, can be easy to understand
- Easy to deploy new features
Cons:
- Hard to detect which file to include
- Multiplicates the number of files to manage
- Slow performance
- Until recently, this was my prefered method.
Wanted features
So, what does a good bashrc have?
Should have:
- Good performance. On a busy server, you really don't want to wait 5 seconds for your new terminal because your IO is sky rocketing.
- Reduce number of included files
- Reduce tests for environment features
- Reduce tests for program and files
- High flexibility
- Cross-OS compatible
- A lot of feature detection
- Ideally, configuration files
- Ease and speed of configuration
- It should not take more than a minute to setup a new bashrc
- If you need to specify your developer email, it would be nice to do it only once.
Yes, you read right, reduce tests AND do a lot of feature detection. You don't want to do Java specific configuration or set an empty variable if Java is not even installed, but you do want Java to be automatically detected.
Generating a bashrc
Let's face it, you will install or remove Java way less often then you will start a new shell. Why then test for Java at each new shell?
This is where I introduce the Dotfiles Builder. The script runs in Bash and outputs the wanted bashrc.
This way, instead of doing:
if [ -d "$HOME/bin" ]; then PATH="$HOME/bin:$PATH" fi
You would do:
if [ -d "$HOME/bin" ]; then echo "PATH=\"$HOME/bin:$PATH\"" fi
And the result would simply be
PATH="$HOME/bin:$PATH"
But constructing PATH is a rather common task and you want to make sure the folder is not already on your PATH. Why not wrap it up ?
Take a look at the alpha version:
https://github.com/lavoiesl/dotfiles-builder/
As well as the example output.
This is a very alpha version of the intended program, but I still want to share what I have and maybe get some feedback and collaborators along the way. Currently, it only generates a bashrc, but expect more to come.