How to configure a distributed file system with replication using GlusterFS

Distributed file system between multiple servers is a thing I have planned for a long time, but I never got around to it because I first had to find the right filesystem for it.
After a lot of research, I found that GlusterFS was the right file system for me.
My plan was to use a distributed filesystem to share the content of my webservers to make sure all my webservers had the same content on their pages at all time and using some sort of high availability to make sure the content was always there.
To this I used a tool in Linux called Lsyncd before I set up my GlusterFS cluster, this worked well and did live syncing of all the servers using rsync. But there was one problem with it.
If I uploaded a lot of files (20+) to one webserver, and it started to sync to the other webserver before I was finished copying, the result would be that some of the files ended up corrupt, and this was a problem for me!

My distributed file system overview

Here is the overview of the setup I am making, it’s not pretty but I’m sure it’s a lot more easy to explain the setup using a simple mspaint drawing than with text!
GlusterFS distributed file system overview
As you can see in the picture above, I have build this with high availability in mind, since I want my websites to always be up and running!

If webserver-1 goes down, lets say from a hardware failure, webserver-2 is getting all the website traffic from the router using a load balancer (This guide will not cover webserver load balancing) until webserver-1 is back up and running again.
If GlusterFS-host-1 goes down, the files will still be available for the webservers using GlusterFS-host-2, and when GlusterFS-host-1 is restored the file system will resync GlusterFS-host-1 to get it’s files up-to-date!
This is in my opinion really smart, and has some great advantages to it. I can do hardware maintenance, have power failures or hardware failures on one server without my websites going offline.
Of course this can also be used if you only have 1 webserver but still want redundancy on your files.
You can use it for other stuff besides web content too. I use mine for FTP content as well as web site content.

What you will need

  • Two running Debian Wheezy servers for GLusterFS
  • Another Debian Wheezy server with Webserver, ftp server or other server software that needs to mount the distributed file system

How to setup and configure a distributed file system with high availability using GlusterFS

Servers

First you will need two servers or more.
I use Debian Wheezy 7.1 servers in my setup, and is what I will be configuring it on in this guide.
You can find a guide on creating Debian servers in this post
In this guide I will use 2 Debian wheezy 7.1 servers, with the following IP addresses (you should change these in the steps below to your own):
GlusterFS-host-1: 192.168.2.91
GlusterFS-host-2: 192.168.2.92
I will be installing GlusterFS version 3.4.1 manually!

Install and configure GlusterFS for replication

Install the server software

On all your Debian GlusterFS servers (Not the clients servers), install the GluserFS-server software by running the following commands.
(If the links no longer work, you can find the newest packages here: http://download.gluster.org/pub/gluster/glusterfs/3.4/LATEST/Debian/apt/pool/main/g/glusterfs/)

Download the packages
wget http://download.gluster.org/pub/gluster/glusterfs/3.4/LATEST/Debian/apt/pool/main/g/glusterfs/glusterfs-server_3.4.2-2_amd64.deb
wget http://download.gluster.org/pub/gluster/glusterfs/3.4/LATEST/Debian/apt/pool/main/g/glusterfs/glusterfs-client_3.4.2-2_amd64.deb
wget http://download.gluster.org/pub/gluster/glusterfs/3.4/LATEST/Debian/apt/pool/main/g/glusterfs/glusterfs-common_3.4.2-2_amd64.deb
wget http://download.gluster.org/pub/gluster/glusterfs/3.4/LATEST/Debian/apt/pool/main/g/glusterfs/glusterfs-dbg_3.4.2-2_amd64.deb
Install the dependicies
apt-get install fuse libdevmapper-event1.02.1 libaio1 libibverbs1 liblvm2app2.2 librdmacm1
Install the packages
dpkg -i glusterfs-common_3.4.2-2_amd64.deb
dpkg -i glusterfs-client_3.4.2-2_amd64.deb
dpkg -i glusterfs-dbg_3.4.2-2_amd64.deb
dpkg -i glusterfs-server_3.4.2-2_amd64.deb

Introduce the GlusterFS servers to eachother

ClusterFS servers have to know about eachother. To do this you have to “probe” from one server to the other by using the following command on GlusterFS-host-1 (remember to change the IP address to the one your GlusterFS-host-2 has):

gluster peer probe 192.168.2.92

This will tell GlusterFS-host-2 that GlusterFS-host-1 exists and the other way around. If you get the output below, they are now friends and talking to each other:
gluster-probe-ip
If you have more than one GlusterFS server, you have to do this to every server.
If you have 4 servers in your setup, you have to make the command above with server 2,3 and 4 IP addresses on server 1.
You do not have the run the command on other servers than your nr 1 GlusterFS server.
Make sure that the servers are talking by running the following command (It doesn’t matter which server you do this on):

gluster peer status

You should get something like this:
gluster-peer-status
If you have more than 2 servers in your setup, you should see all of them i the list, else there will only be 1, on server2 it will show the IP address of Server1, and on Server1 it will show the IP address of Server2.

Create the volume

Now you have to create a volume.
This is a “virtual folder” where the data will be stored, it can be anything you want, you just have to make sure that it exists on all servers running GlusterFS
In this guide I will create a distributed file system, across my 2 GlusterFS servers, and replicate the data so they are always in sync.
In this guide, I want the folder /www to be my volume, since this is where I want to store my web content, replace the folder with whatever you like, just make sure it exists or it will be silently created for you (which might result in you using the wrong disk or partition and run out of free space after a while)
I will also call the volume “www-volume”, you can change this name to whatever you like, just remember it.
Run the following command to create a replication volume across both servers, but replace my values with yours:

gluster volume create www-volume replica 2 transport tcp 192.168.2.91:/www 192.168.2.92:/www

If successful, you should see the lines in the picture below:
glusterfs-create-new-volume

Start the volume

When the volume has been created in the step above, you have to start it.
You do this by running the following command (replace www-volume with your volume name):

gluster volume start www-volume

You should see this output but with your volume name instead:
gluster-start-volume

Check if the volume is started

From this point on in the guide, my volume will be called www-vol and not www-volume in the screenshots. The reason for this is that I made a mistake, and had to start from scratch then I used another name for my volume, and forgot to take new pictures for the previous section of this guide.
Check if the volume is started/running by using the following command on both servers:

glusterfs volume status

2013-11-23-203338_489x156_scrot
There has to be a “Y” in every line under “Online” , it might take a minutes before everything turns to “Y” and not “N”, just run the command a few times.
you can also see information about your volume by using the following command:

gluster volume info

It should output “Status: Started” like in the picture below:
2013-11-23-203939_297x163_scrot

Mount the volume on other servers

You now have a GlusterFS volume running, with replication. But for it to do you any good, you have to mount it on your servers that need the files.
In this guide I will use my Gluster distributed file system to mount a shared /www directory on my two webservers, so they always have the same files, and always have access to them (see “Overview” at the top of this post).

Install Glusterfs client

On you webserver, or FTP server or any other server you want to use your new GlusterFS distributed File System on, install the GlusterFS-client by using the commands below (In this example, on Debian):
(If the links no longer work, you can find the newest packages here: http://download.gluster.org/pub/gluster/glusterfs/3.4/LATEST/Debian/apt/pool/main/g/glusterfs/)

Download the packages
wget http://download.gluster.org/pub/gluster/glusterfs/3.4/LATEST/Debian/apt/pool/main/g/glusterfs/glusterfs-client_3.4.2-2_amd64.deb
wget http://download.gluster.org/pub/gluster/glusterfs/3.4/LATEST/Debian/apt/pool/main/g/glusterfs/glusterfs-common_3.4.2-2_amd64.deb
wget http://download.gluster.org/pub/gluster/glusterfs/3.4/LATEST/Debian/apt/pool/main/g/glusterfs/glusterfs-dbg_3.4.2-2_amd64.deb
Install dependencies

Some other software is needed to get this working, install it all by running the following command:

apt-get install fuse libdevmapper-event1.02.1 libaio1 libibverbs1 liblvm2app2.2 librdmacm1
Install the packages
dpkg -i glusterfs-common_3.4.2-2_amd64.deb
dpkg -i glusterfs-client_3.4.2-2_amd64.deb
dpkg -i glusterfs-dbg_3.4.2-2_amd64.deb
Start Fuse

In my case, I had to start fuse manually, and add it to the startup modules file, without fuse startet, you will not be able to mount the GlusterFS volume later, I recommend adding fuse to your startup modules to make it automatic by adding “fuse” (Without ” “) to /etc/modules using your favorite text editor like Nano.
Fuse can also be loaded manually using the command below:

modprobe fuse
Reboot

If you added fuse to the /etc/module script, it’s a good idear to reboot here. Just to make sure it’s working correctly.
Reboot the server by using the following command:

reboot

Mounting redundant

Once installed, you have to mount the distributed file system. Create the file /etc/glusterfs/datastore.vol (Using “nano” on debian) on the server that needs to connect to the distributed file system and add the following lines to the file.
Replace the following with your info:
[GlusterFSHOST1] = Your server number 1 (Where I use 192.168.2.91)
[GlusterFSHOST2] = Your server number 2 (Where I use 192.168.2.92)
[volume-name] = You volume name (Where I use “www-volume”)

volume remote1
  type protocol/client
  option transport-type tcp
  option remote-host [GlusterFSHOST1]
  option remote-subvolume [volume-name]
end-volume
volume remote2
  type protocol/client
  option transport-type tcp
  option remote-host [GlusterFSHOST2]
  option remote-subvolume [volume-name]
end-volume
volume replicate
  type cluster/replicate
  subvolumes remote1 remote2
end-volume
volume writebehind
  type performance/write-behind
  option window-size 1MB
  subvolumes replicate
end-volume
volume cache
  type performance/io-cache
  option cache-size 512MB
  subvolumes writebehind
end-volume

Add the mount point for automounting

To make sure the server automatically mounts the distributed file system as a local mount point when rebootet, you have to edit the file /etc/fstab, and add the following line to the bottom:
(Replace [MOUNT-DIR] with the directory you want to mount it to, in my case this is aso /www here. It can be any valid folder on the client)

/etc/glusterfs/datastore.vol [MOUNT-DIR] glusterfs _netdev,rw,allow_other,default_permissions,max_read=131072 0 0

Testing replication

You now have your new distributed file system with replication mounted on your server! And it’s time to make sure it’s working properly.
On the server you mounted the volume on (The client), run the following command:
replace [MOUNT-DIR] with the directory you used in the step above to mount the file system.

touch [MOUNT-DIR]/replication-test.txt

This will create a text file on the mount, so now we check if it has been replicated to both GlusterFS servers.
Open your GlusterFS-host-1, and type the following command:
replace [MOUNT-DIR] with the folder you told GlusterFS to use when you created the volume at the “Create the volume” step in this guide.

ls [MOUNT-DIR] -lah

You should see the file named “replication-test.txt” in the list. Don’t mind the “lost+found” and “..” and “.”, they have to be there too.
Now, if you have the file in the list, do the same on GlusterFS-host-2 and make sure the same file is there too. If it is, the replication is working perfectly. You can then delete the testfile on the client again by using the command below:
(Repleace [MOUNT-DIR] with the directory used in the “Add the mount point for automounting” step)

rm [MOUNT-DIR]/replication-test.txt

That’s it. You’re done!
Have fun with it. Remember to test redundancy by powering of one of the GlusterFS servers too see if it handles high availability correctly.

4 thoughts on “How to configure a distributed file system with replication using GlusterFS

  1. Girish

    I have configured Gluster for replication and it works just fine, except that if I stop a volume on one of the servers, the volume gets stopped automatically on the other server too. Is there a way to prevent this?
    Thanks.

    Reply
    1. Steffan Post author

      If i remember correctly, this is by design. (i can’t test it in my production environment. sorry. so if someone could verify it, that would be great)
      If you stop the volume. the volume will be stopped completely. hence the “stopping the VOLUME”
      if you want to take a single brick offline, try disconnecting the network, remove the iptables rules or some other “failure” like that, you will then see the redundancy working 🙂

      Reply

Leave a Reply

Your email address will not be published.