Did You Know?
Donate to TecMint

We are pleased to announce our new TecMint Q/A section to submit your Linux questions

Introduction to GlusterFS (File System) and Installation on RHEL/CentOS and Fedora

Download Your Free eBooks NOW - 10 Free Linux eBooks for Administrators

We are living in a world where data is growing in an unpredictable way and it our need to store this data, whether it is structured or unstructured, in an efficient manner. Distributed computing systems offer a wide array of advantages over centralized computing systems. Here data is stored in a distributed way with several nodes as servers.

GlusterFS Storage

GlusterFS Storage

The concept of a metadata server is no longer needed in a distributed file system. In distributed file systems, it offers a common view point of all the files separated among different servers. Files/directories on these storage servers are accessed in normal ways.

For example, the permissions for files/directories can be set as in usual system permission model, i.e. the owner, group and others. The access to the file system basically depends on how the particular protocol is designed to work on the same.

What is GlusterFS?

GlusterFS is a distributed file system defined to be used in user space, i.e. File System in User Space (FUSE). It is a software based file system which accounts to its own flexibility feature.

Look at the following figure which schematically represents the position of GlusterFS in a hierarchical model. By default TCP protocol will be used by GlusterFS.

GlusterFS Design

GlusterFS Design

Advantages to GlusterFS

  1. Innovation – It eliminates the metadata and can dramtically improve the performance which will help us to unify data and objects.
  2. Elasticity – Adapted to growth and reduction of size of the data.
  3. Scale Linearly – It has availability to petabytes and beyond.
  4. Simplicity – It is easy to manage and independent from kernel while running in user space.

What makes Gluster outstanding among other distributed file systems?

  1. Salable – Absence of a metadata server provides a faster file system.
  2. Affordable – It deploys on commodity hardware.
  3. Flexible – As I said earlier, GlusterFS is a software only file system. Here data is stored on native file systems like ext4, xfs etc.
  4. Open Source – Currently GlusterFS is maintained by Red Hat Inc, a billion dollar open source company, as part of Red Hat Storage.

Storage concepts in GlusterFS

  1. Brick – Brick is basically any directory that is meant to be shared among the trusted storage pool.
  2. Trusted Storage Pool – is a collection of these shared files/directories, which are based on the designed protocol.
  3. Block Storage – They are devices through which the data is being moved across systems in the form of blocks.
  4. Cluster – In Red Hat Storage, both cluster and trusted storage pool convey the same meaning of collaboration of storage servers based on a defined protocol.
  5. Distributed File System – A file system in which data is spread over different nodes where users can access the file without knowing the actual location of the file. User doesn’t experience the feel of remote access.
  6. FUSE – It is a loadable kernel module which allows users to create file systems above kernel without involving any of the kernel code.
  7. glusterd – glusterd is the GlusterFS management daemon which is the backbone of file system which will be running throughout the whole time whenever the servers are in active state.
  8. POSIX – Portable Operating System Interface (POSIX) is the family of standards defined by the IEEE as a solution to the compatibility between Unix-variants in the form of an Application Programmable Interface (API).
  9. RAID – Redundant Array of Independent Disks (RAID) is a technology that gives increased storage reliability through redundancy.
  10. Subvolume – A brick after being processed by least at one translator.
  11. Translator – A translator is that piece of code which performs the basic actions initiated by the user from the mount point. It connects one or more sub volumes.
  12. Volume – A volumes is a logical collection of bricks. All the operations are based on the different types of volumes created by the user.
Different Types of Volumes

Representations of different types of volumes and combinations among these basic volume types are also allowed as shown below.

Distributed Volume

Distributed Volume

Replicated Volume

Replicated Volume

Striped Volume

Striped Volume

Distributed Replicated Volume

Representation of a distributed-replicated volume.

Distributed Replicated Volume

Distributed Replicated Volume

Installation of GlusterFS in RHEL/CentOS and Fedora

In this article, we will be installing and configuring GlusterFS for the first time for high availability of storage. For this, we’re taking two servers to create volumes and replicate data between them.

Step :1 Have at least two nodes

  1. Install CentOS 6.5 (or any other OS) on two nodes.
  2. Set hostnames named “server1” and “server2“.
  3. A working network connection.
  4. Storage disk on both nodes named “/data/brick“.

Step 2: Enable EPEL and GlusterFS Repository

Before Installing GlusterFS on both the servers, we need to enable EPEL and GlusterFS repositories in order to satisfy external dependencies. Use the following link to install and enable epel repository under both the systems.

  1. How to Enable EPEL Repository in RHEL/CentOS

Next, we need to enable GlusterFs repository on both servers.

# wget -P /etc/yum.repos.d http://download.gluster.org/pub/gluster/glusterfs/LATEST/EPEL.repo/glusterfs-epel.repo

Step 3: Installing GlusterFS

Install the software on both servers.

# yum install glusterfs-server

Start the GlusterFS management daemon.

# service glusterd start

Now check the status of daemon.

# service glusterd status
Sample Output
service glusterd start
  service glusterd status
  glusterd.service - LSB: glusterfs server
   	  Loaded: loaded (/etc/rc.d/init.d/glusterd)
  	  Active: active (running) since Mon, 13 Aug 2012 13:02:11 -0700; 2s ago
  	 Process: 19254 ExecStart=/etc/rc.d/init.d/glusterd start (code=exited, status=0/SUCCESS)
  	  CGroup: name=systemd:/system/glusterd.service
  		  ├ 19260 /usr/sbin/glusterd -p /run/glusterd.pid
  		  ├ 19304 /usr/sbin/glusterfsd --xlator-option georep-server.listen-port=24009 -s localhost...
  		  └ 19309 /usr/sbin/glusterfs -f /var/lib/glusterd/nfs/nfs-server.vol -p /var/lib/glusterd/...

Step 4: Configure SELinux and iptables

Open ‘/etc/sysconfig/selinux‘ and change SELinux to either “permissive” or “disabled” mode on both the servers. Save and close the file.

# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
#     enforcing - SELinux security policy is enforced.
#     permissive - SELinux prints warnings instead of enforcing.
#     disabled - No SELinux policy is loaded.
SELINUX=disabled
# SELINUXTYPE= can take one of these two values:
#     targeted - Targeted processes are protected,
#     mls - Multi Level Security protection.
SELINUXTYPE=targeted

Next, flush the iptables in both nodes or need to allow access to the other node via iptables.

# iptables -F

Step 5: Configure the Trusted Pool

Run the following command on ‘Server1‘.

gluster peer probe server2

Run the following command on ‘Server2‘.

gluster peer probe server1

Note: Once this pool has been connected, only trusted users may probe new servers into this pool.

Step 6: Set up a GlusterFS Volume

On both server1 and server2.

# mkdir /data/brick/gv0

Create a volume On any single server and start the volume. Here, I’ve taken ‘Server1‘.

# gluster volume create gv0 replica 2 server1:/data/brick1/gv0 server2:/data/brick1/gv0
# gluster volume start gv0

Next, confirm the status of volume.

# gluster volume info

Note: If in-case volume is not started, the error messages are logged under ‘/var/log/glusterfs‘ on one or both the servers.

Step 7: Verify GlusterFS Volume

Mount the volume to a directory under ‘/mnt‘.

# mount -t glusterfs server1:/gv0 /mnt

Now you can create, edit files on the mount point as a single view of the file system.

Features of GlusterFS

  1. Self-heal – If any of the bricks in a replicated volume are down and users modify the files within the other brick, the automatic self-heal daemon will come into action as soon as the brick is up next time and the transactions occurred during the down time are synced accordingly.
  2. Rebalance – If we add a new brick to an existing volume, where large amount of data was previously residing, we can perform a rebalance operation to distribute the data among all the bricks including the newly added brick.
  3. Geo-replication – It provides back-ups of data for disaster recovery. Here comes the concept of master and slave volumes. So that if master is down whole of the data can be accessed via slave. This feature is used to sync data between geographically separated servers. Initializing a geo-replication session requires a series of gluster commands.

Here, is the following screen grab that shows the Geo-replication module.

Geo Replication

Geo Replication

Reference Links

GlusterFS Homepage

That’s it for now!. Stay updated for the detailed description on features like Self-heal and Re-balance, Geo-replication, etc in my upcoming articles.

Anoop C S

I am basically a FOSS enthusiast interested in working under GNU/Linux and system administration. Looking forward to become a part of an open source initiative. Currently pursue Computer Science & Engineering.

Linux Services & Free WordPress Setup

Our post is simply ‘DIY’ aka ‘Do It Yourself, still you may find difficulties and want us to help you out. We offer wide range of Linux and Web Hosting Solutions at fair minimum rates. Please submit your orders by Clicking Here.

7 Responses

  1. Shamimho says:

    Hi, I am a newbie.I have a question here. What would be better NAS(such as Openfiler or NAS4free) or Glusterfs. I am a little bit confused. Thanks for your nice posting.

    • Anoop C S says:

      @Shamimho,

      NAS is generally a term used for computer data storage inside a network and GlusterFS is always a way of doing the thing (as we can implement NAS via Samba or NFS etc).

      GlusterFS is a scale-out NAS file system. We can implement a NAS with GlusterFS. It is not bound to hardware-level as shown in the Design section above. It operates at user space (FUSE) which is fully software-driven.

      So it seems there is no meaning in comparing GlusterFS and NAS.

  2. Sanjay says:

    Good overview Anoop. What city are you based?

  3. Justdude says:

    You need mkdir -p if you create more than one directory at once as in your example.
    Also you created brick directory, but in gluster example lines you used brick1..
    And for gluster to work after reboot, you need to add it to startup with chkconfig glusterfsd on and chkconfig glusterd on :)

  4. Anoop C S says:

    @Justdude
    You are right. If the path does not exists, we need to provide the -p option for mkdir command. I will update the article with the necessary corrections including the ambiguity in naming the brick directories.

    And for gluster daemon to work after reboot you can also make use of the systemctl command as follows

    # systemctl enable glusterd

    Thanks for your valuable comment on the article.

  5. yan says:

    hi ,
    i am have a question?
    server:
    10.10.10.51
    10.10.10.52
    10.10.10.53
    10.10.10.54
    client:
    10.10.10.55

    server configure:
    gluster mode replica
    eg.
    gluster volume create gfs replica 4 transport tcp 10.10.10.51:/data/file1 10.10.10.52:/data/file1 10.10.10.53:/data/file1 10.10.10.54:/data/file1 force

    client :
    mount -t glusterfs 10.10.10.51:/gfs /mnt/gfs

    Now ,i am have a question

    if one of four server down; eg 10.10.10.51 down.
    client is disconnect ,I am need wait 90seconds, client is autoconnect, clinet is ok

    How to configure the server to shorten the time.

Leave a Reply

This work is licensed under a (cc) BY-NC | TecMint uses cookies. By using our services, you comply to use of our cookies. More info: Privacy Policy.
© 2012-2014 All Rights Reserved.