Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!


Real-Time Filesystem Replication
New on LowEndTalk? Please Register and read our Community Rules.

All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.

Real-Time Filesystem Replication

raindog308raindog308 Administrator, Veteran
edited February 2013 in Help

I have two VPSes in two different states. I'd like to setup filesystem replication (one way).

One method would be to run rsync on a regular basis, or even continuously via script, but that seems kind of cheesy.

What is a good long-distance replication? Ideally it would sit there unused until I put a file in the filesystem on VPS #1, then it would fire up and replicate to VPS #2.

Bonus features:

  • Two-way synchronization
  • Some command-line status I could monitor ("VPS #2 is in sync", "VPS #2 is 140MB behind sync", etc.)
  • If I delete a file on VPS #2, it's smart enough to recreate it...this isn't vital though.

Comments

  • @raindog308 said: Two-way synchronization

    but

    @raindog308 said: If I delete a file on VPS #2, it's smart enough to recreate it...

    Deleting it on #2 would then delete it on #1 if you had two-way synchronization.

  • kamkam Member
    edited February 2013

    Check out lsyncd. Doesn't do all you want but worth a look http://code.google.com/p/lsyncd/

    ...guess I was too slow with my suggestion!

  • GlusterFS: http://www.gluster.org/

    It is what ran my cluster for a long time.

    Also, if you want something easier to setup (altough gluster setup is pretty simple) and something which just uses SSH, then here is bidirectional Rsync on steriods, Unison: http://www.cis.upenn.edu/~bcpierce/unison/

    Unison is a file-synchronization tool for Unix and Windows. It allows two replicas of a collection of files and directories to be stored on different hosts (or different disks on the same host), modified separately, and then brought up to date by propagating the changes in each replica to the other.
    
  • The docs state that Gluster requires a 64-bit OS as well as a recommended 1gb ram. It seems to me that the kind of memory on leb is far too low to warrant a 64-bit OS plus the fact that most leb have <= 1gb ram anyway.

    Do you actually need to meet these requirements in practice or are the docs just being extremely liberal with the specs?

  • twaintwain Member
    edited February 2013

    lsyncd, cool, this might work for the flat files for an Owncloud HA/failover cluster...

  • @MrOwen said: Do you actually need to meet these requirements in practice or are the docs just being extremely liberal with the specs?

    The latter. Runs fine on every vps I've had. With Gluster 3.2 and 3.3 then, vpss with max 512mb ram.

  • @Raymii said: The latter. Runs fine on every vps I've had. With Gluster 3.2 and 3.3 then, vpss with max 512mb ram.

    Oh nice. I've been thinking about putting Gluster on my machines but was holding out because of that.

    Also, did you run Gluster on 64-bit OSes and how much memory would you say Gluster eats up on average?

  • raindog308raindog308 Administrator, Veteran

    @twain said: lsyncd, cool, this might work for the flat files for an Owncloud HA/failover cluster...

    Along with MySQl replication.

    There would inevitably be some mis-synch between MySQL and the filesystem if you failed over, assuming your primary was busy. Unfortunately, I don't think MySQL has a "roll the database back to point in time X" feature. You can restore a backup and roll innodb forward to a point in time, but you can't take a current DB and say "go back to the state you were in 8 hours ago" or whatever point your filesystem as-of time was.

  • @raindog308 said: Along with MySQl replication.

    I would imagine master-master replication within MySQL would be optimal instead of having to deal with the databases at a file-system level.

  • raindog308raindog308 Administrator, Veteran

    @MrOwen said: I would imagine master-master replication within MySQL would be optimal instead of having to deal with the databases at a file-system level.

    Oh yes!

  • @MrOwen @raindog308 Galera or Tungsten Replicator are good for MM mysql replication

  • You can try DRBD.

    P.S.- I have never used it before.

  • DRBD will only work with Xen/KVM as it needs a block device. It's also active/passive so writes can only happen on 1 node at a time. Try MooseFS if you want a clustered FS where writes can happen on both or multiple nodes. It needs FUSE.

  • @MrOwen said: Also, did you run Gluster on 64-bit OSes and how much memory would you say Gluster eats up on average?

    32 bit. Don't have stats at the moment because I don't use it anymore, but as far as I can remember with about 16 GB of data the average was between 80/100MB ram.

    @MrOwen said: I would imagine master-master replication within MySQL would be optimal instead of having to deal with the databases at a file-system level.

    Master Master + HAProxy == win. Make sure you keep an eye on it because MM can go haywire sometimes...

    @biplab said: You can try DRBD.

    Not on OpenVZ because block device. If you are going to run a cluster fs like OCFS2 or GFS you also need special kernel modules.

  • @Raymii i'd really like to see some benchmarks as real time replication in Glusterfs is known to be heavily dependent on latency and performance is usually really really bad with nodes being in different places across WAN.

  • @Raymii if you don't mind sharing what is the current HA set up you have for your sites? I have been trying for years to get a HA setup done for my Wordpress sites (not that I need them but I thought it is fun to do) but with little sucess. Even with a master-master MySQL replication is not that easy. @Prometeus at one point has recommended a cluster setup with Percona I believe, but the resource specs required is far beyond LEB range

  • My vote is for Unison.

  • @sdotsen said: My vote is for Unison.

    You have personal experience using Unison? It kinda worries me that there's no more active development other than the occasional bug fix.

  • for Glusterfs you will need the FUSE filesystem plugin, which has to be installed by the host in most cases.

  • I have used GlusterFS and DRBD for this type of work -- DRBD is nice, but is only a active/passive item, so it wouldn't be as useful to you, as someone mentioned above. I don't think GlusterFS needs as much memory as they state in their docs, as I have used it on 256mb nodes before(pretty much OOM at that point). With gluster, creating a simple mirrored FS with your left over disk space is trivial.

  • raindog308raindog308 Administrator, Veteran

    The more I think about this, the more I realize I really only need one-way replication.

    The earlier conversation about HA OwnCloud intrigued me so I was thinking of setting something up. The MySQL piece is straightforward, but I was looking for something for the filesystem.

    The other problem of keeping the DB in sync with the filesystem may not be easily solvable.

  • @raindog308 said: The other problem of keeping the DB in sync with the filesystem may not be easily solvable.

    If you use MySQL M-M replication you would need a separate partition/directory for the mirrored files. You don't want to put the MySQL data on the mirrored filesystem while also doing M-M replication.

  • raindog308raindog308 Administrator, Veteran

    Correct. Filesystem replication is one track, MySQL replication is separate. The MySQL replication would be at the DB level, not the filesystem level.

    I was referring to the problem where they are out of sync, and you have entries in the DB that are not in the filesystem, or vice-versa.

  • @raindog308 said: Correct. Filesystem replication is one track, MySQL replication is separate. The MySQL replication would be at the DB level, not the filesystem level.

    >

    I was referring to the problem where they are out of sync, and you have entries in the DB that are not in the filesystem, or vice-versa.

    I can imagine this is a bit of an edge-case where it's not easily solved because of the limitations of OpenVZ leb. I also feel that not even the best commercial products are able to duplicate data in absolute real-time just because of factors arising from latency, size of files, and number of files.

    With the amount of traffic that I feel most of us handle, I think rsync running a few times an hour is good enough and usually the database (which can be replicated much quicker) is more important than a missing image or two if one of your boxes happened to go down (well, maybe files are more important if you're running OwnCloud).

  • The problem with rsync or Unison, is that it does not detect file/folder renames.
    So if a file or folder was renamed on the source machine, the old file will be deleted from the destination, and the renamed file will be transferred over a potentially slow and unreliable WAN.
    lsyncd seems to handle this by monitoring the folder with inotify and thus replicating the changes on the other host. I haven't used this, but this seems the best approach.

  • smansman Member
    edited July 2013

    I don't think rsync is cheesy at all. Especially since you only need one-way. You cannot use it for MySQL though.

Sign In or Register to comment.