New on LowEndTalk? Please Register and read our Community Rules.
All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
Distributed Filesystem over multible Server
Hello
I want to know what the best solution would be to create a distributed file system.
Example of what i want:
I have 6 Servers with 1TB storage each.
I want a filesystem with 3TB storage distributed over these Servers so that one or more Servers can go offline and the files are still accessable.
The Servers are all connected over WAN and the solution should be free or cheap.
The Filesystem should be expandable dynamically.
Is this possible?
How good / bad are the iops?
Thanks
Comments
yes , the free one is proxmox with clusters.
Ceph
No.. Ceph over WAN? You need Ceph on a 10G LAN minimum. Ceph is very latency sensitive.
What about GlusterFS?
Haven't had much experience with it, but I don't believe it's designed for higher latency WAN.
I know https://tahoe-lafs.org/trac/tahoe-lafs is designed for more distributed setups though.
Distributed storage over WAN, why would you ever want to do that?
You can do it over different DCs with a private 10G link.
Some time ago I tried glusterfs on a small gigabit lan for some workstations, it wasn't very stable or fast. In the end we went with the usual big 16x disk raid10 server with bonded interfaces.
glusterfs, but in general, the higher the latency is, more crappy is the performance.
moosefs will work if the locations are close enough,
The more latency between the various components the worse the performance will be.
Also unless it's a private WAN circuit use a VPN, don't expose any of these clustered filesystems directly to the internet.
I think storpool[dot]com will play for your needs. This is not free solution, but for your capacity will be affordable.
if you just want to store stuff and dont care much about the logistics behind it or how to access it... => take a look at drftpd.
https://drftpd.org/
CephFS over Infiniband-IB , LizardFS ( or MoosFS is same like Lizard but not free ) over Infiniband-IB, GlusterFS over 10G low latency network - for cloud or virtual servers
BeeGFS, HadoopFS( HDFS ) for HPC
If you work on MS, Win server 2019 the FS is very nice
How would MinIO S3 play in this?
Agree this way works!
drftpd.org is dead , links broken etc and today it lives at https://github.com/drftpd-ng/drftpd
I guess you could use http://www.net2ftp.com/or http://www.smoothftp.com/ as a webfrontend.
does ftpfs support the pret-list thing?
i have fond memories of using drftpd a long time ago.
No, LizardFS is based on MooseFS, which also offers a free version but some good stuff is only available in the commercial version. On the other hand MooseFS is well tested and established while LizardFS is relatively new and far less tested.
Sad story, I know.
@BigB
Forget about a simple answer because there are many factors in play some of which aren't immediately recognizable. Plus your priorities aren't clear enough.
I'm afraid you will have to look at the candidates and check them against your requirements and priorities. Possibly the hardest issue is your "one or more Servers can go offline and the files are still accessable" requirement hand in hand with "as cheap as possible".
Reason: redundancy is still a hard problem, at least beyond 2 times. One of the main reasons is that not all "ECC" algorithms are able to also correct errors (detecting an error is one thing, being able to also correct it is another thing). Plus, "correction capable error coding" beyond 2 errors get expensive in terms of overhead. Some systems for example need a minimum of 8 (or even more) servers and you might need triple (or even more) of your net capacity.
One good (in fact the only one I know of) solution is Mojette Transform erasure coding which however isn't widely available yet (and what is available isn't extensively tested and well established yet).
So, my advice is to clearly define your needs and to make a hard priority list ("hard" meaning that some lower priorities can be ignored if the highest priorities are met). And be prepared to not get all of what you want. Oh, and also keep the CAP theorem in mind (or look it up); it might also serve you as a guide when making your priority list.