How to set up your own distributed, redundant, and encrypted storage grid in a few easy steps
If you have a few different VPSes, you'll most likely have a significant amount of unused storage space across all of them. This guide will be a quick introduction to setting up and using Tahoe-LAFS, a distributed, redundant, and encrypted storage system - some may call it 'cloud storage'.
What are the requirements?
- At least 2 VPSes required, at least 3 VPSes recommended. More is better.
- Each VPS should have at least 256MB RAM (for OpenVZ burstable), or 128MB RAM (for OpenVZ vSwap and other virtualization technologies with proper memory accounting).
- Reading comprehension and an hour of your time or so
What is Tahoe-LAFS?
From the Tahoe-LAFS website:
Tahoe-LAFS is a Free and Open cloud storage system. It distributes your data across multiple servers. Even if some of the servers fail or are taken over by an attacker, the entire filesystem continues to function correctly, including preservation of your privacy and security.
How does Tahoe-LAFS work?
The short version: Tahoe-LAFS uses a RAID-like mechanism to store 'shares' (parts of a file) across the storage grid, according to the settings you specified. When a file is retrieved, all storage servers will be asked for shares of this file, and those that responded fastest will be used to retrieve the data from. The shares are reconstructed by the requesting client into the original file.
All shares are encrypted and checksummed; storage servers cannot possibly know or modify the contents of a share, or the file it derives from.
There are (roughly) two types of files: immutable (these cannot be changed afterwards) and mutable (these can be changed). Immutable files will result in a "read capability" (an encoded string that tells Tahoe-LAFS how to find it and how to decrypt it) and a "verify capability" (that can be used for verifying or repairing the file). A mutable file will also yield a "write capability" that can be used to modify the file. This way, it is possible to have a mutable file, but restrict the write capability to yourself, while sharing the read capability with others.
There is also a pseudo-filesystem with directories; while it isn't required to use this, it makes it possible to for example mount part of a Tahoe-LAFS filesystem via FUSE.
For more specifics, read this documentation entry.
How do I set it up?
1. Install dependencies
Follow the below instructions for all VPSes.
To install and run Tahoe-LAFS, you will need Python (with development files), setuptools, and the usual tools for compiling software. On Debian, this can be installed by running
apt-get install python python-dev python-setuptools build-essential. If you use a different distro, your package manager or package names may differ.
Python setuptools comes with a Python package manager (or installer, rather) named easy_install. We'd rather have pip as our Python package manager, so we'll install that instead:
After installing pip, we'll install the last dependency we need to install manually (
pip install twisted), and then we can install Tahoe-LAFS itself:
pip install allmydata-tahoe.
When you're done installing all of the above, you'll have to make a new user (
adduser tahoe) that you're going to use to run Tahoe-LAFS under. From this point on, run all commands as the
2. Setting up an introducer
First of all, you'll need an 'introducer' - this is basically the central server that all other nodes connect to, to be made aware of other nodes in the storage grid. While the storage grid will continue to function if the introducer goes down, no new nodes will be discovered, and there will be no reconnections to nodes that went down until the introducer is back up.
Preferably, this introducer should be installed on a server that is not a storage node, but it's possible to run an introducer and a storage node alongside each other.
Run the following on the VPS you wish to use as an introducer, as the
tahoe create-introducer ~/.tahoe-introducer tahoe start ~/.tahoe-introducer
Your introducer should now be started successfully. Read out the file
~/.tahoe-introducer/introducer.furl and note the entire contents down somewhere. You will need this later to connect the other nodes.
3. Setting up storage nodes
Now it's time to set up the actual storage nodes. This will involve a little more configuration than the introducer node. On each storage node, run the following command:
If all went well, a storage node should now be created. Now edit ~/.tahoe/tahoe.cfg in your editor of choice. I will explain all the important configuration values - you can leave the rest of the values unchanged. Note that the 'shares' settings all apply to uploads from that particular server - each machine connected to the network can pick their own encoding settings.
- nickname: The name for this particular storage node, as it will appear in the web panel.
- introducer.furl: The FURL for the introducer node - this is the address that you noted down before.
- shares.needed: This is the amount of shares that will be needed to reconstruct a file.
- shares.happy: This is the amount of different servers that have to be available for storing shares, for an upload to succeed.
- shares.total: The total amount of shares that should be created on upload. One storage node may hold more than one share, as long as it doesn't violate the shares.happy setting.
- reserved_space: The amount of space that should be reserved for other applications on this server. Read below for more information.