Tuesday, October 27, 2015

XtreemFS (Unstable) is Available

We are currently working on the next stable release, which will be available soon. We published updated unstable builds via the openSUSE Build Service containing the following features of the upcoming release:
  • New quota implementation: We introduced volume quotas with XtreemFS 1.5.1. However, quotas were only enforced by the MRC service, which caused the possibility to write above a quota while a client holds a valid file handle. Our new quota implementation allows an exact enforcement while protecting against malicious clients. User and group quotas are now available beside the existing volume quotas. Tests have shown that the overhead of the new protocol is negligible. Note that quotas are currently not compatible with volumes that have been created with older XtreemFS versions. This will be available with the next stable release.
  • File system access tracing: We added a policy interface the the OSD service to trace read and write requests to files. We ship policies to write an access trace to a file or a network socket. We will add a RabbitMQ based policy for the stable release.
  • JNI based libxtreemfs: The Java version of libxtreemfs was rewritten using JNI, which improves the performance for parallel access and brings missing features to Java developers. A common code-base for Java and C++ will help to ship upcoming features quickly to both of the libraries.
  • Improved Hadoop adapter: The support for multiple volumes in the Hadoop adapter was extended. Furthermore, the Hadoop integration benefits from the JNI based libxtreemfs and supports asynchronous writes now.
We are thankful for any feedback. Please use our public mailing list or the Github issue tracker to report any problems.
The development of this release has been funded by the European Union Seventh Framework Program in the HARNESS project under grant agreement number 318521. 

Tuesday, March 24, 2015

Consistency while Adding and Removing Replicas

As mentioned in the release notes for XtreemFS 1.5.1 the adding and removing of replicas got more robust. Previously there had been border cases where access based on an outdated replica set could result in data inconsistencies. With XtreemFS 1.5.1 a protocol has been established that ensures consistency in any case.

To understand why those inconsistencies could occur, you have to recall the nature of file access in distributed systems like XtreemFS where metadata and file data is separated.
Opening a file results in a call to the Metadata and Replica Catalog (MRC) which returns the set of replicas and a capability. File data access, like reading or writing, is done directly among the client and the Object Storage Devices (OSD) that are listed in the replica set. The MRC is no longer involved, since access is granted as long as the capability is valid.

Data access in XtreemFS
It is apparent that without further action, different clients could obtain different replica sets for the same file, in case replicas have been added or removed in-between. As the quorum required by the R/W replication is also established by the replicas listed in the replica set, it is possible that different clients access data on a non intersecting sub sets of the replicas.

Consider for example the case that a file has five replicas called A, B, C, D and E. Then a valid majority is A, B and C, even if D and E are not online. This could happen for example if a link between some regions is highly unstable. If the replicas A, B and C are to be removed now, it has to be ensured both, that no client is allowed to write anymore data to A, B and C based on the old replica set, and that data previously not replicated to D and E is transferred to them prior to the installation of the new replica set consisting of just D and E.

The protocol that has been introduced with XtreemFS 1.5.1 does just that. It is extending the replica set to contain a version number, denies access based on outdated versions and uses the MRC to coordinate changes to the replica set between the involved replicas.

The coordination is central to the protocol and involves three stages. First a majority of the old replicas is getting invalidated, to ensure the data does not change during the second stage. The second stage ensures that the latest file data is transferred and updated on majority of the new replicas. Third and lastly the new replica set is installed with an incremented version number.

After the installation the new replica set is returned to clients opening the file. The installation on the replicas happens implicitly if a replica set with a higher version number is encountered.
Then again, if a client tries to access a replica with an outdated replica set is is denied. In XtreemFS 1.5.1 both the libxtreemfs for Java and C++ are handling errors based on outdated replica sets transparently by reloading the replica set from the MRC and retrying the request.

The new feature allows easily adding and removing of replicas for users and guarantees data consistency. Although it is compatible with clients build from the previous versions, it is recommend to update clients and servers simultaneously to profit from the transparent reloading of outdated replica sets.

Thursday, March 12, 2015

XtreemFS 1.5.1 Released

A new stable release of the distributed file system XtreemFS is available. XtreemFS 1.5.1 comes with the following major features:
  • Improved Hadoop support: The Hadoop Adapter supports Hadoop-2.x and other applications running on the YARN platform.
  • Consistent adding and removing replicas for R/W replication: Replica consistency is ensured while adding and removing replicas, xtfs_scrub can replace failed replicas automatically.
  • Improved SSL mode: The used SSL/TLS version is selectable, strict certificate chain checks are possible, the SSL code on client and server side was improved.
  • Better support for mounting XtreemFS using /etc/fstab: All mount parameters can be passed to the client by mount.xtreemfs -o option=value.
  • Initial version of an LD_PRELOAD based client: The client comes in the form of a library that can be linked to an application via LD_PRELOAD. File system calls to XtreemFS are directly forwarded to the services without FUSE. The client is intended for systems without FUSE or performance critical applications (experimental).
  • The size of a volume can be limited: Added quota support on volume level. The capacity limits are currently checked while opening a file on the MRC.
  • OSD health monitoring: OSDs can report their health, e.g. determined by SMART values, to the DIR. The results are aggregated in the DIR web interface. The default OSD selection policy can skip unhealthy OSDs.
  • Minor bugfixes and improvements across all components: See the CHANGELOG for more details and references to the issue numbers.
Furthermore we provide Dockerfiles to run the XtreemFS services in containers. The Dockerfiles are available in a separate Git repository at https://github.com/xtreemfs/xtreemfs-docker.

To ease contributing to XtreemFS for new developers, we added a Vagrantfile to the XtreemFS Git repository that allows setting up a virtual machine having all dependencies to build XtreemFS automatically.

The development of this release was partially funded by the European Commission in the HARNESS project under Grant Agreement No. 318521, as well as the German projects FFMK, GeoMultiSens and BBDC.

Monday, January 12, 2015

XtreemFS on AWS

We found this nice tutorial on running XtreemFS in the Amazon EC2 cloud. The video gives a brief introduction about installing the XtreemFS packages, configuring the services, creating and mounting volumes.
More complex setups using the AWS Elastic IP service require some additional effort. Please consider the following discussion on our mailinglist for details.