tag:blogger.com,1999:blog-16343276250523302742024-03-11T06:45:37.963+01:00XtreemFSThis is the project blog of XtreemFS - a distributed file and replicated file system.Unknownnoreply@blogger.comBlogger44125tag:blogger.com,1999:blog-1634327625052330274.post-61497571482254523072015-10-27T14:40:00.000+01:002015-10-27T15:00:24.565+01:00XtreemFS 1.5.1.84 (Unstable) is AvailableWe are currently working on the next stable release, which will be available soon. We published updated unstable builds via the <a href="http://download.opensuse.org/repositories/home:/xtreemfs:/unstable/" target="_blank">openSUSE Build Service</a> containing the following features of the upcoming release:<br />
<ul>
<li><b>New quota implementation:</b> We introduced volume quotas with XtreemFS 1.5.1. However, quotas were only enforced by the MRC service, which caused the possibility to write above a quota while a client holds a valid file handle. Our new quota implementation allows an exact enforcement while protecting against malicious clients. User and group quotas are now available beside the existing volume quotas. Tests have shown that the overhead of the new protocol is negligible. Note that quotas are currently not compatible with volumes that have been created with older XtreemFS versions. This will be available with the next stable release.</li>
<li><b>File system access tracing:</b> We added a policy interface the the OSD service to trace read and write requests to files. We ship policies to write an access trace to a file or a network socket. We will add a RabbitMQ based policy for the stable release.</li>
<li><b>JNI based libxtreemfs:</b> The Java version of libxtreemfs was rewritten using JNI, which improves the performance for parallel access and brings missing features to Java developers. A common code-base for Java and C++ will help to ship upcoming features quickly to both of the libraries.</li>
<li><b>Improved Hadoop adapter:</b> The support for multiple volumes in the Hadoop adapter was extended. Furthermore, the Hadoop integration benefits from the JNI based libxtreemfs and supports asynchronous writes now.</li>
</ul>
<div>
We are thankful for any feedback. Please use our public <a href="https://groups.google.com/forum/?fromgroups#!forum/xtreemfs" target="_blank">mailing list</a> or the <a href="https://github.com/xtreemfs/xtreemfs/issues" target="_blank">Github issue tracker</a> to report any problems.</div>
<div>
</div>
<div>
The development of this release has been funded by the European Union Seventh Framework Program in the <a href="http://www.harness-project.eu/" target="_blank">HARNESS</a> project under grant agreement number 318521. </div>
Christophhttp://www.blogger.com/profile/16640248567956748435noreply@blogger.com0tag:blogger.com,1999:blog-1634327625052330274.post-89475477677717261532015-03-24T14:12:00.000+01:002015-03-24T14:12:35.219+01:00Consistency while Adding and Removing ReplicasAs mentioned in the release notes for XtreemFS 1.5.1 the adding and removing of replicas got more robust. Previously there had been border cases where access based on an outdated replica set could result in data inconsistencies. With XtreemFS 1.5.1 a protocol has been established that ensures consistency in any case. <br />
<br />
To understand why those inconsistencies could occur, you have to recall the nature of file access in distributed systems like XtreemFS where metadata and file data is separated.<br />
Opening a file results in a call to the Metadata and Replica Catalog (MRC) which returns the set of replicas and a capability. File data access, like reading or writing, is done directly among the client and the Object Storage Devices (OSD) that are listed in the replica set. The MRC is no longer involved, since access is granted as long as the capability is valid. <br />
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj7QYeQbVJW9-6E6sntfnJmMBD59CixNRIEBYCXKI4H4-bzJmpBwJDXOnM-n_TILcg8QbLXo2vmMiCbHOjeBskHkonCYaj7hyphenhyphenvj4oFQ3CLS6W7Ujo9ieN3Q3VLyrsKGgmoUkZFlPRqkgsw/s1600/doodle_architecture_white.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img alt="" border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj7QYeQbVJW9-6E6sntfnJmMBD59CixNRIEBYCXKI4H4-bzJmpBwJDXOnM-n_TILcg8QbLXo2vmMiCbHOjeBskHkonCYaj7hyphenhyphenvj4oFQ3CLS6W7Ujo9ieN3Q3VLyrsKGgmoUkZFlPRqkgsw/s1600/doodle_architecture_white.png" height="120" title="Data access in XtreemFS" width="320" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Data access in XtreemFS</td></tr>
</tbody></table>
<div class="separator" style="clear: both; text-align: center;">
</div>
It is apparent that without further action, different clients could obtain different replica sets for the same file, in case replicas have been added or removed in-between. As the quorum required by the R/W replication is also established by the replicas listed in the replica set, it is possible that different clients access data on a non intersecting sub sets of the replicas. <br />
<br />
Consider for example the case that a file has five replicas called A, B, C, D and E. Then a valid majority is A, B and C, even if D and E are not online. This could happen for example if a link between some regions is highly unstable. If the replicas A, B and C are to be removed now, it has to be ensured both, that no client is allowed to write anymore data to A, B and C based on the old replica set, and that data previously not replicated to D and E is transferred to them prior to the installation of the new replica set consisting of just D and E. <br />
<br />
The protocol that has been introduced with XtreemFS 1.5.1 does just that. It is extending the replica set to contain a version number, denies access based on outdated versions and uses the MRC to coordinate changes to the replica set between the involved replicas.<br />
<br />
The coordination is central to the protocol and involves three stages. First a majority of the old replicas is getting <i>invalidated</i>, to ensure the data does not change during the second stage. The second stage ensures that the latest file data is transferred and <i>updated</i> on majority of the new replicas. Third and lastly the new replica set is <i>installed</i> with an incremented version number.<br />
<br />
After the installation the new replica set is returned to clients opening the file. The installation on the replicas happens implicitly if a replica set with a higher version number is encountered.<br />
Then again, if a client tries to access a replica with an outdated replica set is is denied. In XtreemFS 1.5.1 both the libxtreemfs for Java and C++ are handling errors based on outdated replica sets transparently by reloading the replica set from the MRC and retrying the request.<br />
<br />
The new feature allows easily adding and removing of replicas for users and guarantees data consistency. Although it is compatible with clients build from the previous versions, it is recommend to update clients and servers simultaneously to profit from the transparent reloading of outdated replica sets. <br />
<br />
<br />Anonymoushttp://www.blogger.com/profile/09705165219056948825noreply@blogger.com2tag:blogger.com,1999:blog-1634327625052330274.post-23775450221700124272015-03-12T15:38:00.000+01:002015-03-12T15:46:39.361+01:00XtreemFS 1.5.1 Released<div>
A new stable release of the distributed file system XtreemFS is available. XtreemFS 1.5.1 comes with the following major features:</div>
<ul>
<li> <b>Improved Hadoop support:</b> The Hadoop Adapter supports Hadoop-2.x and other applications running on the
YARN platform.
<div class="p">
<!----></div>
</li>
<li> <b>Consistent adding and removing replicas for R/W replication: </b>Replica consistency is ensured while adding and removing replicas,
xtfs_scrub can replace failed replicas automatically.
<div class="p">
<!----></div>
</li>
<li> <b>Improved SSL mode: </b>The used SSL/TLS version is selectable, strict certificate chain checks
are possible, the SSL code on client and server side was improved.
<div class="p">
<!----></div>
</li>
<li> <b>Better support for mounting XtreemFS using /etc/fstab: </b>All mount parameters can be passed to the client by mount.xtreemfs -o
option=value.
<div class="p">
<!----></div>
</li>
<li> <b>Initial version of an LD_PRELOAD based client: </b>The client comes in the form of a library that can be linked to an
application via LD_PRELOAD. File system calls to XtreemFS are directly
forwarded to the services without FUSE. The client is intended for systems
without FUSE or performance critical applications (experimental).
<div class="p">
<!----></div>
</li>
<li> <b>The size of a volume can be limited: </b>Added quota support on volume level. The capacity limits are currently
checked while opening a file on the MRC.</li>
<li> <b>OSD health monitoring: </b>OSDs can report their health, e.g. determined by SMART values, to the DIR.
The results are aggregated in the DIR web interface. The default OSD
selection policy can skip unhealthy OSDs.
<div class="p">
<!----></div>
</li>
<li><b>Minor bugfixes and improvements across all components: </b>See the CHANGELOG for more details and references to the issue numbers.</li>
</ul>
<div>
Furthermore we provide Dockerfiles to run the XtreemFS services in containers. The Dockerfiles are available in a separate Git repository at <a href="https://github.com/xtreemfs/xtreemfs-docker">https://github.com/xtreemfs/xtreemfs-docker</a>.</div>
<div>
<br /></div>
<div>
To ease contributing to XtreemFS for new developers, we added a Vagrantfile to the <a href="https://github.com/xtreemfs/xtreemfs" target="_blank">XtreemFS Git repository</a> that allows setting up a virtual machine having all dependencies to build XtreemFS automatically.</div>
<div>
<br /></div>
<div>
The development of this release was partially funded by the European Commission in the HARNESS project under Grant Agreement No. 318521, as well as the German projects FFMK, GeoMultiSens and BBDC.</div>
Christophhttp://www.blogger.com/profile/16640248567956748435noreply@blogger.com0tag:blogger.com,1999:blog-1634327625052330274.post-3411164330054424292015-01-12T11:50:00.001+01:002015-01-12T11:50:13.492+01:00XtreemFS on AWS<iframe allowfullscreen="" frameborder="0" height="315" src="//www.youtube.com/embed/1nmISXUQ5pM" width="420"></iframe><br />
<br />
We found this nice tutorial on running XtreemFS in the Amazon EC2 cloud. The video gives a brief introduction about installing the XtreemFS packages, configuring the services, creating and mounting volumes.<br />
More complex setups using the AWS Elastic IP service require some additional effort. Please consider the <a href="https://groups.google.com/forum/#!searchin/xtreemfs/nat/xtreemfs/3pxAQe_Acxs/Luf6v0rtS_0J" target="_blank">following discussion</a> on our mailinglist for details.Christophhttp://www.blogger.com/profile/16640248567956748435noreply@blogger.com2tag:blogger.com,1999:blog-1634327625052330274.post-63548860304935668282014-10-30T12:00:00.000+01:002014-10-30T12:33:36.205+01:00XtreemFS in Docker ContainersRecently, we were running the XtreemFS services in <a href="https://www.docker.com/" target="_blank">Docker</a> containers for one of our current <a href="http://www.harness-project.eu/" target="_blank">research projects</a> and would like to share our experiences. Docker is a container based virtualization solution that provides a certain level of isolation between applications running on the same machine.<br />
<div>
<div>
<br /></div>
<div>
Docker images are generated using a <i>Dockerfile.</i> <i>Dockerfiles</i> contain some metadata and a sequence of instructions that is executed to generate the image. Container images are derived from a base image, e.g. a standard Ubuntu Linux, and store only the changes made to this base image. As all XtreemFS services (<i>DIR</i>. <i>MRC</i> and <i>OSD</i>) are shipped in a common binary file (XtreemFS.jar), we created an <i>xtreemfs-common</i> image that contains the binaries and service specific images that inherit from the common image. The service specific images (<i>xtreemfs-dir</i>, <i>xtreemfs-mrc</i>, and <i>xtreemfs-osd</i>) contain only a service specific call to start each of the services.<br />
<br />
An application running in a Docker container is required to stay in foreground during the lifetime of the container, otherwise the container will terminate. This means for XtreemFS that we are not able to use our service specific init scripts to start the <i>DIRs</i>, <i>MRCs</i>, and <i>OSDs</i>. We extracted the relevant parts from the init scripts and created a <i>CMD</i> call, i.e. the command that will be executed after starting a container. As the XtreemFS logs are directly written to <i>stdout</i> and no longer to a file, one can easily use the <i>docker logs</i> call to check what happens in a container.<br />
<br />
A critical part of running a distributed file system in containers is to ensure that all file system contents are stored persistently, even beyond the lifetime of the container. Our <i>Dockerfiles</i> make use of Docker <i>volumes</i> to store file system contents. A <i>volume</i> is nothing else than a directory, which is mapped from the host machine to the container. The <i>CMD</i> call of our containers expect the service configuration to be placed in <i>/xtreemfs_data</i>, which have to be mapped as a <i>volume</i> to the container. Beside the configuration file, this <i>volume</i> can also be used to store file system contents. However, any other place is possible.<br />
<br />
Mapping the XtreemFS configuration files to a container by using a <i>volume</i> has also the advantage that our Docker images are generic and reusable. As a user can specify volumes and ports that have to be mapped to a container during its start, one can create an arbitrary XtreemFS service configuration files, named <i>dirconfig.properties</i>, <i>mrcconfig.properties</i>, or <i>osdconfig.properties,</i> and map all affected directories and ports at the container start time.<br />
<br />
After mapping network ports to a container, the underlying service is reachable via the IP address of the host. The XtreemFS services register themselves at the <i>directory service (DIR) </i>and propagate their own addresses. While running in containers, the services are not aware of the host's address they are reachable by. Each container knows only its address from an internal virtual network. We can go around this problem by setting the <i>hostname</i> parameter in the <i>MRC</i> and <i>OSD</i> configurations to the public address or name. This workaround has previously been used to run services that are reachable via a <i>NAT</i>.<br />
<br />
We provide the described <i>Dockerfiles</i> on <a href="https://github.com/xtreemfs/xtreemfs-docker" target="_blank">Github</a>. The repository contains a <i><a href="https://github.com/xtreemfs/xtreemfs-docker/blob/master/README.md" target="_blank">README</a></i> file with usage instructions. We may consider to publish them in the Docker index after additional testing and evaluating their use. The containers are currently derived from an Ubuntu base image and take the latest XtreemFS version from out <a href="https://github.com/xtreemfs/xtreemfs" target="_blank">GIT</a> repository. The <i>Dockerfiles</i> can be easily adapted to other Linux distributions or XtreemFS releases. We would be happy to get any feedback.</div>
</div>
Christophhttp://www.blogger.com/profile/16640248567956748435noreply@blogger.com0tag:blogger.com,1999:blog-1634327625052330274.post-84230326114444188462014-08-11T18:50:00.000+02:002015-02-03T15:56:27.136+01:00Mounting XtreemFS Volumes using AutofsAutofs is a useful tool to mount networked file systems automatically on access, for instance on machines without a permanent network connectivity like notebooks. We prepared a <a href="https://github.com/xtreemfs/xtreemfs/wiki/Mounting-an-XtreemFS-Volume-using-Automounter" target="_blank">short tutorial</a> that describes how to use automounter for XtreemFS volumes.<br />
<br />
This assumes you'd like a shared directory called /scratch/xtfs/shared across all of your machines and anyone can read/write to it. While I use /scratch in this example, more traditional /net could be used instead.<br />
<ul>
<li>Assume all of XtreemFS is installed, set up properly, volumes are created...</li>
<li>Have autofs installed (and started or not).</li>
<li>Create an /etc/auto.master with these contents:</li>
</ul>
<div>
<blockquote class="tr_bq">
# All xtreemfs volumes will be automounted in /scratch/xtfs<br />
/scratch/xtfs /etc/auto.xtfs<br />
#<br />
# Include /etc/auto.master.d/*.autofs<br />
#<br />
+dir:/etc/auto.master.d<br />
#<br />
# Include central master map if it can be found using<br />
# nsswitch sources.<br />
#<br />
# Note that if there are entries for /net or /misc (as<br />
# above) in the included master map any keys that are the<br />
# same will not be seen as the first read key seen takes<br />
# precedence.<br />
#<br />
+auto.master</blockquote>
</div>
<ul>
<li style="box-sizing: border-box;">Then create an /etc/auto.xtfs (which you'll have to modify for your DIR).</li>
</ul>
<blockquote class="tr_bq">
shared -fstype=fuse,allow_other :mount.xtreemfs#dir.example.com/volume-0</blockquote>
<ul>
<li style="box-sizing: border-box;">Restart autofs (A command similar to this):</li>
</ul>
<blockquote class="tr_bq">
sudo /etc/init.d/autofs restart</blockquote>
<ul>
<li style="box-sizing: border-box;">Do this for each machine on which you'd like to use autofs.</li>
</ul>
<div>
<span style="background-color: white; color: #333333; font-family: 'Helvetica Neue', Helvetica, Arial, freesans, sans-serif; font-size: 16px; line-height: 25.600000381469727px;">Thanks to Pete for contributing this tutorial!</span></div>
Christophhttp://www.blogger.com/profile/16640248567956748435noreply@blogger.com0tag:blogger.com,1999:blog-1634327625052330274.post-1387261467461869842014-06-03T17:28:00.003+02:002014-06-03T17:30:31.554+02:00XtreemFS moved to GithubWe moved our Git repository from Google Code to Github. The new Project page is available at <a href="https://github.com/xtreemfs/xtreemfs">https://github.com/xtreemfs/xtreemfs</a>. All tickets from the issue tracker have been migrated and are available with the same issue number. Other services like the public mailinglist or the binary package repositories are not affected.<br />
<br />
We are looking forward to your feedback and contributions.Christophhttp://www.blogger.com/profile/16640248567956748435noreply@blogger.com0tag:blogger.com,1999:blog-1634327625052330274.post-49933667883245171042014-03-27T11:25:00.000+01:002014-03-27T11:25:04.276+01:00Public demo server updated to XtreemFS 1.5We updated our public demo server to XtreemFS 1.5. To tryout XtreemFS without setting up an own server, just install the client and mount our volume:<br />
<br />
<span style="color: #333333; font-family: 'courier new'; font-size: 12px; line-height: 19px;">mkdir ~/xtreemfs_demo</span><span style="color: #333333; font-family: Arial, sans-serif; font-size: 12px; line-height: 19px;"> </span><br style="color: #333333; font-family: Arial, sans-serif; font-size: 12px; line-height: 19px;" /><span style="color: #333333; font-family: 'courier new'; font-size: 12px; line-height: 19px;">mount.xtreemfs demo.xtreemfs.org/demo ~/xtreemfs_demo</span><span style="color: #333333; font-family: Arial, sans-serif; font-size: 12px; line-height: 19px;"> </span><br style="color: #333333; font-family: Arial, sans-serif; font-size: 12px; line-height: 19px;" /><span style="color: #333333; font-family: 'courier new'; font-size: 12px; line-height: 19px;">cd ~/xtreemfs_demo</span><br />
<span style="color: #333333; font-family: 'courier new'; font-size: 12px; line-height: 19px;"><br /></span>
For testing you can create any directories and files as you like. Please do not upload anything illegal or copyrighted material. For legal reasons every file create/write is logged with the IP address and timestamp. Files are automatically deleted every hour.Christophhttp://www.blogger.com/profile/16640248567956748435noreply@blogger.com3tag:blogger.com,1999:blog-1634327625052330274.post-3968227547122840422014-03-12T23:36:00.000+01:002014-03-13T11:21:44.302+01:00XtreemFS 1.5 released: Improved support for Hadoop and SSDs<div dir="ltr" id="docs-internal-guid-57802feb-b868-4e48-2d24-af0dbfe88621" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: italic; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;">Berlin, Germany</span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;">. Today, we released a new stable version of the cloud file system XtreemFS.</span></div>
<div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"> </span></div>
<div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;">XtreemFS 1.5 (Codename "Wonderful Waffles") comes with the following major changes:</span></div>
<div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;">
<br /></div>
<ul style="margin-bottom: 0pt; margin-top: 0pt;">
<li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; list-style-type: disc; text-decoration: none; vertical-align: baseline;"><div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline;">Improved Hadoop Support:</span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"> Read and write buffers were added to improve the performance for small requests. We also implemented support for multiple volumes e.g., to store input and output on volumes with different replication policies.</span></div>
</li>
</ul>
<ul style="margin-bottom: 0pt; margin-top: 0pt;">
<li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: bold; list-style-type: disc; text-decoration: none; vertical-align: baseline;"><div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline;">SSDs support: </span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;">So far, an OSD was optimized for rotating disks by using a single thread for disk accesses. Solid State Disks (SSDs) cope well with simultaneous requests and show a higher throughput with increased parallelism. To achieve more parallelism per OSD when using SSDs, multiple storage threads are supported now.</span></div>
</li>
<li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: bold; list-style-type: disc; text-decoration: none; vertical-align: baseline;"><div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline;">Multi-Homing Support:</span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"> XtreemFS can be made available for multiple networks and clients will pick the correct address automatically.</span></div>
</li>
<li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: bold; list-style-type: disc; text-decoration: none; vertical-align: baseline;"><div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline;">Multiple OSDs per Machine</span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;">: Machines with multiple disks have to run an OSD for each disk. We simplified this process with the new xtreemfs-osd-farm init.d script.</span></div>
</li>
<li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: bold; list-style-type: disc; text-decoration: none; vertical-align: baseline;"><div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline;">Bugfixes for Read/Write and Read-Only Replication:</span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"> We fixed a problem which prevented read/write replicated files to fail-over correctly. Another problem was that the on-demand read-only replication could hang and access was stalled.</span></div>
</li>
<li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: bold; list-style-type: disc; text-decoration: none; vertical-align: baseline;"><div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline;">Replication Status Page: </span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;">The DIR status page has got a visualization for the current replica status of open files. For example it shows which replica is the current primary or if a replica is unavailable. </span></div>
</li>
</ul>
<div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;">
<br /></div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjBHltlJO99PQATLu-pBkbnvwv_8FXFnjdJdMea-iHd4P-xZUXPAtjNjh-Hm6e8ACV-RuLHmc1s0f145u7Vc_FeTpsywhNgwWXheeOWY5vsaL2v8xgZpLYSJl_1YCFDfTHXJZuKfc6NvkHO/s1600/XtreemFS+Replica+Status.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjBHltlJO99PQATLu-pBkbnvwv_8FXFnjdJdMea-iHd4P-xZUXPAtjNjh-Hm6e8ACV-RuLHmc1s0f145u7Vc_FeTpsywhNgwWXheeOWY5vsaL2v8xgZpLYSJl_1YCFDfTHXJZuKfc6NvkHO/s1600/XtreemFS+Replica+Status.png" height="351" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Replication Status Page: "osd0" is the backup replica for the open file, "osd1" the primary and "osd2" is currently unavailable.</td></tr>
</tbody></table>
<div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;">
</div>
<div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline;">Tutorial for Read/Write Replication Fail-Over</span></div>
<div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;">Do you want to see the new replication status page in action? </span><a href="http://code.google.com/p/xtreemfs/wiki/ContrailSummerSchoolHandsOn2013" style="text-decoration: none;"><span style="background-color: transparent; color: #1155cc; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: underline; vertical-align: baseline;">We prepared a tutorial which walks you through the setup of a read/write replicated XtreemFS volume on a single machine.</span></a><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"> </span><br />
<br />
<span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;">The tutorial lets you stream a video from the volume and simulate the outage of a replica. You'll learn about the details of the XtreemFS replication protocol and why the video stalls for some seconds and then playback resumes.</span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline;"> </span><br />
<br />
<span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline;">XtreemFS in a Briefcase</span></div>
<div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;">Our friends at AlmereGrid put the tutorial to the next level: They created a setup of eight Raspberry Pi mini-computers running XtreemFS - packaged in a briefcase! Check their website </span><a href="http://cloudcase.eu/" style="text-decoration: none;"><span style="background-color: transparent; color: #1155cc; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: underline; vertical-align: baseline;">CloudCase.eu</span></a><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"> for more details. Here's their video which shows the briefcase and the demonstrated fail-over:</span><br />
<span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"><br /></span></div>
<iframe align="middle" allowfullscreen="" frameborder="0" height="281" mozallowfullscreen="" src="//player.vimeo.com/video/85959951" webkitallowfullscreen="" width="500"></iframe> <br />
<a href="http://vimeo.com/85959951">CloudCase - XtreemFS Cloud file system demonstration</a> from <a href="http://vimeo.com/contrailproject">contrail-project</a>.<br />
<br />
<div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline;">Developing for XtreemFS</span></div>
<div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;">Did you know that you can use XtreemFS directly in your application with our C++ and Java client libraries? This way you avoid any overhead due to Fuse and can access advanced XtreemFS features which are only available through the maintenance tool "xtfsutil" otherwise e.g., adding replicas.</span><br />
<br />
<span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;">From using XtreemFS it's only a small step to dive into the XtreemFS source code itself. We collected several introductory documents for novices in a </span><a href="https://drive.google.com/folderview?id=0B6CtP7wyBUZlZl85azRqMWdIdDA&usp=sharing" style="text-decoration: none;"><span style="background-color: transparent; color: #1155cc; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: underline; vertical-align: baseline;">Google Drive folder "XtreemFS Public"</span></a><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;">. For example, have a look </span><a href="https://docs.google.com/document/d/1qyWixK4ajMflRAi2V_pZrPNmNqt2M7PZQYDdm_K_DR4/edit?usp=sharing" style="text-decoration: none;"><span style="background-color: transparent; color: #1155cc; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: underline; vertical-align: baseline;">how to setup the XtreemFS Server Java projects in Eclipse</span></a><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;">. Have fun!</span></div>
Michaelhttp://www.blogger.com/profile/08512619090626443539noreply@blogger.com0tag:blogger.com,1999:blog-1634327625052330274.post-59706586876967343542013-05-17T12:18:00.000+02:002013-05-17T12:18:16.777+02:00Processing a MRC metadata dump with XSLT<br />
<i>TL;DR We describe how to dump the metadata of an XtreemFS installation to a XML file. The XML dump is filtered for files located on a specific OSD using XSLT. You can use this example for own analyzes of your file system's metadata.</i><br />
<b id="internal-source-marker_0.7267183247022331" style="font-weight: normal;"><span style="font-family: inherit;"><br /></span></b>
At our <a href="http://www.zib.de/PVS">institute</a> we run an XtreemFS installation for scientific users. The installation spans 16 OSDs which are hosted at our site and are regularly accessed by three other institutes throughout Germany. During recent maintenance work we lost all chunks of one OSD by human error: I accidentally deleted all chunks of that OSD because I mistook the directory for a backup whereas it was the last remaining copy. Since the installation is meant for temporary scientific data, we decided against replication and backups at deployment to maximize the available capacity. (Single-disk failures are covered by the underlying RAID5 used on each OSD.)<br />
<br />
<br />
Nonetheless, it was necessary to inform all users about their deleted files. Therefore, I had to find out which files were placed on the affected OSD. XtreemFS stores the list of replicas per file at the MRC (<i>Metadata and Replica Catalog</i>). The MRC allows to dump and restore the metadata in XML format. To find the affected files, I filtered the XML dump using XSLT. This blog post details the required steps. You can use the provided example to run your own analyzes on your file system's metadata.<br />
<br />
<b><b><b><span style="font-family: inherit;">Create a MRC database dump</span><b><span style="font-weight: normal;"><span style="font-family: inherit;"><b></b></span></span></b></b></b><span style="font-weight: normal;"></span></b><br />
<span style="font-weight: normal;">You can use the XtreemFS tool <i><b>xtfs_mrcdbtool</b></i> to dump or restore the MRC database. The MRC will write/read the dump locally. Therefore, you have to specify where the MRC should write the dump on its machine:</span><br />
<blockquote class="tr_bq">
<span style="font-family: "Courier New",Courier,monospace;"><span style="font-weight: normal;"><span style="background-color: white;">xtfs_mrcdbtool -mrc mrc-host.example.com dump /tmp/dump.xml</span></span></span></blockquote>
This command will tell the MRC to write the database dump to the file <i>/tmp/dump.xml</i>. Make sure that the MRC has write permission for the given path. If you configured an "admin_password" for the MRC, you have to set the option <i>--admin_password</i> as well.<br />
<br />
<b><span style="font-weight: normal;"><span style="font-family: inherit;"><span style="background-color: transparent; font-size: 15px; vertical-align: baseline; white-space: pre-wrap;"><b><span style="font-size: small;">Filter the XML database dump using XSL<span style="font-size: small;">T</span></span></b></span></span></span></b><span style="font-weight: normal;"><span style="font-family: inherit;"><span style="background-color: transparent; font-size: 15px; vertical-align: baseline; white-space: pre-wrap;"><span style="font-size: small;"><span style="font-size: small;"><span style="font-size: small;"><span style="font-size: small;"> </span></span></span></span></span></span></span><br />
<span style="font-weight: normal;"><span style="font-family: inherit;"><span style="background-color: transparent; font-size: 15px; vertical-align: baseline; white-space: pre-wrap;"><span style="font-size: small;"><span style="font-size: small;"><span style="font-size: small;"><span style="font-size: small;">T</span>he MRC database dump is in XML format. The XML tree in the dump contains<span style="font-size: small;"> </span>the file system tree of each volume.</span></span></span></span></span></span><br />
<br />
<span style="font-weight: normal;"><span style="font-family: inherit;"><span style="background-color: transparent; font-size: 15px; vertical-align: baseline; white-space: pre-wrap;"><span style="font-size: small;"><span style="font-size: small;"><span style="font-size: small;">You can use XSLT </span></span></span></span></span></span>(Extensible Stylesheet Language Transformations) to filter the dump and transform the output to an even more human-readable form. I've added an example file to our code repository: <a href="http://code.google.com/p/xtreemfs/source/browse/contrib/filter-MRC-dump-with-XSLT/filter_files.xslt" target="_blank">filter_files.xslt</a> You have to use a XSLT processor to transform the original XML dump. For example, use <a href="http://xmlsoft.org/XSLT/xsltproc2.html" target="_blank">xsltproc</a>:<br />
<blockquote class="tr_bq">
<span style="font-family: "Courier New",Courier,monospace;">xsltproc -o filtered_files_output.txt filter_files.xslt /tmp/dump.xml</span></blockquote>
The resulting file <i>filtered_files_output.txt</i> will have the following output format:<br />
<blockquote class="tr_bq">
<i>volume name/path on volume|creation time|file size|file's owner name </i></blockquote>
Modify the filter_files.xslt file to include or exclude other file attributes. This example handles only files which are (at least partially) placed on an OSD with the UUID "zib.mosgrid.osd15". This is realized by the following instruction in the XSLT file which limits the set of selected "file" elements:<br />
<blockquote class="tr_bq">
<i><span style="font-family: "Courier New",Courier,monospace;"><xsl:template match="file[xlocList/xloc/osd/@location='zib.mosgrid.osd15']"></span></i></blockquote>
Write your own <a href="http://en.wikipedia.org/wiki/XPath" target="_blank">XPath</a> expression to realize own filters. If you want all files, just write <i>match="file"</i> without the brackets. Michaelhttp://www.blogger.com/profile/08512619090626443539noreply@blogger.com0tag:blogger.com,1999:blog-1634327625052330274.post-56958613339540958082012-11-13T19:08:00.001+01:002012-11-13T19:08:15.237+01:00XtreemFS 1.4 released at Supercomputing 2012<b id="internal-source-marker_0.5931102836038917" style="font-weight: normal;"><span style="font-family: Arial; font-size: 15px; font-style: italic; vertical-align: baseline; white-space: pre-wrap;">Salt Lake City, Utah</span><span style="font-family: Arial; font-size: 15px; vertical-align: baseline; white-space: pre-wrap;">. Today we released XtreemFS 1.4, a new stable release of the cloud file system XtreemFS. This release is the result of almost one thousand changes ("commits") to the code repository, and extensive testing throughout the year. We worked both on major improvements to the existing code and new features:</span></b><br />
<br />
<ul style="margin-bottom: 0pt; margin-top: 0pt;">
<li style="font-family: Arial; font-size: 15px; list-style-type: disc; vertical-align: baseline;"><span style="font-size: 15px; font-weight: bold; vertical-align: baseline; white-space: pre-wrap;">Improved stability:</span><span style="font-size: 15px; vertical-align: baseline; white-space: pre-wrap;"> Clients and servers are rock solid now. In particular, we fixed client crashes due to network timeouts and issues with the Read/Write file replication.</span></li>
</ul>
<ul style="margin-bottom: 0pt; margin-top: 0pt;">
<li style="font-family: Arial; font-size: 15px; list-style-type: disc; vertical-align: baseline;"><span style="font-size: 15px; font-weight: bold; vertical-align: baseline; white-space: pre-wrap;">Asynchronous writes: </span><span style="font-size: 15px; vertical-align: baseline; white-space: pre-wrap;">Once enabled (mount option "--enable-async-writes"), write() requests will be executed in the background. This improves the write throughput without weakening semantics. We recommend to enable async writes.</span></li>
</ul>
<ul style="margin-bottom: 0pt; margin-top: 0pt;">
<li style="font-family: Arial; font-size: 15px; list-style-type: disc; vertical-align: baseline;"><span style="font-size: 15px; font-weight: bold; vertical-align: baseline; white-space: pre-wrap;">Windows Client (beta):</span><span style="font-size: 15px; vertical-align: baseline; white-space: pre-wrap;"> Complete rewrite based on the stable C++ libxtreemfs and using the Dokan alternative </span><a href="http://www.eldos.com/cbfs/"><span style="color: #1155cc; font-size: 15px; vertical-align: baseline; white-space: pre-wrap;">Callback File System</span></a><span style="font-size: 15px; vertical-align: baseline; white-space: pre-wrap;"> by EldoS corporation. </span><a href="http://code.google.com/p/xtreemfs/downloads/detail?name=xtreemfs_windows_client_1.4.exe"><span style="color: #1155cc; font-size: 15px; vertical-align: baseline; white-space: pre-wrap;">Try it</span></a><span style="font-size: 15px; vertical-align: baseline; white-space: pre-wrap;"> by mounting our public demo server!</span></li>
</ul>
<ul style="margin-bottom: 0pt; margin-top: 0pt;">
<li style="font-family: Arial; font-size: 15px; list-style-type: disc; vertical-align: baseline;"><span style="font-size: 15px; font-weight: bold; vertical-align: baseline; white-space: pre-wrap;">Hadoop support:</span><span style="font-size: 15px; vertical-align: baseline; white-space: pre-wrap;"> Use XtreemFS as replacement for HDFS in your Hadoop setup. This version of XtreemFS comes with a rewritten Hadoop client based libxtreemfs for Java which also provides data locality information to Hadoop.</span></li>
</ul>
<ul style="margin-bottom: 0pt; margin-top: 0pt;">
<li style="font-family: Arial; font-size: 15px; list-style-type: disc; vertical-align: baseline;"><span style="font-size: 15px; font-weight: bold; vertical-align: baseline; white-space: pre-wrap;">libxtreemfs for Java:</span><span style="font-size: 15px; vertical-align: baseline; white-space: pre-wrap;"> Access XtreemFS directly from your Java application. See the </span><a href="http://www.xtreemfs.org/userguide.php"><span style="color: #1155cc; font-size: 15px; vertical-align: baseline; white-space: pre-wrap;">user guide</span></a><span style="font-size: 15px; vertical-align: baseline; white-space: pre-wrap;"> for more information.</span></li>
</ul>
<ul style="margin-bottom: 0pt; margin-top: 0pt;">
<li style="font-family: Arial; font-size: 15px; list-style-type: disc; vertical-align: baseline;"><span style="font-size: 15px; font-weight: bold; vertical-align: baseline; white-space: pre-wrap;">Vivaldi integration:</span><span style="font-size: 15px; vertical-align: baseline; white-space: pre-wrap;"> The Vivaldi replica placement and selection policies enable clients to select close-by replicas based on actual network latencies. These latencies are estimated using virtual network coordinates which are also visualized in the DIR web-interface. Check out the demonstration on the </span><a href="http://demo.xtreemfs.org:30638/vivaldi"><span style="color: #1155cc; font-size: 15px; vertical-align: baseline; white-space: pre-wrap;">web-interface</span></a><span style="font-size: 15px; vertical-align: baseline; white-space: pre-wrap;"> of our public demo server.</span></li>
</ul>
<ul style="margin-bottom: 0pt; margin-top: 0pt;">
<li style="font-family: Arial; font-size: 15px; list-style-type: disc; vertical-align: baseline;"><span style="font-size: 15px; font-weight: bold; vertical-align: baseline; white-space: pre-wrap;">Extended OSD Selection:</span><span style="font-size: 15px; vertical-align: baseline; white-space: pre-wrap;"> Now you can assign custom attributes to OSDs and limit the placement of files on OSDs based on those attributes.</span></li>
</ul>
<b style="font-weight: normal;"><br /></b><b style="font-weight: normal;"><span style="font-family: Arial; font-size: 15px; vertical-align: baseline; white-space: pre-wrap;">This version also includes an updated version of the DIR/MRC replication and adds fail-over support for DIR replicas. As DIR/MRC replication is still in a very early stage this feature is intended as technology preview for more experimental users.</span></b><br />
<b style="font-weight: normal;"><span style="font-family: Arial;"><span style="font-size: 15.199999809265137px; white-space: pre-wrap;"><br /></span></span></b><b style="font-weight: normal;"><span style="font-family: Arial; font-size: 15px; vertical-align: baseline; white-space: pre-wrap;">We are currently at the </span><span style="font-family: Arial; font-size: 15px; font-weight: bold; vertical-align: baseline; white-space: pre-wrap;">Supercomputing 2012</span><span style="font-family: Arial; font-size: 15px; vertical-align: baseline; white-space: pre-wrap;"> exhibition where we present XtreemFS at the Contrail booth #2535 as part of the Contrail project. Since the event takes place in </span><span style="font-family: Arial; font-size: 15px; font-weight: bold; vertical-align: baseline; white-space: pre-wrap;">Salt Lake City, Utah</span><span style="font-family: Arial; font-size: 15px; vertical-align: baseline; white-space: pre-wrap;">, we decided for </span><span style="font-family: Arial; font-size: 15px; font-weight: bold; vertical-align: baseline; white-space: pre-wrap;">"Salty Sticks" as release name</span><span style="font-family: Arial; font-size: 15px; vertical-align: baseline; white-space: pre-wrap;"> for the 1.4 version.</span></b><br />
<b style="font-weight: normal;"><span style="font-family: Arial;"><span style="font-size: 15.199999809265137px; white-space: pre-wrap;"><br /></span></span></b><b style="font-weight: normal;"><span style="font-family: Arial; font-size: 15px; font-weight: bold; vertical-align: baseline; white-space: pre-wrap;">Request for Contributions</span></b><br />
<b style="font-weight: normal;"><span style="font-family: Arial; font-size: 15px; vertical-align: baseline; white-space: pre-wrap;">As XtreemFS is an open source project, we are always looking forward to external contributions and we believe that this release serves as an ideal starting point for that. Here's an incomplete list of things you might be interested to contribute:</span></b><br />
<ul style="margin-bottom: 0pt; margin-top: 0pt;">
<li style="font-family: Arial; font-size: 15px; list-style-type: disc; vertical-align: baseline;"><span style="font-size: 15px; vertical-align: baseline; white-space: pre-wrap;">chef recipe or puppet configuration for automatic deployment</span></li>
</ul>
<ul style="margin-bottom: 0pt; margin-top: 0pt;">
<li style="font-family: Arial; font-size: 15px; list-style-type: disc; vertical-align: baseline;"><span style="font-size: 15px; vertical-align: baseline; white-space: pre-wrap;">a fancy Qt GUI for the client</span></li>
</ul>
<ul style="margin-bottom: 0pt; margin-top: 0pt;">
<li style="font-family: Arial; font-size: 15px; list-style-type: disc; vertical-align: baseline;"><span style="font-size: 15px; vertical-align: baseline; white-space: pre-wrap;">S3-compatible interface based on the client library libxtreemfs</span></li>
</ul>
<ul style="margin-bottom: 0pt; margin-top: 0pt;">
<li style="font-family: Arial; font-size: 15px; list-style-type: disc; vertical-align: baseline;"><span style="font-size: 15px; vertical-align: baseline; white-space: pre-wrap;">direct integration with Qemu/KVM using the C++ libxtreemfs</span></li>
</ul>
<b style="font-weight: normal;"><br /></b><b style="font-weight: normal;"><span style="font-family: Arial; font-size: 15px; font-weight: bold; vertical-align: baseline; white-space: pre-wrap;">XtreemFS Survey</span></b><b style="font-weight: normal;"><br /></b><b style="font-weight: normal;"><span style="font-family: Arial; font-size: 15px; vertical-align: baseline; white-space: pre-wrap;">At last, do not forget to fill out </span><a href="http://xtreemfs.blogspot.com/2012/10/xtreemfs-user-survey.html"><span style="color: #1155cc; font-family: Arial; font-size: 15px; vertical-align: baseline; white-space: pre-wrap;">our survey</span></a><span style="font-family: Arial; font-size: 15px; vertical-align: baseline; white-space: pre-wrap;"> if you use/have used/plan XtreemFS.</span></b><br />
Michaelhttp://www.blogger.com/profile/08512619090626443539noreply@blogger.com1tag:blogger.com,1999:blog-1634327625052330274.post-80583962082742520792012-10-11T16:17:00.002+02:002012-10-11T16:17:50.715+02:00XtreemFS User Survey<span style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 13px;">XtreemFS is free software with an anonymous download, and therefore we</span><br style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 13px;" /><span style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 13px;">only know a fraction of our users. If you are using, have been using,</span><br style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 13px;" /><span style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 13px;">or plan to use XtreemFS, we would love to hear from you!</span><br style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 13px;" /><br style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 13px;" /><span style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 13px;">To that end, we ask you to fill out this year's XtreemFS survey:</span><br style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 13px;" /><a href="https://docs.google.com/spreadsheet/viewform?formkey=dFNzSnQyb2VqTXZOSXJhVnlkc1FPQlE6MQ" style="background-color: white; color: #1155cc; font-family: arial, sans-serif; font-size: 13px;" target="_blank">https://docs.google.com/<wbr></wbr>spreadsheet/viewform?formkey=<wbr></wbr>dFNzSnQyb2VqTXZOSXJhVnlkc1FPQl<wbr></wbr>E6MQ</a><br style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 13px;" /><br style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 13px;" /><span style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 13px;">We know that this will take a few minutes of your time, but your</span><br style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 13px;" /><span style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 13px;">responses will help us tremendously.</span><br style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 13px;" /><br style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 13px;" /><span style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 13px;">If you feel uncomfortable sharing specific information, just skip the</span><br style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 13px;" /><span style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 13px;">question. But be assured that your information will not be shared with</span><br style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 13px;" /><span style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 13px;">anyone.</span><br style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 13px;" /><br style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 13px;" /><span style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 13px;">For any questions or direct feedback, write to </span><a href="mailto:felix@xtreemfs.org" style="background-color: white; color: #1155cc; font-family: arial, sans-serif; font-size: 13px;">felix@xtreemfs.org</a>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-1634327625052330274.post-40343511293485239012012-07-11T00:35:00.003+02:002012-07-11T23:11:43.911+02:00What is object-based storage (and what it is not)<b id="internal-source-marker_0.7267183247022331" style="font-weight: normal;"><span style="font-family: inherit;"><span style="background-color: transparent; font-size: 15px; vertical-align: baseline; white-space: pre-wrap;"><i>TL;DR Object-based storage is a term that categorizes the internal architecture of a file system, it is not a particular features set or interface. While the internal architecture of a file system has many implications for its performance and features, its outer appearance remains that of a file system.</i></span><br /><span style="background-color: transparent; font-size: 15px; vertical-align: baseline; white-space: pre-wrap;"></span><br /><span style="background-color: transparent; font-size: 15px; vertical-align: baseline; white-space: pre-wrap;">We have often stressed the fact that XtreemFS is an object-based file system. While talking to our users, however, we have realized that this term causes more confusion than enlightenment. I blame this poor choice on our academic ignorance, and I hope I can clean up the confusion a bit. In the end, </span><span style="background-color: transparent; font-size: 15px; font-style: italic; vertical-align: baseline; white-space: pre-wrap;">object</span><span style="background-color: transparent; font-size: 15px; vertical-align: baseline; white-space: pre-wrap;"> is not a very descriptive term and most people associate it with object-oriented programming (totally unrelated) or the objects in Amazon's S3 system (only somewhat related).</span><br /><span style="background-color: transparent; font-size: 15px; vertical-align: baseline; white-space: pre-wrap;"></span><br /><span style="background-color: transparent; font-size: 15px; vertical-align: baseline; white-space: pre-wrap;">In storage, an </span><span style="background-color: transparent; font-size: 15px; font-style: italic; vertical-align: baseline; white-space: pre-wrap;">object</span><span style="background-color: transparent; font-size: 15px; vertical-align: baseline; white-space: pre-wrap;"> is a variable-sized, but limited container of bytes. You probably wonder why this trivial concept deserves its own term and became relevant to the storage community at all. Well, this has mostly two aspects - first the name itself, then its main property, namely the fact that it is variable-sized.</span><br /><span style="background-color: transparent; font-size: 15px; vertical-align: baseline; white-space: pre-wrap;"></span><br /><span style="background-color: transparent; font-size: 15px; font-weight: bold; vertical-align: baseline; white-space: pre-wrap;">Block, Blocks, Blocks</span><br /><span style="background-color: transparent; font-size: 15px; vertical-align: baseline; white-space: pre-wrap;"></span><br /><span style="background-color: transparent; font-size: 15px; vertical-align: baseline; white-space: pre-wrap;">While storage hardware keeps a series of bytes, no storage hardware exports byte-level interfaces (disks, tapes, flash, even RAM). The reason is efficiency: addressing single bytes would yield long and many addresses (metadata overhead), but also reading and writing single bytes is inefficient (think checksums, latency, seeking, etc). The unit that is actually used is blocks, a fixed-size container of bytes.</span><br /><span style="background-color: transparent; font-size: 15px; vertical-align: baseline; white-space: pre-wrap;"></span><br /><span style="background-color: transparent; font-size: 15px; vertical-align: baseline; white-space: pre-wrap;">File systems organize blocks into larger and variable-sized containers. This is also true for distributed file systems. As many distributed file systems do not run on bare hardware, they can actually chose a certain block size. There is wide range of file systems, where the block size for all files in the file system is fixed. Such a system is not very flexible: you need to chose a block size that fits all and in turn all you file sizes should have simliar size. There was a saying about Google's GFS (a block-based file system with a 64MB block size): it can hold any set of files, as long as they're large and not too many.</span><br /><span style="background-color: transparent; font-size: 15px; vertical-align: baseline; white-space: pre-wrap;"></span><br /><span style="background-color: transparent; font-size: 15px; vertical-align: baseline; white-space: pre-wrap;">There is a second aspect of blocks shared between local and distributed file systems. Blocks are agnostic about files, ie. a block does not know which file it belongs to. While that's a no-brainer for local file systems, storage servers of block-based distributed file systems are somewhat degraded because they only store anonymous blocks. Only the metadata server knows how the blocks make up files.</span><br /><span style="background-color: transparent; font-size: 15px; vertical-align: baseline; white-space: pre-wrap;"></span><br /><span style="background-color: transparent; font-size: 15px; font-weight: bold; vertical-align: baseline; white-space: pre-wrap;">Here come the objects</span><br /><span style="background-color: transparent; font-size: 15px; vertical-align: baseline; white-space: pre-wrap;"></span><br /><span style="background-color: transparent; font-size: 15px; vertical-align: baseline; white-space: pre-wrap;">You can imagine the joy in the storage community when systems and standards arrived that allowed choosing a block-size per file. This innovation deserved a new term: object. Objects have also a second aspect that makes them great for distributed file system architectures: they raise the abstraction level a bit by making the storage server aware of the object's belonging. Objects are not addressed by a file-agnostic block identifier as blocks are, but by file identifier and sequential object number. This has many advantages for the architecture as storage servers can actually host file system logic (like for replication), which they are equipped for when they run on commodity hardware.</span><br /><span style="background-color: transparent; font-size: 15px; vertical-align: baseline; white-space: pre-wrap;"></span><br /><span style="background-color: transparent; font-size: 15px; vertical-align: baseline; white-space: pre-wrap;">As I hinted earlier: objects are not super-relevant for the user, because all file systems make you work with files (XtreemFS even posix files). And Amazon's S3 objects are not the objects we are talking about here, because they are not size-limited. They are rather files without a hierarchical namespace.</span></span></b>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-1634327625052330274.post-77967562772526829052011-11-02T15:28:00.011+01:002011-11-03T14:54:20.417+01:00XtreemFS 1.3.1 Available Now!<div class="post-header"> </div>XtreemFS 1.3.1 is available now, which adds some minor features and fixes a wide range of bugs that were disclosed since the previous stable release.<br /><br />Bug fixes mainly relate to the new client and the replication infrastructure. We included the following features:<br /><ul><li><span style="font-weight: bold;">Metadata replication</span>. Although XtreemFS 1.3.0 already provides some preliminary support for MRC and DIR replication, the feature turned out to be fairly unstable. With XtreemFS 1.3.1, we took a big step forward in this respect. Especially the MRC replication has been thoroughly tested now and offers automatic fail-over on the client side.<br /><span style="font-weight: bold;"></span></li><li><span style="font-weight: bold;">Asynchronous writes</span>. We enhanced the client with preliminary support for asynchronous writes. However, the feature is currently limited to non-replicated and failure-free scenarios, as it has not yet been integrated with the client's internal retry mechanism.</li><li><span style="font-weight: bold;">Monitoring</span>. We added a service monitoring infrastructure. It is based on SNMP and provides information about the internal state of an XtreemFS service, such as the current memory consumption, I/O throughput, the number of stored files, etc. We also added a corresponding Ganglia plug-in.<br /></li><li><span style="font-weight: bold;">OSD drain tool</span>. We included a utility to remove OSDs from an XtreemFS installation. The tool relocates all files from the respective OSD to other OSDs and gracefully shuts down the OSD.</li><li><span style="font-weight: bold;">Gentoo overlay</span>. To simplify the use of XtreemFS for our Gentoo users, we added a Gentoo overlay for installing XtreemFS on Gentoo.</li></ul>For a more detailed overview of the changes, please refer to our <a href="http://code.google.com/p/xtreemfs/source/browse/tags/XtreemFS-1.3.1/CHANGELOG">change log</a>. We also updated the list of known issues and moved them to the <a href="http://code.google.com/p/xtreemfs/issues/list?can=1&q=label%3AKnownLimitations">issue tracker</a>.<br /><br />We further noticed that we caused some confusion with the 1.3.0 release, as it was sometimes referred to as a <span style="font-style: italic;">release candidate</span>. The website as well as the servers and clients themselves always stated "1.3.0 RC1" as version. However, the packaging did not allow us to release a "1.3.0-RC1" version and therefore we ended up publishing "1.3.0" packages. Blogs and news websites also referred to the released version as "1.3.0". So, we'll leave it that way: XtreemFS version 1.3.0 RC1 is regarded as 1.3.0 and now we're releasing the next version, 1.3.1.<br /><br />Since we released XtreemFS 1.3.0 in August, we got quite a lot of feedback on the mailing list - thanks a lot to our user community for consistently helping us to improve XtreemFS!<br /><br />To be able to quickly respond to the needs of our users, we decided to establish a separate repository with unstable packages, which we update frequently. Unstable packages are less thoroughly tested than stable releases, but they allow us to fix bugs and provide new features on short notice. A link to the unstable repository can now be found on our website at <a href="http://www.xtreemfs.org/download.php#unstable">http://www.xtreemfs.org/download.php#unstable</a>.Janhttp://www.blogger.com/profile/07019863053737337624noreply@blogger.com2tag:blogger.com,1999:blog-1634327625052330274.post-64829133030598273162011-08-23T16:54:00.005+02:002011-08-26T14:28:02.249+02:00Public XtreemFS demo server online againDo you want to test XtreemFS without installing the servers first?<br />
<div><br />
</div><div>No problem! Just install the <a href="http://www.xtreemfs.org/download.php">XtreemFS Client 1.3</a> (currently available for Linux and MacOSX, Windows will follow later) and mount our public XtreemFS demo server:</div><div><br />
</div><div><span class="Apple-style-span" style="color: #333333; font-family: Arial, sans-serif; font-size: 12px; line-height: 19px;"><span style="font-family: 'courier new';"></span><span style="font-family: 'courier new';">mkdir ~/xtreemfs_demo</span> <br />
<span style="font-family: 'courier new';">mount.xtreemfs demo.xtreemfs.org/demo ~/xtreemfs_demo</span> <br />
<span style="font-family: 'courier new';">cd ~/xtreemfs_demo</span></span></div><div><span class="Apple-style-span" style="color: #333333; font-family: Arial, sans-serif; font-size: 12px; line-height: 19px;"><span style="font-family: 'courier new';"> <br />
</span></span></div><div>We placed a copy of the freely available short film "<a href="http://www.bigbuckbunny.org/index.php/download/">Big Buck Bunny</a>" on the demo server which you can watch after mounting. (Please keep in mind that your Internet connection to our demo server has to be fast enough to watch it smoothly.)</div><div><br />
</div><br />
<div>For testing you can create any directories and files as you like. Please do not upload anything illegal or copyrighted material. For legal reasons every file create/write is logged with the IP address and timestamp. Files are automatically deleted every hour.</div><div><blockquote></blockquote></div>Michaelhttp://www.blogger.com/profile/08512619090626443539noreply@blogger.com0tag:blogger.com,1999:blog-1634327625052330274.post-91779343884039561642011-08-11T12:49:00.001+02:002011-08-26T14:25:51.509+02:00Upgrading from XtreemFS 1.2.x to 1.3The good news is that the on-disk formats of the DIR, MRC and OSD haven't changed which means that just need to update the packages. The bad news is that 1.2.x and 1.3.0 are protocol-incompatible, you can't mix 1.2.x servers/clients with 1.3.0 servers/clients.<br />
<br />
To upgrade servers, shut them down. Install the 1.3 packages (or update if you have the xtreemfs repository configured) and start the servers again. For the clients, unmount all mounted volumes and install (or update) the 1.3.0.<br />
<br />
Most important changes in the user-interface:<br />
<ul><li>mount.xtreemfs has new command line options</li>
<li><span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">xtfs_stat</span>, <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">xtfs_repl</span> and <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">xtfs_sp</span> have been merged into <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">xtfsutil</span></li>
<li><span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">xtfsutil</span> works on files on mounted volumes, not on URLs</li>
<li>you can use <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">xtfsutil --errors /mount/point</span> to retrieve client error messages</li>
</ul>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-1634327625052330274.post-15988093381310293222011-08-10T15:28:00.000+02:002011-08-10T15:28:56.629+02:00First Release Candidate for XtreemFS 1.3After nearly one year of development, we have finished the first release candidate for XtreemFS 1.3.<br />
The most important new feature is the cross-site file replication with auto-failover. The new replication works with mutable files, i.e. files can be read and written.<br />
<br />
We have created packages for Linux in <a href="http://download.opensuse.org/repositories/home:/xtreemfs/">our repositories on build.opensuse.org</a>, they should become available within a few hours. For Mac OS X, we have a <a href="http://code.google.com/p/xtreemfs/downloads/detail?name=XtreemFS_1.3RC1_Installer.dmg">packaged client with installer</a>. The sources can be downloaded at <a href="http://code.google.com/p/xtreemfs/downloads/list">http://code.google.com/p/xtreemfs/downloads/list</a>.Unknownnoreply@blogger.com7tag:blogger.com,1999:blog-1634327625052330274.post-702560054282220342011-02-17T16:09:00.000+01:002011-02-17T16:09:50.660+01:00Moving towards 1.3As you might have noticed, we are currently a full year behind schedule with the 1.3 release. The good news is that we are working heavily on the new release.<br />
<br />
We now have a new client written from scratch that implements important features such a replica failover and metadata caching. The new client already passes our complete test suite (see <a href="http://groups.google.com/group/xtreemfs-test">http://groups.google.com/group/xtreemfs-test</a>). Currently, we are cleaning up the code, work on a libxtreemfs, porting the client to windows and a lot of manual testing for the release.<br />
<br />
Most of the changes we have worked on since last year are invisible to the user. First of all, we have switched the internal protocol to a custom RPC format which is optimized for transfer of raw data (the file content) and uses Google protocol buffers for message encoding. For this new RPC protocol, we have re-written and optimized the client and server infrastructure. The new protocol is up to 2x faster when transferring objects. As a user, you'll notice that the URLs now start with "pbrpc://" instead of "oncrpc://".<br />
<br />
Over the next few weeks, we'll write short posts on the stuff we have been working, e.g. the PBRPC protocol, the new client features and internals, read/write replication for files, MRC replication ...<br />
Of course, we'll keep you updated on the 1.3 release.<br />
<br />
For anyone who wants to throw a first glance at XtreemFS 1.3, we set up a repository with unstable packages: <a href="http://download.opensuse.org/repositories/home:/xtreemfs:/unstable/">http://download.opensuse.org/repositories/home:/xtreemfs:/unstable/</a><br />
Please be aware that these packages are experimental and may be changed or updated without prior notice.Unknownnoreply@blogger.com2tag:blogger.com,1999:blog-1634327625052330274.post-67539410474168214512010-08-02T17:58:00.004+02:002010-09-20T12:55:14.327+02:00Want to work on XtreemFS?We have positions for PhD students at the <a href="http://www.zib.de/">Zuse Institute Berlin</a> (Germany) where you have the opportunity to work on XtreemFS within the <a href="http://www.contrail-project.eu/">CONTRAIL</a> project. You can email (kolbeck<img border="0" src="http://www.zib.de/Bilder/klammeraffe.gif" />zib.de) or call (+49-30-841-85-328) us for more information. Deadline is October 15th.<br />
<br />
Here is the official job description:<br />
<br />
<br />
The Zuse Institute Berlin (ZIB) is a non-university research institute under public law of the state of Berlin. In close interdisciplinary co-operation with the Berlin universities as well as national and international scientific institutions, ZIB conducts research and development in the field of information technology, applied mathematics, and computer science. To support research and development efforts in EU- and BMBF-funded projects, the department Parallel and Distributed Systems invites applications for several PhD Student or PostDoc Positions (f/m) for the duration of two years - Vgr. IIa/Ib BAT/Anwendungs-TV Land Berlin - Application code WA 22/10 As a research assistant you will explore, design, implement and evaluate scalable, fault-tolerant and distributed algorithms and systems for processing large-scale scientific data. We have developed a range of systems including: Scalaris, a structured peer-to-peer storage system; XtreemFS, a distributed and replicated file system, and BabuDB, a replicated key-value store. In co-operation with partners from science and industry we validate, extend, and optimize our solutions in production environments. <div><br />
</div><div>Requirements<br />
<ul><li>Master's degree or Diploma in computer science </li>
<li>Solid fundamentals in distributed systems and algorithms </li>
<li>Experience with distributed file systems, databases or peer-to-peer technology </li>
<li>Demonstrated coding skills in C++, Java or Erlang </li>
<li>Familiarity with Unix/Linux </li>
<li>Ability to work in interdisciplinary and international teams </li>
<li>Fluency in English </li>
</ul><br />
You will work in an inspiring and pleasant environment and will receive adequate professional support. We offer challenging scientific tasks, a high degree of autonomy, and state-of-the-art technical infrastructure. You will have the opportunity to pursue a PhD or Habilitation supervised by Prof. Reinefeld. The position will be initially financed for a period of two years with the possibility of extension. The salary is based upon wage group IIa/Ib as per Berlin Collective Agreement for the Public Sector. Zuse Institute Berlin is an equal opportunity employer. We prefer to balance the number of female and male employees in our institute. Thus, we kindly encourage female candidates to apply to this job offer. Handicapped persons will be given preference to other equally qualified candidates. Please send your complete application, referring to application code WA 22/10, including cover letter, CV and relevant certificates/GPA/university transcripts until 15. October 2010 to Konrad-Zuse-Zentrum fuer Informationstechnik Berlin (ZIB), - Verwaltung -, Takustr. 7, 14195 Berlin, Germany<br />
<br />
<br />
<ul></ul></div>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-1634327625052330274.post-34232417137741564932010-06-07T12:18:00.000+02:002010-06-07T12:18:24.359+02:00XtreemFS at LinuxTag 2010We'll be at LinuxTag 2010 in Berlin which starts in two days (June 9th to 12th). Visit us at the XtreemOS booth #206 in Halle 7.2a.Unknownnoreply@blogger.com1tag:blogger.com,1999:blog-1634327625052330274.post-16202889766581279922010-05-26T14:01:00.001+02:002010-05-27T10:50:44.655+02:00ISC, Summer School ...You can meet us at ISC 2010 in Hamburg from 31/05 to 02/06 at the XtreemOS booth (booth #121, next to Unicore and BSC).<br />
<br />
The <a href="http://xtreemos.eu/hotspot_news/register-for-our-summer-school-2010">XtreemOS summer school</a> will take place at Schloss Günzburg near Ulm from the 5th to the 9th of July. There is also a talk and a practical on XtreemFS.<br />
<br />
Finally, the <a href="http://xtreemos.eu/hotspot_news/xtreemos-computing-challenge">XtreemOS Challenge</a> offers a prize of €1,000 for the best application ported to XtreemOS.Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-1634327625052330274.post-12813423662581290252010-05-07T20:00:00.011+02:002010-05-07T23:24:51.782+02:00How XtreemFS uses leases for replication<span class="Apple-style-span" style=" ;font-family:Verdana;font-size:13px;"><div style="margin-top: 0px; margin-bottom: 0px; "><span class="Apple-style-span" style=" ;font-size:13px;"><div style="margin-top: 0px; margin-bottom: 0px; ">If you implement a distributed file systems like <b>XtreemFS,</b> you are dealing with many interesting problems. The most central one is probably to make files behave as if they were stored in the local file system.</div><br />The main property of this sought behavior is called strong or <b>sequential consistency</b> [*]. Sequential consistency requires that reads and writes (even concurrent ones) are executed in a well-defined (but random) order. Apart from sequential consistency, the file system must also ensure that reads reflect all previous writes and that concurrent reads and writes are isolated.<br /><br />As XtreemFS supports fault-tolerant file replication, it has to maintain the local behavior of files while internally storing and coordinating multiple physically independent copies of the data. In technical terms, this translates to implementing sequential consistency for replicated data in a fault-tolerant way.<br /><br /><div style="margin-top: 0px; margin-bottom: 0px; ">The simplest way to implement sequential consistency is to use a central instance that defines the order in which operations change the data (a <b>sequencer</b>). And indeed many distributed file systems realize sequential consistency by establishing a lock server that hands out locks to clients or storage servers. The lock holder receives all operations on the data and executes them serially. The result is a well-defined order. These locks can be made fault-tolerant by attaching a timeout to them: what you get is a <b>lease,</b> which can be revoked even when a client is unresponsive or dead.<br /><br />A sophisticated alternative to defining a sequencer is to use a so-called <b>replicated state machine</b>, a distributed algorithm that is executed by all replicas of the data. If you want implement a fault-tolerant version of it, you will end up with using a Paxos derivative. The problem is that all fault-tolerant algorithms in this domain require two round-trips for both reads and writes to the data to establish sequential consistency across all replicas, which is clearly to expensive for high-rate operations like the ones on files.<br /><br />So we are left with the central instance approach, fully aware that this introduces both a single point of failure and performance bottleneck. A quick back-of-the-envelope calculation reveals: assuming 50.000 open files in your cluster, with a 30 sec timeout you have 1666/sec lease renewals, which already quite some load for a lock server.<br /><br /><div style="margin-top: 0px; margin-bottom: 0px; ">Such a high lease renewal rate is even more of a problem when you consider fault-tolerance of the lock server itself. To ensure that it is <b>highly available</b>, you need to replicate its state, and are again faced with a sequential consistency + replication problem. The solutions: master-slave replication with some fail-over protocol (another lease problem?) or a replicated state machine for the lease server itself. The latter has been chosen by Google for their Chubby lock service. The replication of the lock servers state solves the availability problem, but worsens the performance bottleneck. Google's "Paxos Made Live" paper cites 640 ops/sec [3]. Not enough for 50k open files (although Chubby is not used for GFS' leases).<br /><br /><div style="margin-top: 0px; margin-bottom: 0px; ">For <b>XtreemFS</b>, we have chosen a different approach. Instead of using a lock service to manage leases, we let the object storage devices (OSDs) negotiate leases among themselves. For each file, all OSDs that host a replica of the file negotiate the lease. The lease-holding OSD acts as a sequencer and receives and executes all reads and writes. In turn, an OSD participates in all lease negotiations for all its file replicas that are currently open.</div><br /><div style="margin-top: 0px; margin-bottom: 0px; ">Negotiating leases directly by the storage servers has several advantages: it scales naturally with the number of storage servers, it saved us from implementing a lock server, and the user from the headache of provisioning and managing another service. The only problem we needed to overcome was that we first needed an algorithm that negotiates leases in a fault-tolerant, decentralized way. Such a thing didn't exist, and telling from a <a href="http://pl.atyp.us/wordpress/?p=2729">recent blog post</a> from Jeff Darcy, the usage of fault-tolerant leases still seams to be its infancy [Link].</div><br /><div style="margin-top: 0px; margin-bottom: 0px; ">The result of our efforts are two algorithms, <b>FatLease </b>[1] and its successor <b>Flease </b>[2]. They scale to thousands of concurrent lease negotations per second - for each set of participants. For XtreemFS this means essentially that the number of open files is counted per storage server and not against the whole file system. With 1000s/sec. negotiations, this would translate to an open file count of more than 50k files per OSD.</div><br /><div style="margin-top: 0px; margin-bottom: 0px; ">With a fault-tolerant lease negotiation algorithm, we have solved the problem of enforcing sequential consistency and arbitrating concurrent operations. While this is the hardest part of implementing replication, the data replicas also need to be updated for every change. How this is done in XtreemFS will be a topic of a future blog post.</div><div style="margin-top: 0px; margin-bottom: 0px; "><br /><br />[*] POSIX actually mandates a strong consistency model: serializabiliy. In simple terms, it means that the file system has to take communication between its clients into account. However, this is impractical for distributed file systems, as the file system would have to control all communication channels of its clients.</div><br /><div style="margin-top: 0px; margin-bottom: 0px; ">References:</div><div style="margin-top: 0px; margin-bottom: 0px; ">[1] F. Hupfeld, B. Kolbeck, J. Stender, M. Högqvist, T. Cortes, J. Malo, J. Marti. “<a href="http://www.xtreemfs.org/publications.php">FaTLease: Scalable Fault-Tolerant Lease Negotiation with Paxos</a>”. In: <i>Cluster Computing </i>2009<i>.</i></div><br /><div style="margin-top: 0px; margin-bottom: 0px; ">[2] B. Kolbeck, M. Högqvist, J. Stender, F. Hupfeld. “<a href="http://www.xtreemfs.org/publications.php">Fault-Tolerant and Decentralized Lease Coordination in Distributed Systems</a>”. <i>Technical Report 10-02, Zuse Institute Berlin, 2010</i>.</div><br />[3] <span class="Apple-style-span" style="font-size:85%;">Tushar Chandra, Robert Griesemer, and Joshua Redstone. "</span><a href="http://labs.google.com/papers/paxos_made_live.html" id="c_m0" title="Paxos made live">Paxos made live</a>". <span class="Apple-style-span" style="font-size:85%;">PODC '07: 26th <i>ACM Symposium on Principles of Distributed Computing</i>.</span><br /></div></div></span></div></span>Unknownnoreply@blogger.com1tag:blogger.com,1999:blog-1634327625052330274.post-91195345142856343952010-04-06T11:26:00.002+02:002010-04-06T11:26:15.710+02:00student position in BerlinWe have a student position for XtreemFS at ZIB in Berlin: <a href="http://www.zib.de/News/Jobs/index.en.html">http://www.zib.de/News/Jobs/index.en.html</a> (in German)Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-1634327625052330274.post-45372039605623239912010-03-05T15:08:00.000+01:002010-03-05T15:08:21.171+01:00XtreemFS user surveyYou can help us make XtreemFS better. Let us know what you use XtreemFS for and which features you need. Fill out our user survey at <a href="http://www.xtreemfs.org/user_survey.php">http://www.xtreemfs.org/user_survey.php</a>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-1634327625052330274.post-1423993420885209272010-02-04T10:31:00.002+01:002010-02-04T10:31:56.882+01:00XtreemFS update 1.2.1We just released an update for XtreemFS (version 1.2.1). This version contains mainly bug fixes, e.g. for FreeBSD and Fedora 12, and enhanced replica management. The new scrubber will automatically replace failed replicas.<br />
<br />
Source code and packages are available for download on <a href="http://www.xtreemfs.org/">http://www.xtreemfs.org</a>.<br />
<br />
There is no change in the database or OSD storage, so an update from 1.2 should work out of the box. The usual advice: backup important data before upgrading.Unknownnoreply@blogger.com1