Thursday, October 30, 2014

XtreemFS in Docker Containers

Recently, we were running  the XtreemFS services in Docker containers for one of our current research projects and would like to share our experiences. Docker is a container based virtualization solution that provides a certain level of isolation between applications running on the same machine.

Docker images are generated using a Dockerfile. Dockerfiles contain some metadata and a sequence of instructions that is executed to generate the image. Container images are derived from a base image, e.g. a standard Ubuntu Linux, and store only the changes made to this base image. As all XtreemFS services (DIR. MRC and OSD) are shipped in a common binary file (XtreemFS.jar), we created an xtreemfs-common image that contains the binaries and service specific images that inherit from the common image. The service specific images (xtreemfs-dir, xtreemfs-mrc, and xtreemfs-osd) contain only a service specific call to start each of the services.

An application running in a Docker container is required to stay in foreground during the lifetime of the container, otherwise the container will terminate. This means for XtreemFS that we are not able to use our service specific init scripts to start the DIRs, MRCs, and OSDs. We extracted the relevant parts from the init scripts and created a CMD call, i.e. the command that will be executed after starting a container. As the XtreemFS logs are directly written to stdout and no longer to a file, one can easily use the docker logs call to check what happens in a container.

A critical part of running a distributed file system in containers is to ensure that all file system contents are stored persistently, even beyond the lifetime of the container. Our Dockerfiles make use of Docker volumes to store file system contents. A volume is nothing else than a directory, which is mapped from the host machine to the container. The CMD call of our containers expect the service configuration to be placed in /xtreemfs_data, which have to be mapped as a volume to the container. Beside the configuration file, this volume can also be used to store file system contents. However, any other place is possible.

Mapping the XtreemFS configuration files to a container by using a volume has also the advantage that our Docker images are generic and reusable. As a user can specify volumes and ports that have to be mapped to a container during its start, one can create an arbitrary XtreemFS service configuration files, named dirconfig.properties, mrcconfig.properties, or osdconfig.properties, and map all affected directories and ports at the container start time.

After mapping network ports to a container, the underlying service is reachable via the IP address of the host. The XtreemFS services register themselves at the directory service (DIR) and propagate their own addresses. While running in containers, the services are not aware of the host's address they are reachable by. Each container knows only its address from an internal virtual network. We can go around this problem by setting the hostname parameter in the MRC and OSD configurations to the public address or name. This workaround has previously been used to run services that are reachable via a NAT.

We provide the described Dockerfiles on Github. The repository contains a README file with usage instructions. We may consider to publish them in the Docker index after additional testing and evaluating their use. The containers are currently derived from an Ubuntu base image and take the latest XtreemFS version from out GIT repository. The Dockerfiles can be easily adapted to other Linux distributions or XtreemFS releases. We would be happy to get any feedback.