Wednesday, December 9, 2009

A hadoop file system driver for XtreemFS

We have started the development of a FileSystem driver for hadoop. The current version implements the basic functionality and supports parallel I/O for striped files but does not support advanced features like retrieving block locations. The driver is still in an early stage of development!

If you still want to try it, the source is in trunk/contrib/hadoop. To build it, you need the jar from the hadoop release 0.20.1 and the XtreemFS trunk (or release 1.2). Make sure to modify the path to the hadoop.jar in nbproject/project.properties. In order to use XtreemFS with hadoop, you need to load three jar files: the HadoopClient.jar, the XtreemFS.jar and the yidl.jar. You also have to add XtreemFS to your hadoop config:
<property>
  <name>fs.xtreemfs.impl</name>
  <value>org.xtreemfs.common.clients.hadoop.XtreemFSFileSystem</value>
  <description>The FileSystem for xtreemfs: uris.</description>
</property>

The URLs for XtreemFS follow this scheme: xtreemfs://volumeName@dirServiceHost:32638/path

The userID and groupID can be set in the config options xtreemfs.client.userid and xtreemfs.client.groupid

1 comment:

mareddyonline said...

This is one of the most incredible blogs Ive read in a very long time. The amount of information in here is stunning, like you practically wrote the book on the subject. Your blog is great for anyone who wants to understand this Hadoop subject more. Great stuff; please keep it up!
Hadoop Training in hyderabad