The XtreemFS client is implemented as a FUSE driver. Therefore, the throughput of FUSE could also be a limiting factor for the overall performance of our file system. Matthias implemented a simple "emptyfs" FUSE driver which simply discards all data. I used the driver to measure bandwidth from an application through the VFS layer and FUSE to the user-level process. The machine I ran the test on has two CPUs with four cores each (Xeon E5420 @ 2.5GHz) with 16GB RAM. I used dd to transfer 2GB of data with block sizes from 4k to 64MB.
The results are plotted in two graphs. The first graph shows the throughput in MB/s as report by dd. The second graph shows the CPU usage (sy= system, us= user) and the number of context switches.
graph 1 (write bandwidth in MB/s):
graph 2 (CPU usage, context switches):
With this results (2GB/s) for 128k or larger blocks, it is easy to see that FUSE is not the limiting factor for us. But this also shows that FUSE without the direct_io options has real performance problems as all write requests are split into 4k writes. So, you have to choose between performance and the ability to execute files (mmap does not work when direct_io is enabled, see this FUSE mailing list entry).