The results are plotted in two graphs. The first graph shows the throughput in MB/s as report by dd. The second graph shows the CPU usage (sy= system, us= user) and the number of context switches.
graph 1 (write bandwidth in MB/s):

graph 2 (CPU usage, context switches):

With this results (2GB/s) for 128k or larger blocks, it is easy to see that FUSE is not the limiting factor for us. But this also shows that FUSE without the direct_io options has real performance problems as all write requests are split into 4k writes. So, you have to choose between performance and the ability to execute files (mmap does not work when direct_io is enabled, see this FUSE mailing list entry).