Programming odds and ends — InfiniBand, RDMA, and low-latency networking for now.


gcsfs, a FUSE driver for Google Cloud Storage, now available

gcsfs is a FUSE driver for Google Cloud Storage with much the same functionality as s3fuse. It isn’t quite a fork — for the moment, both drivers are very similar — but this makes it easier to separate future development of Google Cloud Storage-specific features. Some key features:

  • Binaries for Debian 7, RHEL/CentOS 6, Ubuntu, and OS X 10.9.
  • Compatible with GCS web console.
  • Caches file/directory metadata for improved performance.
  • Verifies file consistency on upload/download.
  • Optional file content encryption.
  • Supports multi-part (resumable) uploads.
  • Allows setting Cache-Control header (via extended attributes on files).
  • Maps file extensions to known MIME types (to set Content-Type header).
  • (Mostly) POSIX compliant.

Binaries (and source packages) are in gcsfs-downloads (on Google Drive).

Users of CentOS images on Google Compute Engine can install pre-built binaries by downloading gcsfs-0.15-1.x86_64.rpm, then using yum:

user@gce-instance-centos ~]$ sudo yum localinstall gcsfs-0.15-1.x86_64.rpm
user@gce-instance-centos ~]$ gcsfs -V
gcsfs, 0.15 (r792M), FUSE driver for cloud object storage services
enabled services: aws, fvs, google-storage
user@gce-instance-centos ~]$

For Debian download gcsfs_0.15-1_amd64.deb, then:

user@gce-instance-debian:~$ sudo dpkg -i gcsfs_0.15-1_amd64.deb
... a bunch of dependency errors ...
user@gce-instance-debian:~$ sudo apt-get install -f
user@gce-instance-debian:~$ gcsfs -V
gcsfs, 0.15 (r792M), FUSE driver for cloud object storage services
enabled services: aws, fvs, google-storage

s3fuse 0.15 released

s3fuse 0.15 is now available. This release contains fairly minor fixes and packaging updates. Highlights from the change log:

Removed libxml++ dependency.
libxml++ was pulling in many unnecessary package dependencies and wasn’t really providing much added value over libxml2, so as of 0.15 it’s gone. As a bonus, it’s no longer necessary to enable the EPEL repository on RHEL/CentOS before installing s3fuse.

Fixed libcurl init/cleanup bug.
0.14 and earlier versions had a bug that sometimes prevented establishment of SSL connections if s3fuse ran in daemonized (background) mode. 0.15 addresses this.

Binaries for RHEL/CentOS, Debian, and OS X, as well as source archives, are now hosted in s3fuse-downloads (on Google Drive).

Ubuntu packages are at the s3fuse PPA.

s3fuse 0.14 released

Over the weekend I posted version 0.14 of s3fuse. Highlights from the change log:

NEW: Multipart (resumable) uploads for Google Cloud Storage.
With this most recent release, Google Cloud Storage support is on par with S3 support. Multipart/resumable uploads and downloads work reliably, and performance is similar. Many thanks to Eric J. at Google for all the help improving GCS support in 0.14.

NEW: Support for FV/S.
With the help of Hiroyuki K. at IIJ, s3fuse now supports IIJ’s FV/S cloud storage service.

NEW: Set file content type by examining extension.
s3fuse will now set the HTTP Content-Type header according to the file extension using the system MIME map.

NEW: Set Cache-Control header with extended attribute.
If a Cache-Control header is returned for an object, it will be available in the s3fuse_cache_control extended attribute. Setting the extended attribute (with, say, setfattr) will update the Cache-Control header for the object.

NEW: Allow creation of “special” files (FIFOs, devices, etc.).
mkfifo and mknod now work in s3fuse-mounted buckets, with semantics similar to NFS-mounted filesystems (in particular: FIFOs do not form IPC channels between hosts).

Various POSIX compliance fixes.
From the README:

s3fuse is mostly POSIX compliant, except that:

  • Last-access time (atime) is not recorded.
  • Hard links are not supported.

Some notes on testing:

All tests should pass, except:

  chown/00.t: 141, 145, 149, 153
    [FUSE doesn't call chown when uid == -1 and gid == -1]

  mkdir/00.t: 30
  mkfifo/00.t: 30
  open/00.t: 30
    [atime is not recorded]

  rename/00.t: 7-9, 11, 13-14, 27-29, 31, 33-34
  unlink/00.t: 15-17, 20-22, 51-53
    [hard links are not supported]

As with 0.13, Ubuntu packages are at the s3fuse PPA.

s3fuse 0.13 released

I’ve just uploaded version 0.13 of s3fuse, my FUSE driver for Amazon S3 (and Google Cloud Storage) buckets. 0.13 is a near-complete rewrite of s3fuse, and brings a few new features and vastly improved (I hope) robustness. From the change log:

NEW: File encryption
Operates at the file level and encrypts the contents of files with a key (or set of keys) that you control. See the README.

NEW: Glacier restore requests
Allows for the restoration of files auto-archived to Glacier. See this AWS blog post and the README for more information.

NEW: OS X binaries
A disk image (.dmg) is now available on the downloads page containing pre-built OS X binaries (built on OS X 10.8.2, so compatibility may be limited).

NEW: Size-limited object cache
The object attribute cache now has a fixed size. This addresses the memory utilization issues reported by Gregory C. and others. The default maximum size is 1,000 objects, but this can be changed by tweaking the max_objects_in_cache configuration option.

IMPORTANT: Removed auth_data configuration option
For AWS, use aws_secret_file instead. For Google Storage, use gs_token_file. This will require a change to existing configuration files.

IMPORTANT: Default configuration file locations
s3fuse now searches for a configuration file in ~/.s3fuse/s3fuse.conf before trying %sysconfdir/s3fuse.conf (this is usually /etc/s3fuse.conf or /usr/local/s3fuse.conf).

File Hashing
SHA256 is now used for file integrity checks. The file hash, if available, will be in the “s3fuse_sha256” extended attribute. A standalone SHA256 hash generator (“s3fuse_sha256_hash”) that uses the same hash-list method as s3fuse is now included.

Set the stats_file configuation option to a valid path and s3fuse will write statistics (event counters, mainly) to the given path when unmounting.

OS X default options
noappledouble and daemon_timeout=3600 are now default FUSE options on OS X.

KNOWN ISSUE: Google Cloud Storage large file uploads
Multipart GCS uploads are not implemented. Large files will time out unless the transfer_timeout_in_s configuration option is set to something very large.

OS X support and other s3fuse news

Version 0.12 of my pet project, s3fuse, now supports OS X (via FUSE4x). A few notes/caveats:

  • Only FUSE4x is supported. OSXFUSE is not.
  • -o noappledouble is your friend. It will keep OS X from filling your S3 bucket with .DS_Store files as you browse the mounted volume.
  • Set a reasonable daemon timeout (e.g., -o daemon_timeout=3600) to keep FUSE4x from timing out and aborting large uploads/downloads

Sometime in January I’ll release version 0.13, which is a near-complete rewrite that adds support for file-level encryption. I’m also working on adding support for file retrieval from Glacier (for files archived by S3 — see this post on the AWS blog).

s3fuse, now with Google Storage support

Just posted version 0.11 of s3fuse, my FUSE driver for Amazon S3 and, now, Google Storage for Developers. 0.11 also improves stability, error handling, logging, and directory caching. Give it a try. In addition to a source tarball, packages are available for both Debian and Red Hat.

Introducing s3fuse, a FUSE driver for Amazon S3

I temporarily lost access to some data not very long ago as a result of an unplanned outage, and the incident woke me up to the utility of offsite backup. I wanted something I could mount as a local file system under Linux, that I could access over the Web, and that was backed by a reasonably-reliable data storage infrastructure. There are some decent options out there — I eventually settled on Jungle Disk, which has Windows, Linux, and MacOS clients, as well as a Web client. Unfortunately, after a few weeks of use, an outage resulted in the loss of a non-trivial chunk of my Jungle Disk data. This prompted me to look into using Amazon’s S3 directly (rather than through Jungle Disk). There are several FUSE drivers for S3 in existence, but I wanted something written from the ground up with support for concurrent requests and extended attributes, and with a directory structure compatible with what Amazon’s S3 Web client expects. I also wanted to learn about Amazon Web Services and libcurl.

The end result is s3fuse, my FUSE driver for Amazon S3. It’s very much alpha-level, but it has the features I need and is reliable enough for my purposes. Try it out, feel free to comment, make changes, and report bugs.