Planet Ceph

Aggregated news from external sources

January 25, 2016

Hosting a web site in radosgw

If you’re familiar with web site hosting on Amazon
S3
,
which is a simple and cheap way to host a static web site, you might
be wondering whether or not you can do the same in Ceph radosgw.

The short answer is you can’t. Bucket Website is listed as Not
Supported
in the radosgw S3 API
support matrix, and
radosgw doesn’t have
index document support
either.

But the longer answer is that you can, provided you use radosgw in
combination with a front-end load-balancer — which, as it happens,
can add a few more bells and whistles, as well. You could probably do
the same thing with nginx, Varnish, or Apache in a
mod_proxy_balancer balancer setup, but in this example
configuration, we’ll use HAProxy.

Getting started: the radosgw basics

Let’s take look at a simple radosgw configuration with virtual host
support, such that you can access your buckets as either
http://ceph.example.com/bucketname or
http://bucketname.ceph.example.com:

[client.rgw.radosgw01]
rgw_frontends = civetweb port=7480
rgw_dns_name = ceph.example.com
rgw_resolve_cname = True

Suppose we use s3cmd to upload an HTML file to this bucket, setting
a public ACL:

s3cmd mb s3://testwebsite
s3cmd put --acl-public index.html s3://testwebsite/

Then if you exposed your radosgw to the web, any client (without
authentication) would be able to retrieve
http://testwebsite.ceph.example.com:7480/index.html with a web
browser, or any other HTTP client application (such as curl or
wget):

curl -I http://testwebsite.ceph.example.com:7480/index.html

Which would then return something like:

HTTP/1.1 200 OK
Content-Length: 18050
Accept-Ranges: bytes
Last-Modified: Mon, 25 Jan 2016 21:28:47 GMT
ETag: "b03130a4a1fc24df0f9f336f2b6d1d90"
x-amz-request-id: tx000000000000000005a88-0056a7b7eb-312df-default
Content-type: text/html
Date: Tue, 26 Jan 2016 18:16:11 GMT

Introducing HAProxy

Now let’s start out with putting HAproxy in between. Nothing special
there: radosgw listens on the conventional 7480 port, and we simply
hand HAproxy traffic through there, and bind HAProxy itself to
port 80.

global
    log         /dev/log local0
    pidfile     /var/run/haproxy.pid
    maxconn     4000
    user        haproxy
    group       haproxy
    daemon

    # turn on stats unix socket
    stats socket /var/lib/haproxy/stats level admin

    # Default SSL material locations
    ca-base /etc/ssl/certs
    crt-base /etc/haproxy/ssl

    # Default ciphers to use on SSL-enabled listening sockets.
    # For more information, see ciphers(1SSL).
    ssl-default-bind-ciphers HIGH
    tune.ssl.default-dh-param 2048

defaults
    log global
    mode http
    option httplog
    option dontlognull
    retries 3
    timeout queue 1000
    timeout connect 1000
    timeout client 30000
    timeout server 30000
    option forwardfor


frontend ceph_front
    bind 0.0.0.0:80
    default_backend ceph_back

backend ceph_back
    balance source
    server radosgw01 127.0.0.1:7480 check

Index documents

So, the first thing we’ll need to add is support for index
documents. We’d like to make sure that when we retrieve
https://testwebsite.ceph.example.com/, what’s actually fetched from
the backend is /index.html. We can do that by adding an HAproxy ACL
that matches for the trailing slash in the path, and an http-request
set-path
directive that appends the index document name:

frontend ceph_front
    bind 0.0.0.0:80
    acl path_ends_in_slash path_end -i /
    # Append index document (index.html) to any path
    # ending in "/".
    http-request set-path %[path]index.html if path_ends_in_slash
    default_backend ceph_back

Now, that’s fine in terms of getting the index document correctly:

curl -I http://testwebsite.ceph.example.com/index.html
HTTP/1.1 200 OK
Content-Length: 18050
Accept-Ranges: bytes
Last-Modified: Mon, 25 Jan 2016 21:28:47 GMT
ETag: "b03130a4a1fc24df0f9f336f2b6d1d90"
x-amz-request-id: tx000000000000000005a94-0056a7b9e3-312df-default
Content-type: text/html
Date: Tue, 26 Jan 2016 18:24:35 GMT

However, it of course breaks uploads and even bucket listings, or in
other words, anything that uses the S3 API. Now you could test for
some S3-specific headers in the request, but really, you should just
check whether the request is authorized, and only apply the index
document logic if it isn’t, like so:

frontend ceph_front
    bind 0.0.0.0:80
    acl path_ends_in_slash path_end -i /
    acl auth_header hdr(Authorization) -m found
    # Append index document (index.html) to any path
    # ending in "/", unless the request has an auth header
    http-request set-path %[path]index.html if path_ends_in_slash !auth_header
    default_backend ceph_back

Great. Now we can upload using full paths without mangling, and on any
un-authenticated requests, we substitute /index.html for any trailing
/. In case you’re wondering: yes, this works for any path, not just
the root path.

Directory paths

However, you may also want something else, which is the ability to
correctly handle a request like
http://testwebsite.ceph.example.com/my/sub/directory, where of
course you want the path /my/sub/directory translated into
/my/sub/directory/index.html, which means we want to append a slash
and an index document name to the request path.

So let’s do that:

frontend ceph_front
    bind 0.0.0.0:80
    acl path_has_dot path_sub -i .
    acl path_ends_in_slash path_end -i /
    acl auth_header hdr(Authorization) -m found
    http-request set-path %[path]index.html if path_ends_in_slash !auth_header
    # Append trailing slash if necessary.
    http-request set-path %[path]/index.html if !path_has_dot !path_ends_in_slash !auth_header
    default_backend ceph_back

Note that what we’re doing here is somewhat crude. We’re assuming that
any actual file that we want to retrieve looks like name.ext,
meaning it has a dot (period, full stop) character in it. The
path_sub -i . expression in the path_has_dot ACL simply matches
any path with . in it, and we’re assuming that if a path has a dot
then it points to a file, if it doesn’t then it points to a directory.

You could be a little more clever here and use path_regex instead of
path_sub for a full regular expression match. But regex lookups are
slower than simple substring matches, so if the substring match works
for you, go for it.

So now, we can do this:

s3cmd put --acl-public index.html s3://testwebsite/my/sub/directory/

And then:

# Note omitted trailing slash
curl -I http://testwebsite.ceph.example.com/my/sub/directory
HTTP/1.1 200 OK
Content-Length: 24235
Accept-Ranges: bytes
Last-Modified: Mon, 25 Jan 2016 23:57:04 GMT
ETag: "fecd005b33c0f6bfdee61b787cf54cb0"
x-amz-request-id: tx00000000000000000bc83-0056a7bd25-312cd-default
Content-type: text/html
Date: Tue, 26 Jan 2016 18:38:29 GMT

HTTPS support

So, what else might you want to do? One obvious thing that you can use
HAproxy for is SSL termination. The radosgw embedded civetweb
webserver can do that for you, but that feature is currently mildly
broken in a rather curious
way
. So in order to allow HTTPS
access to all your content via HAproxy instead, you would add:

frontend ceph_front_ssl
    bind 0.0.0.0:443 ssl crt ceph.pem no-sslv3 no-tls-tickets
    reqadd X-Forwarded-Proto: https
    acl path_has_dot path_sub -i .
    acl path_ends_in_slash path_end -i /
    acl auth_header hdr(Authorization) -m found
    http-request set-path %[path]index.html if path_ends_in_slash !auth_header
    http-request set-path %[path]/index.html if !path_has_dot !path_ends_in_slash !auth_header
    default_backend ceph_back

But maybe you’d like to force, not merely allow, HTTPS
access. redirect to the rescue:

frontend ceph_front
    bind 0.0.0.0:80
    reqadd X-Forwarded-Proto: http
    redirect scheme https code 301 if !{ ssl_fc }

frontend ceph_front_ssl
    bind 0.0.0.0:443 ssl crt ceph.pem no-sslv3 no-tls-tickets
    reqadd X-Forwarded-Proto: https
    acl path_has_dot path_sub -i .
    acl path_ends_in_slash path_end -i /
    acl auth_header hdr(Authorization) -m found
    http-request set-path %[path]index.html if path_ends_in_slash !auth_header
    http-request set-path %[path]/index.html if !path_has_dot !path_ends_in_slash !auth_header
    default_backend ceph_back

And here we go:

# Note HTTP
curl -IL http://testwebsite.ceph.example.com/my/sub/directory
HTTP/1.1 301 Moved Permanently
Content-length: 0
Location: https://testwebsite.ceph.example.com/my/sub/directory
Connection: close

HTTP/1.1 200 OK
Content-Length: 24235
Accept-Ranges: bytes
Last-Modified: Mon, 25 Jan 2016 23:57:04 GMT
ETag: "fecd005b33c0f6bfdee61b787cf54cb0"
x-amz-request-id: tx00000000000000000bdeb-0056a7bf9b-312cd-default
Content-type: text/html
Date: Tue, 26 Jan 2016 18:48:59 GMT

Compression

And finally, maybe you’d like to speed up access to the stuff on your
site. Why not add gzip on-the-fly-compression? It’s supported by every
browser worth its salt, and will make your users happier. You’ll want
to restrict compression to specific MIME types though. In the
configuration below, we enable compression for plain text, HTML, XML,
CSS, JavaScript, and SVG images.

frontend ceph_front
    bind 0.0.0.0:80
    reqadd X-Forwarded-Proto: http
    redirect scheme https code 301 if !{ ssl_fc }

frontend ceph_front_ssl
    bind 0.0.0.0:443 ssl crt ceph.pem no-sslv3 no-tls-tickets
    reqadd X-Forwarded-Proto: https
    acl path_has_dot path_sub -i .
    acl path_ends_in_slash path_end -i /
    acl auth_header hdr(Authorization) -m found
    http-request set-path %[path]index.html if path_ends_in_slash !auth_header
    http-request set-path %[path]/index.html if !path_has_dot !path_ends_in_slash !auth_header
    compression algo gzip
    compression type text/html text/xml text/plain text/css application/javascript image/svg+xml
    default_backend ceph_back

Let’s see how that helps us. Do a request without gzip encoding
support, and observe that its total download size matches the
document’s Content-Length:

curl https://testwebsite.ceph.example.com/my/sub/directory > /dev/null
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 24235  100 24235    0     0  94565      0 --:--:-- --:--:-- --:--:-- 94299

Now, add an Accept-Encoding header:

curl -H 'Accept-Encoding: gzip' https://testwebsite.ceph.example.com/my/sub/directory > /dev/null
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  5237    0  5237    0     0  19243      0 --:--:-- --:--:-- --:--:-- 19324

There. Actual download size goes from 24KB down to just 5KB.

Where to go from here

There’s a few additional features to be added here. You
could enable CORS or HSTS, for example, and of course you could add
more backends. But if you read this far, you surely get the idea.

And you’re welcome to examine the headers you can pull from this page
you’re reading, wink wink. 🙂

Source: Hastexo (Hosting a web site in radosgw)

Careers