Chapter 8. The Web Services

This chapter offers information on how web services are configured (and why). We also discuss FTP and MySQL considering how much they are related to each other with respect to the user.

It's important to stress that the configurations described in this chapter are primarily meant as examples: in fact, there are many possibilities for configuring the web service on a server aimed at fully exploiting the potentiality of the distribution mechanism we have just described in the previous chapters.

8.1. Apache

8.1.1. Web service structure

When we had to choose the configuration of the different web servers on the various boxes, we needed to take into account also the reorganization of the infrastructural domains that we serve on the web.

This was particularly true in the case of the https protocol, which gives some problems if managed as we have been doing until now, i.e. with third level domains for each service: webmail.domain.org, squirrel.domain.org, passwd.domain.org, etc.. The SSL certificate unfortunately allows just a single hostname (in the CN attribute) to match the browser URL request and if there is more than one certificate, the browser shows an alarming message, even if the user has already installed the CA certificate. In a way, this vanquishes the effort to have users install the CA certificate or to have them use the PKI structure correctly.

This problem can be solved in at least two different ways: the first implies the use of wildcard certificates, i.e. a certificate for something like *.domain.org that will be valid for all hosts included in domain.org namespace. While this workaround is a bit shady in the RFC, it seems to work on most browsers.

However, we have favoured a different solution (also giving us the chance to rearrange a bit our websites): we have put all the SSL subdomains under a single domain (www.domain.org). The servers have to be reached independently though, so we have also set up a wwwN.domain.org address where N is a number. Luckily a very popular extension of the X509 specifics (subjectAltNames) allows us to add some other domain name to the web server certificate. In this way we can create an ad hoc certificate that allows for all HTTPS connections to be correctly validated with the simple installation of the CA certificate.

8.1.2. Configuration

The Apache 2 configuration has been planned to be fit into the modular scheme used by Debian systems. It is then structured as follows: (the apache2 directory below is the one you can find in /configfiles/ring0/common).

  • apache2/apache2.conf - this is the main apache2 configuration file where you can find the generic directives to make the server work.

  • apache2/virtualhost-template - this file is used as a template for the automatic generation of virtual host configurations: the LDAP database is parsed by a periodic script and a virtual host configuration is created for each hosted site, substituting some values in this file (the hostname and the logfile paths for example). In some cases the information included in apache2/conf.d/extra directory is included in the virtual host configuration.

  • apache2/include/ - this directory contains some pieces of configuration which are often repeated so that they can be included and changed with the touch of a single file instead of multiple editing. E.g.: standard php directives for user sites, robot blocking, etc.:

    • include/dominio-auth.conf - this file limits the access to the users of the domain.org domain. By including this file in the section desired you can make part of the site accessible only to administrators (this part of the website should better be in https)

               <Location "blahblah">
                 Include /etc/apache2/include/dominio-auth.conf
               </Location>

    • include/cache.conf - this file enables the caching of contents (for locations where one has to do reverse proxying).

    • include/coral.conf - this piece of configuration enables the coral distributed cache (it might be useful when a specific site or server is overwhelmed by requests).

    • include/php-common.conf - standard PHP configurations

    • include/stop-robots.conf - this configuration blocks some particularly nasty robots.

  • apache2/mods-enabled/ - this directory contains parts of configuration enabling the various modules to be loaded (.load) by apache and their configurations (.conf). We do not handle this with a2enmod but rather with CFengine.

  • apache2/sites-available/ - this directory includes the configuration of the structural virtualhosts. This has to be enabled using the a2ensite command (run by hand on each server, since it's possible that not all servers share the same virtualhosts). The command simply builds a link to the file in the sites-enabled directory

  • apache2/conf.d/ - this directory contains configurations that are included by other files in the appropriate places.

    • conf.d/ssl/ - here you can find various modules to configure https services of the server (i.e. on wwwN.domain.org)

    • conf.d/virtualhosts/ - users virtualhosts (automatically generated)

    • conf.d/subsites-domain/ - domain.org subsites (automatically generated)

    • conf.d/subsites-public/ - public.org subsites (automatically generated)

    • conf.d/extra/ - in this directory you can find files to list additional directives to be included in some particular site (the file has to be named as the virtualhost or as the alias of the LDAP database followed by the .conf suffix; it will be automatically entered in the appropriate configuration file).

The main aim of this configuration is to have a more or less homogeneous distribution of the users requests on the different servers using a round-robin DNS record (i.e. the possibility to return to clients asking for a domain name a series of IP addresses randomly). This approach is perfect for static content or for pseudo-dynamic content that only needs to be read. In fact, regarding web applications, when you login and generate a session it is a good idea to redirect the user always to the same server (even if it's not necessarily important to which server): this way session data are accessed locally (and quickly).

To do this, after the round-robin DNS, we use HTTP redirections structured as follows:

  • A file in /etc/apache2/servers_map defines where a specific application is available (on which server). It includes several lines, each defining a different combination:

          pool    www1|www2|www3
    	    

  • In the main virtualhost configuration (www.domain.org) you can already find the following directives:

          RewriteEngine On
          RewriteMap servers rnd:/etc/apache2/servers_map
          RewriteRule ^(/app.*)$ https://${servers:pool}.dominio.org$1 [R,L]
    	    
    this is used to redirect (from the pool list) all requests for a specific /app on a random server. This way the user is locked onto a specific server for the rest of the session.

8.1.3. Distributed Applications

Since we had this possibility, we decided to install certain applications on every server so that if one of them is off-line it will be possible to connect to the others (the user control panel, or our blog).

In order to be used in this way, normal "software" needs to be modified, particularly concerning the separation of the read and write queries to the MySQL database (supposing the database is replicated in single-master mode) and the possibility to tranfer sessions from one box to the other. The latter is needed because services might be located on a specific server that can be different from the one where the user logs in.

To avoid this problem, the most common solution (a centralized SQL database or a replicated one) was not satisying for us since it would imply a high internal bandwidth consumption (it's not thinkable to do a remote SQL connection for each access to a webpage). We therefore developed a specific mechanism using authentication cookies that are read as a transparent authentication system while the user passes from the front server to the final server. By intercepting the login functions of the various applications, a new session on the new server is created when the authentication cookie matches.

Single sign-on mechanism