One of the great things about the HTTP protocol, besides status code 418, is that it's stateless. A web server therefore is not required to store any information on the user or allocate resources for a user after the individual request is done. By that a single web server can handle many many many different users easily, and well if it can't anymore one can add a new server, put a simple load balancer in front and scale out. Each of those web servers then handles its requests without the need for communication which leads to linear scaling (assuming network provides enough bandwidth etc.).
Now the Web isn't used for serving static documents only anymore but we have all these fancy web apps. And those applications often have the need for a state. The most trivial information they need is the current user. HTTP is a great protocol and provides a way to do authentication which works well with its stateless nature - unfortunately this authentication is implemented badly in current clients. Ugly popups, no logout button, ... I don't have to tell more I think. For having nicer login systems people want web forms. Now the stateless nature of HTTP is a problem: The user may login and then browse around. On later requests it should still be known who that user is - with a custom html form based login alone this is not possible. A solution might be cookies. At least one might think so for a second. But setting a cookie "this is an authorized user" alone doesn't make sense as it could easily be faked. Better is to simply store a random identifier in a cookie and then keep a state information on the server. Then all session data is protected and only the user who knows this random identifier is authenticated. If this identifier is wisely chosen and hard to guess this works quite well. Luckily this is a mostly PHP- and MySQL-focused blog and as PHP is a system for building web applications this functionality is part of the core language: The PHP session module.
The session module, which was introduced in PHP 4, partly based on work on the famous phplib library, is quite a fascinating piece of code. It is open and extendable in so many directions but still so simple to use that everybody uses it, often newcomers learn about it on their first day in PHP land. Of course you can not only store the information whether the user is logged in but cache some user-specific data or keep the state on some transactions by the user, like multi-page forms or such.
In its default configuration session state will be stored on the web server's file system. Each session's data in its own file in serialized form. If the filesystem does some caching or one uses a ramdisk or something this can be quite efficient. But as we suddenly have a state on the web server we can't scale as easily as before anymore: If we add a new server and then route a user with an existing session to the new server all the session data won't be there. That is bad. This is often solved by a configuration of the load balancer to route all requests from the same user to the same web server. In some cases this works quite ok, but it is often seen that this might cause problems. Let's assume you want to take a machine down for maintenance. All sessions there will die. Or imagine there's a bunch of users who do complex and expensive tasks - then one of your servers will have a hard time, giving these users bad response times which feels like bad service, even though your other systems are mostly idle.
A nice solution for this would be to store the sessions in a central repository which can be accessed from all web servers.
Continue reading "High Performance PHP Session Storage on Scale"