[ Team LiB ] Previous Section Next Section

28.12 Optimizing Disk-Based Sessions

Many Web applications use HTTP sessions to retain information about specific users for the duration of their visit. The default and most common storage for the session information is disk files, located in /tmp. With heavily loaded Web sites that serve large number of users, accessing the session store on the disk may become extremely inefficient, since most file systems (including Linux's ext2 and Windows' NTFS) don't handle a large number of files in the same directory very efficiently. As the number of files in the /tmp directory grows due to a large number of active sessions, the time it takes to open each session file becomes longer.

A good first step would be moving the session storage directory from /tmp into a dedicated directory in the file system. You can do that by setting the session.save_path directive in php.ini. Using a different directory removes the overhead of non-PHP sessions-related files if any reside in /tmp. However, this is indeed just a first step and not necessarily a very big one. Given enough active sessions, the number of other files in /tmp may be negligible.

As if out of habit, PHP comes to the rescue and allows you to easily distribute the session files to multiple directories without any hassle. PHP has built-in support to treat the first n letters in the session key as hashing directories. For those of you not familiar with this methodology, let's illustrate. Consider we have a session with the key 3fdb6cd5748e5ef2ecc415530a3f167e. Assuming we've set session.save_path to /tmp/php_sessions, PHP stores this session in a file named /tmp/php_sessions/ sess_3fdb6cd5748e5ef2ecc415530a3f167e. However, if we change php.ini to session.save_path = 2;/tmp/php_sessions, PHP stores the session information in /tmp/php_sessions/3/f/sess_3fdb6cd5748e5ef2ecc415530a3f167e. Note the extra directories separating php_sessions and the session file itself. Similarly, if we set the session.save_path to 4;/tmp/php_sessions, PHP stores the session file in /tmp/php_sessions/3/f/d/b/sess_3fdb6cd5748e5ef2ecc415530a3f167e. The optional semicolon-separated number in session.save_path is named the session save path depth.

Thanks to the exponential nature of this algorithm, the number of files per directory is reduced by a power of 36 that equals the session save path depth, 36 being the number of characters used for session identifiers. This means that there usually isn't a need to go beyond a depth of 2 or 3.

Garbage collection may be improved too. Garbage collection is the process of removing old session files after a certain expiration timeout. By default, PHP takes care of garbage collection automatically. However, due to architectural constraints, PHP's built-in garbage collection takes place inside the context of a request. This means that at least one request will end up being blocked for the duration of the cleanup, which can sometimes take more than a few seconds. Moreover, PHP's automated cleanup supports only the default depth setting of 0. As soon as we move to use a different depth, automated garbage collection will no longer work, and session files will begin to pile up.

The best solution for the garbage collection issue is to move it out of PHP and into a cron job. For instance, if you would like to remove sessions after 24 hours and perform collection every hour, you could add the following line to the system's crontab:

0 * * * * nobody find /tmp/php_sessions -name sess_\* -ctime +1 | xargs rm –f

Using this mechanism works regardless of any session.save_path depth you may be using and prevents any requests from getting stuck for long periods of time due to garbage collection. Of course, you may want to tune the frequency of garbage collection by using different cron settings or change the expiration limit for session file by using different find settings.

    [ Team LiB ] Previous Section Next Section