Posts Tagged ‘ram’

An Unfunny Thing Happened on the way to 51% Full tmpfs

I’ve run into another caveat of my SHM/tmpfs File-Based PHP Cache/Datastore in RAM technique. Fortunately, this one is much better documented than the subject of A Funny Thing Happened on the way to 100% Full tmpfs. On a 2GB tmpfs used storage halted at 51%

# df
Filesystem                                               1K-blocks     Used Available Use% Mounted on
none                                                       2097152  1062212   1034940  51% /mnt/ram

And the number of files stopped increasing at

# ls | wc -l
174196

Knowing that tmpfs isn’t a file-system in the true sense of the term and therefore should not be subject to limitations like directory size I had a hunch it had run out of inodes.

# df -i
Filesystem                                                Inodes  IUsed   IFree IUse% Mounted on
none                                                      174206 174206       0  100% /mnt/ram

As it turns out

The default [number of inodes] is half of the number of your physical RAM pages

We can either adjust the nr_inodes mount option to override this default or set it to 0 to remove this limitation altogether.

# mount -o remount,nr_inodes=0 /mnt/ram

Now when we look at our inode statistics:

# df -i
Filesystem                                                Inodes  IUsed   IFree IUse% Mounted on
none                                                           0      0       0     - /mnt/ram

From /usr/src/linux/Documentation/filesystems/tmpfs.txt:

Tmpfs is a file system which keeps all files in virtual memory.


Everything in tmpfs is temporary in the sense that no files will be
created on your hard drive. If you unmount a tmpfs instance,
everything stored therein is lost.

tmpfs puts everything into the kernel internal caches and grows and
shrinks to accommodate the files it contains and is able to swap
unneeded pages out to swap space. It has maximum size limits which can
be adjusted on the fly via 'mount -o remount ...'

If you compare it to ramfs (which was the template to create tmpfs)
you gain swapping and limit checking. Another similar thing is the RAM
disk (/dev/ram*), which simulates a fixed size hard disk in physical
RAM, where you have to create an ordinary filesystem on top. Ramdisks
cannot swap and you do not have the possibility to resize them. 

Since tmpfs lives completely in the page cache and on swap, all tmpfs
pages currently in memory will show up as cached. It will not show up
as shared or something like that. Further on you can check the actual
RAM+swap use of a tmpfs instance with df(1) and du(1).


tmpfs has the following uses:

1) There is always a kernel internal mount which you will not see at
   all. This is used for shared anonymous mappings and SYSV shared
   memory. 

   This mount does not depend on CONFIG_TMPFS. If CONFIG_TMPFS is not
   set, the user visible part of tmpfs is not build. But the internal
   mechanisms are always present.

2) glibc 2.2 and above expects tmpfs to be mounted at /dev/shm for
   POSIX shared memory (shm_open, shm_unlink). Adding the following
   line to /etc/fstab should take care of this:

        tmpfs   /dev/shm        tmpfs   defaults        0 0

   Remember to create the directory that you intend to mount tmpfs on
   if necessary.

   This mount is _not_ needed for SYSV shared memory. The internal
   mount is used for that. (In the 2.3 kernel versions it was
   necessary to mount the predecessor of tmpfs (shm fs) to use SYSV
   shared memory)

3) Some people (including me) find it very convenient to mount it
   e.g. on /tmp and /var/tmp and have a big swap partition. And now
   loop mounts of tmpfs files do work, so mkinitrd shipped by most
   distributions should succeed with a tmpfs /tmp.

4) And probably a lot more I do not know about :-)


tmpfs has three mount options for sizing:

size:      The limit of allocated bytes for this tmpfs instance. The 
           default is half of your physical RAM without swap. If you
           oversize your tmpfs instances the machine will deadlock
           since the OOM handler will not be able to free that memory.
nr_blocks: The same as size, but in blocks of PAGE_CACHE_SIZE.
nr_inodes: The maximum number of inodes for this instance. The default
           is half of the number of your physical RAM pages, or (on a
           machine with highmem) the number of lowmem RAM pages,
           whichever is the lower.

These parameters accept a suffix k, m or g for kilo, mega and giga and
can be changed on remount.  The size parameter also accepts a suffix %
to limit this tmpfs instance to that percentage of your physical RAM:
the default, when neither size nor nr_blocks is specified, is size=50%

If nr_blocks=0 (or size=0), blocks will not be limited in that instance;
if nr_inodes=0, inodes will not be limited.  It is generally unwise to
mount with such options, since it allows any user with write access to
use up all the memory on the machine; but enhances the scalability of
that instance in a system with many cpus making intensive use of it.


tmpfs has a mount option to set the NUMA memory allocation policy for
all files in that instance (if CONFIG_NUMA is enabled) - which can be
adjusted on the fly via 'mount -o remount ...'

mpol=default             use the process allocation policy
                         (see set_mempolicy(2))
mpol=prefer:Node         prefers to allocate memory from the given Node
mpol=bind:NodeList       allocates memory only from nodes in NodeList
mpol=interleave          prefers to allocate from each node in turn
mpol=interleave:NodeList allocates from each node of NodeList in turn
mpol=local               prefers to allocate memory from the local node

NodeList format is a comma-separated list of decimal numbers and ranges,
a range being two hyphen-separated decimal numbers, the smallest and
largest node numbers in the range.  For example, mpol=bind:0-3,5,7,9-15

A memory policy with a valid NodeList will be saved, as specified, for
use at file creation time.  When a task allocates a file in the file
system, the mount option memory policy will be applied with a NodeList,
if any, modified by the calling task's cpuset constraints
[See Documentation/cgroups/cpusets.txt] and any optional flags, listed
below.  If the resulting NodeLists is the empty set, the effective memory
policy for the file will revert to "default" policy.

NUMA memory allocation policies have optional flags that can be used in
conjunction with their modes.  These optional flags can be specified
when tmpfs is mounted by appending them to the mode before the NodeList.
See Documentation/vm/numa_memory_policy.txt for a list of all available
memory allocation policy mode flags and their effect on memory policy.

        =static         is equivalent to        MPOL_F_STATIC_NODES
        =relative       is equivalent to        MPOL_F_RELATIVE_NODES

For example, mpol=bind=static:NodeList, is the equivalent of an
allocation policy of MPOL_BIND | MPOL_F_STATIC_NODES.

Note that trying to mount a tmpfs with an mpol option will fail if the
running kernel does not support NUMA; and will fail if its nodelist
specifies a node which is not online.  If your system relies on that
tmpfs being mounted, but from time to time runs a kernel built without
NUMA capability (perhaps a safe recovery kernel), or with fewer nodes
online, then it is advisable to omit the mpol option from automatic
mount options.  It can be added later, when the tmpfs is already mounted
on MountPoint, by 'mount -o remount,mpol=Policy:NodeList MountPoint'.


To specify the initial root directory you can use the following mount
options:

mode:   The permissions as an octal number
uid:    The user id 
gid:    The group id

These options do not have any effect on remount. You can change these
parameters with chmod(1), chown(1) and chgrp(1) on a mounted filesystem.


So 'mount -t tmpfs -o size=10G,nr_inodes=10k,mode=700 tmpfs /mytmpfs'
will give you tmpfs instance on /mytmpfs which can allocate 10GB
RAM/SWAP in 10240 inodes and it is only accessible by root.


Author:
   Christoph Rohland <cr@sap.com>, 1.12.01
Updated:
   Hugh Dickins, 4 June 2007
Updated:
   KOSAKI Motohiro, 16 Mar 2010

A Funny Thing Happened on the way to 100% Full tmpfs

I’ve been having trouble with XCache’s datastore lately and was prompted to revisit my SHM/tmpfs File-Based PHP Cache/Datastore in RAM strategy.

I mounted a 256MB tmpfs slice like this:

none                    /mnt/ram        tmpfs           defaults,noatime,size=256M      0 0

Somehow, after the storage was exhausted the number of files kept increasing.

$ watch "ls | wc -l"

To my surprise new, empty files were being created with a random string attached to their name.

36588.posts.object.ychanKJYGF
36588.posts.ychanPIUGN

Interestingly, while these non-files continued to accumulate the reported amount of storage used by tmpfs never changed. One might expect these file entries to be taking up some amount of space, somewhere.

I’ve done as much research as time will permit but haven’t found the purpose of this behaviour. Since the altered file name and lack of content makes these files void as far as my fcache implementation is concerned it highlights the importance of thorough garbage collection.

SHM/tmpfs File-Based PHP Cache/Datastore in RAM

It seems like only yesterday XCache was my knight in shining armour but a burst of segfaults has prompted the creation of a backup plan.

UPDATE It turns out my problem was actually PHP’s fault. Put that armour back on!

We can use files on tmpfs to provide much the same function as XCache or APC’s shared datastore. Start by mounting a slice somewhere appropriate (this line is for fstab):

none                    /mnt/ram        tmpfs           defaults,noatime,size=256M      0 0

Next we’ll create some basic interface functions. Let $config['fcache_path'] be the path to your tmpfs mount or writeable directory:

function fcache_isset($key)
{
	global $config;
	return @file_exists($config['fcache_path'].$key);
}

function fcache_unset($key)
{
	global $config;
	return @unlink($config['fcache_path'].$key);
}

function fcache_get($key)
{
	global $config;
	$val = @file_get_contents($config['fcache_path'].$key);
	if(empty($val))
		return NULL;
	else
		return $val;
}

function fcache_set($key, $val='')
{
	global $config;
	if(!empty($val))
	{
		$tmp = tempnam($config['fcache_path'], $key);
		if(@file_put_contents($tmp, $val))
		{
			if(@rename($tmp, $config['fcache_path'].$key))
				return true;
			else
				return false;
		}
		else
		{
			return false;
		}
	}
	return true;
}

I use rename() instead of flock() to make atomic writes because according to the manual page:

On some operating systems flock() is implemented at the process level. When using a multithreaded server API like ISAPI you may not be able to rely on flock() to protect files against other PHP scripts running in parallel threads of the same server instance!

They mention IIS’ ISAPI specifically but I’ve had enough problems with Apache’s mpm_worker lately that I’m not willing to take the risk. Further, I’d rather have the query run twice than have any lock-related hangups.

Now that we have some very basic functions to interface with we can put them to work in something useful. The following is what I’ve whipped up to switch between XCache, this file-based cache and no cache at all when pulling standard mysql results. cache_set() could easily be replaced with cache_unset() preceeding every update query but I do things this way to make the code more readable to me. You can also increase performance by using only arrays instead of converting between arrays and objects but this software was written entirely using mysql_fetch_object() and the caching was an afterthought.

Let $config['cache'] contain the cache type.

function cache_get($key, $query)
{
	global $config;
	if($config['cache'] == 'xcache' and function_exists('xcache_get'))
	{
		$serialized = xcache_get($key);
		if($serialized != NULL)
		{
			$unserialized = unserialize($serialized);
			$object = (object) $unserialized;
			return $object;
		}
		else
		{
			$result = mysql_query($query);
			if($result === false)
				return false;
			if(mysql_num_rows($result) > 0)
			{
				$object = mysql_fetch_object($result);
				$array = (array) $object;
				$serialized = serialize($array);
				xcache_set($key, $serialized);
				return $object;
			}
			else
			{
				return true;
			}
		}
	}
	elseif($config['cache'] == 'fcache' and function_exists('fcache_get'))
	{
		$serialized = fcache_get($key);
		if($serialized != NULL)
		{
			$unserialized = unserialize($serialized);
			$object = (object) $unserialized;
			return $object;
		}
		else
		{
			$result = mysql_query($query);
			if($result === false)
				return false;
			if(mysql_num_rows($result) > 0)
			{
				$object = mysql_fetch_object($result);
				$array = (array) $object;
				$serialized = serialize($array);
				fcache_set($key, $serialized);
				return $object;
			}
			else
			{
				return true;
			}
		}
	}
	else
	{
		$result = mysql_query($query);
		if($result === false)
			return false;
		if(mysql_num_rows($result) > 0)
		{
			$object = mysql_fetch_object($result);
			return $object;
		}
		else
		{
			return true;
		}
	}
}

function cache_set($key, $query)
{
	global $config;
	if($config['cache'] == 'xcache' and function_exists('xcache_unset'))
	{
		$result = mysql_query($query);
		if($result === false)
			return false;
		xcache_unset($key);
		return true;
	}
	elseif($config['cache'] == 'fcache' and function_exists('fcache_unset'))
	{
		$result = mysql_query($query);
		if($result === false)
			return false;
		fcache_unset($key);
		return true;
	}
	else
	{
		$result = mysql_query($query);
		if($result === false)
			return false;
		return true;
	}
}

function cache_unset($key)
{
	global $config;
	if($config['cache'] == 'xcache' and function_exists('xcache_unset'))
	{
		xcache_unset($key);
		return true;
	}
	elseif($config['cache'] == 'fcache' and function_exists('fcache_unset'))
	{
		fcache_unset($key);
		return true;
	}
	else
	{
		return true;
	}
}

Please note that cache_get() checks if the returned value is NULL, it does NOT use (x|f)cache_isset() because that would introduce a serious race condition.

This implementation leaves out two important features that xcache has: garbage collection and timeouts. Garbage collection can be handled by a cron script and use of the find command to take out stale entries. Timeouts can be implemented by inserting a value into the file and comparing it against the file’s time stamp and the current time – a clever idea I got from looking over http://flourishlib.com/docs/fCache.

Will Bond’s fCache is probably what you’re looking for if you want to be able to port between all of the major datastores easily and have individual control over an item’s expiration. However, this implementation uses a rand()om number for garbage collection and may be subject to the race (or minor hangup depending on how file_put_contents() handles locking) condition we avoid here with atomic writes.

Here are some completely meaningless apache bench benchmarks against an AJAX app’s polling script on a live, production server:

Without datastore

Document Length:        105 bytes

Concurrency Level:      20
Time taken for tests:   122.880 seconds
Complete requests:      10000
Failed requests:        0
Write errors:           0
Total transferred:      2420000 bytes
Total POSTed:           2290229
HTML transferred:       1050000 bytes
Requests per second:    81.38 [#/sec] (mean)
Time per request:       245.761 [ms] (mean)
Time per request:       12.288 [ms] (mean, across all concurrent requests)  
Transfer rate:          19.23 [Kbytes/sec] received          
         18.20 kb/s sent       
         37.43 kb/s total      
 
Connection Times (ms)          
              min  mean[+/-sd] median   max   
Connect:       91  202 341.7    154    9170   
Processing:    19   43  40.3     32     746   
Waiting:       19   40  39.0     31     746   
Total:        114  245 344.1    194    9193   
 
Percentage of the requests served within a certain time (ms) 
  50%    194    
  66%    212    
  75%    224    
  80%    232    
  90%    264    
  95%    408    
  98%    480    
  99%   3170    
 100%   9193 (longest request)

XCache datastore

Document Length:        105 bytes

Concurrency Level:      20
Time taken for tests:   121.803 seconds
Complete requests:      10000
Failed requests:        0
Write errors:           0
Total transferred:      2420000 bytes
Total POSTed:           2290229
HTML transferred:       1050000 bytes
Requests per second:    82.10 [#/sec] (mean)
Time per request:       243.605 [ms] (mean)
Time per request:       12.180 [ms] (mean, across all concurrent requests)
Transfer rate:          19.40 [Kbytes/sec] received
                        18.36 kb/s sent
                        37.76 kb/s total

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:       91  201 331.6    154    3418
Processing:    19   42  40.6     32     798
Waiting:       19   39  39.2     31     788
Total:        115  243 334.1    193    3459

Percentage of the requests served within a certain time (ms)
  50%    193
  66%    210
  75%    221
  80%    228
  90%    260
  95%    405
  98%    473
  99%   3163
 100%   3459 (longest request)

fcache

Document Length:        105 bytes

Concurrency Level:      20
Time taken for tests:   121.174 seconds
Complete requests:      10000
Failed requests:        0
Write errors:           0
Total transferred:      2420000 bytes
Total POSTed:           2291374
HTML transferred:       1050000 bytes
Requests per second:    82.53 [#/sec] (mean)
Time per request:       242.347 [ms] (mean)
Time per request:       12.117 [ms] (mean, across all concurrent requests)
Transfer rate:          19.50 [Kbytes/sec] received
                        18.47 kb/s sent
                        37.97 kb/s total

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:       90  199 320.8    154    3407
Processing:    19   42  38.7     33     747
Waiting:       19   40  37.3     32     747
Total:        116  242 323.2    193    3486

Percentage of the requests served within a certain time (ms)
  50%    193
  66%    210
  75%    222
  80%    231
  90%    262
  95%    408
  98%    475
  99%   3161
 100%   3486 (longest request)

Interesting to see fcache narrowly beat out xcache but since the testing environment is not perfectly controlled the results are of course useless.

Return top
foxpa.ws
Online Marketing Toplist
Internet
Technology Blogs - Blog Rankings

Internet Blogs - BlogCatalog Blog Directory

Bad Karma Networks

Please Donate!


Made in Canada  •  There's a fox in the Gibson!  •  2010-12