Friday, November 11, 2011

Faster Arrays - Davey Shafik

Arrays, long considered the work horse of PHP have one flaw: they can be incredibly slow. There is however an alternative ?�at least, for a small subset of use cases. SplFixedArray.

You use SplFixedArray like so:

 <?php $array = new SplFixedArray($size); $array[0] = 'foo'; $array[1] = 'bar'; ... $array[100000] = 'baz'; ?> 

The SplFixedArray class provides a super-fast, fixed size array implementation. There are some limitations however, first you must use numeric keys and secondly you cannot use anonymous assignment (i.e. $array[] = 'value';).

You?ll notice one requirement was missing, that it should have a fixed size. While having a fixed size is what will bring you the speed increase it?s actually not a requirement that the size be fixed. Though you must specify a size to the constructor, you can change it (and lose most ?�if not all ?�speed benefits) at any time using SplFixedArray->setSize().

So, what sort of speed increase are we talking about? In my testing of arrays 100, 100? 1,000,000 elements, you will see a speed increase about 20-25%; for arrays smaller than 100, it will actually be slower by 25-40%.

The benchmarking was very simple, a comparison of a read and write iteration for both normal and fixed arrays of different sizes like so:

 <?php $size = 1000;  // Regular Array $array = array();  // Write for($i = 0; $i < $size; $i++) {   $array[$i] = $i; }  // Read for($i = 0; $i < $size; $i++) {   $value = $array[$i]; }  // Fixed Array $fixedArray = new SplFixedArray($size);  // Write for($i = 0; $i < $size; $i++) {   $fixedArray[$i] = $i; }  // Read for($i = 0; $i < $size; $i++) {   // do nothing   $value = $fixedArray[$i]; } ?> 

Additionally, the memory usage to run the benchmarks for array vs SplFixedArray is significantly different, regular arrays clock in at 198MB while SplFixedArray uses a mere 83MB, that?s a 59% memory saving.

In practical terms, you?re only going to be worried about the speed of arrays when you?re dealing with larger arrays anyway, so the speed loss for the lower digits isn?t a big concern? but where exactly could this be useful?

There is one common scenario where you will commonly be dealing with large numerically indexed arrays of data: Your database result sets. Using PDO, you can tell how many results you have before you retrieve the row data using PDOStatement->rowCount().

 <?php $pdo = new PDO(...);  $sql = "SELECT * FROM table WHERE column = :value"; $query = $pdo->prepare($sql);  $values = array(':value' => 'foo'); $result = $query->execute($values);  $count = $query->rowCount();  $rows = $query->fetchAll(...); // It would be great if $rows could be an SplFixedArray... ?> 

Unfortunately, it is not possible to set the result set container for PDOStatement->fetchAll() to use SplFixedArray ? however, if someone wants to help (that is, someone who knows internals and? well, C), I?ve got an opening for a coach!

At the urging of my co-worker Helgi, I threw the arrays into a FilterIterator and got some pretty interesting results. Using similar code to the first benchmark, but instead of just reading out the array, we created and used a custom FilterIterator:

 <?php class EvenFilterIterator extends FilterIterator {   /**    * Accept only even-keyed values    *    * @return bool    */   public function accept()   {     // Get the actual iterator     $iterator = $this->getInnerIterator();      // Get the current key     $key = $iterator->key();      // Check for even keys     if ($key % 2 == 0) {       return true;     }      return false;   } } ?> 

For regular arrays, we must first create an iterator:

 <?php $iterator = new ArrayIterator($array);  $filter = new EvenFilterIterator($iterator);  foreach ($filter as $key => $val) {     $value = $val; } ?> 

For the SplFixedArray, we passed it straight into the EvenFilterIterator, otherwise the code is the same.

Even with the extra overhead of creating the ArrayIterator, the SplFixedArray is only marginally (1%) faster till it reaches the 10000 elements mark, and then it starts to become marginally slower (again 1%). So, I guess the take-away is: use with caution.



No comments:

Post a Comment