As some of you may know, I’m crazy about speed. So when I saw that people were happily using Predis as their choice of PHP client for Redis, I was a bit confused.
Why use a client written in PHP for something that should be ‘fast’ like Redis?
That kind of defeats the purpose - unless you don’t really care about response times and scalability.
I could understand using it if there were no alternatives such as PhpRedis, or if you wanted to add some sort of proprietary layer that you cannot add on top of a C extension.
Don’t get me wrong, if you have a valid reason to use the extension, then more power to you. I know both packages have contributors who have put tons of sweat into getting them to where they are now.
To automate and define a common benchmarking strategy for Memcache, Memcached, Predis and PhpRedis I decided to write a small framework that automatically runs a set of tests that client requests.
You can find it here: https://github.com/AlekseyKorzun/benchmarker-nosql-php
Tests were performed on VirtualBox with 2 processors and 1024MB of RAM allocated to it.
The host machine is Intel i7 2600K with 16 GB of RAM.

In a regular get/set benchmark every client except Predis performs on equal level. Memcached edges out PhpRedis and Memcache by ~1 r/s on average at ~83 r/s while Predis is trailing the pack around ~12 r/s.

This test is pretty hard core; if you look at the benchmarking framework we are testing get/set with a pretty huge object.
Both Memcache and Predis fail to complete the test and begin to fail once concurrency goes up to 100.
Redis and Memcached are pretty much even at 50 and 100 concurrent requests but once we go up to 150 requests Redis starts to trail Memcached by ~ 10.5 r/s which indicates that it prone to fail before Memcached gives in.

Pretty even performance for everybody except Predis, which is about 6x slower than the rest of the clients.

Again, every client except Predis performs at about the same level. Predis seems to average out at 11-12 r/s in every test as it seems that this is a limitation before it even starts to hand off requests to Redis daemon.

Predis failed to complete this test while every other client passed it with 77 r/s on average with PhpRedis leading the pack with a small margin.


Predis once again, fails to go above 13 r/s while PhpRedis destroys it five way till Sunday.
Both Redis and Memcached clients support IgBinary, so obviously I had to test them since I’m a huge IgBinary fan.
There are two ways you can use IgBinary, natively (as in let client handle it) or directly in PHP (serialize object prior to passing it to client using IgBinary extension).
I tested both approaches, let’s start with native:

Memcached is performing extremely well with IgBinary but sees a minor performance drop over regular serializer as we reach 150 concurrent connections.
PhpRedis sees a good jump in performance as well but it starts to even out as we increase connections and unfortunately the client was unable to complete the final 150 concurrent connections test.
And check out results when we serialize objects directly in PHP using igbinary extension:

While PhpRedis still fails to complete the final test both clients seem to process more requests when we handle serialization ourselves.
Let’s compare native IgBinary tests with regular PHP seriazer:

Keeping in mind that you can squeeze more juice out if you use IgBinary directly in your code I will have to say IgBinary is a winner even if it shows a minor drop as we reach 150 connections.
Memcached with IgBinary is a clear winner.
Provided benchmarks might not reflect real world (tm) performance so take everything you read below with a grain of salt.
I do not recommend using Predis if you care about performance, period. It’s a massive bottle neck and if you are not using features unique to Redis over traditional RDBMS you are already running then I would not even bother introducing Redis in your stack if you are going to use Predis as a client.
The Memcached client is faster than PhpRedis and will keep your site up (even if its slow) for a bit longer before starting to fail.
The Memcache client is not a snail by any means, while it failed the large keys test at 150 concurrent connections it still put up a really good fight and performed quite well.
If you do not need the features Memcached has to offer and are not scaling application that cached large objects you should be fine.
If you can, use IgBinary. It’s does make a big difference.
You can view spreadsheet of benchmark data here: https://docs.google.com/spreadsheet/pub?key=0AhePUdRMAppIdHIzd2d3YU9oVE55MnctaGc3NTVvcVE&single=true&gid=0&output=html
I received few inquires for a technical run down of how Memcached wrapper handles expiration of keys.
When I first read about this concept, it was pretty hard to understand since most of the sources were way too technical (the fact that English is my second language probably did not help either) for somebody who just entered word of ‘holy crap! you can store stuff in memory’.
I will try my best to explain the problem with just caching and one of the ways you can avoid it without using too much of technical jargon.
Caching, to a less experienced developer is viewed as a tool that solves all of the performance problems. When Memcache was first introduced , developers would simply wrap a chunk of code in a block that does the following:
And when same code block was executed again it would find the data associated with that key within the cache pool and give you the data… extremely fast.
To a developer that’s a wow factor by it self, I remember how impressed I was when I implemented something like this for the first time.
Such reaction is enough to simply put your tools down and call it a job well done. You can go home and celebrate your achievement and dream about handling thousands of users at any given time because this caching thing is awesome.
While you are still thinking that you solved all of the scalability problems you will ever have, your website get’s a tsunami of new visitors from a TV PR campaign your marketing team launched.
Everything looks great, you are smirking at how well the cache is working.
Then the site goes down.
You scrambles to find a solution, it can’t be the caching you just implemented. It’s just too fast to fail.
Usually (from what I seen) people will point fingers at the database or whatever complex/slow pieces of code you were hiding behind a cache.
(And while the database might be the slowest part of your application, that’s not the reason why your web site went down.)
Since cache is sooo fast, and your logs are telling you that your database simply died as soon as it started to see a little bit of requests you naturally assume that’s the main issue and it should be addressed.
So in a moment of panic you scream for more database slaves, a better tune of your cluster and perhaps a crazy last second sharding implementation.
That works out great, you are back in business and things are looking up.
Just like a paradise you might discover after walking under a burning sun in a desert, the solution you put in place to prevent another downtime due to a massive amount of traffic was simply an illusion.
The web site will still go down under the same conditions (unless of-course you invested in a small data center that hosts dozens of database clusters, then it’s debatable).
The reason for such mistakes is overlooking the fact that once you cache something that takes more than x seconds to execute, it will eventually expire from cache.
Either from having a short time to live (TTL) or by being pushed out the caching pool to make space for fresher data.
Let’s say you have 35 queries on your web site that you put behind a caching layer. You request the page and it flies. Absolutely no issues.
Even when cache expires and you grab the page it loads pretty damn fast, you can add 10 more concurrent visitors and there will be no problems. The database picks it all up easily and puts in back in the cache.
Now, if you multiply those 10 concurrent requests by let’s say 20. If all of the queries are cached, cache pool can process 200 requests without any issues.
But once the data in the cache expires all of a sudden all 200 requests to cache pool return ‘resource not found’ and sends all of those requests directly to the database at once.
And then the alarms go off.
The chain reaction is usually something along these lines:
That sucks, right? If only there was a way to prevent this from happening.
One of the approaches to this problem is fairy simple. The concept is to wrap the original resource you are caching in an array that contains a time stamp that is set to a time that is just a few minutes short of the time when the item is actually set to expire.
And when your application unpacks cached resource it will check that time stamp and if current time greater it means that item it just retrieved is going to expire relatively soon.
Once it knows that item is going to expire, it will update the cached record with a new one that contains the exacly the same data it just recieved but with a longer expiration time.
Essentially telling anybody else who is pulling the data that the item is not going to expire any time soon.
Then you simply lie to your application and tell them that this request did not get anything back from cache. It will now execute a database query you were caching and save it back into the database with a new expiration date.
All of the new requests will now have an updated version of cached data.
While it’s possible that more than one request will slip through the flood gates, this is usually really rare. When I tested with 200 concurrent connections the key was updated by a single request.
Since you are caching everything, your database and apache should be pumping out requests fairly fast so you can probably afford more than a single person slipping through this barrier one in a while without creating an unrecoverable request queue of death.
Let’s get a little bit more technical and try to implement this solution in our code. First we need to come up with a time interval that we will subtract from the original expiration date.
This time interval depends on your caching strategy, basic rule of thumb take a median expiration intervals for the most important keys in cache that represent data from really slow database queries.
If that number is let’s say an hour and you can guarantee that on average, during a relatively busy day you get more than one request every 10 minutes, it’s safe to set time interval to 10 minutes.
If you have cached data that needs to be updated at a faster rate, always make sure that you are guaranteed that there will be a single request between original expiration time and the fake expiration time (ie: your safety net).
So since we now have a number in mind, let’s write a simple wrapper for Memcached extension so we can override set() method and wrap a resource we are caching in an array containing our ‘fake’ expiration date:
use \Memcached;
class Wrapper
{
/**
* Time to substract from original expiration date
*
* @var int
*/
const DELAY = 600;
/**
* Instance of Memcached
*
* @var Memcached
*/
protected $memcached;
/**
* Indicates that current look-up will expire shortly (dog-pile)
*
* @var bool
*/
protected $isResourceExpired = false;
/**
* Class constructor
*/
public function __construct()
{
$this->memcached = new Memcached();
}
/**
* Add a new cached record using passed resource and key association
*
* @param string $key key to store passed resource under
* @param mixed $resource resource you want to cache
* @param int $ttl when should this key expire in seconds
* @return bool
*/
public function set($key, $resource, $ttl)
{
return (bool)$this->memcached->set($key, $this->wrap($resource, $ttl), $ttl);
}
/**
* Wrap new cached resource into an array containing TTL stamp
*
* @param mixed $resource resource that is getting cached
* @param int $ttl internal extended expiration
* @return mixed[] returns packed resource with TTL stamp to store in cache
*/
protected function wrap($resource, $ttl)
{
// Set meta expiration date 10 minutes before the actual date
$ttl -= self::DELAY;
return array(
'ttl' => $ttl,
'resource' => $resource
);
}
}
As you can see, we simply intercept set() method on original extension so we can call wrap() method on a resource we are caching. In return that method will take original expiration time we are attempting to set and subtract 10 minutes from it prior to adding it to our array.
Now, we need to intercept a get() method so we can unwrap the previously wrapped data and check the fake expiration date we set in order to determine if we should pretend the result is no longer cached.
To do so let’s add following methods to our wrapper:
/**
* Override get method so we can wrap resource that is being cached
* in an array containing additional metadata
*
* @param $string $key
* @param mixed $resource where to store retrieved resource
* @return bool
*/
protected function get($key, &$resource)
{
// Attempt to retrieve record within cache pool
$response = $this->memcached->get($key);
if ($this->memcached->getResultCode() == Memcached::RES_SUCCESS) {
// Pass record to unwrap method
$resource = $this->unwrap($key, $response);
// If key is marked as expired (needs to be updated within this request)
// we will not return true, but instead fake a failure
if (!$this->isResourceExpired) {
return true;
}
}
return false;
}
/**
* Get requested data back into memory while setting a delayed cache entry
* if data is expiring soon
*
* @param string $key key that you are retrieving
* @param mixed[] packed data that we got back from cache pool
* @return mixed|bool returns cached resource or false if invalid data was
* passed for unwrapping
*/
protected function unwrap($key, array $data)
{
// If expiration date is not set to never
if ($data['ttl'] > 0) {
// If current time is equal or greater than a fake expiration time
if (time() >= $data['ttl']) {
// Set the stale value back into cache for a short 'delay' of 10 minutes
// so no one else tries to write the same data.
//
// Note how we are calling our set method that utilizes wrap()
if ($this->set($key, $this->wrap($data['resource'], self::DELAY), self::DELAY)) {
// Set flag that tells
$this->isResourceExpired = true;
}
}
}
return $this->store($key, $data['resource']);
}
You now have a pretty solid protection that stops random flood of requests that bypass your caching layer at the same time.
As part of my Memcached wrapper, I included a simple proof of concept script that you can use to test this scenario your self.
When I bench marked the script in question with 200 concurrent requests the results did all the talking:
Using technique we implemented:
[30-Apr-2013 01:02:52 UTC] Wrapper database hit: 2013-04-30 03:02:52
[30-Apr-2013 01:07:52 UTC] Wrapper database hit: 2013-04-30 03:07:52
[30-Apr-2013 01:12:51 UTC] Wrapper database hit: 2013-04-30 03:12:51
As you can see only a single request out of 200 got through to query the database and update cache.
Using raw get/set methods:
[30-Apr-2013 01:18:09 UTC] Memcached database hit: 2013-04-30 03:18:09
[30-Apr-2013 01:18:09 UTC] Memcached database hit: 2013-04-30 03:18:09
[30-Apr-2013 01:18:09 UTC] Memcached database hit: 2013-04-30 03:18:09
[30-Apr-2013 01:18:09 UTC] Memcached database hit: 2013-04-30 03:18:09
[30-Apr-2013 01:18:09 UTC] Memcached database hit: 2013-04-30 03:18:09
[30-Apr-2013 01:18:09 UTC] Memcached database hit: 2013-04-30 03:18:09
[30-Apr-2013 01:18:09 UTC] Memcached database hit: 2013-04-30 03:18:09
[30-Apr-2013 01:18:09 UTC] Memcached database hit: 2013-04-30 03:18:09
[30-Apr-2013 01:18:09 UTC] Memcached database hit: 2013-04-30 03:18:09
[30-Apr-2013 01:18:10 UTC] Memcached database hit: 2013-04-30 03:18:10
[30-Apr-2013 01:18:10 UTC] Memcached database hit: 2013-04-30 03:18:10
[30-Apr-2013 01:18:10 UTC] Memcached database hit: 2013-04-30 03:18:10
[30-Apr-2013 01:18:10 UTC] Memcached database hit: 2013-04-30 03:18:10
…. the Apache could no longer keep up with requests.
This package provides a way to store your application files in Memcached pool with automatic parsing using eval() upon retrieval.
I wrote this after trolling a friend of mine how storing files in Memcached is faster than including them from disk, so while there are definite uses for this under right circumstances make sure you understand what you are doing.
First let’s look at benchmarks running in VirtualBox on MacBook Pro:
Loaded 500 files using regular include() in 0.020308017730713 Loaded 500 files using MemFS in 0.15591597557068 Loaded 500 files using MemFS (10 files at the time) 0.018874883651733 Loaded 500 files using MemFS (250 files at the time) 0.015585899353027 Loaded 500 files using MemFS (500 files at the time) 0.016402959823608
Keep in mind that the laptop in question has SSD drive.
As you can see, when compared directly one for one you might get performance degration but as you scale up and request 10+ files you will see a slight speed increase of about 0.00143313407 seconds.
Not much but this benchmark is not scientific and was performed on a empty machine with SSD drive with no I/O load. Your setup might be different and if you take 0.0014 * requests per seconds your application is processing you might see a benefit of saving 14 seconds per 10000 requests.
- Servers that host PHP based applications in a saturated or slow I/O environment. - Distributed applications that constantly update their logic, can be updated by updating memory pool with new code that they will include and evaluate. - Applications that need to load and parse more than 1 file in a consecutive order.
For optimal results, make sure server hosting cache pool is hosted on a separate server but shares the LAN with your web servers.
Depending on your load the link between the servers must be greater than 100Mbps.
Make sure to use igBinary with latest version of daemon.
If you have your own autoloader, simply update namespaces and drop the files into your frameworks library.
For people that do not have that setup, you can visit http://getcomposer.org to install composer on your system. After installation simply run `composer install` in parent directory of this distribution to generate vendor/ directory with a cross system autoloader.
You can benchmark and gauge how much benefit this optimization might bring you by running:
php benchmark/run.php
make sure to run it twice before reading the results so the application can cache files
You can grab the source code from my GitHub repository:
https://github.com/AlekseyKorzun/memfs-phpI recently got to play around with Magento (eCommerce platform based on Zend Framework) and had to figure out default billing and shipping information for all of our current clients.
Apparently developer community build around Magento is pretty amateurish judging by technical discussions that are found on their forums.
When I looked up how to retrieve default address information, the answers would all be in a pure PHP code that utilized Magento models to retrieve data.
Thats bloated and frankly plain dumb. Not to mention that some of us do not use PHP for system specific tasks.
In my case I’m using Python so I needed a SQL query that I can run against our Magento database to retrieve required data.
Magento stores data with dozens of relations (very confusing if you are just starting out) and some of relations do not make much sense. They split out entity properties by type (datetime, int, text, etc) and associate each type separately to main entity via id.
Not to mention that address data is considered to be it’s own entity and has it’s indexing outside customers space.
First we have to grab default billing and shipping ids associated to our user accounts:
SELECT
`ce`.`email`,
`default_billing_id`.`value` AS `default_billing`,
`default_shipping_id`.`value` AS `default_shipping`
FROM
`customer_entity` AS `ce`
LEFT JOIN `customer_entity_int` AS `default_billing_id` ON
(`default_billing_id`.`entity_id` = `ce`.`entity_id`) AND
(`default_billing_id`.`attribute_id` = '14')
LEFT JOIN `customer_entity_int` AS `default_shipping_id` ON
(`default_shipping_id`.`entity_id` = `ce`.`entity_id`) AND
(`default_shipping_id`.`attribute_id` = '13')
WHERE
(`ce`.`entity_type_id` = 1) AND
(`ce`.`is_active` = 1)
As you can see attribute id of ‘13’ represents default shipping identifier and attribute ‘14’ represents default billing identifier.
Now we need to join across address entity to retrieve actual address information linked to default addresses the user has, in this example I will retrieve default zip code for both billing and shipping address:
SELECT
`ce`.`email`,
`default_billing_id`.`value` AS `default_billing`,
`default_shipping_id`.`value` AS `default_shipping`,
`default_billing_zipcode`.`value` AS `default_billing_zipcode`,
`default_shipping_zipcode`.`value` AS `default_shipping_zipcode`
FROM
`customer_entity` AS `ce`
LEFT JOIN `customer_entity_int` AS `default_billing_id` ON
(`default_billing_id`.`entity_id` = `ce`.`entity_id`) AND
(`default_billing_id`.`attribute_id` = '14')
LEFT JOIN `customer_entity_int` AS `default_shipping_id` ON
(`default_shipping_id`.`entity_id` = `ce`.`entity_id`) AND
(`default_shipping_id`.`attribute_id` = '13')
LEFT JOIN `customer_address_entity_varchar` AS `default_shipping_zipcode` ON
(`default_shipping_id`.`value` = `default_shipping_zipcode`.`entity_id`) AND
(`default_shipping_zipcode`.`attribute_id` = '29')
LEFT JOIN `customer_address_entity_varchar` AS `default_billing_zipcode` ON
(`default_billing_id`.`value` = `default_billing_zipcode`.`entity_id`) AND
(`default_billing_zipcode`.`attribute_id` = '29')
WHERE
(`ce`.`entity_type_id` = 1) AND
(`ce`.`is_active` = 1)
Note that attribute_id’s might vary in your installation. You might consider modifying this query to use attribute labels for a more portable approach.
Feel free to reach out to me if you have any questions.
If you interact with any of Amazon’s Web Services using PHP you should be using their SDK.
It’s very powerful and pretty simple to get started with.
In my case I did not want to checkout their whole repository into my frameworks namespace and decided to use .phar file they provided as
an alternative.
To me, using a .phar file for third party libraries kind of makes sense if I’m not planning to extend or overwrite any of the available functionality.
Plus if you have your own directory conventions and don’t want to bother re-factoring Amazon’s structure, .phar might come in handy.
The issue I ran into (not really an issue but inconsistency always bothers me as an engineer) is that if you are using namespaces and your own autoloader, requiring .phar file anywhere you need to access AWS is pretty ghetto.
The way I solved it is by using a very simple wrapper that either loads configuration (using \library\configuration as an example) containing your key/secret or takes key/secret as a constructor parameter and simply stores instance of AWS internally while routing all the calls directly to it.
You can check out this approach on my GitHub page located here: https://github.com/AlekseyKorzun/aws-sdk-wrapper
Just a quick announcement that I updated my PHP 5 client for Cloud Servers(tm) API.
Some of the major fixes include retry limiting for authentication requests, support for updated response codes (especially for server/image requests) and better examples.
Code was also updated to conform to PSR-2 and phpDocumentator 2.
Grab the latest version here: http://alekseykorzun.github.io/rackspace-open-cloud-php/