Magento Data Segmentator

Read my blog post on how we handle syncing of production data to other environments at BaubleBar.

baublebarengineering:

In a fast-moving engineering culture, the ability to quickly push production data to different environments is critical for productivity.

This becomes more than just a giant roadblock.

If the process is slow, engineers will tend to avoid syncing their personal development instances and work…

Properly purging stale cart data from Magento CE and EE

If you are retaining cart data indefinitely you will eventually run into table locking issues as your customers attempt to interact with either cart or checkout within your Magento store.

Locking will result in poor store performance followed by complete infrastructure failure as your http server of choice becomes saturated.

Saturation occurs when dozen of requests are send over to your database server, effectively locking tables.

Database will start queueing up requests until table unlocks. Since http server is now behind schedule, new children will be spawned until you run out of resources to effectively serve your clients.

In this example we will purge any data that was not updated in the past 30 days, the amount of days you wish to retain your data will vary on the amount of traffic and resources you have at your disposal.

Magento, stores all of the quote data within the `sales_flat_quote` table and if you are running a release that has build-in triggers; you will only have to purge data from that table which in turn will trigger deletion of stale associated data.

First, put your website into maintenance mode (backing-up your data is always a good idea), then login to your Magento database and verify number of records you have in the `sales_flat_quote_item` table by running:

Write down the number query returned. Now purge records that have not been updated for over 30 days by running following query:

If you are dealing with tons of stale data, this query might take up to 30 minutes.

Once the query completes and the number of affected rows returned is greater than 0, you will have to re-run our count query in order to check if the count has been reduced.

The difference should reflect the number of records that were affected during delete operation:

If that’s the case then you are using a version of Magento that automatically purges related meta-data.

If the number remained the same, you will have to purge associations by hand using queries below:

Enterprise edition requires additional queries for completion:

And you are done!

Finish up by running mysqlcheck -o against your Magento database and bring your store back from maintenance mode.

Maintenr for Android — Available Now.

maintenr:

image

Today, we’re excited to bring you Maintenr for Android.

We’ve already seen more than 400 beta users join Maintenr to take control of their maintenance and fuel tracking, and now we’re thrilled to offer a way for Android users to use our platform while on the go.

image

Instafeed

Instafeed package contains an extremely simple API adapter for Instagram that you can easily extend.

The http://instagram.com/developer/endpoints/tags/ end-point support is available out of the box as an example.

From packages I saw, you would either get a huge beast of an API client that tries to do everything for you, or a half ass bare bone one that is poorly made and could not be extended without hacking tons of things together.

I also did not like the fact that developers thought it was okay to simply drop random ‘callback.php’ files with their package and ‘index.php’ that forwards you to an authorization URI.

So, another thing that this package includes is token exchange and a demo application that utilizes tags resource to generate data.

It’s build on Silex (micro-framework) and uses Bootstrap via Twig engine. Super easy to deploy and play around with.

You can view demo here: http://instafeed.webfoundation.net

If you are a fan of having a huge screen with metrics at your office, you can easily tweak and integrate this to show data for tags that you want to monitor ;)

Installation

Visit http://getcomposer.org to install composer on your system.

After installation simply run `composer install` in parent directory of this distribution to generate vendor/ directory that contains a cross system autoloader and required libraries.

You should be able to use adapter by loading \Instafeed\Tag.

Deploying Demo

If you want to deploy the demo site and play around with it, I recommend (since Twig templates are using relative path(s)) setting up a sub-domain that points to www/ directory of this package.

Register your application at http://instagram.com/developer (setting up proper callback/site URI’s) and open up index.php file to setup your client identification/secret key and redirect URI.

I’m using .htaccess, so if you are not using Apache I’m sure you can figure out how to forward requests to index.php.

PHP Judy Array Introduction and Comparison

Judy array (as explained on http://judy.sourceforge.net/) is a C library that provides a state-of-the-art core technology that implements a sparse dynamic array.

Judy arrays are declared simply with a null pointer.

A Judy array consumes memory only when it is populated, yet can grow to take advantage of all available memory if desired.

Not a lot of PHP developers are aware of this library which is available as an C extension (Pecl) for PHP: http://php.net/manual/en/intro.judy.php

I want to give you a quick pros/cons of implementing Judy array in your application and a brief benchmark comparison to a more common types of array implementations available in PHP.

Array Implementations Tested

  • Array()
  • ArrayObject()
  • SplFixedArray()
  • Judy()

What I Analyzed

  • Execution time and amount of memory it takes to create 100 instances of each implementation.
  • Execution time, peak and allocation of memory during Insertion, iteration and removal of 10000 items within each implementation.

Benchmarking Framework

As with my testing of common NoSQL databases, I wrote a simple benchmarking framework that you can use to run your own tests that mimic your application. 

You can find it here: https://github.com/AlekseyKorzun/benchmarker-array-php

Creation of 100 Instances

As you can see, when you create 100 instances the performance different is pretty much identical in both execution and memory utilization.

100 is a pretty generous number (in my option) unless you deal with a collection driven application.

If that’s the case, as number of instances goes up Array() implementation will perform slightly better. 

Appending 10000 Items

Speed wise, Array() walks away from any other implementation which is not surprising. The peak memory usage is slightly above than the rest and memory consumption is almost identical to ArrayObject().

ArrayObject(), SplFixedArray() and Judy() both finish execution at 0.06 mark. 

But when it comes to memory Judy is a winner in this benchmark, leading SplFixedArray() by 80K~ on peak usage and about 78K~ on utilization.

Removing 10000 Items

During removal of items we see results identical to our append test. The only different is that SplFixedArray() seems to be 0.1s faster than ArrayObject() and Judy().

Iteration Over 10000 Items

Iteration test produced same results as removal test. Execution wise Array() won hands down and memory utilization trophy goes to Judy().

Conclusion

As you can see, on a smaller scale of things (when dealing with 10000 items) the difference between different approaches is not that great.

But it’s very clear that Array() is fast and to the point when it comes with storing data and Judy() uses less memory than memory conscious SplFixedArray().

Based on my benchmarks I can point out following things:

  • As your data grows, Judy() arrays store it more efficiently. No question about that.
  • Iteration and data manipulation will always be faster in Array() implementation. 
  • SplFixedArray() will iterate/manipulate large data sets faster than Judy().
  • Compared to SplFixedArray(), you don’t have to set initial array size, which might be a plus for some developers.
  • When storing data, unless you are using features that ArrayObject() has to offer it’s better to stick to a simple Array() implementation.
  • If you are not dealing with tons of data, you may still optimize as you wish but results will be minimal unless you are serving tons of requests per second and really need to juice your application performance.

As always, I recommend forking my benchmarking application and extending it to use your own data

to get an idea of how much performance you might gain from switching over.

You can also switch one of the web servers to utilize Judy() instead of X() and observe response to the change via resource monitoring over next few days.

I hope you enjoyed reading this post!

P.S

You can view spreadsheet of benchmark data here: https://docs.google.com/spreadsheet/pub?key=0AhePUdRMAppIdC0tZWNELTRsbElpZF81V28wTnpWaEE&output=html

Benchmarking Memcached and Redis clients

As some of you may know, I’m crazy about speed. So when I saw that people were happily using Predis as their choice of PHP client for Redis, I was a bit confused. 

Why use a client written in PHP for something that should be ‘fast’ like Redis?

That kind of defeats the purpose - unless you don’t really care about response times and scalability. 

I could understand using it if there were no alternatives such as PhpRedis, or if you wanted to add some sort of proprietary layer that you cannot add on top of a C extension.

Don’t get me wrong, if you have a valid reason to use the extension, then more power to you. I know both packages have contributors who have put tons of sweat into getting them to where they are now.

What I Analyzed

  • The performance difference piqued my interest. I wanted to find out just how much performance users are sacrificing by choosing one implementation over another.
  • Since Redis is usually found in the same stack as Memcached (which I will touch upon later in this post), I included benchmarks for Memcached that demonstrates how it stacks up against Redis while performing identical operations.
  • Also, for the fans of IgBinary, I covered both native and client level implementations of it too.

Benchmarking Framework

To automate and define a common benchmarking strategy for Memcache, Memcached, Predis and PhpRedis I decided to write a small framework that automatically runs a set of tests that client requests.

You can find it here: https://github.com/AlekseyKorzun/benchmarker-nosql-php

Hardware Set-Up

Tests were performed on VirtualBox with 2 processors and 1024MB of RAM allocated to it.

The host machine is Intel i7 2600K with 16 GB of RAM.

Software Set-Up

  • Ubuntu 12.04.2
  • PHP 5.3.10-1ubuntu3.6
  • Apache 2.2.22 
  • Redis server 2.2.12 
  • Memcached 1.4.13
  • Libmemcached 1.0.16
  • Memcache client 3.0.6
  • Memcached client 2.1.0
  • Redis client 2.2.2
  • Predis client 0.8
  • IgBinary extension 1.1.2-dev

Common Benchmarks

In a regular get/set benchmark every client except Predis performs on equal level. Memcached edges out PhpRedis and Memcache by ~1 r/s on average at ~83 r/s while Predis is trailing the pack around ~12 r/s.

This test is pretty hard core; if you look at the benchmarking framework we are testing get/set with a pretty huge object.

Both Memcache and Predis fail to complete the test and begin to fail once concurrency goes up to 100.

Redis and Memcached are pretty much even at 50 and 100 concurrent requests but once we go up to 150 requests Redis starts to trail Memcached by ~ 10.5 r/s which indicates that it prone to fail before Memcached gives in.

Pretty even performance for everybody except Predis, which is about 6x slower than the rest of the clients.

Again, every client except Predis performs at about the same level. Predis seems to average out at 11-12 r/s in every test as it seems that this is a limitation before it even starts to hand off requests to Redis daemon.

Predis failed to complete this test while every other client passed it with 77 r/s on average with PhpRedis leading the pack with a small margin.

List & Set Benchmarks

Predis once again, fails to go above 13 r/s while PhpRedis destroys it five way till Sunday.

IgBinary Benchmarks

Both Redis and Memcached clients support IgBinary, so obviously I had to test them since I’m a huge IgBinary fan.

There are two ways you can use IgBinary, natively (as in let client handle it) or directly in PHP (serialize object prior to passing it to client using IgBinary extension).

I tested both approaches, let’s start with native:

Memcached is performing extremely well with IgBinary but sees a minor performance drop over regular serializer  as we reach 150 concurrent connections.

PhpRedis sees a good jump in performance as well but it starts to even out as we increase connections and unfortunately the client was unable to complete the final 150 concurrent connections test.

And check out results when we serialize objects directly in PHP using igbinary extension:

While PhpRedis still fails to complete the final test both clients seem to process more requests when we handle serialization ourselves.

Let’s compare native IgBinary tests with regular PHP seriazer:

Keeping in mind that you can squeeze more juice out if you use IgBinary directly in your code I will have to say IgBinary is a winner even if it shows a minor drop as we reach 150 connections.

Memcached with IgBinary is a clear winner.

Conclusion

Provided benchmarks might not reflect real world (tm) performance so take everything you read below with a grain of salt.

I do not recommend using Predis if you care about performance, period. It’s a massive bottle neck and if you are not using features unique to Redis over traditional RDBMS you are already running then I would not even bother introducing Redis in your stack if you are going to use Predis as a client.

The Memcached client is faster than PhpRedis and will keep your site up (even if its slow) for a bit longer before starting to fail.

The Memcache client is not a snail by any means, while it failed the large keys test at 150 concurrent connections it still put up a really good fight and performed quite well.

If you do not need the features Memcached has to offer and are not scaling application that cached large objects you should be fine.

If you can, use IgBinary. It’s does make a big difference.

Words of Wisdom

  • If your web site can only serve X number of requests per second under load, when you do not factor in data calls, having a NoSQL client that can do XYZ requests per second will not magicly solve any of your problems.
  • Not everybody can afford to maintain Redis and Memcached servers in their stack. If you need data retention and/or features Redis has to offer over Memcached and can live with the performance level Redis offer then you do not need Memcached.

    If you care about performance, I advise to store ‘unimportant’ or ‘larger’ data in Memcached. For example template views/layouts or collections of models. 
  • Always test ‘theories’ and see if they apply to your case. Evaluate your options and strategy prior to committing to anything. 

You can view spreadsheet of benchmark data here: https://docs.google.com/spreadsheet/pub?key=0AhePUdRMAppIdHIzd2d3YU9oVE55MnctaGc3NTVvcVE&single=true&gid=0&output=html