Scalability and frameworks - Part 2 
Well, in case you were wondering, I managed to get around my scalability problem by focussing on PHP's strengths and side-stepping it's weaknesses. So in this particular problem I was facing 2 primary bottlenecks - yaml parsing and propel memory leaks.

I got around the yaml issue by using some custom string hacking - and PHP's string functions are pretty quick. By using split() and strpos() I could slice up my big yaml file into more digestible sizes (see previous post).

The propel issue, however, was always going to be more problematic. I considered removing all my ORM calls and either using delayed inserts or LOAD DATA INFILE statements. Both of which I have used successfully in the past, but implementing it in this case meant a complete rewrite of over 2000 lines of tested code.

So, I decided to use a similar approach to the ORM. Like YAML, Propel 1.2 is very good at what it does inside small to medium processes. It just isn't very efficient when used in big jobs - like inside large loops. So instead of trying to process my massive array that I managed to construct from the original yaml, I sliced it up and sent smaller yaml files back into the queue.

What this means is that instead of having a single item of 13,000 recipients sitting in the queue I have 130 items of 100 instead. My queue processing task can now batch 100 emails at a time and then terminate, thereby freeing up memory for the next batch. While there is some extra overhead in writing files to disk and duplicating some of the yaml, the benefits are well worth it. The end result is a scalable, fast and robust system that can handle almost any request by simply slicing it up into manageable chunks. TIme and money saved, client happy :)

Dan 
Arguably the garbage collection issues in PHP prevent it from being truly scalable. And while we've all been looking forward to using 5.3 in production - and as you point out - it's not completely sorted out yet.

PHP has obviously outgrown it's humble beginnings, but it is yet to shed the baggage. In the meantime we need to keep our processes short and sweet - which is actually better practice, really. Having batch jobs running for several hours is harder to test, debug and manage than having a queue of much smaller processes (where possible).

Alan 
Sorry, actual link to the bug I mentioned above is http://bugs.php.net/bug.php?id=47351

Alan 
Hi

Yes, memory leaks caused by Propel are huge pain in the ass. But the truth is these memory leaks are only PHP fault.
1. Improper handling of wiping circular references by, so called, garbage-collector.
It's solved in Propel 1.3 by using Model::clearAllReferences() with Propel::disableInstancePooling()
2. It's the worst. DateTime objects weren't released correctly.
Because of that every datetime field of model was cause of memory leak. Read that http://bugs.php.net/bug.php?id=46108 . This bug was patched but not included into PHP5.3.0. Imagine how pissed I was :) But fear not - I tested that path on 5.3.1 snapshot. It works! Script that ran almost 10h with ~2,000,000 rows inserted didn't even choke once :)

Cheers, Alan

Comments 
Comments are not available for this entry.