Your ORM Sucks
By:21 Oct '11
I don't like frameworks. Web application frameworks, ORMs, whatever.
I don't mean that as harshly as it probably sounds. It's something like saying, "I don't like cooking with microwaves." They have their uses, certainly - I'm not going to scrub out a pan in the morning because I want to make oatmeal, for example - but there are limits to what they can do, and I think there's a reluctance or inability to recognize that. I'm certainly not above nuking a pile of Bagel Bites, but I don't tell myself that it's haute cuisine.
Granted, like any framework, ORMs certainly have their uses, and most projects will benefit from using them in some capacity. No one likes writing the same boring INSERT, UPDATE and DELETE statements for every table. They enforce consistency - you essentially don't have a choice about naming conventions or class structure anymore, so you can't screw them up. They usually maintain relationships from the database as part of the code. Some have their own internal query cache. It's usually easy to extend them. Unfortunately, based upon my cursory research, there appears to be at least one attempt at a MongoDB ORM, so I can't use "don't support nosql" as a point in their favor anymore.
So by all means, use an ORM every time. By which I mean that every repository should probably have one, and emphatically not that it should be used for every query. Because whatever the tool at hand: Zend Framework, Class::ReluctantORM, a microwave--there always ends up being a place where it doesn't work, or doesn't work very well, and you're forced to do things the old-fashioned way. Sometimes the simple solution really is the best. Why would you bootstrap Zend and load up a bunch of model classes just to import some records? You can do that with a DBI handle and a perl script. Or flat files and sed/awk, probably. To some extent this is just a matter of opinion, and that's fair, but there are situations where my way - the ugly, hacky way - is objectively and demonstrably better. Not always. But sometimes.
By Way of Example
- In what might be the canonical case of "Why Would You Do This", a listing of articles with headlines and perhaps publication dates, with the titles linking to individual article pages containing the full text. To render this list, the ORM-written query was selecting all fields from the table. Because that's all it knew how to do; you ask for a list of articles and you get those articles, with no thought put into why you want them, or which fields you need, because it's a generic tool and that's all it knows how to do. Sometimes I think it helps to think of ORMs as the dumbest programmer you've ever worked with. Think of the query that guy would write, and that's probably similar to the inefficient unreadable gloop you're getting from the machine generation.
- A three-layer navigation menu, with almost all the items on it determined by what was, or wasn't in the database. After spending a few hours untangling what the thing was doing, it was something like this:
select('e.event_id, e.name, e.url_name, i.url, i.title, tr.title, a.article_id, a.title, ae.article_event_id, rg.title, rv.title, rm1.related_media_id, rm2.related_media_id, i.sort_order') ->from(CLASS . ' e') ->leftJoin('e.info_page i ON e.event_id = i.event_id AND i.is_deployed IS TRUE AND i.pub_date <= NOW()') ->leftJoin('e.tour_results tr') ->leftJoin('e.articles ae') ->leftJoin('ae.article a ON a.article_id = ae.article_id AND a.pub_date <= NOW() AND a.is_highlight IS TRUE AND a.is_deployed IS TRUE') ->leftJoin('e.related_media rm1') ->leftJoin('rm1.photo_galleries rg ON rg.photo_gallery_id = rm1.media_id AND rm1.media_type = 'photo_gallery' AND rg.is_highlight IS TRUE AND rg.status = 1 AND rg.pub_date <= NOW()') ->leftJoin('e.related_media rm2') ->leftJoin('rm2.videos rv ON rv.video_id = rm2.media_id AND rm2.media_type = 'video' AND rv.is_highlight IS TRUE AND rv.status = 1 AND rv.pub_date <= NOW()') ->where('e.instance_id = ?', array(...)) ->andWhere('e.deployed IS TRUE') ->orderBy('e.start_date ASC');
- A page to view poll results in a CMS admin. Either the ORM didn't support anything as simple as "SELECT COUNT(*) FROM answers GROUP BY answer_id", or the person who wrote it didn't think it was a problem to select 80,000 rows and then have PHP parse them into objects. Frankly I'm not thrilled by either alternative, and as you can probably guess, this thing ran out of memory and barfed on a pretty regular basis.
The root of the problem (as with most problems) is not thinking critically, not being aware that all this magical query dust doesn't come cheap.
You have to use the right tool for the job. It's not uncommon for the balance between generic, easy to use and quick to develop, and bespoke, laborious and highly performant, to tilt sharply towards the latter. The pain in the ass here is that it's not unusual for this sort of problem to lie dormant on a dev dataset (a dozen rows per table and just enough information to test out edge cases), and then one day rear up and slow your pages to a crawl or blow them up entirely, as soon as you hit real data. It's up to the developer to have some notion about seeing this coming. Even then, everyone gets bitten by this from time to time.
What it boils down to is that if you write a bad query, one that does "SELECT * FROM tbl_huge LEFT OUTER JOIN tbl_big_mclarge" and returns an unnecessarily wide data set full of BLOBs, or that joins across a dozen tables when it only really needs 3, or that has a big stupid slow "SELECT ... FROM ... WHERE NOT IN (SELECT ...)", or that tries to run a SQL "COUNT" in PHP, and it becomes a problem, it is your fault. I don't care if you wrote the thing yourself, or if you used an ORM and it wrote the query, it's still your fault, and you are going to have to fix it. "But that's the way the product does it" is not an acceptable response. Ever. For anything. Code is running on your servers. You are responsible for it. A microwave makes wretched chicken, so I guess it's time you learned how to work the stove, because I'm not eating that crap.
So by all means, use ORMs for your trivial cases, for basic stuff or where performance isn't an issue. But it's eventually going to hit a wall and you'll have to do your own dirty work. And when that happens, you can't say you weren't warned.