§ Seeds / 2009 / Fall


Fall 2009 an OmniTI publication

Year:

Season:

ORMs Done Right

Object-Relational Mapper (ORM) systems are one of the most contentious topics in database application development. Creating an ORM is notoriously perilous, but using them has pitfalls as well. Most ORMs provide little protection against misuse; the inexperienced developer can easily create an application that unilaterally imposes awkward database design constraints, hammers the database with innumerable queries, and is very difficult to optimize.

ORMs provide an automated link between the application object model and the database model. Practically speaking, this generally means that tables become classes and rows become objects. At the most basic level, this provides per-object persistence services. Most ORMs also handle relationships between objects, turning compositional relationships between objects into foreign key relationships in the database.

In this entry, we present the benefits and pitfalls of ORMs and introduce a new Perl ORM implementation, Class::ReluctantORM. Class::ReluctantORM is “a reluctant ORM for reluctant people.” Its design goals are to create a framework that is unambitious, scalable, and easily circumvented. These goals are not so much technological as philosophical; an approach that has value, as we shall see.

Benefits of ORMs

Why developers love them

Developers see a tremendous productivity gain: persistence no longer has to be hand-crafted into each class. Since all model classes are using the same techniques, there is a large consistency gain as well. Developers also get to keep their head in application-model space, without having to shift gears into database-model space. Context-switching is expensive, and switching between implementation languages can be especially jarring.

Consider getting a list of pirates on a ship:

  my $dbh = DBI->connect(...);
  my $sth = $dbh->prepare(>>EOSQL);
  SELECT p.* 
    FROM highseas.pirates p
      INNER JOIN highseas.ships s ON s.ship_id = p.ship_id
    WHERE s.name = ?
EOSQL
  $sth->execute('Golden Hind');
  my @pirates;
  while (my $row = $sth->fetchrow_hashref()) {
     push @pirates, Pirate->new($row);
  }
  $sth->finish();
  foreach my $pirate (@pirates) {
     # Do something with $pirate
  }

Compared to:

  my $ship = Ship->fetch_by_name('Golden Hind');
  my @pirates = Pirate->search_by_ship($ship);
  foreach my $pirate (@pirates) {
     # Do something with $pirate
  }

The second example is much more legible, and remains entirely in the application model — you don't have to think about how the database is set up, or how the tables interact. You don’t even need to know SQL.

Additionally, most modern ORMs can shield the business logic from limited changes in the database schema (such as table or column renames). While this sort of change is usually better hidden at the database logical layer using a view, organizations that have restrictive database change policies may appreciate the added flexibility.

Why leads love them

Team leads find ORMs to be very useful for several reasons. The most obvious is the reduced amount of time spent wiring up persistence layers; instead, developers can stay focused on the business problems. This enables new capabilities. For example, using an ORM, it’s much easier to knock out a quick prototype in response to an RFP, or explore an alternative design. Additionally, since the amount of SQL is dramatically reduced, developers need not have SQL skills to be productive.

Why project managers love them

Project managers like ORMs for many of the same reasons that team leads do. Because productivity is increased, bids can be lower, or more features can be delivered for the same schedule. This may lead to more contracts. The reduced skillset needs of an ORM-based project can also help solve staffing problems.

Pitfalls

DBA Gripe #1: ORMS that dictate DB design

Some ORMs dictate database design. These constraints typically center around keys. Commonly, primary keys are required to be be single-column, integer, and auto-incrementing. Foreign keys are often under the same constraints. This leads to the proliferation of artificial keys.

Naming conventions are another sore point. Some ORMs require primary keys to be named id, others require them to be named pirate_id, or even pirate. Tables may be required to be named in the plural, and one ORM’s notion of pluralization may not match that of another ORM (e.g. staff vs staffs vs staves).

These constraints are annoying but tolerable if the database design is new and the ORM-based application is the only client. But in most real-world situations, the ORM-based application is only one consumer of the database. It may be a pre-existing design with several legacy apps already using the schema. It is possible to use views and rules to appease the ORM's requirements, but that trades the developer's productivity gain with a busywork task for the DBA.

Finally, there is the issue of schema ownership. Both the ORM and the DB know about the database structure. When a change is needed, where do you make the change? Some ORMs “own” the schema, and will execute DDL to modify the database to match changes in the object model. Others don’t own the schema but instead mirror the database schema in a configuration file. Better ORMs read the database at startup, and configure the object model accordingly (though this has problems of its own, especially related to startup speed).

Class::ReluctantORM — stay agnostic

Class::ReluctantORM is firmly in the “read the schema from the database on startup” camp. Some configuration is still needed — to set up connection handles, declare classes, and to create relationships that cannot be auto-detected. Pushing more of the configuration into auto-configurators, while maintaining overridablility, is an active area of development.

DBA Gripe #2: Opaque, Baroque Query generators

An ORM, by its very nature, must contain some kind of query generation mechanism. SQL is an easy language to generate, but a very difficult language to generate well. There are many dialects. An ORM may choose to generate standards-compliant (but slow) queries, or it may attempt to optimize for the particular database engine. As the optimization increases, the query complexity often increases. Some ORMs choose to punt, generating many simple queries (see DBA gripe #3); others may generate one massive, multi-JOIN query. In either case, at some point you will get the classic complaint that “the database is slow.” A DBA wants to be able to tune and replace these queries with hand-crafted versions. This may or may not be possible. Even if it is, the query generator is buried in the ORM code itself, in developer-land, and often requires both developer and DBA to invest time to optimize a query.

Class::ReluctantORM — query monitors

It is important that the SQL generation process be as transparent as possible. To this end, Class::ReluctantORM provides a unique monitoring facility that provides hooks for several key events in the life of a query, including initiation, SQL generation, execution with bound parameters, result fetching, and teardown of the query.

Monitors may execute arbitrary Perl code at any or all of the events. A monitor may abort a query if needed, or simply log statistics or debugging data. Monitors may be attached at compile time or runtime, and may be attached to a particular class, or all classes in the model.

Class::ReluctantORM ships with six canned monitors, including those for join count, column count, data volume, timing, diagnostic, and one which executes the query under EXPLAIN ANALYZE to predict performance. The developer is free to add new monitors.

DBA Gripe #3: hidden expensive actions

Consider this expression:

  my $jewels = $ship->pirates->first->hideaways->find_by_name('Skull Island')->treasures->first->jewel_count();

While it won’t win any awards for formatting, it is fairly clear: get the number of jewels that the first pirate on my ship has stashed away on Skull Island. It’s easy to imagine a junior programmer writing this, or a journeyman programmer writing a less contrived example.

Does this code, at first glance, look like it is hammering the database? How many database queries will this result in? Depending on the ORM, it may range from 1 to 6. And depending on the database, the 1 query might be better or worse than the 6. In almost every case, the queries involved will pull back more information than they need from the database, so even when the ORM gets the queries right, it’s still likely to have unnecessary overhead.

Or this common case:

  foreach my $ship (@fleet) {
     foreach my $pirate ($ship->pirates()) {
        foreach my $hideaway ($pirate->hideaways()) {
           foreach my $loot ($hideaway->treasures()) {
              tithe_to_queen($loot);
           }
        }
     }
  }

That should have scared you.

Class::ReluctantORM — mandatory prefetching

Looking back at this example:

  # 1-6 queries
  my $jewels = $ship->pirates
                    ->first
                    ->hideaways
                    ->find_by_name('Skull Island')
                    ->treasures
                    ->first
                    ->jewel_count();

This usage is problematic because:

Class::ReluctantORM does not allow accessors to directly execute queries. Instead, each accessor looks for a cached value, and returns it if found. A cache miss throws a FetchRequired exception. A full-featured prefetching facility is available:

  # One query
  my $ship = Ship->fetch_deep(
    where => 'name' => 'Golden Hind',
    with => {
      pirates => {
        hideaways => {
          treasures => {}
        }
      }
    }
  );
  # Zero queries
  my $jewels = $ship->pirates
                    ->first
                    ->hideaways
                    ->find_by_name('Skull Island')
                    ->treasures
                    ->first
                    ->jewel_count();

This fetch_deep call executes exactly one SELECT SQL statement, JOINing against the related tables. The results are then processed to create one ship object, which has a collection of pirates, each of which has a collection of hideaways, each of which has a collection of treasures. This data is now prefetched, and a long, deep chain of method calls like above is now permissible.

Importantly, if a programmer adds a method call (say, $pirate->parrots) that is not prefetched, an exception will be thrown the first time it is executed. The developer will see this immediately in testing, and add the required clause to the prefetch. This integrates scalability directly into the development process. This feature, unique to Class::ReluctantORM, is what provides its name: it is reluctant to do database fetches.

Software engineering: impedance mismatch rabbit hole

We have a fundamental problem with ORMs: relations aren’t classes, and tuples aren’t objects. This problem is called the “impedance mismatch” between the database model and the application model, and is discussed in detail in several places on the Internet. Some of the more troubling issues include:

ORM developers are faced with a few nasty choices. Keep it simple, and let the user of the ORM know that the ORM is an unsynchronized, approximate model of the database. Or gradually add complexity, attempting to patch over the impedance mismatch. The latter path gets into diminishing returns quickly.

Class::ReluctantORM — 90% rule

Some ambitious ORMs try to solve 100% of the object-database problem. Class::ReluctantORM tries to solve the easiest 90%. That means it makes the choice that the impedance mismatch is a very hard problem, and the ORM will do its best, but you still need to be aware of its limitations. This scope limit helps exclude features that would dramatically increase complexity (for example, there is very little support for aggregates).

Skill atrophy

One of the big advantages of ORMs is also a major disadvantage: no, or little, use of SQL. We learn skills through exposure and experience, and if we are never exposed to SQL, we’ll never learn it. Or, if we know some SQL, then use ORMs exclusively, our skills will likely atrophy. In almost every application, the ORM will need to be bypassed at some point, and then SQL skills will be sorely missed.

Class::ReluctantORM — SQL pass-thru

For the remaining 10% of problems outside the scope of Class::ReluctantORM, several avenues are provided to bypass the query generator and use SQL directly. Because it was developed in a shop with a heavy mistrust of ORMs, Class::ReluctantORM is designed to make this bypass as easy as possible. The documentation mentions how to bypass the ORM early.

Avenues of SQL support, ranging from SQL-centric to object-centric:

  1. Ask the ORM-managed object or class for a database handle, and execute statements on it. Results are in raw values, not part of the object model.
  2. As above, but wrap this into a method call on an ORM object or class, thus integrating SQL into the object model. This is handy for aggregate functions.
  3. Future releases aim to provide the ability to override specific ORM-generated queries with your own SQL.
  4. Ask Class::ReluctantORM to intepret the SQL into its own representation, and execute. If the translation was successful, return values will be ORM-based objects. This is a Class::ReluctantORM-exclusive feature.
  5. Write a query directly using Class::ReluctantORM’s abstract SQL engine. You’re no longer writing SQL directly, but performing method calls on FromClause objects, for example. This is guaranteed to return ORM objects.
  6. Use ORM methods and pass SQL fragments as arguments (e.g., a WHERE clause for a search() method).

In all six cases, the developer must use SQL or SQL concepts. This may help reduce SQL atrophy. In many cases, because the SQL can be just “dropped in,” you can have a DBA or SQL expert develop SQL for a specific query with no contact with the ORM.

Framework lock-in

Like any framework, using one is often irreversible. It is very difficult to adapt an application to use a different ORM — even if the interfaces are similar, often times there will be differences among the query specifications, the DB requirements, or the semantics of operations (e.g. do inserts cascade?). ORMs are especially susceptible to lock-in because their footprint is so ubiquitous in the application code. Every time you deal with a relationship between your model objects, you interact with the ORM.

Class::ReluctantORM — unsuprising interface

While there is little that can be done to fight lock-in, Class::ReluctantORM tries to reduce the pain of switching to another ORM, or dropping ORM support altogether, by using common conventions for method names (accessors are named directly after the property, for example). When a new feature is added, the interfaces of other ORMs are studied, and similar conventions are adopted if possible.

Sometimes it’s the wrong tool

ORMs are not good for everything. ORMs by their nature are weaker at these tasks:

Class::ReluctantORM — lots of fish in the sea

If an ORM isn’t right for your project, Class::ReluctantORM won’t help you. Even if an ORM is a good fit, keep in mind it is a slow-startup ORM. There are others that are fast-startup, large-configuration ORMs, and even some that can cache their configuration.

Conclusion

For all their pitfalls, the tremendous productivity advantages of ORMs will continue to tempt developers to use them. Like any productivity booster, ORMs seem to draw a lot of hype, and it’s important to see through the hype to the realities and shortcomings of the technology. Once those shortcomings have been addressed, however, ORMs can be used conscientiously.

Class::ReluctantORM is a new ORM implementation that seeks to make it harder to fall into the traps. It avoids some “impedance mismatch” issues by narrowing its scope to the most common 90% of use cases. For the more complex situations, numerous SQL bypass avenues are available. Whether queries are ORM-generated or customized, they all pass through the query monitoring system, providing an early warning system for scalability problems. Finally, mandatory prefetching can reduce bad coding practices early in the development cycle.