<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:content="http://purl.org/rss/1.0/modules/content/"
    xmlns:atom="http://www.w3.org/2005/Atom">
    <channel>
        <atom:link href="http://omniti.com/shares/seeds" rel="self" type="application/rss+xml" />
        <title>OmniTI ~ Seeds</title>
        <link>http://omniti.com/seeds</link>
        <language>en-us</language>
        <description>Seeds</description>
        <item>
            <title>Marketing Malware</title>
            <link>http://omniti.com/seeds/marketing-malware</link>
            <guid>http://omniti.com/seeds/marketing-malware</guid>
            <description><![CDATA[
Internet registrar GoDaddy.com is notorious for two things: domain names and risque super bowl commercials. The infamy began in 2005, when GoDaddy paid $4.8 Million for two 30 second spots during Superbowl XXXIX. The commercial featured WWE wrestler C...]]></description>
            <content:encoded><![CDATA[<p class="first">
Internet registrar GoDaddy.com is notorious for two things: domain names and risque super bowl commercials. The infamy began in 2005, when GoDaddy paid $4.8 Million for two 30 second spots during Superbowl XXXIX. The commercial featured WWE wrestler Candice Michelle experiencing a &#8220;Wardrobe Malfunction,&#8221; a clear parody of the half-time show fiasco involving Janet Jackson the previous year. After having 16 variations of the storyboard filmed and rejected by the Fox Network, version number 17 was pre-approved for broadcast during the first and fourth quarters of the game. The 30 second spot in the first quarter drove a web site traffic increase of 1,600% to the GoDaddy site, and then a strange thing happened: the commercial never played a second time. NFL executives purportedly pressured Fox to pull the commercial due to its &#8220;inappropriate&#8221; nature, despite the fact that it had already been paid for, pre-approved by Fox, and initially aired. Led by GoDaddy CEO Bob Parsons, the blogosphere screamed censorship like bloody murder, which only served to fuel additional publicity. In the end, GoDaddy deemed the event so successful that they now define their brand around the &#8220;GoDaddy Girls,&#8221; airing annual Super Bowl commercials that tiptoe along the edge of broadcast acceptability.
</p>
<p>
At the center of the GoDaddy controversy is the correlation between advertising and brand identity. As a content provider, the Fox Network understands that their brand will be held responsible for the quality of both the program content and the advertising they deliver. If either aspect of the broadcast is sufficiently offensive or inept, they risk losing viewers to other stations. Consequently, Fox must provide content that is tame enough to avoid outrage from extremely conservative viewers while remaining provocative enough to satisfy the desire of third party companies seeking to push the envelope with their ad strategy. Content providers in any medium will be judged by the quality of the content they provide, and that includes the use and placement of advertising. This can be a tough balancing act, and Fox isn&#8217;t alone in walking the tightrope.
</p>
<p>
When it comes to the Web, the importance of managing advertising content takes on a new dimension. Hackers are increasingly using fraud and social engineering tactics to infiltrate advertising networks, and then utilizing their position within this circle of trust to inject malware, browser redirects, and cross site scripting attacks on unsuspecting visitors. If these attacks are successfully executed, hackers may steal credit cards, social security numbers, banking information, personal photos and anything else that has been digitized on the victim&#8217;s computer. Alternatively, a visitor&#8217;s computer may be turned into one of many &#8220;sleeper cell agents&#8221; in a botnet, ready to respond to a few keystrokes at any time and become an active participant in a worldwide Internet attack. It isn&#8217;t just web site visitors who are vulnerable. This same strategy can be used by black hat marketing consultants to siphon traffic from one web site to a competing web site or even to blacklist an entire site from the Google search index. The worst part? Hackers are able to target their attacks with profound granularity, making it extremely difficult for anyone within the targeted organization to know the attack is even happening.
</p>
<p>
The risk posed by vulnerable advertising mediums is not merely theoretical. In 2009, I documented two separate exploits that successfully penetrated highly trafficked and popular web sites. Both sites had full time IT teams who were previously unaware that the exploit was occurring. Furthermore, in a now highly publicized <a href="http://www.nytimes.com/2009/09/15/technology/internet/15adco.html?_r=2"><span>event</span></a> in September of 2009, the New York Times was the victim of a malware advertiser who legitimately purchased ad space from the Times while pretending to be a representative of Vonage. IT departments and technical staff are trained to watch site visitors for malicious activity, but painfully few are watching the advertisers.
</p>
<h2>Deconstructing the Hack</h2>
<p>
Client-side arbitrary code execution is the primary culprit behind advertising based attacks. Attackers first gain trust with the target by posing as a legitimate advertiser. This process may be as simple as paying for space in an automated advertising system, or as involved as calling a major corporation while posing as a sales rep or marketing executive from another company. After the attacker has been approved as an advertiser, they develop a custom script to exploit the medium being used. They may start off displaying advertising that looks legitimate, but inevitably they switch to the malicious ad that begins to infect or otherwise manipulate site visitors.
</p>
<p>
Depending on the amount of freedom granted to an advertiser, a variety of techniques may be used to deliver the hacker&#8217;s payload. Many setups allow advertisers to automatically submit a combination of HTML, CSS and JavaScript code to be embedded within the layout of the publisher&#8217;s site. In this scenario, hackers can easily embed malicious scripts by using JavaScript or by including an external Flash SWF in the markup. In the off chance that this code is reviewed at all before publication, it is likely reviewed by someone in marketing or sales who is only examining the submission based on the content currently displayed and is unable to analyze the underlying code for potential security vulnerabilities.
</p>
<p>
Allowing advertisers to place custom Javascript or Flash files inline as part of a web site&#8217;s markup is especially dangerous as the advertiser is no longer restricted by cross-domain access policies. This leaves the advertiser with the power to do virtually anything the web site developers can do, such as posting AJAX requests or altering any element of the DOM. In an attempt to prevent this, some web sites have opted for an alternative setup that links an iframe or traditional frame to a server belonging to the advertiser. However, this approach is also flawed because changes to the advertising code may be made at anytime and the publishing site is powerless to implement a pre-approval process or apply any automated content filtering.
</p>
<p>
Regardless of the method utilized, if an attacker gains the ability to execute custom code on the target site, Pandora&#8217;s box has been opened and virtually all the evil of the Internet may be unleashed. A few fictional yet plausible exploit scenarios include the following:
</p>
<h2>You Won (Malware)!</h2>
<p class="first">
Johann is a regular visitor to Finance Magazine&#8217;s web site. Like most Finance Magazine visitors, he is an investor with a diversified portfolio in mutual funds, bonds, and individual stocks. While browsing the latest financial news, a popup branded with the magazine logo suddenly appears and announces that he won a free 2 Year subscription to FM and a chance to win lunch with Warren Buffett. Lunches with Buffett are normally valued in the millions, and Johann has been wanting a print subscription to the magazine for some time. He clicks &#8220;Accept&#8221; and is asked to download the registration form. At first he is a bit suspicious and doesn&#8217;t understand why the registration form is a downloadable .exe file, <em>but it is a company promotional from the official web site</em>, so he reasons it must be okay. After downloading the application, he launches it and is asked several questions about his financial net worth. He is then asked for his full name, phone number, address, e-mail address and social security number. He is again a bit suspicious and doesn&#8217;t understand why he needs to enter his social security number, but the form does looks very professional. He clicks the info button next to the Social Security Number field and is shown this dialogue message:
</p>
<blockquote>
<p>
You must enter your social security number in order to ensure that &#8220;Lunch With Warren&#8221; contest participants are limited to one entry per person.
</p>
</blockquote>
<p>
With visions of the &#8220;Sage of Omaha&#8221; in his head and the promise of the next printed issue of Finance Magazine at his door step, Johann fills out the form and clicks submit.
</p>
<p>
Several months later, Johann&#8217;s financial life is in ruins. His personal information was used to register for several credit cards and a bank loan was issued for a Mercedes SLK in California. The executable file he downloaded included a custom Trojan Horse virus that allowed the attackers to login to his personal machine. They used this access to acquire his banking information and passwords, which they used four months later to wire over $10,000 to an account in the Cayman Islands. Although Johann is now suing for criminal negligence, his life (and his credit) will never be the same.
</p>
<h2>The Black Hat Who Stole Christmas</h2>
<p class="first">
Brianna is a successful small business owner with an online niche retail store that averages $25,000 in sales and 500,000 visitors per month. In addition to a product catalog, her site also contains high quality articles and videos that pull traffic from Google and allow her to further monetize her online presence by selling custom advertising. Advertising is sold on a month-by-month contract basis, and advertising providers are given an account that they paste HTML, CSS, and JavaScript in so it can be included directly in the site markup. Usually, Brianna&#8217;s sales skyrocket for the entire month of December as Christmas approaches. This year, however, she has seen a 60% reduction in traffic and sales are plummeting. After closely analyzing her site analytics, she realizes that traffic from Google has drastically been reduced. She went from being in the top 10 results for her product niche to not appearing anywhere within the top 5 pages, and she can&#8217;t figure out why. What Brianna doesn&#8217;t realize is that she sold an advertising slot for December to a Black Hat SEO optimizer working for a competitor. As part of his overall strategy for propelling his client to the top, he decided to cut the legs out from underneath the competitors by running advertising templates on their sites that carefully and skillfully violated nearly every technical SEO guideline required by Google. Brianna made a few hundred dollars for the advertising space, but unknowingly violating Google&#8217;s SEO rules would cost her tens of thousands of dollars in the year ahead.
</p>
<h2>Defending Against Malicious Advertisers</h2>
<p>Unfortunately, a foolproof technical solution to malicious advertisement doesn&#8217;t exist. However, this fact is not an excuse for apathy. The risk to both organizations and web site visitors can be mitigated by applying these technical steps:
</p>
<ol>
<li>Preview Ad Submission
<p class="first">
Within your advertising process, a system should be setup to preview all advertising before it is published live. This process should minimally include the ability to preview the way an ad looks and functions, but ideally it will also involve a quick scan of the actual advertising code to check for any unusual pieces of content (e.g. a few lines of encrypted JavaScript). If ad provider&#8217;s can automatically update their content, they should be sure that each new version is also approved before publication.
</p>
<p>
This tactic can not be used if ad placement is achieved by embedding frames or iframes that link directly to a third party server.
</p>
</li>
<li>Restrict Dynamic Advertising
<p class="first">
The single most effective method of preventing advertising abuse is to ban your advertisers from executing any dynamic advertising code. This process is as simple as stripping any scripting tags from advertising content and is extremely effective. However, the potency of this tactic comes at the cost of advertising flexibility as it limits all advertising to stylized images and text. In a medium where small games and other interactive content is often used to garner clicks and entertain eyeballs, this is often not a viable tactic. When scripting code is permitted, certain dynamic content should still be banned. For example, anything that involves alert() or document.location() calls in JavaScript could be stripped while leaving other code in place.
</p>
<p>
This tactic can not be used if ad placement is achieved by embedding frames or iframes that link directly to a third party server.
</p>
</li>
<li>Sandbox Dynamic Advertising
<p class="first">
In scenarios where dynamic content is permitted, it is useful to place ads within frames or iframes to take advantage of cross-domain safety restrictions. Yet as we have seen, using these constructs to link directly to a third party server is a security vulnerability as advertisers can easily change the content displayed without providing the publishers an opportunity to review the changes or strip malicious code. To obtain the best of both worlds, create an advertising sandbox by designating a domain or subdomain specifically to serve advertising content. Then place frames or iframes on your main site that link directly to the new domain. Because you control the ad domain, you will be able to preview ad submission, restrict dynamic advertising, and benefit from the added security that cross-domain security restrictions provide.
</p>
</li>
</ol>
<p>
Even when implementing the safeguards above, the decision to grant an advertiser space on your web site should not be taken lightly. Doing so inherently confers a degree of trust upon a third party. Ensure that trust is well placed by implementing a screening policy for all new advertising sign ups. Such a policy could be as simple as calling companies directly to verify that the representative is authorized to sell advertising on their behalf or as involved as requiring all advertisers to provide copies of their business incorporation license or other government issued identification. Regardless, the threat can not be delegated to the IT staff and forgotten. Marketing and sales play an equally important role, and the safest organizations are those who view security as the shared responsibility of all the members within it.
</p>   
<p>
There was a time when good advertising meant entertaining video or amusing copy. It could be judged purely on face value and the ability to generate ROI. That time has passed. In an increasingly interactive world, it is now more important than ever for organizations and individuals to understand that advertising consists of both form and function. Ignoring this fact can result in something far worse than assaulting the sensibilities of your audience; it can devastate their lives. Let content providers and audiences beware: the age of badvertising has begun. 
</p>]]></content:encoded>
            <pubDate>Tue, 22 Dec 2009 22:00:00 GMT</pubDate>
        </item>
        <item>
            <title>Business Metrics Too</title>
            <link>http://omniti.com/seeds/business-metrics-too</link>
            <guid>http://omniti.com/seeds/business-metrics-too</guid>
            <description><![CDATA[When I began tinkering around with web services as a hobby, it was common to fiddle with an application for days.  I would curse and grind and sputter with Apache and hobbled-together programs.  This would frequently unearth new challenges: setting up ...]]></description>
            <content:encoded><![CDATA[<p class="first">When I began tinkering around with web services as a hobby, it was common to fiddle with an application for days.  I would curse and grind and sputter with Apache and hobbled-together programs.  This would frequently unearth new challenges: setting up a mail service, creating a database to store user accounts and perhaps pulling content from a third party.  Inevitably these minor distractions would monopolize my attention and the original application would be left to gather dust, without any documentation or monitoring in place.
</p>
<p>This seems to be a common problem with many professional software development shops.  Project managers help to keep the development teams focused, but their goals are still feature-driven with an eye on the next release cycle.  The IT and Operations teams are painfully undermanned, left to maintain their systems and services without any training on the care and feeding of their new pet.  For the hobbyist or Open Source project, this becomes an annoyance.  If you&#8217;re running a business, operational neglect can have a dire impact on your bottom line.
</p>
<p>Systems Administrators are unconsciously trained to look at everything through a boolean filter.  Hosts are up or down.  Services are on or off.  Their understanding of the application stack is often superficial, limited to the same perspective as that of a typical user.  Does the website load?  Can I ping the servers?  This is a completely logical approach.  And yet, it fails to consider those "corner" cases where activity looks normal, but an internal component suffers an unexpected condition.  Stealthy failures like these can be missed for months and result in significant lost revenue or wasted overhead.
</p>
<p>Monitoring systems have improved over the years with "advanced" features like automatic discovery of hosts and services.  Scanning a network, they can identify hosts and differentiate web servers from workstations.  Resources are grouped logically.  It&#8217;s a very turnkey way to add monitoring to your infrastructure.  Unfortunately, for many companies, this is where the story ends (and the pain begins).  Attentions are subsequently focused elsewhere.  Priorities are reestablished.  One of the most important resources, the one that makes sure everything else is operating smoothly, becomes forgotten and orphaned.  It&#8217;s easy to forget about something that doesn&#8217;t make <a href="http://omniti.com/seeds/what-is-web-operations"><span>your job</span></a> easier or offer intrinsic value to your bottom line.

</p>
<p>A poor economy and high unemployment levels remind us how important it is to optimize our existing architecture.  The current trend towards Cloud Computing and <a href="http://omniti.com/seeds/virtualization-zfs-and-zetaback"><span>Virtualization</span></a> makes this even more challenging.  These technologies are useful for creating highly elastic platforms on a budget, but they complicate engineering by <a href="http://omniti.com/seeds/concepts-of-cloudish-storage"><span>outsourcing data storage</span></a> and processing to an external black box.  In turn, we&#8217;re forced to add resiliency in the form of additional processing nodes and redundant storage.  This added complexity introduces countless opportunities for disaster.  It&#8217;s a vicious cycle.
</p>
<p>As the Web has become the obvious target for fresh product development, additional layers of abstraction are introduced into the application stack.  New technologies and components offer exciting ways to communicate with the end user and from one business to another.  The higher we go, the more these layers are decoupled from traditional monitoring proficiencies.  The resulting programs are overly intricate and opaque.  We need new ways to increase visibility and derive useful data from modern business systems.
</p>
<p>Gaining visibility over business operations is probably the easiest improvement any company can make.  Quality analytics require a solid understanding of your IT operations and business processes, which come from transparency into your systems.  Once these have been established we should be equipped with the tools to streamline and simplify any infrastructure.
</p>
<ol style="font-size: 1.143em;">
<li>Key Performance Indictators
<p class="first">First and foremost, identify the external business metrics that directly affect your revenue.  Establish thresholds and put fault-detection monitors into place, just like you would for any server or application.  Alerts on business operations (e.g. new user registrations, orders per hour) are more important than the systems that support them.  Remember that revenue is an asset, and hardware is a cost.  Not the other way around.
</p>
</li>
<li>Review IT Monitors
<p class="first">Evaluate your existing IT monitoring systems.  Ensure that metrics are being gathered for every single host and service.  The breadth and depth of data collected now will directly influence the quality of the information that can be extracted later on.  It&#8217;s paramount to have the metrics to support your decisions, but you won&#8217;t know which they are until you can juxtapose them later.

</p>
</li>
<li>Stockpile Data
<p class="first">Collect as many metrics as possible, for as long as possible.  There are no good excuses for not storing metrics indefinitely.  Storage is inexpensive, and a variety of technologies allow us to scale capacity with ease.  In three years we should be able to look back on data with as much granularity as the information that was collected just yesterday.
</p>
</li>
<li>Highlight Deficiencies
<p class="first">Graph your metrics.  Study their trends and formulate a plan to address the immediate capacity limitations.  When deploying new resources, look for hints in the trends that reveal hidden relationships in your network.  But remember that this goes beyond planning for the future; this data has inherent value in supporting your ongoing decisions.
</p>
</li>
<li>Build Relationships
<p class="first">Correlate graphs in ways that represent your business systems.  Pinpoint metrics that relate towards a common goal (sales per visit, length of visit, average page size, network latency and webserver load).  You might be shocked at the patterns revealed.  If your trending application doesn&#8217;t allow you to correlate incongruent data easily, find a new one.
</p>
</li>
<li>Empower Stakeholders
<p class="first">Distribute the accumulated knowledge with individuals and teams within your organization that can take action towards positive change.  If possible, give them access to all of the information, not just the data that directly affects them.  For large architectures, there is rarely a single person with a holistic view of the entire stack.  Trust your partners and there&#8217;s a good chance they&#8217;ll unearth something you missed previously.

</p>
</li>
</ol>
<p>Fault-detection and Trending solutions should return more on investment than high uptime or speedy notifications.  They should prepare an organization to increase capacity before limits are reached, realign resources to meet <a href="http://omniti.com/seeds/dissecting-todays-internet-traffic-spikes"><span>unexpected traffic spikes</span></a>, help <a href="http://omniti.com/does/design-and-development"><span>Development & Design teams</span></a> to better understand your customers, and decrease the maintenance and staffing necessary for normal IT operations.  Feedback should be real-time and tuned to the needs of your organization.  In a nutshell, it should pay for itself and then some.
</p>
]]></content:encoded>
            <pubDate>Wed, 16 Dec 2009 16:00:00 GMT</pubDate>
        </item>
        <item>
            <title>Transcending the Medium</title>
            <link>http://omniti.com/seeds/transcending-the-medium</link>
            <guid>http://omniti.com/seeds/transcending-the-medium</guid>
            <description><![CDATA[I grew up in Boyd County, Kentucky, where John Deere tractors are practically an indigenous species. This now famous brand, started by none other than Mr. John Deere himself, began life with one sole product: the steel plow. Fortunately for lawn care c...]]></description>
            <content:encoded><![CDATA[<p>I grew up in Boyd County, Kentucky, where John Deere tractors are practically an indigenous species. This now famous brand, started by none other than Mr. John Deere himself, began life with one sole product: the steel plow. Fortunately for lawn care companies and husbands everywhere, Mr. Deere was a man who understood his market. He understood that his company existed to serve the needs of the agriculture industry, and not purely to cut furrows in soil. Because of his focus on the needs of his consumers, Deere &#38; Co. now manufactures hundreds of products and parts in multiple agricultural categories, perhaps the most iconic of which is the quintessential riding lawn mower. Business leaders would do well to take a page from the Deere &#38; Co. play book, especially when it comes to their technology strategy. If John Deere were alive today, I think he might offer the following advice: place your focus on the consumer, and then find ways for technology to serve them. I&#8217;ll say it even more forcefully: For just a moment, completely forget about the specific technology you&#8217;re using to serve your market. Instead, just think about the lives of those who use it. Why do they use it? What does it help them achieve? Is there a better way for them to reach their goals, even if it means completely stepping outside of the medium you have created? 
</p>
<p>
Consider Facebook, YouTube, and Digg. These are not &#8220;web&#8221; companies. They are, respectively, communications, entertainment, and news companies. If tomorrow a new technology emerged &#40;and inevitably it will&#41; that allowed consumers to better achieve their goals than the services offered by these so-called &#8220;web&#8221; companies, all three organizations would be forced to either embrace that new technology or risk significant loss of market share. The kicker is that they aren&#8217;t alone: what is true for Facebook, YouTube, and Digg is also true for you. 
</p>
<p>
Now for the secret: tomorrow may have come a day early. In April of 2009, Apple announced that the iPhone App Store had surpassed one billion downloads in just 9 months<sup><a href="http://omniti.com#footnotes" style="color: #990000;">1</a></sup>. Consider that for a moment. It took Firefox, the most used web browser in the world<sup><a href="http://omniti.com#footnotes" style="color: #990000;">2</a></sup>, over 4 years to reach one billion downloads<sup><a href="http://omniti.com#footnotes" style="color: #990000;">3</a></sup>, and VOIP/IM service Skype served its one billionth download after 5 years<sup><a href="http://omniti.com#footnotes" style="color: #990000;">4</a></sup>. Apple did it in 10 months, and they did it in an emerging market with an entirely unique distribution model. As impressive as that was, Apple has recently outdone themselves once again, reaching the 2 Billion download mark in September of 2009<sup><a href="http://omniti.com#footnotes" style="color: #990000;">5</a></sup>, doubling the total number of downloads served in just 5 months. 
</p>
<p>
This kind of exponential growth is unlikely to slow down anytime soon, and Apple&#8217;s success with both the iPhone hardware as well as the iPhone App Store has revolutionized the pace of innovation in the smart phone market. Verizon is projected to carry 18 different smart phones powered by the Google Android OS by the end of 2009<sup><a href="http://omniti.com#footnotes" style="color: #990000;">6</a></sup>, and Blackberry now has an &#8220;App World&#8221; to provide consumers of their smart phone line with third party applications. 
</p>
<p>
What does this impressive growth mean for business owners, executives, and IT managers? It means that consumers are becoming increasingly mobile, connected, empowered, and, ultimately, accessible. As large segments of the consumer market begin to adopt mobile technology, the organizations who benefit the most will be those who evolve to service them, finding new distribution or advertising channels for existing products and in some cases utilizing this technological shift to create entirely new offerings. 
</p>
<p>
Yet the most profitable organizations will capitalize on the opportunities created by the mobile market without becoming lost within them. As smart phones are beginning to reach critical mass, it is perhaps now more important than ever to realize that much of the excitement in the mobile space is fueled by HTTP and the Web, and Internet usage  certainly isn&#8217;t going anywhere but up in the foreseeable future. In the United States alone, Internet Service Providers have reached a point of market saturation with 71&#37; of all Americans reporting consistent Internet and Web availability at home or work<sup><a href="http://omniti.com#footnotes" style="color: #990000;">7</a></sup>, and 85&#37; of iPhone users report using mobile Safari to browse the web on a regular basis<sup><a href="http://omniti.com#footnotes" style="color: #990000;">8</a></sup>. Savvy managers will apply many of the lessons learned in previous online ventures to future expansion in the mobile space while continuing to build lasting value wherever their consumers are.
</p>
<img alt="Device Penetration" src="http://images.omniti.net/omniti.com/i/b/mark-seeds%28phone%29.png" />
<p>
In the past 10 years, I&#8217;ve had the pleasure of working on a variety of projects that span multiple mediums. In varying capacities, I&#8217;ve directly contributed to television productions, radio broadcasts, print publications, trade show exhibits, stage shows and online advertising campaigns. I&#8217;ve also served as the lead engineer on a variety of desktop, web, and mobile software projects. I consider myself to be a graduate of the school of hard knocks, and when you dance with lady experience long enough, you begin to realize that the principles of success transcend all mediums. The following are just a few &#8220;Transcendent Themes&#8221; that I believe must be applied regardless of the medium an organization chooses to operate within: 
</p>
<ol style="font-size: 1.143em;">
<li>Launching is Not Enough
<p class="first">
&#8220;If you build it, they will come.&#8221; It worked great for Kevin Costner&#8217;s character in Field of Dreams, but in the real world, merely bringing a product or service to market is rarely enough for it to succeed. Good products require good marketing initiatives behind them, and what is true in conventional mediums is also true on the web and the emerging mobile market. 
</p>
<p>
From banner ads to social media, today&#8217;s marketers have more tools at their disposal than ever before. Finding the right mix can be a challenge, but the solution will likely involve a cross-comparison of proven conversion rates against perceived market presence. In English: find out where your market is digitally congregating, look at the proven effectiveness of the tools at your disposal that are able to reach them, and then start a small initial campaign to test your analysis. 
</p>
</li>
<li>Sample It to Sell It
<p class="first">
Shopping mall fast food chains face some of the fiercest competition around. Sales quantity is king in a confined space with direct competition, low profit margins, and a product lifespan measured in hours instead of days or months. The presence of free samples in food courts isn&#8217;t primarily motivated by desperation but by survival. Fast food chains continue to offer free samples month after month in shopping centers across America because they understand that the cost of giving away small amounts of their product is adequately covered by the sales those samples generate. 
</p>
<p>
This same principle is applied in virtually every industry. Clothing stores often allow customers to use in store fitting rooms to try on outfits before purchasing. Automobile dealers allow qualified leads the famed &#8220;Test Drive,&#8221; and the movie industry produces previews of all new releases to entice the viewer to later enjoy the full experience. 
</p>
<p>
Samples sell, and this age old axiom may be even more effective when applied to digital products than to physical goods. 
</p>
<p>
Consider the case study of iCombat, a top 100 iPhone application. The month after the makers of iCombat released a free, &#8220;lite&#8221; version of their paid application, they were rewarded with an 8.73&#37; conversion rate. The most impressive part? A conversion rate of 8.73&#37; resulted in a monthly sales revenue increase of 496&#37;<sup><a style="color: #990000;" href="http://omniti.com#footnotes">9</a></sup>. From the iCombat creator:
</p>
<blockquote> 
<p>we had waited months longer than we should have 
to launch a lite version. There was no point to waiting and sacrificing 
the initial new release buzz.
</p></blockquote>
<p>
When it comes to profiting from the power of sampling, iCombat isn&#8217;t alone. Mobile analytics company Flurry released a study of the impact sample applications had on paid application sales and found that there is on average an 85&#37; &#8220;free-to-paid&#8221; sales lift generated by application sampling<sup><a href="http://omniti.com#footnotes" style="color: #990000;">10</a></sup>. From the Flurry report:
</p>
<blockquote>
<p>
Among your strongest marketing plays in the App Store is to offer 
a free trial of your game or application. Not only is the App Store designed 
for this, but also it&#8217;s the best way to reduce consumer risk in trying your 
application, with the goal of eventually getting that user to purchase the 
full version. Think: money. And from our data, it&#8217;s among the most 
effective moves you can make.
</p>
</blockquote>
</li>
<li>Make a Good First Impression
<p class="first">
1/20th of a second. That&#8217;s how long you have to make a good impression on the typical web user according to Dr. Gitte Lindgaard of Carleton University in Ontario, Canada. Dr. Lindgaard published an article in Behaviour and Information Technology in which he describes that a visitor not only forms a cognitive bias in the first 1/20th of a second after visiting a web site, but that this cognitive bias, formally known in psychology as the &#8220;halo effect,&#8221; would also significantly influence the user&#8217;s opinions on the reliability and usability of the web site. 
</p>
<p>
In the words of Dr. Lindgaard:
</p>
<blockquote>
<p>
&#8230;the strong impact of the visual appeal of the site seemed to draw attention away from usability problems. This suggests that aesthetics, or visual appeal, factors may be detected first and that these could influence how users judge subsequent experience&#8230;. Hence, even if a website is highly usable and provides very useful information presented in a logical arrangement, this may fail to impress a user whose first impression of the site was negative.
</p>
</blockquote>
<p>
While no studies have been published to measure the impact of the "halo effect" in mobile applications, it is undoubtedly an important aspect of the user experience. While a thorough discussion of design principles is beyond the scope of this article, consider the fact that proper use of color alone has been proven to increase brand recognition by up to 80&#37;<sup><a style="color: #990000;" href="http://omniti.com#footnotes">11</a></sup>. In visual mediums, color certainly matters. Give your users the right impression by abiding by the principles of color psychology shown below:   
</p>
</li>
</ol>
<img width="500" height="325" src="http://images.omniti.net/omniti.com/i/b/mark-seeds%28color%29.jpg" alt="Color Psychology"/>
<p>
Success is not limited to any single medium, and neither are your consumers. The companies that realize and apply that to their operations today will be the companies taking consumers into the frontiers of tomorrow. Be among them by placing the needs of your consumers first and then utilizing whatever technology is necessary to serve them. 
</p>

<h3>Footnotes</h3>
<ol>
<li><a name="footnotes" href="http://www.apple.com/pr/library/2009/04/24appstore.html">http://www.apple.com/pr/library/2009/04/24appstore.html</a></li>
<li><a href="http://www.w3schools.com/browsers/browsers_stats.asp"><span>http://www.w3schools.com/browsers/browsers_stats.asp</span></a></li>
<li><a href="http://en.wikipedia.org/wiki/Firefox#Market_adoption"><span>http://en.wikipedia.org/wiki/Firefox#Market_adoption</span></a></li>
<li><a href="http://share.skype.com/sites/en/2008/09/celebrating_1_billion_download.html"><span> http://share.skype.com/sites/en/2008/09/celebrating_1_billion_download.html</span></a></li>
<li><a href="http://www.apple.com/pr/library/2009/09/28appstore.html"><span>http://www.apple.com/pr/library/2009/09/28appstore.html</span></a></li>
<li><a href="http://bits.blogs.nytimes.com/2009/05/27/google-expect-18-android-phones-by-years-end/"><span> http://bits.blogs.nytimes.com/2009/05/27/google-expect-18-android-phones-by-years-end/</span></a></li>
<li><a href="http://www.census.gov/compendia/statab/tables/09s1118.pdf"><span> http://www.census.gov/compendia/statab/tables/09s1118.pdf </span></a></li>
<li><a href="http://macdailynews.com/index.php/weblog/comments/16715/"><span> http://macdailynews.com/index.php/weblog/comments/16715/ </span></a></li>
<li><a href="http://www.theapplicationfarm.com/2009/07/just-how-much-does-a-lite-version-help-boost-sales/"><span> http://www.theapplicationfarm.com/2009/07/just-how-much-does-a-lite-version-help-boost-sales/ </span></a></li>
<li><a href="http://blog.flurry.com/bid/19375/iPhone-App-Store-Marketing-Give-it-Away-to-Get-Paid"><span> http://blog.flurry.com/bid/19375/iPhone-App-Store-Marketing-Give-it-Away-to-Get-Paid </span></a></li>
<li><a href="http://www.colormatters.com/market_whycolor.html"><span> http://www.colormatters.com/market_whycolor.html </span></a></li>
</ol>]]></content:encoded>
            <pubDate>Tue, 13 Oct 2009 21:32:33 GMT</pubDate>
        </item>
        <item>
            <title>When Commodity Makes Sense</title>
            <link>http://omniti.com/seeds/when-commodity-makes-sense</link>
            <guid>http://omniti.com/seeds/when-commodity-makes-sense</guid>
            <description><![CDATA[We&#8217;d all like to spend as little money as possible to get the performance we desire from our computing hardware.  When the term &#8220;commodity&#8221; is used in relation to computing, it typically refers to products that are mass-produced and w...]]></description>
            <content:encoded><![CDATA[<p>We&#8217;d all like to spend as little money as possible to get the performance we desire from our computing hardware.  When the term &#8220;commodity&#8221; is used in relation to computing, it typically refers to products that are mass-produced and widely available, with little to distinguish them other than price.  This is in contrast to &#8220;enterprise&#8221; hardware &#8212; specialized, vertically-integrated product lines such as Sun SPARC and IBM POWER that target a narrower slice of the computing market and differentiate themselves much more on features than on price.  When I say &#8220;commodity&#8221;, I don&#8217;t simply mean standardized hardware, I mean the cheapest, lowest-common-denominator gear that gets the job done. There are legitimate use cases for both types of hardware, but as with all computing solutions, there are tradeoffs that must be understood to make the wisest choices.</p>

<img alt="Disk Array" src="http://images.omniti.net/omniti.com/i/b/array.png" width="500" height="201" />

<p>The choice to design an architecture with commodity hardware in mind comes with some enticing benefits. First, instead of one expensive widget, I can afford a bunch of cheaper widgets and spread out my work load among them, which also helps isolate failures and improves the overall continuity of service to my customers. Second, it allows me to scale my solution as the demands of the business grow.  Third, money saved by avoiding pricey hardware is freed to be spent in other areas.</p>

<p>Data storage is one market where there is a stark difference between enterprise and commodity hardware.  The first time I heard the term <abbr title="redundant array of inexpensive disks">RAID</abbr>, I learned that the &#8220;I&#8221; stood for &#8220;Inexpensive&#8221;. Later, I discovered that it is often given as &#8220;Independent&#8221;. Both make sense in context, but it seems now that the former meaning has been lost when 15K-rpm drives are <em>de rigueur</em> and sit at the top end of the price range.  Lowly 7200-rpm or even 5400-rpm SATA drives occupy the low end.  This is but one area of computing where commodity drives is not seen as capable of matching more expensive, enterprise drives. However, a holistic systems approach reveals that there are plenty of places where commodity drives makes sense in terms of delivering on business goals and ensuring a high quality of service.</p>

<p>At the high end, dedicated storage arrays with custom hardware controllers filled with 15K-rpm drives define enterprise storage. These are speed demons; the high spindle speed means low latency (around 2ms, compared to 4-5ms for 7200 rpm drives). To drive latency even lower, some arrays use a technique called &#8220;short-stroking,&#8221; which utilizes only the innermost area of each disk platter, minimizing the distance that the heads must move. Such luxury comes with a steep price. Short-stroking reduces the usable space of each drive, requiring more drives for a given amount of storage. Not only that, but 15K drives max out at 300GB, so obtaining the kind of storage sizes required by large enterprises requires entire cabinets full of disk shelves. 15K drives are power-hungry, hot-running monsters which, at a time when electricity and carbon footprint concerns are becoming increasingly important, means that the luxury of high performance is ever more expensive, outstripping the performance gains of adding more spindles.</p>

<p>By contrast, a commodity storage approach seeks to maximize storage space and minimize costs, both initial and ongoing.  The same budget can buy more spindles to keep latencies down and <abbr title"Input/Output Operations Per Second">IOPS</abbr> up.  These additional spindles can fit within the same (or less) space and power budget as well.</p>

<p>The commodity storage picture is not all rosy.  Enterprise drives are manufactured to withstand the higher level of vibration that comes with putting a lot of drives into the same chassis.  They also have lower bit-error rates and higher <abbr title"Mean Time Before Failure">MTBF</abbr> than their commodity cousins.  Simply put, commodity drives fail more often.  &#8220;Failure&#8221; could be anything from silent data corruption to outright mechanical malfunction.  Commodity drives also don&#8217;t spin as fast &#8212; 7200 rpm at most.  That increases latencies, especially on random read loads.  Any solution that employs commodity drives must account for these realities and work around them.</p>

<p>Thanks to some excellent, disruptive technology from Sun, namely <a href="http://www.opensolaris.org/os/community/zfs/"><span>ZFS</span></a>, I can design a system around inexpensive, 7200 rpm drives, bolstered by a few <abbr title"solid-state disk">SSD</abbr>s, which provides more capacity with fewer spindles and <a href="http://blogs.sun.com/brendan/entry/test"><span>improves read/write latencies</span></a> far beyond the capabilities of short-stroked 15K drives. ZFS features such as end-to-end checksums, guaranteed on-disk consistency, intelligent prefetch, and immense scalability make it a good fit for my motley array of cheap disks. If I run low on space, adding more disks is extremely simple and cost-effective.  Likewise, features like <a href="http://www.opensolaris.org/os/community/zfs/demos/selfheal/"><span>self-healing data</span></a> and <a href="http://blogs.sun.com/bonwick/en_US/entry/smokin_mirrors"><span>top-down, metadata-driven resilvering</span></a> mean that silent data corruption and device failures don&#8217;t have to be the ulcer-inducing events they once were.</p>

<p>Storage is just one example of how some careful thought during the design process can yield significant savings during implementation.  The same theory applies to entire server farms when deploying web applications.  An application that is designed to scale horizontally can be run on a large number of cheap servers rather than a few very expensive ones.</p>

<p>Sometimes it <em>doesn&#8217;t</em> make sense to go the commodity route. It is as important to know when you can scale horizontally as it is to know when you should not. An application may require more engineering to rearchitect it to scale out than it would cost to buy a larger single machine to scale it up. I&#8217;m not just talking about high-end RISC gear either &#8212; some relatively large x86-64 configurations are possible. For example, the upper end of <a href="http://www-03.ibm.com/systems/x/hardware/enterprise/index.html"><span>IBM&#8217;s System x</span></a> line can be configured with up to 4 4U chassis linked together in an 8-socket, 48-core, 128-DIMM system. That&#8217;s a monster box (cue the Tim Allen grunts). If you can run your app today on one machine, <em>and</em> you can plan its growth to fit into a monster box, then the cost-effective approach may just be to use the larger machine.</p>]]></content:encoded>
            <pubDate>Mon, 21 Sep 2009 15:00:00 GMT</pubDate>
        </item>
        <item>
            <title>YSlow! to YFast! in 45 minutes.</title>
            <link>http://omniti.com/seeds/yslow-to-yfast-in-45-minutes</link>
            <guid>http://omniti.com/seeds/yslow-to-yfast-in-45-minutes</guid>
            <description><![CDATA[The web is a complex beast.  There are many moving parts involved in delivering a complete web application today.  For a significant portion of my career, I have focused primarily on the architecture and implementation of the parts that an end-user nev...]]></description>
            <content:encoded><![CDATA[<p>The web is a complex beast.  There are many moving parts involved in delivering a complete web application today.  For a significant portion of my career, I have focused primarily on the architecture and implementation of the parts that an end-user never sees.  Racks, servers, databases, switches, routers and load-balancers; the list goes on, but you get the point.  The goal of such an architecture, of course, is to receive a user&#8217;s HTTP request and construct and return a complete result as quickly as possible.  To say that there are &#8220;a lot of moving parts&#8221; in today&#8217;s web architectures is an understatement &#8212; they are beasts.</p>

<p>What makes this even more complicated is that once you spew forth the result to the end-user you have a daunting set of user-perceptible performance issues remaining to be addressed.  This performance challenge happens in a hostile environment: one we do not control (the user&#8217;s computer) over a long-haul network we do not control driven via a browser we did not select &#8212; a daunting challenge indeed.</p>

<p>We are very fortunate to have excellent tools at our disposal with which to tackle this challenge.  Two of my favorites are Yahoo&#8217;s <a href="http://developer.yahoo.com/yslow/"><span>YSlow!</span></a> and Google&#8217;s <a href="http://code.google.com/p/page-speed/"><span>Page Speed</span></a> tools.  Both are extensions to the most excellent <a href="http://getfirebug.com/"><span>FireBug</span></a> add-on for <a href="http://www.mozilla.com"><span>Mozilla&#8217;s FireFox web browser</span></a>.  Both tools will help you dissect the various aspects of the content you deliver to end-users and understand how each bit will contribute to perceived slowness.  In the web (and most other things in life) perception is king.  A user&#8217;s perception drives their response.</p>

<h2>Irony: the not-so-delicious kind.</h2>

<p>I recently attended the <a href="http://en.oreilly.com/velocity2009"><span>Velocity conference</span></a> and the first workshop I attended was Steve Souders&#8217; excellent presentation on <a href="http://en.oreilly.com/velocity2009/public/schedule/detail/8807"><span>Website Performance Analysis</span></a>.  Steve Souders is the original author of YSlow! which I use on a daily basis.  Steve used YSlow! to show how to analyze website performance (as one might have assumed from his workshop title).  I popped open YSlow! on our corporate website and&#8230; horror!</p>

<p>While OmniTI has enormous breadth in the Internet space, we are primarily known as an Internet performance and scalability company. This made the fact that we received an F on YSlow! all the more embarrassing. This was a case of the right hand not knowing what the left hand was doing &#8212; something we evangelize against. I decided that I would fix that and aim to do it by the end of Steve&#8217;s presentation.  A play-by-play follows.</p>

<h3>No Expires Headers.</h3>

<p>It turns out that our images, javascript, and CSS didn&#8217;t have expires headers.  Our CSS is in a directory /c/, our javascript is located in /js/, and all our images are in /i/.  I could do this by content type, but a location-based approach gives me the flexibility of serving dynamic/uncacheable content with those content types if I choose to later:</p>

<pre><samp>
&lt;Directory "/www/sites/omniti.com/www/i"&gt;
    ExpiresActive On
    ExpiresDefault "access plus 1 month"
&lt;/Directory&gt;
&lt;Directory "/www/sites/omniti.com/www/c"&gt;
    ExpiresActive On
    ExpiresDefault "access plus 1 month"
&lt;/Directory&gt;
&lt;Directory "/www/sites/omniti.com/www/js"&gt;
    ExpiresActive On
    ExpiresDefault "access plus 1 month"
&lt;/Directory&gt;
</samp></pre>

<h3>Using Etags.</h3>

<p>Etags are on.  This isn&#8217;t really a problem in and of itself, but since some of our static content can be served by multiple machines and the Etag in Apache is based off inode, it will be different from machine to machine and cause issues:</p>

<pre><samp>
&lt;FilesMatch "\.(js|css|gif|png|jpe?g)$"&gt;
  FileETag None
&lt;/FilesMatch&gt;
</samp></pre>

<h3>Uncompressed content.</h3>

<p>This is even easier.  We run Apache 2.2, so:</p>

<pre><samp>
AddOutputFilterByType DEFLATE \
       text/html text/plain text/xml \
       application/javascript text/css
</samp></pre>

<h3>No CDN.</h3>

<p>We have a fast CDN-like caching layer residing at s.omniti.net that we can leverage&#8230; so I flipped all the images over to that.  Technically, this is cheating because you have to add s.omniti.net to the YSlow! configuration to be recognized as a CDN.  I was pleased to learn that even without formally moving the images to a known CDN, we still moved to an A rating in YSlow!</p>

<h3>Assets served from a domain with cookies.</h3>

<p>The move of all static assets to s.omniti.net resolved this issue.  This goes to show that even if you don&#8217;t have a CDN, simply putting your static assets in a different domain (that has no cookies) can considerably speed performance in two ways: (1) it allows for more concurrency on the network layer and (2) it reduces the upstream payload for quicker requests.</p>

<h2>The result?</h2>

<p>A noticeably faster web site in under 45 minutes.</p>

<p>Before I fixed things up, it took 486ms to render (over the conference Internet connection).</p>

<a href="http://omniti.com/i/b/yslow-visit1.png"><img alt="yslow-visit1-small" src="http://images.omniti.net/omniti.com/i/b/yslow-visit1-small.png" /></a>

<p>After, a bit of work, I was able to drop the time-to-render to 315ms over the same link.  That&#8217;s a 35% reduction and it almost drops the page load time down into the &#8220;so fast it doesn&#8217;t matter&#8221; arena.</p>

<a href="http://omniti.com/i/b/yfast-visit1.png"><img alt="yfast-visit1-small" src="http://images.omniti.net/omniti.com/i/b/yfast-visit1-small.png" /></a>

<p>There are several things I&#8217;d like to do that would further improve page load/render times.  The javascript used could be consolidated into a single js file (aside from the web analytics parts).  The CSS could also be consolidated from two files to one.  On our <a href="http://omniti.com/is"><span>about page</span></a> we have thumbnail photos of all our staff, they are all the same size and we could easily turn this into a single image and use CSS sprites; that would dramatically improve the perceived performance of that page.</p>

<p>Some things we did right?  Our search is wicked fast as we pull the results in AJAX and make a single DOM manipulation to visualize them.</p>

<h2>Next steps.</h2>

<p>Go fix your site.  Make it faster.  Make the web a better place.  It took me 45 minutes to make significant positive impact.  Granted, if I didn&#8217;t know your web application or it was more complicated than our corporate site (which I believe all are), it will take a bit longer.  It&#8217;s worth it.  Do it.  Or <a href="http://omniti.com/does/scalability-and-performance"><span>hire us to do it</span></a>.</p>]]></content:encoded>
            <pubDate>Tue, 07 Jul 2009 13:30:00 GMT</pubDate>
        </item>
        <item>
            <title>What is Web Operations?</title>
            <link>http://omniti.com/seeds/what-is-web-operations</link>
            <guid>http://omniti.com/seeds/what-is-web-operations</guid>
            <description><![CDATA[The field of web operations is one with which I am intimately
familiar.  For the last twelve years, I have immersed myself in this
field and have had the distinct privilege in helping define it.  Even
now, writing a job description for a web operations...]]></description>
            <content:encoded><![CDATA[<p>The field of web operations is one with which I am intimately
familiar.  For the last twelve years, I have immersed myself in this
field and have had the distinct privilege in helping define it.  Even
now, writing a job description for a web operations specialist is
nearly impossible and when I speak with colleagues about what web
operations truly is, we all seem to articulate things differently.  I
wrote an article a little over a year ago after attending the first
O&#8217;Reilly Velocity Summit.  I now sit in a hotel room preparing
my workshop for delivery at the second annual Velocity conference and
realize very little has changed.  While I still believe the definition
of web operations is in flux, I truly appreciate a forum in which it
can be explored further.  I strongly encourage anyone in the bay area
to swing by and partake.</p>

<p>While attending the summit that helps plan this conference, I had two
epiphanies:</p>
<ol>
<li>a realization of the lack of a career path for people who do what
we do (no standard titles, no standard roles and responsibilities and
certainly a lack of sex appeal);</li>
<li>a clear lack of terminology for the technology requirements
that are so common in these environments.</li>
</ol>
<p>Terminology is easy, in my opinion &#8212; you just argue until
someone wins.  Of course, arguing is a hobby of mine, so I have bias.
On the other hand, defining a career path that is an industry accepted
path is hard.</p>

<h2>The Career: Web Operations</h2>
<p>The term <a href="http://en.wikipedia.org/wiki/Web_operations"><span>Web
Operations</span></a> was used a lot during this event.  While it is not
awful, I really do not like this term.  The hard part is that the
captains, superstars, or heroes in these roles are multidisciplinary
experts.  They have a deep understanding of networks, routing,
switching, firewalls, load-balancing, high availability, disaster
recovery, TCP &amp; UDP services, NOC management, hardware
specifications, several different flavors of UNIX, several web server
technologies, caching technologies, several databases, storage
infrastructure, cryptography, algorithms, trending and capacity
planning.  The issue: how can we expect to find good candidates that
have fluency in such a nimiety of technologies?  In the traditional
enterprise, you have architects which are broad and shallow and their
team of experts which are focused and deep.  However, the
expectation is that your &#8220;web operations&#8221; engineer be both
broad and deep: fix your gigabit switch, optimize your MySQL database
and guide the overall architecture design to meet scalability
requirements.</p>

<p>I struggle with this.  Not everyone can be a superstar.  More
importantly, no one can really start as a superstar.  If we use an
apprentice model (which is common in industries without institutional
support) we limit the total number of able workers in this field.  So,
how do we (re)define the requirements for a junior web operations
person?</p>

<p>We have to have a plan for hiring on people and progressing them
through a career path to make this a legitimate discipline.  During
conversation, one of my colleagues said they just hire people that
they think are agile &#8212; &#8220;If I tell them to know IOS well
enough to configure a router and troubleshoot a problem, I expect them
to show up tomorrow with a basic understanding of IOS and ready to
start typing in commands at a console.&#8221; I agree this sort of
&#8220;no boundaries&#8221; attitude is required for the job, but
where do you start?</p>

<p>Another person mentioned that the reason for the lack of sex appeal in
the position was due to popular attitude.  Many people apply for
development positions and &#8220;don&#8217;t quite make the cut&#8221;
and are instead offered system administration positions.  I personally
don&#8217;t subscribe to this philosophy and we certainly do not operate
like that at <a href="http://omniti.com/"><span>OmniTI</span></a>, but I have seen
it in other companies &#8212; I hope it is not prevalent.</p>

<p>Basically, this is one of the few positions in the organization that
has no boundaries of responsibility.  If something breaks,
it <em>is</em> your problem.  Why isn&#8217;t this the case throughout
the organization &#8212; why is it that even the most junior of
developers doesn&#8217;t wake up to fix their code when it breaks and causes
service degradation in the middle of the night?  It is uncommon that
this level of responsibility is expected of developers, while it is a
quite common expectation of the operations crew.</p>

<p>Circling back, I really do not like the term &#8220;web ops.&#8221; I
realize it is not far off, but it isn&#8217;t sexy.  Google has a few
different roles with this level of responsibility.  One I like is
called: &#8220;Site Reliability Engineer.&#8221; However, I would like
a set of job titles and a progression through them that makes this an
appealing career path for young, ambitious geeks.</p>

<p>In order to define these roles, we should think about what they are
responsible for.  In our organization I see this as a few things:</p>

<h3>Junior</h3>
<p>On the junior level, they are responsible for learning.  They are
responsible for deploying new services and documenting such
deployments.  They are responsible for instrumenting deployments to
make sure that faults are detected and trending is possible.</p>

<h3>Mid-level</h3>
<p>On the mid-level, they are responsible for all of the above, and more.
Effective and complete troubleshooting of failures.  Making sense of
trending information.  Understanding work loads that exist.  Tuning
systems to better accommodate current workloads and proactive tuning
to handle known future workloads.  One of the key differences between
mid-level and junior is the ability to correctly prioritize
remediation of issues during incident response.  Staying calm,
collected and executing with clarity of thought during an emergency.</p>

<p>What does &#8220;complete troubleshooting&#8221; mean?  I mean
troubleshooting without boundaries.  I want no shyness in cracking
open developer code and telling them what they did wrong and
why. Finger pointing at people simply doesn&#8217;t work, you have to
point your finger at implementation problems, not people.  To do that
requires the skill to track a performance problem or reliability issue
down to a specific line of code or approach.</p>

<h3>Senior</h3>
<p>On the senior side, technology research and selection is a must.
Additionally, they are responsible for incorporating new technologies in the architecture to improve availability and reduce costs, constantly analyzing systems to
improve efficiency and capacity planning to understand growth well
enough to ensure provisioning and deployment outpace need.  Donald
Knuth long said that premature optimization is the root of all evil;
I&#8217;ve long said that the ability to accurately determine what is
premature separates senior from junior.</p>

<p>One of the core responsibilities that all engineering disciplines share is
assessing the appropriateness of the technologies at hand.  For example,
a &#8220;Web Architect&#8221; must ensure that
technology selection as well as development and deployment strategy
match the business need.  This is &#8220;hard.&#8221;</p>

<h2>Above and Beyond</h2>
<p>Web operations is a special role.  This role is in no way fitting for failed
developers, it is for developers/engineers that have outpaced their
career path.  One that has a deep understanding of how things work:
&#8220;a complete systemic view of general site architecture.&#8221;
However, they want <b>more responsibility</b>, they want to make sure
that <b>all of it works all of the time</b>: the app, the stack, the
hardware, the network.  Whatever technology the business needs, it
must work, it must performs and it must be able to meet demand.
Lastly, in their heart of hearts, they must believe that all problems
are equal in their need for resolution and problem prioritization is
dictated by business impact and not by flights of fancy (how cool or
interesting the problem is).</p>

<p>It is an impossible job requirement: &#8220;Knows everything about all
technologies deployed in Internet architectures.&#8221; While no one
fills this requirement, what I want is someone whose career goal is to
find out how close they can get.</p>]]></content:encoded>
            <pubDate>Mon, 22 Jun 2009 00:39:00 GMT</pubDate>
        </item>
        <item>
            <title>Concepts of Cloud(ish) Storage</title>
            <link>http://omniti.com/seeds/concepts-of-cloudish-storage</link>
            <guid>http://omniti.com/seeds/concepts-of-cloudish-storage</guid>
            <description><![CDATA[It&#8217;s rare that I write an article simply to educate.  Most of
the time I am attempting to articulate or justify a position, or
simply rebutting someone&#8217;s nonsensical yammering.  For a
refreshing change, I thought I would take some time to e...]]></description>
            <content:encoded><![CDATA[<p>It&#8217;s rare that I write an article simply to educate.  Most of
the time I am attempting to articulate or justify a position, or
simply rebutting someone&#8217;s nonsensical yammering.  For a
refreshing change, I thought I would take some time to educate you on
the fundamentals of large-scale data storage.  Many people think of
&#8220;storage as a service&#8221;
(now being called &#8220;cloud storage&#8221;) as a magic
black box.  At the end of the day, it is just bits on disks.  And like
all things, if you use enough of it, you can more than cover the cost
of managing it yourself by simply eliminating your vendor&#8217;s
margin (insourcing).</p>

<p>There are more and more services providing outsourced storage.  The
concept is simple: you upload a digital asset to the vendor (via some
sort of API or tool), they return an identifying key of some sort
(sometimes this key is provided by you, the uploader) and they store
the asset for you.  To retrieve the asset, you use a similar method to
the one used to upload.  In the simplest terms, you can think of it as
a mapped network drive to which you can save assets, and later
reconnect to retrieve them.</p>

<p>By no means is this new technology.  However, the idea of managing
one&#8217;s own storage, combined with growing space requirements and
fear of loss due to lack of redundancy, have driven people to want to
make this particular problem someone else&#8217;s.  Making this choice
&#8212; to solve the problem yourself or to outsource &#8212; is
always the outcome of several factors: cost, convenience, and
safety.</p>

<h3>Redundancy: The basics</h3>

<p>Let&#8217;s take a look at the fundamentals of data storage.  We
all want our data to be safe.  It&#8217;s pretty obvious that storing
exactly one copy of the data isn&#8217;t safe, but it&#8217;s actually
more complex than you would think &#8212; storing two copies
doesn&#8217;t buy you much without taking a few extra steps.</p>

<p>Before we dive in and explore methods for keeping data safe across
systems, we need to realize that one of our fundamental assumptions is
invalid.  We assume that when we write data to a disk, it will have no
errors when we read it back.  <a href="http://indico.cern.ch/getFile.py/access?contribId=3&amp;sessionId=0&amp;resId=1&amp;materialId=paper&amp;confId=13797"><span>This
assumption is fundamentally wrong</span></a>.  There&#8217;s this little evil
thing called a bit error (basically, one of the zeros or ones that was
written came back inverted).  How often this type of error occurs is a
probability called bit error rate (BER).  The <abbr title="bit error
rate">BER</abbr> on modern spinning disks is usually around 10<sup>-13</sup> or
10<sup>-14</sup>.  Basically, for every 1 to 10 terabytes you write, one of the
bits &#8212; when read &#8212; won&#8217;t equal what was written.  A
single erroneous bit might not matter for some types of data, but for
others, such an error could be disastrous. We write a lot of data
these days, and bit errors are silent, so the lesson here is: write
checksums with your data.</p>

<p>The classic method of ensuring that data is safe is to store
multiple copies on different physical media.  Inside a single system,
this can be accomplished with RAID1 (mirroring), which makes sure all
data is on two physically separate disks.  With a bit of (somewhat)
clever math, we can take that same data, split it into a few pieces
and store each piece on a different drive. We can then calculate a
block of parity data, and store that on an additional drive.
Retracing the same math backwards shows that we can lose any single
disk in the set, and we&#8217;ll still be able to reconstruct our
data.  This is the basis for RAID5.  Sometimes systems need to be
resistant to multiple concurrent disk failures (hence the introduction
of RAID6, which uses an erasure code such as <a href="http://en.wikipedia.org/wiki/Reed%E2%80%93Solomon_error_correction"><span>Reed-Solomon</span></a>).</p>

<p>None of these scenarios are designed to reduce the risk of data
corruption.  Rather, they were designed to prevent data loss due to
hardware failure of one or more underlying disks.  One issue with
using RAID is that you are storing files on a set of drives, those
files consist of chunks of data (blocks) which map to physical blocks
of bits on the drives, and somewhere along that path we could lose our
way.  If a specific physical block goes bad, or somehow becomes
unreadable, we can&#8217;t easily map it back to a logical object,
such as a file.  We only find out that there&#8217;s a problem when we
try to read the object.  Another problem with this general technique
is that all of these disks live in a single system and if that system
fails, all of the data is unavailable (or worse, lost).</p>

<p>So, RAID is designed to keep our data somewhat safe within a single
system, but it doesn&#8217;t address system failures.  The most
obvious design is to put all of our information on two systems.  There
are pros and cons with this approach.  On the positive side, once
we&#8217;ve identified which system holds a copy of our asset, we only
need to communicate with that single system to retrieve a copy of the
asset &#8212; simplicity.  The downside here is that we&#8217;ve used
half of our storage as redundancy, and yet if two of our nodes fail,
we&#8217;ve necessarily made unavailable (or permanently lost)
1/(N*(N-1)) of our assets.  With two nodes, this works out to 100% (of
course), and with 10 nodes, it&#8217;s around 2%.</p>

<p>Taking a different approach altogether allows us to use half of our
storage for redundancy, while maintaining dramatically greater
availability.</p>

<h3>Erasure codes</h3>

<p>High availability of assets in light of system failures is achieved
by today&#8217;s peer-to-peer systems.  Their technical description is
clear-cut, yet extremely detailed.  By using erasure codes, these
systems are able to split data into many pieces (similar to RAID5),
but instead of calculating simple parity, they calculate unique
erasure codes.</p>

<p>Imagine we split our data into 5 pieces, and then calculate 5
additional pieces of data, any of which could be used to reconstruct
any of the original 5 pieces were they found to be unavailable &#8212;
these are erasure codes.  So, with the data in 5 fragments + 5 erasure
fragments, we&#8217;ve consumed twice the space but can now stand to
lose any five pieces before the data becomes unavailable and/or lost.
The main drawbacks to such a system are that calculating and
distributing erasure codes is much more complicated than simply
storing two copies of the same data, and that retrieving data requires
contacting at least 5 machines to serve an asset.</p>

<p>This erasure code approach assumes a slightly larger network of
servers.  With two copies and 100 machines we see 99.8% availability
with 5 machine faiures. With a 10 fragment (5 data + 5 coded)
scenario, if 5 nodes fail, we maintain 100% availability.  In the
pathological case where 50 of our 100 nodes fail, the two-copy method
would result in an availability of approximately 75.3%, whereas the
erasure code method would achieve approximatately 98.7% asset
availability.</p>

<h3>Back to reality</h3>

<p>In peer-to-peer systems, where clients enter and leave the network
rapidly, the use of erasure codes for high redundancy is quite
necessary.  However, in a datacenter environment, with redundancy on
each system and maintenance windows that we control, the situation is
entirely different.  Controlling the servers, their configuration and
their region of deployment gives us a landscape on which we can build
a sufficiently redundant system with all sorts of advantages.</p>

<p>Reduced system complexity and simple distributed processing are
significant advantages that result from having whole data objects like
images or documents present on a single node.  With this model,
we can offload some computational processing to the nodes that hold the
data and they can act without consuming additional resources such as the
CPU time and network bandwidth required to reconstitute whole objects 
from their distributed pieces.</p>

<p>At the end of the day, a hybrid/adaptive approach between the two
would yield the best outcome.  I see that being the next thing in
distributed storage.  Most of us that are faced with storing large
amounts of data have already thrown traditional filesystems and
POSIX-compliance to the wind and are looking for fresh, more
appropriate solutions to our specific problems.</p>

<p>For now, until these merge, the approach of redundantly storing
whole assets makes the most sense.  It is simple and easy to build,
deploy and administer.  It is also trivially easy to understand and
troubleshoot.</p>]]></content:encoded>
            <pubDate>Thu, 11 Jun 2009 20:16:48 GMT</pubDate>
        </item>
        <item>
            <title>Virtualization, ZFS and Zetaback</title>
            <link>http://omniti.com/seeds/virtualization-zfs-and-zetaback</link>
            <guid>http://omniti.com/seeds/virtualization-zfs-and-zetaback</guid>
            <description><![CDATA[It used to be the case that when you wanted to deploy a new application
you would need to buy new server hardware to host it on. Today however, there
are many different virtualization technologies to choose from, each allowing
you to have more than one...]]></description>
            <content:encoded><![CDATA[<p>It used to be the case that when you wanted to deploy a new application
you would need to buy new server hardware to host it on. Today however, there
are many different virtualization technologies to choose from, each allowing
you to have more than one virtual server per physical machine. Virtualization
has a number of benefits &#8212; lower cost, power, space, and cooling. Of course,
you need to have a machine powerful enough, but many services, especially
internal ones such as company wikis and instant messaging servers, do not
require the full resources of a physical server, and it makes sense to combine
these using virtualization.</p>

<p>In many cases, web applications can be combined on a single server using
virtual hosting facilities in Apache, but this is an imperfect solution.
Inevitably the situation arises where you have an application that doesn&#8217;t
play well in a virtual hosting situation, be it badly written, or requiring
specific versions of libraries or modules that conflict with another
application. There are also administrative concerns &#8212; anybody who has access
to one application has access to them all.</p>

<p>The virtual hosting method also eliminates one of the biggest benefits of
using virtualization on entire servers. Many virtualization technologies
provide some method of transferring a virtual machine between physical
hardware &#8212; if a particular server is behaving badly, just transfer all the
virtual machines onto replacement hardware with little to no loss of service
and without having to reinstall the operating system/applications.</p>

<p>Here at OmniTI many of our servers run Solaris, giving us two very
powerful features on which we heavily rely when it comes to making
use of virtualization: Solaris containers (Zones), and ZFS.</p>

<h2>Virtualization using Zones</h2>

<p><a href="http://www.sun.com/bigadmin/content/zones/"><span>Zones</span></a> provide
lightweight virtualization for Solaris. Unlike many other virtualization
solutions such as VMWare or VirtualBox, Solaris zones don&#8217;t emulate physical
hardware on which several complete operating systems run; rather, there is
one kernel running in the system with multiple partitions (the zones) in
which user programs run.</p>

<p>This type of virtualization doesn&#8217;t force you to pick a set amount of RAM
for each virtual machine, or set up virtual disk images (although <a href="http://www.opensolaris.org/os/community/zones/faq/#rm"><span>resource
    limits</span></a> can be set for each zone). Because there is no hardware
emulation going on, zones are also incredibly <em>fast</em> &#8212; fast enough
that we are able to run multiple production services on a single machine
without any perceptible slowdown. Even for high traffic sites that can
saturate an entire (physical) server, we are still able to make use of zones
(with just one non-global zone per server) without any significant
performance hit. This allows us to benefit from the ease of moving a zone from
one machine to another, either in the event of hardware failure, or to migrate
to a more powerful machine.</p>

<p>Zones can also ease administration of multiple servers by centralizing
package management. By default, any package installed on the global zone is
automatically installed to all non-global zones. You can also specify that
certain paths are inherited from the global zone, reducing disk space
requirements per zone. The inherited paths become read only, forcing them to
be the same across all zones. If all packages are installed
from the global zone, and you make use of inherited paths, then you can be
assured that every zone has the same software configuration.</p>

<p>However, this doesn&#8217;t have to be the case &#8212; you also have the option of
installing packages in the zones themselves if different zones need different
packages installed. To do this, don&#8217;t inherit any directories. This creates a
'large' or 'whole root' zone, and you are free to install whatever is needed
inside the zone itself.</p>

<h2>ZFS and Zones</h2>

<p>ZFS has <a href="http://opensolaris.org/os/community/zfs/whatis/"><span>many
    useful features</span></a> that put it far ahead of most other filesystems that
are available.  Several of them are of particular interest in that they make virtualization better: a pooled storage model, snapshots,
and the ability to transfer filesystems via the <code>zfs send</code>
command.</p>

<p>Pooled storage does away with the idea of having filesystems on individual
partitions, and having to guess how much space will be occupied by individual
filesystems. You just create one pool across the entire disk (or set of disks)
that you want to store your data on. Any filesystems you then create in that
pool will only use up as much space as needed to hold the data.</p>

<p>In practice, this means we can create individual filesystems for each of our
zones without having to worry about how much space to assign to each. Having
each zone on its own filesystem is required to be able to snapshot, backup,
restore and transfer zones individually.</p>

<p>Snapshots give you almost instant point-in-time copies of your filesystem,
each of which only take up enough space to hold what has changed since the
snapshot was taken. The benefits of this are numerous including the ability to roll
back to an earlier time and consistent backups (take a backup from the snapshot,
and you won&#8217;t have files being modified while the backup is in progress). From
the point of view of virtualization however, one of the biggest benefits of
snapshots is in combination with zfs send.</p>

<p>The <code>zfs send</code> command allows you to send a snapshot of a ZFS
filesystem from one machine to another (or on the same machine, if you so
desire):</p>

<pre><samp># zfs send data/zones/myzone@somesnapshot | \
    ssh remote_machine zfs receive data/zones/myzone
</samp></pre>

<p>This allows you to quickly move (or copy) a zone from one machine to
another: detach your zone, zfs send the filesystem to another machine, attach
the zone, and you have your zone up and running on a completely different
machine.</p>

<p>You can also make use of incremental snapshots to minimize the amount of
time the zone is down (a zone has to be halted in order to detach it):
snapshot the zone&#8217;s filesystem and send it across while the zone is still
running, shut the zone down, detach it, snapshot the zone&#8217;s filesystem once
more and send the incremental snapshot across.</p>

<p>Until recently there were <a href="http://www.opensolaris.org/os/community/zones/faq/#cfg_zfsboot"><span>issues
    with upgrading zones that live on a zfs filesystem</span></a>, but this has been
fixed in Solaris 10/08 (u6), and Live Upgrade is now supported. There is now
little reason not to use ZFS as the filesystem for zones.</p>

<h2>Backing it all up with <a href="https://labs.omniti.com/trac/zetaback"><span>Zetaback</span></a></h2>

<p>It doesn&#8217;t take much thought to realize that the snapshot/zfs send tools
can also be used to take backups of systems, especially when you make use of
incremental snapshots. At OmniTI we have developed <a href="https://labs.omniti.com/trac/zetaback"><span>Zetaback</span></a>, a backup tool
based on zfs that automates much of the work of taking and managing backups.

</p><p>With Zetaback, you specify a list of hosts, the retention policy, and how
often to take a full/incremental backup. Then you just let it go.  It connects
to each host via ssh, scans the host for filesystems to back up, and by
default will back up everything, automatically picking up new filesystems. You
can filter the list using regular expressions if you want to limit what is
backed up.</p>

<p>In addition to taking backups themselves, Zetaback provides tools to quickly restore zfs filesystems, view the status
of backups and generate reports showing which filesystems violate the backup
policy (e.g. those that have not had a successful backup in 1 week).</p>

<p>The choice to make use of virtualization is often an easy one, the choice
of which solution to go with is somewhat harder. If Solaris meets the needs of
your applications, then it is worth considering Zones. Combined with the
features of ZFS and Zetaback, they provide a flexible and powerful solution.</p>

<h2>Some real-world numbers</h2>

<p>We&#8217;re a web infrastructure and development shop, so we run a lot of development servers.  Each environment needs the flexibility of its own software selection including version.  To accommodate that, we run 37 zones on 2 development servers.  Each development server has 8GB of RAM and two dual-core 64-bit AMD processors &#8212; in financial terms: about $2300 each.  Our production boxes, that serve corporate mail, document management, version control, instant messaging, directory services, etc., all run in zones also.  For that we have two boxes (just like the development ones) on which  17 zones happily reside.  All of our important services run on a rather small set of machines&#8202;&#8212;&#8202;easy to manage, cheap to power and cool.  And for our purposes, it is far more efficient than heavy-weight virtualization like VMWare ESX.</p>

<p>We&#8217;ve been running this type of light-weight virtualization for over two years now.  We&#8217;re pretty happy with it.  I suggest you give it a whirl.</p>]]></content:encoded>
            <pubDate>Fri, 10 Apr 2009 14:39:27 GMT</pubDate>
        </item>
        <item>
            <title>ORMs Done Right</title>
            <link>http://omniti.com/seeds/orms-done-right</link>
            <guid>http://omniti.com/seeds/orms-done-right</guid>
            <description><![CDATA[Object-Relational Mapper (ORM) systems are one of the most contentious topics in database application development.   Creating an ORM is notoriously perilous, but using them has pitfalls as well.  Most ORMs provide little protection against misuse; the ...]]></description>
            <content:encoded><![CDATA[<p>Object-Relational Mapper (ORM) systems are one of the most contentious topics in database application development.   Creating an ORM is <a href="http://blogs.tedneward.com/2006/06/26/The+Vietnam+Of+Computer+Science.aspx"><span>notoriously perilous</span></a>, but using them has pitfalls as well.  Most ORMs provide little protection against misuse; the inexperienced developer can easily create an application that unilaterally imposes awkward database design constraints, hammers the database with innumerable queries, and is very difficult to optimize.</p>

<p>ORMs provide an automated link between the application object model and the database model.  Practically speaking, this generally means that tables become classes and rows become objects.  At the most basic level, this provides per-object persistence services.  Most ORMs also handle relationships between objects, turning compositional relationships between objects into foreign key relationships in the database.</p>

<p>In this entry, we present the benefits and pitfalls of ORMs and introduce a new Perl ORM implementation, <code>Class::ReluctantORM</code>.  <code>Class::ReluctantORM</code> is &#8220;a reluctant ORM for reluctant people.&#8221;  Its design goals are to create a framework that is unambitious, scalable, and easily circumvented.  These goals are not so much technological as philosophical; an approach that has value, as we shall see.</p>


<h2>Benefits of ORMs</h2>

<h3>Why developers love them</h3>
<p>Developers see a tremendous productivity gain: persistence no longer has to be hand-crafted into each class.  Since all model classes are using the same techniques, there is a large consistency gain as well.  Developers also get to keep their head in application-model space, without having to shift gears into database-model space.   Context-switching is expensive, and switching between implementation languages can be especially jarring.</p>

<p>Consider getting a list of pirates on a ship:</p>

<pre><code>  my $dbh = DBI->connect(&#8230;);
  my $sth = $dbh->prepare(&gt;&gt;EOSQL);
  SELECT p.* 
    FROM highseas.pirates p
      INNER JOIN highseas.ships s ON s.ship_id = p.ship_id
    WHERE s.name = ?
EOSQL
  $sth->execute('Golden Hind');
  my @pirates;
  while (my $row = $sth->fetchrow_hashref()) {
     push @pirates, Pirate->new($row);
  }
  $sth->finish();
  foreach my $pirate (@pirates) {
     # Do something with $pirate
  }
</code></pre>

<p>Compared to:</p>
<pre><code>  my $ship = Ship->fetch_by_name('Golden Hind');
  my @pirates = Pirate->search_by_ship($ship);
  foreach my $pirate (@pirates) {
     # Do something with $pirate
  }
</code></pre>

<p>The second example is much more legible, and remains entirely in the application model&#8201;&#8212;&#8201;you don&#8217;t have to think about how the database is set up, or how the tables interact.  You don&#8217;t even need to know SQL.</p>

<p>Additionally, most modern ORMs can shield the business logic from limited changes in the database schema (such as table or column renames).  While this sort of change is usually better hidden at the database logical layer using a view, organizations that have restrictive database change policies may appreciate the added flexibility.</p>

<h3>Why leads love them</h3>

<p>Team leads find ORMs to be very useful for several reasons.  The most obvious is the reduced amount of time spent wiring up persistence layers; instead, developers can stay focused on the business problems.  This enables new capabilities. For example, using an ORM, it&#8217;s much easier to knock out a quick prototype in response to an <abbr title="request for proposals">RFP</abbr>, or explore an alternative design.  Additionally, since the amount of SQL is dramatically reduced, developers need not have SQL skills to be productive.</p>

<h3>Why project managers love them</h3>

<p>Project managers like ORMs for many of the same reasons that team leads do.  Because productivity is increased, bids can be lower, or more features can be delivered for the same schedule.  This may lead to more contracts.  The reduced skillset needs of an ORM-based project can also help solve staffing problems.</p>

<h2>Pitfalls</h2>

<h3>DBA Gripe #1: ORMS that dictate DB design</h3>

<p>Some ORMs dictate database design.  These constraints typically center around keys.  Commonly, primary keys are required to be be single-column, integer, and auto-incrementing.  Foreign keys are often under the same constraints.  This leads to the proliferation of artificial keys.</p>

<p>Naming conventions are another sore point.  Some ORMs require primary keys to be named <code>id</code>, others require them to be named <code>pirate_id</code>, or even <code>pirate</code>.  Tables may be required to be named in the plural, and one ORM&#8217;s notion of pluralization may not match that of another ORM (e.g. <code>staff</code> vs <code>staffs</code> vs <code>staves</code>).</p>

<p>These constraints are annoying but tolerable if the database design is new and the  ORM-based application is the only client.  But in most real-world situations, the ORM-based application is only one consumer of the database.  It may be a pre-existing design with several legacy apps already using the schema.  It is possible to use views and rules to appease the ORM&#8217;s requirements, but that trades the developer&#8217;s productivity gain with a busywork task for the DBA.</p>

<p>Finally, there is the issue of schema ownership.  Both the ORM and the DB know about the database structure.  When a change is needed, where do you make the change?  Some ORMs &#8220;own&#8221; the schema, and will execute DDL to modify the database to match changes in the object model.  Others don&#8217;t own the schema but instead mirror the database schema in a configuration file.  Better ORMs read the database at startup, and configure the object model accordingly (though this has problems of its own, especially related to startup speed).</p>

<h4><code>Class::ReluctantORM</code>&#8201;&#8212;&#8201;stay agnostic</h4>

<p><code>Class::ReluctantORM</code> is firmly in the &#8220;read the schema from the database on startup&#8221; camp.  Some configuration is still needed&#8201;&#8212;&#8201;to set up connection handles, declare classes, and to create relationships that cannot be auto-detected.  Pushing more of the configuration into auto-configurators, while maintaining overridablility, is an active area of development.</p>


<h3>DBA Gripe #2: Opaque, Baroque Query generators</h3>

<p>An ORM, by its very nature, must contain some kind of query generation mechanism.  SQL is an easy language to generate, but a very difficult language to generate well.  There are many dialects.  An ORM may choose to generate standards-compliant (but slow) queries, or it may attempt to optimize for the particular database engine.  As the optimization increases, the query complexity often increases.  Some ORMs choose to punt, generating many simple queries (see DBA gripe #3); others may generate one massive, multi-<code>JOIN</code> query. In either case, at some point you will get the classic complaint that &#8220;the database is slow.&#8221; A DBA wants to be able to tune and replace these queries with hand-crafted versions.  This may or may not be possible.  Even if it is, the query generator is buried in the ORM code itself, in developer-land, and often requires both developer and DBA to invest time to optimize a query.</p>

<h4><code>Class::ReluctantORM</code>&#8201;&#8212;&#8201;query monitors</h4>

<p>It is important that the SQL generation process be as transparent as possible.  To this end, <code>Class::ReluctantORM</code> provides a unique monitoring facility that provides hooks for several key events in the life of a query, including initiation, SQL generation, execution with bound parameters, result fetching, and teardown of the query.</p>

<p>Monitors may execute arbitrary Perl code at any or all of the events.  A monitor may abort a query if needed, or simply log statistics or debugging data.  Monitors may be attached at compile time or runtime, and may be attached to a particular class, or all classes in the model.</p>

<p><code>Class::ReluctantORM</code> ships with six canned monitors, including those for join count, column count, data volume, timing, diagnostic, and one which executes the query under <code>EXPLAIN ANALYZE</code> to predict performance.  The developer is free to add new monitors.</p>


<h3>DBA Gripe #3: hidden expensive actions</h3>

<p>Consider this expression:</p>

<pre><code>  my $jewels = $ship->pirates->first->hideaways->find_by_name('Skull Island')->treasures->first->jewel_count();
</code></pre>

<p>While it won&#8217;t win any awards for formatting, it is fairly clear: get the number of jewels that the first pirate on my ship has stashed away on Skull Island.  It&#8217;s easy to imagine a junior programmer writing this, or a journeyman programmer writing a less contrived example.</p>

<p>Does this code, at first glance, look like it is hammering the database?  How many database queries will this result in?  Depending on the ORM, it may range from 1 to 6. And depending on the database, the 1 query might be better or worse than the 6. In almost every case, the queries involved will pull back more information than they need from the database, so even when the ORM gets the queries right, it&#8217;s still likely to have unnecessary overhead.</p>

<p>Or this common case:</p>

<pre><code>  foreach my $ship (@fleet) {
     foreach my $pirate ($ship->pirates()) {
        foreach my $hideaway ($pirate->hideaways()) {
           foreach my $loot ($hideaway->treasures()) {
              tithe_to_queen($loot);
           }
        }
     }
  }
</code></pre>

<p>That should have scared you.</p>

<h4><code>Class::ReluctantORM</code>&#8201;&#8212;&#8201;mandatory prefetching</h4>

<p>Looking back at this example:</p>

<pre><code>  # 1-6 queries
  my $jewels = $ship->pirates
                    ->first
                    ->hideaways
                    ->find_by_name('Skull Island')
                    ->treasures
                    ->first
                    ->jewel_count();
</code></pre>

<p>This usage is problematic because:</p>
<ul>
  <li>There is no indication that queries are occurring.</li>
  <li>Any performance issues will be detected in production, not in development.</li>
</ul>

<p><code>Class::ReluctantORM</code> does not allow accessors to directly execute queries.  Instead, each accessor looks for a cached value, and returns it if found.  A cache miss throws a <code>FetchRequired</code> exception.  A full-featured prefetching facility is available:</p>

<pre><code>  # One query
  my $ship = Ship->fetch_deep(
    where => 'name' => 'Golden Hind',
    with => {
      pirates => {
        hideaways => {
          treasures => {}
        }
      }
    }
  );
  # Zero queries
  my $jewels = $ship->pirates
                    ->first
                    ->hideaways
                    ->find_by_name('Skull Island')
                    ->treasures
                    ->first
                    ->jewel_count();
</code></pre>

<p>This <code>fetch_deep</code> call executes exactly one <code>SELECT SQL</code> statement, <code>JOIN</code>ing against the related tables.  The results are then processed to create one ship object, which has a collection of pirates, each of which has a collection of hideaways, each of which has a collection of treasures.  This data is now prefetched, and a long, deep chain of method calls like above is now permissible.</p>

<p>Importantly, if a programmer adds a method call (say, <code>$pirate->parrots</code>) that is not prefetched, an exception will be thrown the first time it is executed.  The developer will see this immediately in testing, and add the required clause to the prefetch.  This integrates scalability directly into the development process.  This feature, unique to <code>Class::ReluctantORM</code>, is what provides its name: it is <em>reluctant</em> to do database fetches.</p>


<h3>Software engineering: impedance mismatch rabbit hole</h3>

<p>We have a fundamental problem with ORMs: relations aren&#8217;t classes, and tuples aren&#8217;t objects.  This problem is called the &#8220;impedance mismatch&#8221; between the database model and the application model, and is discussed in detail in several places on the Internet.  Some of the more troubling issues include:</p>

<ul>
  <li>Identity&#8201;&#8212;&#8201;Two objects referring to the same record are distinct, though there is only one record.</li>
  <li>Partial fetches&#8201;&#8212;&#8201;Most OOP languages do not have a notion of an object that is only partially populated, but it is perfectly valid (and desirable) to select only a subset of columns from a table.</li>
  <li>Inheritance&#8201;&#8212;&#8201;There is no clear analogue of inheritance in the database world.  Several <a href="http://blogs.tedneward.com/2006/06/26/The+Vietnam+Of+Computer+Science.aspx"><span>approaches exist</span></a>, but they all have severe drawbacks.</li>
  <li>Caching&#8201;&#8212;&#8201;The object model will be out of date as soon as it leaves the database.  Should results be cached?  How long?</li>
</ul>

<p>ORM developers are faced with a few nasty choices.  Keep it simple, and let the user of the ORM know that the ORM is an unsynchronized, approximate model of the database.  Or gradually add complexity, attempting to patch over the impedance mismatch.  The latter path gets into diminishing returns quickly.</p>

<h4><code>Class::ReluctantORM</code>&#8201;&#8212;&#8201;90% rule</h4>

<p>Some ambitious ORMs try to solve 100% of the object-database problem.  <code>Class::ReluctantORM</code> tries to solve the easiest 90%.  That means it makes the choice that the impedance mismatch is a very hard problem, and the ORM will do its best, but you still need to be aware of its limitations.  This scope limit helps exclude features that would dramatically increase complexity (for example, there is very little support for aggregates).</p>


<h3>Skill atrophy</h3>

<p>One of the big advantages of ORMs is also a major disadvantage: no, or little, use of SQL. We learn skills through exposure and experience, and if we are never exposed to SQL, we&#8217;ll never learn it.  Or, if we know some SQL, then use ORMs <em>exclusively</em>, our skills will likely atrophy.  In almost every application, the ORM will need to be bypassed at some point, and then SQL skills will be sorely missed.</p>

<h4><code>Class::ReluctantORM</code>&#8201;&#8212;&#8201;SQL pass-thru</h4>

<p>For the remaining 10% of problems outside the scope of <code>Class::ReluctantORM</code>, several avenues are provided to bypass the query generator and use SQL directly.  Because it was developed in a shop with a heavy mistrust of ORMs, <code>Class::ReluctantORM</code> is designed to make this bypass as easy as possible.  The documentation mentions how to bypass the ORM early.</p>

<p>Avenues of SQL support, ranging from SQL-centric to object-centric:</p>
<ol>
  <li>Ask the ORM-managed object or class for a database handle, and execute statements on it.  Results are in raw values, not part of the object model.</li>
  <li>As above, but wrap this into a method call on an ORM object or class, thus integrating SQL into the object model.  This is handy for aggregate functions.</li>
  <li>Future releases aim to provide the ability to override specific ORM-generated queries with your own SQL.</li>
  <li>Ask <code>Class::ReluctantORM</code> to intepret the SQL into its own representation, and execute.  If the translation was successful, return values will be ORM-based objects.  This is a <code>Class::ReluctantORM</code>-exclusive feature.</li>
  <li>Write a query directly using <code>Class::ReluctantORM</code>&#8217;s abstract SQL engine.  You&#8217;re no longer writing SQL directly, but performing method calls on <code>FromClause</code> objects, for example.  This is guaranteed to return ORM objects.</li>
  <li>Use ORM methods and pass SQL fragments as arguments (e.g., a <code>WHERE</code> clause for a <code>search()</code> method).</li>
</ol>

<p>In all six cases, the developer must use SQL or SQL concepts.  This may help reduce SQL atrophy.  In many cases, because the SQL can be just &#8220;dropped in,&#8221; you can have a DBA or SQL expert develop SQL for a specific query with no contact with the ORM.</p>


<h3>Framework lock-in</h3>

<p>Like any framework, using one is often irreversible.  It is very difficult to adapt an application to use a different ORM&#8201;&#8212;&#8201;even if the interfaces are similar, often times there will be differences among the query specifications, the DB requirements, or the semantics of operations (e.g. do inserts cascade?).  ORMs are especially susceptible to lock-in because their footprint is so ubiquitous in the application code.  Every time you deal with a relationship between your model objects, you interact with the ORM.</p>

<h4><code>Class::ReluctantORM</code>&#8201;&#8212;&#8201;unsuprising interface</h4>

<p>While there is little that can be done to fight lock-in, <code>Class::ReluctantORM</code> tries to reduce the pain of switching to another ORM, or dropping ORM support altogether, by using common conventions for method names (accessors are named directly after the property, for example).  When a new feature is added, the interfaces of other ORMs are studied, and similar conventions are adopted if possible.</p>


<h3>Sometimes it&#8217;s the wrong tool</h3>

<p>ORMs are not good for everything.  ORMs by their nature are weaker at these tasks:</p>

<ul>
  <li>Reporting and summarization&#8201;&#8212;&#8201;ORMs are good at treating rows as objects.  What happens when a column is an aggregate?  In this case, raw SQL is much more convenient.  Aggregate APIs are often inflexible and complex.</li>
  <li>Anything involving fast startup&#8201;&#8212;&#8201;If the ORM queries the database for its schema at startup, there will be a lag before the ORM is ready.  This isn&#8217;t a problem for long-running processes like web servers, but it can be a burden for command-line scripts.</li>
</ul>

<h4><code>Class::ReluctantORM</code>&#8201;&#8212;&#8201;lots of fish in the sea</h4>

<p>If an ORM isn&#8217;t right for your project, <code>Class::ReluctantORM</code> won&#8217;t help you.  Even if an ORM is a good fit, keep in mind it is a slow-startup ORM.  There are others that are fast-startup, large-configuration ORMs, and even some that can cache their configuration.</p>

<h2>Conclusion</h2>

<p>For all their pitfalls, the tremendous productivity advantages of ORMs will continue to tempt developers to use them.  Like any productivity booster, ORMs seem to draw a lot of hype, and it&#8217;s important to see through the hype to the realities and shortcomings of the technology.  Once those shortcomings have been addressed, however, ORMs can be used conscientiously.</p>

<p><code>Class::ReluctantORM</code> is a new ORM implementation that seeks to make it harder to fall into the traps.  It avoids some &#8220;impedance mismatch&#8221; issues by narrowing its scope to the most common 90% of use cases.  For the more complex situations, numerous SQL bypass avenues are available.  Whether queries are ORM-generated or customized, they all pass through the query monitoring system, providing an early warning system for scalability problems.  Finally, mandatory prefetching can reduce bad coding practices early in the development cycle.</p>]]></content:encoded>
            <pubDate>Wed, 18 Mar 2009 14:49:49 GMT</pubDate>
        </item>
        <item>
            <title>Under the Hood</title>
            <link>http://omniti.com/seeds/under-the-hood</link>
            <guid>http://omniti.com/seeds/under-the-hood</guid>
            <description><![CDATA[My perspective on the evolution of OmniTI is somewhat like that of a mechanic on a team of race car designers. As the company changes and becomes more sophisticated, my job has been and still is to ensure we have all the necessary parts to accommodate ...]]></description>
            <content:encoded><![CDATA[<p>My perspective on the evolution of OmniTI is somewhat like that of a mechanic on a team of race car designers. As the company changes and becomes more sophisticated, my job has been and still is to ensure we have all the necessary parts to accommodate those changes and that they are incorporated into the new design. So while the techies are customizing the machine (the glamorous part of the job), I am busy working under the hood. So what changes have taken place over the years to keep the OmniTI racing machine on the track and way out in front?</p>

<p>In the beginning our headquarters was located in Theo&#8217;s house. It was very nice but small. We built out a secure data room there in order to satisfy the security requirements of one of our clients. It had plexiglass windows, rack space and a 200-pound metal door. We personally did the build out which was an onerous job, to say the least! It seems mind boggling to think that we now manage thousands of machines in datacenters all around the globe. From that office we upgraded to a suite of three executive offices in Calverton, Maryland. While staying there we located unfinished office space in Columbia and designed it to accommodate our particular work requirements. We were growing fast and needed office space for our additional staff as well as for meeting with clients.</p>

<p>We started with two people and within three years grew to a staff of four. Our clients required services and support 24/7 so the days were long, as were the nights. Holidays, weekends and vacations were commonly workdays, and working into the wee hours of the morning after a full workday was the norm. The work scaled with the increase of staff from 2 to 4 people, so the workload remained the same across the board. Then we reached a point where we were in complete overload. At that point the engine needed overhauling and refitting to stay in the race. So we interviewed and hired 4 new staff - 3 developers and 1 system administrator. This was a major redesign.</p>

<p>The addition of new staff meant we had to make some serious changes to the shop. We needed an HR department and benefits package that would be both attractive and competitive with other companies&#8202;&#8212;&#8202;yet another upgrade. We also had to have a more comprehensive employment contract and that meant having a labor lawyer to advise us. This was in addition to the corporate counsel who helped us craft our client contracts. The pit crew was growing!</p>

<p>After the move to the new site in Columbia, we immediately increased the size of the team. Before we knew it we were fifteen strong and still growing. During this time we formally defined our product initiatives Ecelerity, Postal Engine and MultiVIP and then proceeded to trademark them. We also began to do business as a separate entity called Message Systems. Now we had two race cars on the track!</p>

<p>As the staff grew we began to understand how instrumental our culture was to our success and started to truly nurture it. This was and remains a unique work atmosphere that permeates the operations of the entire workforce. What exactly is the OmniTI culture? Ask ten people on staff and you probably will get ten different answers. I, on the other hand, have been part of that culture from the get-go and can tell you that, however it is defined, it is the heart and soul of OmniTI and the fuel that feeds the machine. The culture is derived from work principals that were instilled at OmniTI&#8217;s inception. These include providing quality services to meet clients&#8217; needs as the number one priority; standing behind our work and being accountable for our mistakes; being passionate about our work; and using mindshare and brainstorming as working tools. Our office is designed specifically to enable and encourage this sort of interactive environment. As a result, the OmniTI culture has attracted some of the best and the brightest in the industry.</p>

<p>After the Columbia office came a second larger Columbia office, and an office down under the Manhattan Bridge (DUMBO) in Brooklyn, New York. We currently have a staff of 7 working in that office.</p>

<p>What about the crew you may ask? Over the years we have been fortunate to have a diverse staff representing a collection of ethnic and cultural backgrounds for which we are all richer. And our team has had many sponsors including trade associations, private industry, not-for-profits, political organizations, and government to name a few. As we zoom around the race track we also take time for the occasional pit stop by having pizza every Thursday, spring cookouts (with serious volley ball games), summer picnics and awesome holiday parties!</p>

<p>So as our race cars become more and more sophisticated they continue to require constant attention to maintain all the working parts. Each day brings new adjustments to the engines and frameworks to keep the motors fine-tuned and the goings smooth.</p>

<p>I&#8217;ll take this opportunity to share some helpful hints with the other mechanics out there:</p>

<ol>
<li>Stay organized and, despite this digital age, keep hard copies of everything.</li>
<li>Retain legal counsel that understands your business and earns your trust.</li>
<li>Set deadlines for everything.</li>
<li>If you are going to do something, always take the time to understand how to do it right. If you don&#8217;t have time to execute it right, take the time to document the corners you cut and how that is likely to bite you later.</li>
</ol>

<p>Zoom-zoom!</p>]]></content:encoded>
            <pubDate>Thu, 12 Mar 2009 01:48:03 GMT</pubDate>
        </item>
        <item>
            <title>Stacking the Deck for Publishers</title>
            <link>http://omniti.com/seeds/stacking-the-deck-for-publishers</link>
            <guid>http://omniti.com/seeds/stacking-the-deck-for-publishers</guid>
            <description><![CDATA[

Newspapers and magazines have a unique opportunity with online publishing. They have the best content. They have the most talented writers, editors, and managers. The industry has survived everything the world has thrown at it since newspapers first ...]]></description>
            <content:encoded><![CDATA[<img alt="Daily Planet" src="http://images.omniti.net/omniti.com/i/b/458-daily-planet.jpg"  />

<p class="first">Newspapers and magazines have a unique opportunity with online publishing. They have the best content. They have the most talented writers, editors, and managers. The industry has survived everything the world has thrown at it since newspapers first emerged in the 16th century. But, making the most of the Web requires specialist help. Whether publications have a dedicated digital team, or integrate print and web together, there is a distinct difference in how editorial, design, and production need to operate. On the one hand, the recent <a href="http://www.thestandard.com/news/2009/01/27/gatehouse-claims-victory-even-its-online-reputation-suffers"><span>furor about &#8220;deep linking&#8221;</span></a>&#8201;&#8212;&#8201;where the legality of sites linking to individual pages was finally put to rest&#8201;&#8212;&#8201;showed the fragility of trying to apply standards that are reasonable in print to the Web. On the other hand <em><a href="http://telepgraph.co.uk/"><span>The Telegraph</span></a></em> has shown with its in-house video and audio studios, evolutionary <a href="http://www.flickr.com/photos/lloyd-davis/425838238/"><span>news room</span></a> and <a href="http://advertising.telegraph.co.uk/"><span>integrated advertising solution</span></a>, that a <a href="http://www.journalism.co.uk/5/articles/531141.php"><span>radical content strategy</span></a> can <a href="http://www.journalism.co.uk/2/articles/531631.php"><span>pay</span></a> <a href="http://blogs.journalism.co.uk/editors/2007/10/04/telegraph-wins-top-aop-award-guardian-wins-three-others/"><span>dividends</span></a>. The whole approach for print media on the Web is evolving. It&#8217;s definitely a brave new world, but not everything has changed: content is still king!</p>

<h2>Doom, gloom, flourish!</h2>

<p>As some commentators, fueled by the news of falling readerships and economic woes, prematurely sound the death knell for print, <em><a href="http://newyorker.com/"><span>The New Yorker</span></a></em> published a beautifully researched article by <a href="http://www.history.fas.harvard.edu/people/faculty/lepore.php"><span>Jill Lepore</span></a> (in print and pixel) entitled <em><a href="http://www.newyorker.com/arts/critics/atlarge/2009/01/26/090126crat_atlarge_lepore"><span>Back Issues: The Day The Newspaper Died</span></a></em>. In it, we relive the last time the death of print loomed large in America: November 1st, 1765. On that day the <a href="http://en.wikipedia.org/wiki/Stamp_Act_1765"><span>Stamp Act</span></a> came into force. It required printers to affix a stamp to each of their pages, pay a halfpenny tax on each half sheet of paper, and a two shilling tax on each advertisement. Ostensibly, the tax was imposed in the &#8220;colonies&#8221; by the British Parliament to fund the French and Indian wars, but the backlash was severe. Printers were better placed than most to vent their frustrations. It has been argued that the Stamp Tax was one of the sparks that provoked revolution, bringing the issue of taxation without consent sharply into focus, with the ire and vitriol of printers like Benjamin Edes of <a href="http://www.loc.gov/rr/news/18th/140.html"><span><em>The Boston Gazette</em></span></a>, and Benjamin Franklin thrown in for good measure. They survived that calamity, and some newspapers like <em><a href="http://en.wikipedia.org/wiki/The_Hartford_Courant"><span>The Hartford Courant</span></a></em> are <a href="http://www.courant.com/"><span>still published</span></a> today both in print and pixel.</p>

<p>Jill Lapore&#8217;s article in January coincided neatly with the month the credit crunch came home to roost. It was looming, it was imminent, then suddenly it was here. About the same time we also heard that the Pulitzer Prize-winning <a href="http://www.nytimes.com/2008/10/29/business/media/29paper.html"><span><em>Christian Science Monitor</em> was going online-only</span></a>. The 27-year-old <a href="http://www.nytimes.com/2008/11/20/business/media/20mag.html"><span><em>PC Magazine</em> followed suit soon after</span></a>. Various teen magazines previously had made the switch, including <em><a href="http://www.missbehavemag.com/"><span>Missbehave</span></a></em>, <em><a href="http://www.cosmogirl.com/"><span>Cosmogirl</span></a></em>, <em><a href="http://ellegirl.elle.com/"><span>Ellegirl</span></a></em>, and <em><a href="http://www.teenmag.com/"><span>Teen</span></a></em>. In December and January other publications also went online-only, including <em><a href="http://www.impre.com/hoynyc/home.php"><span>Hoy Nueva York</span></a></em>, <em><a href="http://www.time.com/time/magazine/asia/"><span>AsiaWeek</span></a></em> (now redirecting to <em>Time</em>), and the <em><a href="http://www.kansascitykansan.com/"><span>Kansas City Kansan</span></a></em>.</p>

<p>Far from &#8220;closing&#8221;, as some sources seem to suggest, publications are <em>evolving</em> just as they always have. The shift is perhaps a little faster and more pronounced than previous changes, but only a few titles are swapping print production for online-only. The vast majority of newspapers and magazines are augmenting print with online production. How well the transition occurs is the rub. Many will struggle with a half-hearted approach; others, like <em><a href="http://thelegraph.co.uk/"><span>The Telegraph</span></a></em>, are embracing the change, investing and flourishing. It&#8217;s also worth affirming that a publication doesn&#8217;t have to appear on paper to be worthy, a fact validated by the Pulitzer Prize Board when they announced in December 2008 that they were <a href="http://www.pulitzer.org/new_eligibility_rules"><span>broadening the competition to allow online-only publications</span></a>.</p>

<h2>Content is almost enough</h2>

<p>Print publishers still have the most important advantage: great content. The Web sometimes can seem packed with titillating dross, but people don&#8217;t thirst just for a quick bit of light-hearted refreshment; they also hunger for the substantial. Satisfying both with a content strategy specifically for the Web, putting the right infrastructure in place to support it, and understanding the behavior of the Web&#8217;s audience are the keys to success.</p>
<p><a href="http://advertising.telegraph.co.uk/new%5Ftcuk/"><span><em>The Telegraph</em> is investing in user experience design</span></a> and using it to sell ad space. Get the user experience right and people will become subscribers and readers more readily than they ever have for print. The potential audience is global. The advertising revenue stream is global. The publishers who grasp the nuances of online publishing first will have the competitive edge to evolve into the first truly international news and feature sites. Those who invested early are already ahead: <em>The Guardian</em> <a href="http://www.journalism.co.uk/2/articles/530672.php"><span>launched in America in 2007</span></a>. The paper predicts its <a href="http://www.journalism.co.uk/2/articles/531498.php"><span>podcasts will be profitable by April this year</span></a>. Today, it has an almost equal split of readers, with a third from the UK, a third from the U.S., and a third from the rest of the world.</p>

<p>International audiences have very different requirements from content and advertising. Designing production and technical infrastructure that can deliver location-specific material is the future of truly international publications. The same digital content delivery channels (web site, RSS feeds, email, podcasts, and video) are still relevant. The material and style will respond to the context and the audience receiving it. Hot topics emerge much faster in the new era, and publications need the right technology in place to know what they are and be able to react. Some publishers are already providing location-specific content, and being very sophisticated in how they understand and serve their global audience. Advertisers are taking notice.</p>

<h2>&#8220;Web 2.0&#8221; and all that jazz</h2>

<p>Print publications provide an enacted narrative. Readers start at the cover, then flip, and read. They observe the story told by the publication in a pre-determined sequence, with only the words and images from the staff to tell the tale. In contrast, the narrative of the Web can be both enacted and emergent. The narrative emerges from both the published material of the professionals and the audience contributions. These contributions can be within the web site, on personal blogs aggregated by services like <a href="http://technorati.com/"><span>Technorati</span></a>, or on social networks like <a href="http://twitter.com/"><span>Twitter</span></a>. When <a href="http://www.oreillynet.com/pub/a/oreilly/tim/news/2005/09/30/what-is-web-20.html"><span>Tim O&#8217;Reilly coined the term &#8220;Web 2.0&#8221;</span></a>, user-generated content was at the core of his thoughts. Whatever we call it&#8201;&#8212;&#8201;user-generated content, Web 2.0, emergent narratives, or reader contributions&#8201;&#8212;&#8201;it requires an approach that is more sophisticated than just opening up a site to comments. There are many ways in which users can and should be able to reuse content, add their own, and participate. There are also many ways in which publciations can pull in content from around the Web to help them tell the tale. In fact, editing such content is a valuable service. You only have to look at the success of sites like <em><a href="http://www.newsvine.com/"><span>Newsvine</span></a></em> and <a href="http://ffffound.com/"><span>Ffffound</span></a> to see how citizen journalism can contribute to the industry. It&#8217;s not like the idea is new.  <cite><a href="http://en.wikipedia.org/wiki/James_Franklin_(printer)"><span>James Franklin</span></a></cite>, editor of <em><a href="http://en.wikipedia.org/wiki/The_New-England_Courant"><span>The New England Courant</span></a></em>, had this to say about his editorial policy just before American independence:</p>

<blockquote><p>I hereby invite all Men, who have leisure, Inclination and Ability, to speak their Minds with Freedom, Sense and Moderation, and their Pieces shall be welcome to a Place in <span class="end-quote">my Paper.</span></p></blockquote>

<h2>Connecting the dots</h2>

<p>The overall objective for newspapers or magazines remains the same as it&#8217;s ever been: a large audience to attract and retain high-paying advertisers. Meeting that objective online means combining the content with the best user experience. User experience is not a term found in the print world. Paper is a static medium. It&#8217;s a passive experience. Letters to the editor have been the traditional interaction of the audience with print publications. The Web is neither static or passive. It&#8217;s dynamic, with content being syndicated, read, and shared in new ways. It&#8217;s interactive, with content being augmented, reused and commented on as it&#8217;s published. Reader behavior has changed. Expectations have changed. People still want to passively read, but they also want to interact, republish, share, and comment&#8201;&#8212;&#8201;and they want these on demand. They want a different experience. It requires a different kind of strategy that understands the audience expectations and technology, and describes exactly how publications can use both to be successful.</p>

<h3>User experience design</h3>

<p>The first step is to have a clear set of business objectives in a reasonable timescale. Publications have to invest. How they invest is the big question. All the solutions are already available to bridge the gap between business objectives and audience behavior. Understanding audience behavior is the first step on the path to profitability. <em>User experience design</em> delivers just that. Rather than asking questions, it observes behavior. From that we can build a clear picture of how the audience experience can be optimized. That may involve more than just tweaking the design of the interface. It can encompass elements like content strategy: what is published, how it&#8217;s delivered to people, and how it&#8217;s written to achieve the business objectives.</p>

<h3>Web application development</h3>

<p>User experience design is nothing without the right applications to support publishing operations and deliver the content through the various channels. They should actively <em>help</em> journalists, editors, and managers do their job. Applications should make interaction for readers quick, easy and fun&#8201;&#8212;&#8201;an adventure. The software has to be secure. It has to scale well as (hopefully) increasing numbers of visitors find the great content, and keep coming back. The experience has to be fast, safe, and helpful. Anything less is a disservice to the reader in much the same way that badly printed text or art would be in print.</p>

<h3>Internet architecture and infrastructure</h3>

<p>Applications need machines to run on. That means intelligent technical architectures and infrastructure. If the audience is international then the infrastructure needs to be. That means multiple locations. It means twenty-four hour monitoring, often using <a href="https://labs.omniti.com/trac/reconnoiter"><span>tools written</span></a> <a href="https://labs.omniti.com/trac/zetaback"><span>specifically for the task</span></a>. It means rapid reaction times to fluctuations in traffic.</p>

<p>After our recent <a href="http://omniti.com/remembers/2009/two-webby-awards-for-national-geographic"><span>work with the award-winning National Geographic</span></a>, their readership went up by 500%. The applications and infrastructure behind the site also had to handle massive traffic spikes as stories spread virally around the Web. Having infrastructure that can perform when the spikes arrive is almost an art. It can get expensive, and as we all become more <a href="http://omniti.com/seeds/using-less-is-green"><span>conscious about environmental impact</span></a>, performance is the answer. <a href="http://friendster.com/"><span>Friendster</span></a>&#8217;s situation gave us a chance to show <a href="http://omniti.com/helps/friendster"><span>how to scale a site properly</span></a>. They were about to launch in China and predicted they would need twice as many servers to do so. Page loading times were already slow at 9 seconds. With a little help from us, they launched in China with the same number of servers they already had. They doubled the number of users to 60 million, but pages loaded more than twice as fast at 3.5 seconds.</p>

<h2>Stacking the deck</h2>

<p>As good as the content might be, or the perception of the brand in the real world, when <a href="http://www.upi.com/Odd_News/2008/12/05/Obamas_Zune_story_crashes_news_site/UPI-96001228524763/"><span>infrastructure or applications fail</span></a>, or are <a href="http://news.cnet.com/8301-1009_3-10041743-83.html"><span>hacked</span></a> or <a href="http://www.wired.com/politics/law/news/2003/03/58200"><span>threatened</span></a>, the brand can be irreparably harmed in the eyes of readers. How this happens is often straightforward: using different vendors for design, application development, and infrastructure turns the gaps between them into critical fault lines. Under the pressure of success or failure, the fault lines are amplified. For example, vendors have to communicate with each other, often having very different processes, and contractual obligations. Trying to fix a problem, innovate, or improve performance becomes expensive because each vendor only understands their specialist area. No single vendor can see the whole story to find the most efficient solution. While all this is going on, the experience fails and the audience falls away in frustration.</p>

<p>The business case for separating different production areas no longer exists. Business objectives are best met with a holistic approach to design, development, and infrastructure. They all fundamentally affect the user experience, which is the single most important factor affecting visitor numbers on the Web. Combining great content with holistic technology will save money and encourage innovation. Publishers who do it early and do it right will give their readers the best possible experience, and stack the deck in their favor for years to come.</p>]]></content:encoded>
            <pubDate>Thu, 12 Feb 2009 15:00:00 GMT</pubDate>
        </item>
        <item>
            <title>Custom Trending and the Benefits of Source Code Availability</title>
            <link>http://omniti.com/seeds/custom-trending-and-the-benefits-of-source-code-availability</link>
            <guid>http://omniti.com/seeds/custom-trending-and-the-benefits-of-source-code-availability</guid>
            <description><![CDATA[One of the self evident truths about system administration is that you need to
know what is going on with your systems. Monitoring - knowing that your
systems are working as expected and, more importantly, knowing when they
aren&#8217;t - is the thing ...]]></description>
            <content:encoded><![CDATA[<p>One of the self evident truths about system administration is that you need to
know what is going on with your systems. Monitoring - knowing that your
systems are working as expected and, more importantly, knowing when they
aren&#8217;t - is the thing most people consider first when they realize that fact.
Equally important however, is trending - knowing what your systems were doing
in the past. Trending allows you to determine if the current state of your
system is normal, or if something has changed that could signify a problem.
Trending also allows you to predict when your current systems will become
unable to handle what is expected of them. When the amount of traffic to your
website is about to outgrow your systems, you can see this and add more
capacity, either by adding servers or replacing them with something more
powerful, before the capacity problems start to occur.</p>

<p>At OmniTI, we use a number of systems for trending, including
<a href="http://www.cacti.net/"><span>Cacti</span></a> and our very own
<a href="https://labs.omniti.com/labs/reconnoiter"><span>Reconnoiter</span></a>. Each system comes
with a large number of monitors built in, allowing you to trend anything from
network traffic, to system load to disk space. Sometimes however, you need
metrics for which there is nothing currently available. This is where these
systems&#8217; extensibility comes into play.</p>

<p>The following example shows a situation where we needed information that our
current monitoring/trending systems were not able to provide,
and we needed to extend them with a custom trending solution:</p>

<p>One of our clients had a website that had become very popular, and was
suffering performance issues as a result. We suspected that at least part of
the system was I/O bound, and so we wanted to gather metrics on the I/O
performance of the system over time. The systems in question were running
Solaris 10 with the data on
<a href="http://www.sun.com/bigadmin/features/articles/zfs_overview.jsp"><span>ZFS</span></a>. The
normal <code>iostat</code> command, for which a number of monitors exist, does not give
true values for reads and writes performed by ZFS. Iostat can only see
read/write requests from filesystems. True I/O statistics can be obtained on
the command line by running the <code>zpool iostat</code> command. This works in a
similar way, producing output similar to the following:</p>

<pre><samp># zpool iostat rpool 10 5
               capacity     operations    bandwidth
pool         used  avail   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
rpool       16.6G  57.7G      1      0  43.5K  2.06K
rpool       16.6G  57.7G      0      0      0      0
rpool       16.6G  57.7G      0      0      0      0
rpool       16.6G  57.7G      0      0      0      0
rpool       16.6G  57.7G      0      0      0      0
</samp></pre>

<p>The statement above that our monitoring systems were not able to provide the
information that we needed isn&#8217;t quite true. Elsewhere, we had a monitor that
obtained zpool I/O statistics using a long running <code>zpool iostat</code> process,
taking the values out and entering them into a database, with a custom script
that fetched the values from the database and entered them into cacti. The
system in question was a database server, so this method, while clunky, worked
well enough for its purpose. For monitoring the web servers however, using the
same method just wasn&#8217;t practical and we needed something better. We needed
something that didn&#8217;t require running a long running process and running a
database server on the machine just for trending information.</p>

<p>The obvious choice here was to use <abbr title="Simple Network Management Protocol">SNMP</abbr>. Cacti (as well as pretty much every
monitoring/trending package) has built-in support for obtaining data over
SNMP, and net-snmp (the snmp agent in use on the server) has various ways of
extending functionality to get custom metrics.</p>

<p>Having chosen SNMP, the next decision was how to get the data we needed and
present it over SNMP. The seemingly obvious choice would be to run <code>zpool
iostat</code> and parse the output as was done previously, presenting those values
over SNMP. However, that either requires the long running <code>zpool iostat</code>
process, or running it once for a few seconds at a time to get a snapshot of
the I/O over that period, which will lead to inaccurate results (it won&#8217;t tell
us anything about the performance of the system during the time between
checks). One of the things that Cacti (or rather rrdtool, which cacti makes
use of) is very good at is taking raw data and generating meaningful
statistics from it. If we could somehow get raw I/O values rather than
already aggregated values such as 'n KB over the past m seconds read' and pass
those to cacti, then cacti could do the work and we would get accurate values.</p>

<p>Enter open source. The source code to OpenSolaris is available, including the
<a href="http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/cmd/zpool/zpool_main.c#1863"><span>source code to the zpool command</span></a>,
which it possible to see how the <code>zpool iostat</code> command itself worked.  Once
you trace the various calls made by that function, it turns out that
underneath, the <code>zpool iostat</code> command uses libzfs to fetch the exact raw
values we are looking for. It was then a relatively simple matter to take that
code and print out the raw values:</p>

<pre><code>#include &#60;stdio.h&#62;
#include &#60;sys/fs/zfs.h&#62;
#include &#60;libzfs.h&#62;

/*
 * Sample code to demonstrate printing of raw zpool io stats.
 * Compile with: cc -lzfs -lnvpair zpoolio.c -o zpoolio
 */

int print_stats(zpool_handle_t *zhp, void *data) {
    uint_t c;
    boolean_t missing;

    nvlist_t *nv, *config;
    vdev_stat_t *vs;

    if (zpool_refresh_stats(zhp, &#38;missing) != 0)
        return (1);

    config = zpool_get_config(zhp, NULL);

    if (nvlist_lookup_nvlist(config, 
        ZPOOL_CONFIG_VDEV_TREE, &#38;nv) != 0) {
        return 2;
    }

    if (nvlist_lookup_uint64_array(nv, 
        ZPOOL_CONFIG_STATS, (uint64_t **)&#38;vs, &#38;c) != 0) {
        return 3;
    }

    printf(
        "pool:%s read_ops:%llu write_ops:%llu " \
            "read_bps:%llu write_bps:%llu\n",
        zpool_get_name(zhp),
        vs->vs_ops[ZIO_TYPE_READ],
        vs->vs_ops[ZIO_TYPE_WRITE],
        vs->vs_bytes[ZIO_TYPE_READ],
        vs->vs_bytes[ZIO_TYPE_WRITE]
    );
    return 0;
}

int main() {
    libzfs_handle_t *g_zfs;
    g_zfs = libzfs_init();
    return(zpool_iter(g_zfs, print_stats, NULL));
}
</code></pre>

<p>Once this was done, the next step was to get the values exported over SNMP so
that cacti could view them. Net-SNMP has a <code>pass</code> directive that allows you to
delegate an OID to an external program, and have that program print out the
results it needs. These values are then exported over SNMP, available for any
of the above monitoring tools to make use of.</p>

<p>In Cacti, it was then just a matter of creating an appropriate SNMP Data
Query, adding some Graph Templates, and wait for the pretty pictures to come
flowing in.</p>

<p>This example shows how you might approach developing code to obtain custom
metrics, and shows the benefits of having the source code available so that
you can learn from tools that do most, but not all of what you are trying to
achieve. Sometimes, what you need just isn&#8217;t available and you just have to build a solution from the available pieces.</p>

]]></content:encoded>
            <pubDate>Tue, 10 Feb 2009 20:53:17 GMT</pubDate>
        </item>
        <item>
            <title>Increasing the Aperture on Security</title>
            <link>http://omniti.com/seeds/increasing-the-aperture-on-security</link>
            <guid>http://omniti.com/seeds/increasing-the-aperture-on-security</guid>
            <description><![CDATA[Security is good. Security is necessary. Security is someone else&#8217;s
concern. Security is for our CISSP engineers to focus on in their dimly lit
rooms with their lava lamps and empty Red Bulls stacked to the
heavens. It&#8217;s an ugly business, a...]]></description>
            <content:encoded><![CDATA[<p>Security is good. Security is necessary. Security is someone else&#8217;s
concern. Security is for our <abbr
title="Certified Information Systems Security
Professional">CISSP</abbr> engineers to focus on in their dimly lit
rooms with their lava lamps and empty Red Bulls stacked to the
heavens. It&#8217;s an ugly business, and making sense of it requires
professionals with years of experience and dog-eared certificates. It
has become an industry unto itself and frightens politicians to adopt
far-reaching policies that stoke the smoldering furnace of paranoiac
innovation. We attend Hacker conferences and Security summits to
behold the latest zero-day vulnerability and poke fingers at the
developers who fail to create secure software.</p>

<p>We are Systems Administrators. We are Web Developers. We are
Database Engineers and Storage Architects and Pointy-Haired Bosses
with hard copies of <a href="http://www.schneier.com/blog/"><span><span>Schneier&#8217;s
blog</span></span></a> littering our desks. We&#8217;ve been told before that Security is
our mandate, but to what end? We have heeded the vendor patch
announcements and updated our servers. We comply
with <a href="https://www.pcisecuritystandards.org/"><span>PCI</span></a>
requirements and pass the Nessus scans. We&#8217;ve studied the
latest <a href="http://www.owasp.org/index.php/Category:OWASP_Top_Ten_Project"><span>OWASP</span></a>
Top 10 list and applied its principles to combat the cross-site
scripting attacks and SQL injections. Haven&#8217;t we? Sure we have! Or
have we? Have I, as a Systems Administrator, considered the
implications of the AJAX interface written by the dev team? Has the
<abbr title="Database Administrator">DBA</abbr> considered the impact
of an exploit in the <code>chroot</code>ed webserver?  Have our PHP developers been
in touch with the <abbr title="Storage Area Network">SAN</abbr>
administrator to ensure that he has the capacity to withstand
unlimited file uploads, and what effect that might have on our
encrypted volume?</p>

<p>A secure infrastructure is not a zero-sum game. While applications
on the Internet have become more complex and availability has
increased, so have the attack vectors and their impact on the rest of
the application stack. We&#8217;ve long recognized that an <abbr
title="Open Systems Interconnection">OSI</abbr>-centric approach to
security is a losing proposition. Firewalls are no longer considered a
panacea for network attacks. Modern intrusions exercise multiple
layers of the defense perimeter. Engineering secure applications
becomes akin to a round
of <a href="http://www.hasbro.com/jenga/"><span>Jenga</span></a>; how many pieces
can we lose before the structure collapses? A cohesive approach to Web
Application Security mandates a holistic approach across the entire
engineering organization. But how many of us think beyond the edges of
the envelope that defines our professional skills and aptitude?</p>

<p>Most of us gain our skills through a function-oriented approach. We
learn repeatable steps that culminate in the desired result. Often
this includes a period of trial-and-error where we assimilate aspects
of the misstep and adapt to avoid the failure condition. This is only
natural and is reinforced by our trainers or educators in order to
attain the prescribed goals. However, these tactics can also be used
to seek out the stress points within a system. When you increase your
knowledge of the entire application stack, you reveal "opportunities
for efficiency" by understanding the relationship of each component to
the whole.</p>

<p>Unfortunately, the churn of modern software development fosters a
"feature-first" mentality. We&#8217;re all familiar with the
process. Marketing or Project Management determines the feature set
that will drive purchases and upgrades for the client base. Deadlines
are aligned with revenue cycles rather than product maturity. In the
end, Developers feel the squeeze from Project Managers focused on
their calendars and from Customers, frustrated by another release as
unwitting QA test subjects. Engineers are forced into specialization,
becoming a cog with a narrowed focus. Our aperture begins to close,
decreasing our exposure to the application stack and impeding
interoperability with other project teams and departments. Although
we&#8217;ve entered the Information Age, popular software engineering
practices are still rooted in the assembly line mentality.</p>

<img alt="security across the stack" src="http://images.omniti.net/omniti.com/i/b/split-arch-v-3.jpg" style="float: right; width: 230px; height: 288px; margin: 0 0 1em 1em; border: 0;" />

<p>The birth of secure software requires more than a commitment to
correct code and elegant program design. A sort of Renaissance man is
needed, a polymath who familiarizes himself (or herself) with the
belts and pulleys of neighboring components. Someone who has a passion
for the whole system. Orthogonal studies should become a desired
trait, not a distraction. Database Administrators, Network Engineers,
Java Developers and Systems Administrators working in harmony.</p>

<p>Hackers, in the acceptable sense, are a strange breed. By
definition we enjoy the art of deconstruction. We want to know what
makes things tick. Perhaps it&#8217;s something in our biological blueprint
that drives our thirst for knowledge. We have an insatiable curiosity
for understanding the misunderstood or unknown. An expensive hobby,
for certain. Whether that price comes in the form of relationships,
money or spare parts, it matters not. The knowledge of how that system
works is compelling enough to hold our attention for hours and days
and weeks and years.</p>

<p>Or maybe we just like breaking stuff.</p>

<p>After an attack, it&#8217;s only natural to wonder what motivated the
attacker to focus on us. We comb through the evidence looking for the
exposed stress points. But to focus on the bugs is to ignore the
problem and merely serves to reinforce the broken processes. The same
vulnerability will crawl out of a different hole next time. It will
wait for the next hacker. And they will come. The hacker has already
created new tools to make it easier next time. The hacker is
generous. He likes to share his toys with his hacker friends.</p>

<p>Engineering teams with a foundation in "whole system" design stand
a better chance of resisting and recovering from attacks. By studying
different layers of the application stack they gain an understanding
of the operational complexities and attack vectors. They can predict
vulnerabilities in the design and planning phases. They can isolate
exploits faster and pinpoint failures in unfamiliar regions. New code
becomes inoculated by the changes in philosophy. Junior programmers
and administrators pass these principles on to their peers. A new
Renaissance begins.</p>
]]></content:encoded>
            <pubDate>Wed, 04 Feb 2009 21:00:00 GMT</pubDate>
        </item>
        <item>
            <title>Embracing Failure to Rise Above Enterprise-Class Thinking</title>
            <link>http://omniti.com/seeds/embracing-failure-to-rise-above-enterprise-class-thinking</link>
            <guid>http://omniti.com/seeds/embracing-failure-to-rise-above-enterprise-class-thinking</guid>
            <description><![CDATA[Failures in technical
systems are inevitable.  Drives die, network interfaces wink out,
backhoes take out cross-country backbones, data rooms flood. The effects of such failures range from minor inconveniences to
crippling outages, but thoughtful plann...]]></description>
            <content:encoded><![CDATA[<p>Failures in technical
systems are inevitable.  Drives die, network interfaces wink out,
backhoes take out cross-country backbones, <a href="http://www.youtube.com/watch?v=t0gBReKskXQ"><span>data rooms flood</span></a>. The effects of such failures range from minor inconveniences to
crippling outages, but thoughtful planning can greatly increase the
possibility that the next failure will be the former instead of the
latter.  The fact is that 100% uptime for 100% of users is an
unrealistic goal.  Creating an information technology infrastructure that expects
failures and minimizes user exposure to those failures is critical to
preserving continuity of service to the majority of users.  This is the point of transcendence into carrier-class thinking.</p>

<p>Military planners always
factor in casualties when deciding on a plan of action.  The failure
of an individual component (a soldier, a tank, an airplane) is
expected, but the overall goal will still be achieved.  Many
enterprises do excellent risk management in their business operations
but fail to apply those same principles to their IT infrastructure. 
The mantra of smart investing is diversification; likewise, in the
insurance industry, the goal is to spread the company&#8217;s risk among a
wide population, only a few of whom will actually make a claim in a
given year.  Yet when it comes to IT planning, all the eggs go into
one large (often expensive) basket.  No amount of money can ensure
that a single point of failure will never fail.  That money would be
better spent on engineering around failures, to design systems that
fail gracefully, or that at least fail only partially, limiting the
damage to some subset of users.  Put another way, plan for the
failure of the most critical piece of infrastructure and engineer
service continuity despite that failure.</p>

<p>I like stories.  As a
systems administrator for more than 10 years, I have my fair share of
them, both good and bad, and the really memorable ones have valuable
lessons to teach us about how to construct systems that allow most
users to see little or no interruption to their service.</p>

<p>In my <abbr>ISP</abbr> (Internet Service Provider) days, the company
I worked for had one physical server hosting email.  It was a
relatively large, expensive UNIX server, but it got the job done and
had impressive reliability compared to <abbr title="Personal Computer">PC</abbr> hardware of the day.  As the ISP business expanded, the demand for email grew beyond what the
server could handle, and when the inevitable outage occurred, it
affected every single mail user in the system.  The solution was, of
course, to get more servers.  It was not cost-effective to grow with
more big UNIX servers, due to a number of factors such as rack space
and power, not to mention the capital investment.  We needed more
(and smaller) servers to store the mail, to both absorb our growing
capacity of users and to reduce the impact of an individual server
going down.  The system we came up with decoupled mail routing from
mail delivery and mailbox access.  This enabled us to deploy
lightweight <abbr>MX</abbr> (Mail Exchanger) servers that didn&#8217;t need much in the way of local
storage, as all incoming mail was delivered to some other host.  The
MX servers were behind a load balancer, so we could scale them
horizontally as required to keep up with demand.  The mail storage
hosts had more local storage, utilizing <abbr>RAID</abbr> (Redundant Array of Inexpensive Disks) to survive disk
failures, and had standby hosts to which all mailbox data was
replicated in case of host failure.  Gluing it all together was a set
of proxy hosts backed by <abbr>LDAP</abbr> (Lightweight Directory Access Protocol) to locate users&#8217; mail
storage host and handle mailbox access.  The directory service was
also used by the MX hosts for inbound delivery, to locate the
appropriate storage host.  Users connected to the proxies instead of
directly to their mail storage host.  We could do quick maintenance
or handle short outages without most customers ever realizing there
was a problem.  For example, <abbr>POP</abbr> (Post Office Protocol) clients checking for new mail would
be given a "no new mail" response when the backing store was
unavailable.    This architecture was much more resilient to
failures, and in the event of a failure (or even a maintenance
event), the existence of the proxy between users and the actual
server allowed us to reduce the users&#8217; exposure to the problem.</p>

<p>The next illustrative story
comes from a client who operates a large email infrastructure
supporting millions of users.  Their mail storage sits on a <abbr>SAN</abbr> (Storage Area Network), implemented on
three expensive, vertically-integrated systems from a major vendor and interconnected on a costly Fibre Channel switching fabric (which is, as ZFS author <a href="http://blogs.sun.com/bonwick/entry/zfs_end_to_end_data"><span>Jeff Bonwick</span></a> puts it, "a network designed by disk firmware writers. God help you.")  The result is a very high ratio of spindles to control
units, so when there is a problem with one unit, that problem affects
one-third of their customers, which could run well over several
million users.  That&#8217;s a lot of eggs in one basket.  The price of the
basket does not guarantee an absence of problems-- the redundant
control heads <i>must</i> run the same firmware version, so a
firmware bug will wipe out both of them.  The cost of the storage
platform is sufficiently high that scaling horizontally becomes
prohibitively expensive, and doesn&#8217;t go very far to address the
spindle-to-control-unit ratio.  What they need is a fundamental shift
in storage planning.  More and cheaper baskets
to hold fewer eggs each, so fewer eggs are lost when a basket fails. 
In this case the baskets are commodity servers and direct-attach
storage running free software and exporting block devices over <abbr>iSCSI</abbr> (Internet Small Computer Systems Interface)
to the servers handling client connections.  For the cost of one of
the vendor-supplied storage systems, we get nine new storage nodes,
each with redundant control heads and data storage.  These nine nodes
provide the same amount of usable space as the three old units, and
have capacity to spare.  The cost savings enables more nodes to be
purchased, and facilitates horizontal scaling to meet demand.  Additional cost savings are realized on the interconnects, which can be standard 10Gb Ethernet.  The
larger number of nodes means a three-fold decrease in the number of
users exposed to a node failure, and future scaling only decreases
this number further.</p>

<p>Turning away from email, my
final story covers data warehousing for a large, web-focused marketing
company.  Their <abbr>OLTP</abbr> (Online Transaction Processing) database that backs the
website runs on <a href="http://www.oracle.com/"><span>Oracle</span></a>.  They need a separate place to run
intensive data-mining queries and transformations that are not appropriate for the
OLTP system, for which the typical solution is an <a href="http://en.wikipedia.org/wiki/Operational_data_store"><span>Operational Data Store</span></a> (<abbr>ODS</abbr>), a type of data warehouse.  Initially this was another Oracle instance on a single server.  When the size
of the dataset grew beyond the capacity of the server, a decision
had to be made.  A server with enough memory and CPU power to
handle the load would have exceeded the Oracle product license, but
purchasing additional licenses was cost-prohibitive.  The solution was
two-fold: convert the ODS to the open source <a href="http://www.postgresql.org/"><span>PostgreSQL</span></a>
server, and put it on two systems instead of one.  The conversion to
PostgreSQL is outside the scope of this article, but the decision to use
two servers provides several distinct advantages.  First, they are
not set up as master/slave, which keeps the setup simple.  They both replicate from Oracle in
parallel, having no awareness of one another.  This works fine since
the data-mining queries are essentially read-only (some jobs do data transformations, but they operate on temporary tables.)  Second, both systems are fast enough to handle
the entire operational load, so if one system is down, all its jobs can be
shifted to the other with no degradation of service to users.  Third,
upgrades to PostgreSQL can be tested with live data without disrupting
service, as jobs can again be shifted away from the instance being
upgraded.  Under normal circumstances, both servers are used for
production work, yielding the best return on investment.</p>

<p>These stories illustrate
the advantages of expecting failure and engineering around it to
create robust internet architectures.  Failure is inevitable, but
dire consequences need not be.</p>
]]></content:encoded>
            <pubDate>Tue, 27 Jan 2009 15:04:14 GMT</pubDate>
        </item>
        <item>
            <title>The Irony of Sun Database Technology</title>
            <link>http://omniti.com/seeds/the-irony-of-sun-database-technology</link>
            <guid>http://omniti.com/seeds/the-irony-of-sun-database-technology</guid>
            <description><![CDATA[It&#8217;s been just over a year since Sun announced it had agreed to purchase MySQL, the ever popular open source database technology. At the time most people saw the move as a way for Sun to make it&#8217;s way into the internet space, where MySQL ha...]]></description>
            <content:encoded><![CDATA[<p>It&#8217;s been just over a year since <a href="http://http//www.sfgate.com/cgi-bin/article.cgi?f=/c/a/2008/01/17/BU77UGDVT.DTL&#38;type=tech"><span>Sun announced it had agreed to purchase MySQL</span></a>, the ever popular open source database technology. At the time most people saw the move as a way for <a href="http://sun.com/"><span>Sun</span></a> to make it&#8217;s way into the internet space, where <a href="http://mysql.com/"><span>MySQL</span></a> has made a lot of in-roads. There was also a thought that Sun might be able to help MySQL overcome some of the technical hurdles it faced when moving into enterprise level usage.</p>

<p>So after one year, what has this marriage produced? Well, they finally pushed out the 5.1 release which had been stuck in development for a couple years, although it didn&#8217;t include any major changes from what was on the road-map before the Sun purchase. They also <a href="http://www.techcrunchit.com/2008/07/23/new-mysql-fork-turns-back-the-clock/"><span>pushed out a fork of the code base</span></a>, which took the interesting step of removing features, rather than adding the enterprise technology many people were looking for, leaving Sun a bill for $1 Billion dollars but without an "Enterprise" database to call it&#8217;s own. The irony of all of this is that, even before the MySQL purchase, Sun already had a product containing technologies similar to today&#8217;s leading commercial database, it&#8217;s just that the technology lives in a file system, specifically <abbr title="Zettabyte Filesystem">ZFS</abbr> (Zettabyte Filesystem).</p>

<p>One of the basic tenets of a database system is that you can guarantee that data is safe on disk, and generally that any database will give you a chance to throw away changes if you need to. In the database world, you know this as <code>COMMIT</code> and <code>ROLLBACK</code>, common operations to most people, although missing from the <a href="http://en.wikipedia.org/wiki/MyISAM"><span>MYISAM</span></a> technology that Sun purchased from MySQL. In the ZFS world, while not implemented the same way as in a database, these ideas are embodied in the commands <code>zfs snapshot</code> and <code>zfs rollback</code>. Both of the commands work with active data partitions, and work so well that you can use them as protection in large batch command style operations against MYISAM; simply <code>zfs snapshot</code> your system before hand, run your large MYISAM command, and then <code>zfs rollback</code> afterwards if you find you need to go back. </p>

<p>Of course what good is a system, database or any other, if you cannot back it up? The back-up process for MySQL is straightforward, although it&#8217;s use of <code>LOCK TABLES</code> makes it a second-rate solution at best. Consider with any sufficiently large system, <code>LOCK TABLES</code> will keep you from providing five nines uptime almost by definition. ZFS on the other hand gives you the ability to make backups with ease. Once you have a snapshot of your system, the ability to clone, promote, or send a snapshot gives you quite a bit of flexibility for backing up your system, and it can all be done on-line.</p>

<p>But it gets even better really. One of the things that many databases deal with is caching data files from the file system, using some algorithm to determine what should be kept in memory. In MySQL, the database only caches index files (data files themselves are left to the OS to handle), and it does so using a simple <abbr>LRU</abbr> (least recently used) cache; a caching mechanism where the least recently used data is purged whenever new data requests are made. Again, ZFS contains something more sophisticated. ZFS uses something known as an <abbr>ARC</abbr> (adaptive replacement cache), which improves upon the LRU idea by keeping track of not just how recently something is used, but also how frequently it is used. Again the nature of work being done makes for different specifics in implementation, but other database have looked at and implemented ARC systems and seen significant improvements over the LRU method.</p>

<p>And still there are other examples, take the <a href="http://blogs.sun.com/perrin/entry/the_lumberjack"><span>ZFS intent log</span></a>. The ZFS intent log is used by ZFS to gather systems calls in memory and log them, both for purposes of performance; system calls can be aggregated together before execution; and crash recovery; in the event of a crash, ZFS can examine the log and replay any system calls that did not finish execution. Of course those familiar with databases will recognize this approach, as it is commonly implemented as a transaction log within database systems, for much the same reasons; commits to the database can be aggregated for performance, and in the event of a system crash, the commit log can be replayed to ensure all committed transaction made it to disk. Unfortunately, MYISAM, the storage engine owned by Sun, does not get these benefits.</p>

<p>Now, we must say that MySQL has been around for some time, so it&#8217;s users have gone through the trouble of finding workarounds to the lack of functionality we&#8217;ve been seeing in ZFS. Luckily Oracle provides a storage engine for MySQL, known as <a href="http://www.innodb.com/"><span>InnoDB</span></a>, which implements much of the features discussed above. Also MySQL has simplified replication support built in, which allows for users to set up multiple copies of the database without significant effort. In fact, these techniques are encouraged, as you can use the slave database system for taking backups or for crash recovery in case of loss of the primary node. What we think is often overlooked is that here, the database, which should be a model of data integrity and robustness, gives you workarounds and tools like <a href="http://dev.mysql.com/doc/refman/5.0/en/repair.html"><span><code>CHECK</code> and <code>REPAIR</code></span></a>, while the filesystem, what you typically expect your database to protect you from, is so carefully designed in ZFS to ensure data integrity, that <code>CHECK</code>/<code>REPAIR</code> are unnecessary.</p>

<p>Unfortunately the cynic in us has to wonder if we will ever see some of the more sophisticated ideas from ZFS make their way into MySQL. After all, since the current workarounds tend to require running multiple instances, and Sun is in the business of selling hardware (either multiple servers, or servers large enough to house multiple virtual servers, take your pick), keeping things status quo creates a nice relationship between these two divisions of the company. Given that, maybe there is no irony at after all.</p>

]]></content:encoded>
            <pubDate>Thu, 22 Jan 2009 15:39:51 GMT</pubDate>
        </item>
        <item>
            <title>Using Less is Green</title>
            <link>http://omniti.com/seeds/using-less-is-green</link>
            <guid>http://omniti.com/seeds/using-less-is-green</guid>
            <description><![CDATA[Every time I hear about green computing I feel like there is a gap&#8201;&#8212;&#8201;an enormous gap. The same thing is true in most conservation efforts I witness:


I see grand plans to make it easier and cheaper to produce foods, but no trend to a...]]></description>
            <content:encoded><![CDATA[<p>Every time I hear about green computing I feel like there is a gap&#8201;&#8212;&#8201;an enormous gap. The same thing is true in most conservation efforts I witness:</p>

<ul>
<li>I see grand plans to make it easier and <a href="http://www.borealisgroup.com/industry-solutions/base-chemicals/plant-nutrients/precision-farming/"><span>cheaper</span></a> to produce foods, but <a href="http://www.healthatoz.com/healthatoz/Atoz/common/standard/transform.jsp?requestURI=/healthatoz/Atoz/dc/caz/nutr/obes/alert08032004.jsp"><span>no trend to actually eat less</span></a>;</li>
<li>companies make hybrid vehicles and the consumers flock to them <a href="http://www.newcarpark.com/blog/?p=68"><span>without regard for the environmental manufacturing costs</span></a> and even as these issues are solved and hybrids have less per-mile environmental impact people will still drive too much;</li>
<li>the fast adoption of <a href="http://www.consumersearch.com/light-bulbs/compact-fluorescent-light-bulbs"><span>florescent light bulbs</span></a> (and <a href="http://www.consumersearch.com/light-bulbs/led-light-bulbs"><span>now LED</span></a>) and yet people still leave their lights on when they don&#8217;t need them.</li>
</ul>

<p>I tend to argue a point where I believe my point is right and the alternative is wrong. In this unique case, I find that the alternative is right, but just not "right enough." We could do better. We should do better. Think complete.</p>

<h2>Green Computing through Hardware Optimization</h2>

<p>So much focus is placed on making equipment (processors, ram, storage) more energy efficient that people are losing sight of the bigger picture. Energy efficient equipment is certainly one piece of the puzzle. Unfortunately, too many people see that one piece as a <cite lang="fr">fait accompli</cite> in their energy conservation efforts.</p>

<p>At <a href="http://omniti.com/"><span>OmniTI</span></a> we&#8217;re always careful about fully understanding the power profile of the hardware we install. We are conservative and look for the most power efficient machines we can find that still meet our architectural requirements (which can vary wildly from component to component). Everyone should do this. IBM and HP and Intel are all telling you that you should do it and that they can help. Do it. Let them. But please, don&#8217;t stop there.</p>

<h2>Green Computing through Virtualization</h2>

<p>The next step that is popular in the efforts to save your wallet (and the planet) is consolidation. This is the philosophy that one of today&#8217;s machine is powerful enough to accomplish the goals of many of yesteryear&#8217;s machines. So, virtualize! Take the old machines, turn them into virtual servers and run them on one machine today. Virtualization (of one type or another) has many advantages including: ease of management, simplistic disaster recovery, flexibility in technology selection, shorter provisioning times and the opportunity for consolidation.</p>

<p>Many of our engineers run <a href="http://www.virtualbox.org/"><span>VirtualBox</span></a> or <a href="http://vmware.com/"><span>VMWare</span></a> to quickly launch the platform of their choice. They are allocated one machine each, so they only have the opportunity to use a certain number of watts. Virtualization makes their job a bit faster and a bit easier despite the user experience being ever-so-slightly slower than running native. This use of virtualization does not reduce energy consumption in any significant way though it does increase individual productivity.</p>

<p>We have development environments that are managed by the operations team here that must resemble (as closely as is economically feasible) the production environment to which they deploy. We have many of these and they are all distinct, but not heavily loaded. It is feasible that consolidation could be used in this approach. Our actual situation is that we have to operate 40 isolated development environment. We do this on&#8230;</p>

<ul>
<li>Two $2300 1U machines</li>
<li><a href="http://en.wikipedia.org/wiki/Solaris_Containers"><span>Solaris Containers (Zones)</span></a> as the lightweight virtualization technology</li>
<li>at about 200W run rate, which results in about 3.5 MW-hours per year</li>
</ul>

<p>If you considered the alternative naive implementation:</p>

<ul>
<li>40 1U machines</li>
<li>at about 180W run rate, which results in about 63.1 MW-hours per year.</li>
</ul>

<p>We realize a savings of 59.6 megawatts. Wow! Now, that is an utterly naive method. Instead, let&#8217;s look at a popular method like VMWare ESX:</p>

<p>To run 40 VMWare instances&#8230;</p>

<ul>
<li>I need some substantially bigger hardware at 2GB of RAM per instance (Solaris containers and other similar technologies have some memory sharing efficiencies).</li>
<li>We only have 40 instances here, so going the blade center route seems less compelling.</li>
<li>An <a href="http://www-03.ibm.com/systems/x/hardware/rack/x3650/index.html"><span>IBM x3650</span></a> should be able to manage <a href="http://www.google.com/search?q=IBM+vmware+sizing+guide&amp;ie=utf-8&amp;oe=utf-8&amp;aq=t&amp;rls=org.mozilla:en-US:official&amp;client=firefox-a"><span>about six instances</span></a> (which aren&#8217;t peak and can afford some occasional performance degradation).</li>
<li>Seven of these at 230W each we burn 14.1 MW-hours per year.</li>
<li>This assumes you use local storage. If you need a SAN, you&#8217;ll have to add that into the power profile too.</li>
</ul>

<p>One can say they burn 14 megawatts per year instead of 63! But to me, burning 3.5MW is even better. Now, for those financially responsible types, I&#8217;ve only spoken to recurring operational costs. If you run the numbers on initial capital investment you&#8217;ll see an even more significant savings by simply choosing the right tool for the job (between $80k and $100k by our internal calculations).</p>

<p>This isn&#8217;t to say that you should never use VMWare or a similar heavy-weight virtualization technology. Those technologies afford you specific advantages (like the ability to run entirely different operating systems in each instance). You could also consider something slightly lighter-weight like <a href="http://xen.org/"><span>Xen</span></a>. But, if you find that your virtualization requirements on Solaris will fit in the Containers model (or your Linux needs would be satisfied by <a href="http://wiki.openvz.org/Main_Page"><span>OpenVZ</span></a>) you stand to gain a lot. We only have 40 instances, and the choice saved us 10 megawatts over the next best virtualization solution. Imagine if you had 1000.</p>

<p>These concepts are not likely to be foreign to any reader. Most people have considered virtualization approaches along with hardware replacement to reduce energy costs. But please, don&#8217;t stop there.</p>

<h2>Green Computing through Performance Optimization</h2>

<p>When I look to virtualization technologies for consolidation, there is one requirement&#8201;&#8212;&#8201;a single machine has enough horsepower to power more than a single virtual instance. At OmniTI we deal with some large Internet architectures that serve millions upon millions of people. The bottom line is, I can completely saturate any piece of hardware you give me. There is no opportunity for consolidation in many of these architectures. The awful thing is that I see people choose hardware that is more energy efficient and simply leave it at that. The logical conclusion everyone has arrived at is: "if I can get the same CPU cycles and I/O operations for less watts, I win!" Yes, you win. No, this is not the conclusion of anything. It is the beginning. I hope your ultimate goal is not to spend CPU cycles, it is to service users. The obvious progression from here is: "if I can serve the same number of users with less CPU cycles and I/O operations, I win!" Now we&#8217;re getting some where. That statement starts with the end in mind. This is the land of performance optimization.</p>

<p>I usually try to explain concept through metaphors and analogies, but this multi-resolutioned efficiency concept was a hard one to translate. So hard, that I&#8217;m at a loss. Those who know me well, will say: "Theo without a clever analogy at hand?! That&#8217;s like Denis Leary without a vulgar rant." Alas, I&#8217;ll just give some examples.</p>

<ul>
<li>We increased both the functionality and the performance in <a href="https://labs.omniti.com/trac/fastxsl"><span>core XSLT technologies</span></a> for <a href="http://www.friendster.com/"><span>Friendster</span></a> and we enabled them increase system performance by a factor of over 2.5. That translates to 60% less hardware or 2.5 times as many users. Armed with that, they chose to <a href="http://news.cnet.com/8301-13577_3-9783671-36.html"><span>enter China</span></a>.</li>
<li>We developed a purpose-built content publishing system for <a href="http://ngm.nationalgeographic.com/"><span>National Geographic Magazine</span></a> and were able to deploy an infrastructure of less than half the size (less than half the power) of the leading competitive offering. This architecture was able to sustain several prolonged front-page exposures on <a href="http://msn.com/"><span>msn.com</span></a>&#8201;&#8212;&#8201;delivering, at peak, as many as 3000 <i>new</i> visitors per second.</li>
<li>We developed the <a href="http://messagesystems.com/"><span>Message Systems MTA</span></a> that helps the largest of the large ISPs handle incoming mail volume with as much as 80% reduction in infrastructure when replacing competing commercial incumbents and as much as 95% reduction when replacing open source incumbents.</li>
</ul>

<p>The goal is to get where you are going while spending less. Less of what? Less money, less power, less heat, less CPU cycles, less, less, less. Less of everything. Not only is it better for our planet, it&#8217;s simply cheaper. Don&#8217;t excessively or wastefully use resources. Be responsible: conserve.</p>
]]></content:encoded>
            <pubDate>Tue, 20 Jan 2009 22:13:50 GMT</pubDate>
        </item>
        <item>
            <title>Dissecting Today&#039;s Internet Traffic Spikes</title>
            <link>http://omniti.com/seeds/dissecting-todays-internet-traffic-spikes</link>
            <guid>http://omniti.com/seeds/dissecting-todays-internet-traffic-spikes</guid>
            <description><![CDATA[Today&#8217;s Internet has changed quite a bit from the Internet I used to know.  The Internet has always been successful because of net neutrality.  What&#8217;s net neutrality?  It&#8217;s complicated, but essentially it means that anyone anywhere ca...]]></description>
            <content:encoded><![CDATA[<p>Today&#8217;s Internet has changed quite a bit from the Internet I used to know.  The Internet has always been successful because of net neutrality.  What&#8217;s net neutrality?  It&#8217;s complicated, but essentially it means that anyone anywhere can publish with equal rights.  These aren&#8217;t the kind of rights people usually talk about&#8230; I&#8217;m not speaking of freedom of speech.  Instead, I&#8217;m talking about content being simply bits.  It doesn&#8217;t matter if it comes from <a href="http://cnn.com/"><span>CNN</span></a> or <a href="http://lethargy.org/"><span>my personal blog</span></a>, you as a reader can download the bits that make up the pages you see without bias or preferential treatment.  This makes it darn easy to be a publisher and leads to a fabulous ecosystem with an overwhelming amount of varied content.  However, with more content it is easy to recognize that much of it is utter trash.  Yes. Yes. I know that one man&#8217;s trash is another man&#8217;s treasure.  However, it presents opportunities for sites that help you navigate the wasteland.</p>

<p>Many popular sites today are popular because they link to articles and news items and photographs and movies all over the Internet; they are "interest aggregation services."  And while the Internet has (for now) a decent preservation of net neutrality when it comes to simple web content, not all publishers are on equal footing.  Not long ago, anyone could run a server anywhere (their basement) with DSL or cable or (gasp) dial-up&#8202;&#8212;&#8202;now, the challenge is coping with unexpected attention.</p>

<p>Years ago, the site <a href="http://slashdot.org/"><span>slashdot</span></a> coined a term "slashdotted" which meant that a site received so much sudden traffic that service degraded beyond an acceptable point and the site was effectively unavailable.  This often happened to sites that were at the end of small pipes (DSL, T1, etc.) and occasionally (though rarely) due to bad engineering.  While slashdot might have coined the term, they simply don&#8217;t have the viewership numbers that other large sites today have.</p>

<p>At <a href="http://omniti.com/"><span>OmniTI</span></a>, I work on sites that aren&#8217;t on the end of T1 lines.  Sites with gigabits or tens of gigabits of connectivity.  Sites with 50 million or more users.  Sites powered by thousands of machines. I also work on sites that service millions of people from just a handful of machines (efficiency certainly has its advantages sometimes).  I find it particularly interesting that already popular sites (with significant baseline bandwidth) are seeing these unexpected surges.  For a long time, my blog has been on this same machine which is a vhost for several other web sites.  I&#8217;ve had traffic spikes from places like slashdot, reddit, digg, etc.  And, no surprise, I couldn&#8217;t actually see the bandwidth jump on the graphs&#8230; 10Mbits to 11Mbs?  That&#8217;s not a spike.</p>

<p>Things are changing.  Sites like <a href="http://digg.com/"><span>Digg</span></a> are becoming ever more popular and people are drawn to them as a means of sifting the waste of the Internet.   This means as more people rely on <a href="http://digg.com/"><span>Digg</span></a> and <a href="http://reddit.com/"><span>Reddit</span></a> and other similar sites, the number of unexpected viewers of your content can rise more sharply.</p>

<p>What does all of this mean?  It means that the old rule of thumb that your infrastructure should see 70% resource utilization at peak is starting to falter.  The typical trends used to look like this (this is last week&#8217;s graph from a retail client with a user base of 3 million):</p>

<div style="text-align: center;"><img style="border: 1px solid rgb(200, 200, 200); padding: 4px; text-align: center; display: block; max-width: 800px;" src="http://images.omniti.net/omniti.com/i/b/boringtrend.png" alt="" /></div>

<p>We see a nice peak, a nice valley.  Thursday afternoon, we see a nice traffic spike.  Well, this used to be what I called a traffic spike.  Now, different services have different spike signatures.  It resembles traffic model of classic Internet advertising, except that there is genuine interest and thus dramatically higher conversion rates.  It&#8217;s a simple combination of placement, frequency and exposure.  Because content, unlike ad banners, exists for an extended period of time (sometimes forever), the frequency is very high.  Digg and Reddit have excellent placement with very little exposure (things move out quickly).  A site like CNN or NYTimes usually provides mediocre placement (unless you are on the front page) and excellent exposure.</p>

<p>Lately, I see more sudden eyeballs and what used to be an established trend seems to fall into a more chaotic pattern that is the aggregate of different spike signatures around a smooth curve.  This graph is from two consecutive days where we have a beautiful comparison of a relatively uneventful day followed by long-exposure spike (nytimes.com) compounded by a short-exposure spike (digg.com):</p>

<div style="text-align: center;"><img style="border: 1px solid rgb(200, 200, 200); padding: 4px; text-align: center; display: block; max-width: 800px;" src="http://images.omniti.net/omniti.com/i/b/spikesdissected.png" alt="" /></div>

<p>The disturbing part is that this occurs even on larger sites now due to the sheer magnitude of eyeballs looking at today&#8217;s already popular sites.  Long story short, this makes planning a real bitch.</p>

<p>And the interesting thing is perspective on what is large&#8230;  People think Digg is popular&#8202;&#8212;&#8202;it is.  The <a href="http://nytimes.com/"><span>New York Times</span></a> is too, as is CNN and most other major news networks&#8202;&#8212;&#8202;if they link to your site, you can expect to see a dramatic and very sudden increase in traffic. And this is just in the United States (and some other English speaking countries)&#8230; there are others&#8230; and they&#8217;re kinda big.</p>

<p>What isn&#8217;t entirely obvious in the above graphs?  These spikes happen inside 60 seconds.  The idea of provisioning more servers (virtual or not) is unrealistic.  Even in a cloud computing system, getting new system images up and integrated in 60 seconds is pushing the envelope and that would assume a zero second response time.  This means it is about time to adjust what our systems architecture should support.  The old rule of 70% utilization accommodating an unexpected 40% increase in traffic is unraveling.  At least eight times in the past month, we&#8217;ve experienced from 100% to 1000% sudden increases in traffic across many of our clients.</p>

<p>I talk about scalability a lot.  It&#8217;s my job.  It&#8217;s my passion.  I regularly emphasize that scalability and performance are truly different beasts.  One key to scalability is that a "systems design" scales.  Architectures are built to be able to scale, they are not built "at scale."  It&#8217;s just too expensive to build a system to serve a billion people (until you have a billion people).  It&#8217;s cheap to <em>design</em> a system to serve a billion people.  Once you have a billion people accessing your site, you can likely justify executing on your design.  Google is successful for this reason: their ideas scale and they can build into them as demand rises.  On the flip side, traffic anomalies in the form of spikes are unexpected (by their definition) and scaling a system out to meet the <em>unexpected</em> demand is almost unreasonable.  I would even argue that it is more of a performance-centric issue.  I want every asset I serve to be as cheap to serve as possible allowing me to handle larger and larger spikes.</p>

<p>The reason I find all of this stuff interesting is that understanding <a href="http://omniti.com/does/scalability-and-performance"><span>performance and scalability</span></a>, understanding the <a href="http://omniti.com/writes/scalable-internet-architectures"><span>principles of scalable systems design</span></a> and having <a href="http://omniti.com/does/scalability-and-performance/process"><span>sound and efficient processes for handling performance issues</span></a> is becoming crucial for sites regardless of their size.  This takes insight and practice and it reminds me of Knuth&#8217;s famous saying:</p>

<blockquote><p>We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.</p></blockquote>


<p>That&#8217;s all well and good, but which 97% of the time?  My response to Knuth&#8217;s statement (with which I completely agree) is:</p>

<blockquote><p>Understanding what is and isn&#8217;t "premature" is what separates senior engineers from junior engineers.</p></blockquote>


<p>Let&#8217;s add perspective on the word "sudden."  Most network monitoring systems poll SNMP devices (like switches, load-balancers, and hosts) once every five minutes (we do this every 30 seconds in some environments).  Some people say, "my site scales! bring it on." We see these spikes happen inside 60 seconds and they occasionally induce a ten-fold increase over trended peaks.  Often times, this spike can be well underway for several minutes before your graphing tools even pick up on it.  Then, before you have time to analyze, diagnose and remediate&#8230; poof&#8230; it&#8217;s gone.  Be careful what you wish for.</p>

<p>This, in many ways, is like a tornado.  Our ability to predict them sucks.  Our responses are crude and they are quite damaging.  However, predicting these Internet traffic events isn&#8217;t even possible&#8202;&#8212;&#8202;there are no building weather patterns or early warning signs.  Instead we are forced to focus on different techniques for stability and safety.  The idea of a DoS, a DDoS or the sometimes similar signature of a sudden popularity spike doesn&#8217;t increase my heart rate anymore&#8202;&#8212;&#8202;it&#8217;s just another day on the job.  However, I thought I&#8217;d share the four guidelines that I believe are key to my sanity in these situations:</p>

<ol>
<li><em>Be Alert</em>: build automated systems to detect and pinpoint the cause of these issues quickly (in less than 60 seconds).</li>
<li><em>Be Prepared</em>: understand the bottlenecks of your service systemically.  Understanding your site inside and out.  Contemplate how you would respond if a specific feature or set of features on your site were to get "suddenly popular."</li>
<li><em>Perform Triage</em>: understand the importance of the various services that make up your site.  If you find yourself in a position to sacrifice one part to ensure continued service of another, you should already know their relative importance and not hesitate in the decision.</li>
<li><em>Be Calm</em>: any action that is not analytically driven is a waste of time and energy. Be quick, not rash.</li>
</ol>

<p>Back to those other countries&#8230; Enter China and their recently lessened censorship and we have a looming tidal wave for smaller sites that achieve sudden popularity.  Spikes of several hundred megabits per second are difficult to account for when your normal trend is around twenty megabits per second.    The following graph is traffic induced from a link from a popular foreign news site (that I can&#8217;t read).  I call it: "ouch:"</p>

<div style="text-align: center;"><img style="border: 1px solid rgb(200, 200, 200); padding: 4px; text-align: center; display: block; max-width: 800px;" src="http://images.omniti.net/omniti.com/i/b/spikechina.png" alt="Graph showing a sharp rise in traffic with a long tail." /></div>]]></content:encoded>
            <pubDate>Thu, 15 Jan 2009 18:36:51 GMT</pubDate>
        </item>
    </channel>
</rss>
