<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:content="http://purl.org/rss/1.0/modules/content/"
    xmlns:atom="http://www.w3.org/2005/Atom">
    <channel>
        <atom:link href="http://omniti.com/shares/seeds" rel="self" type="application/rss+xml" />
        <title>OmniTI ~ Seeds</title>
        <link>http://omniti.com/seeds</link>
        <language>en-us</language>
        <description>Seeds</description>
        <item>
            <title>Why NoSQL Does Not Mean NoDBA</title>
            <link>http://omniti.com/seeds/why-nosql-does-not-mean-nodba</link>
            <guid>http://omniti.com/seeds/why-nosql-does-not-mean-nodba</guid>
            <description><![CDATA[Whether you like it or not, NoSQL is changing the world. Granted, it&#8217;s not even clear what NoSQL means sometimes, but there is no doubt that, for better or worse, we are in a renaissance of non-relational database systems right now. For me person...]]></description>
            <content:encoded><![CDATA[<p>Whether you like it or not, NoSQL is changing the world. Granted, it&#8217;s not even clear what NoSQL means sometimes, but there is no doubt that, for better or worse, we are in a renaissance of non-relational database systems right now. For me personally, I tend to ignore the hype, study these systems with a critical eye, and then deploy them where traditional RDBMS software struggles. I do occasionally bump into people who babble on about how NoSQL will put DBAs out of business. When I hear this kind of comment, I just nod my head and smile: It&#8217;s hard to convince people that their beloved paradigm shift is just more of the same, and also very seldom worth the effort. However, I was recently talking to an Oracle DBA, and he made some comments about how he was concerned that new companies would have no use for a DBA because they were all switching to NoSQL. This surprised me a little, actually. I figured if there was one group of people who wouldn&#8217;t buy into the NoSQL hype, it would be the stalwart Oracle crowd. Et tu, Brute? If the hype has gotten to them, I guess it&#8217;s time for me to speak up. Whatever you think about NoSQL, the death of the DBA is a ludicrous idea. </p>

<p>One of the little known secrets of NoSQL systems is that they are used to hold data. Most NoSQL systems try to trumpet the ease of pushing data in and out of the system: "just push your JSON object to the server and your data is instantly stored, regardless of structure." It&#8217;s Oh, So Magical. The problem, of course, is that easily dumping data into a system doesn&#8217;t mean much if you can&#8217;t get it back out. This is where a data model comes into play. </p>

<p>In a traditional RDBMS environment there is usually someone who designs a data model ahead of time, breaking down information into a relational design, sometimes even drawing up an ERD diagram for people to reference, so they can find their information. This would be turned into DDL, committed to the database and then enforced rather strictly. Try to insert the wrong type of data, and you&#8217;ll get an error. Try to query for a column that doesn&#8217;t exist, and you&#8217;ll get an error. I&#8217;ve seen many a developer complain that the database is "too strict" because they couldn&#8217;t get their queries right, but the truth is you still need a <a href="http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/forum-application-data-model-conversion-td5210111.html"><span>data</span></a> <a href="http://markmail.org/message/p3h2r5dq5o2hqmod#query:+page:1+mid:5cc7yy5323b5mgkl+state:results"><span>model</span></a> even though you went to a NoSQL system. You won&#8217;t hear much about these problems from NoSQL advocates because the type of shops that use them are often smaller shops, where there are a small number of developers, and everyone knows what everyone else is doing, more or less. Or they are using NoSQL on new projects&#8202;&#8212;&#8202;where the scope of the system still fits in head-space pretty easily. However, as code bases grow, and projects stretch on for years, and developers come and go, not knowing what data is stored where means that you start to wish you had a data model. You recognize this the first time you hit a bug because you assumed that not getting a value back in the "items" key meant the user had no items in their cart, only to realize later that you should have been looking at the "item" key. If you&#8217;re lucky, this type of bug pops up on information retrieval rather than in information storage; but either way, cleaning up such a bug can be painful. You&#8217;ll also notice how, often, you have to pull back objects just to see what information is actually stored, because depending upon when the object was inserted, it may have a different opinion on what the data it should be holding looks like. Of course, you can go back and dig through the code that puts the data in, but this assumes you know where that code lives. Maybe you could ask the guy who originally wrote that feature, but suppose he left the company six months ago? Oh well, grep is your friend.  </p>

<p>So let&#8217;s pretend you&#8217;ve taken the time to discuss what data you must hold up front, and someone keeps a diagram of that posted on the wall. . . awesome. Sadly, you must still implement it in a physical system. In all the hype about "schema free" databases, the fact that in order to get good performance you still have to make adjustments for physical layout, or build things like indexes into your system to make queries against the server fast enough, is often overlooked. Yes, even on NoSQL you still need to know about <a href="http://mongly.com/Speedig-Up-Queries-Understanding-Query-Plans/"><span>explain tools and indexing</span></a>.</p>

<p>Think about what this means: Someone has to recognize that a certain piece of data is going to be requested a lot or notice a performance problem on the existing server. Once you figure out the right index to build, someone has to build it in production. This means locking, potentially, and certainly means an IO hit. Is your application developer going to be responsible for this? Does he know if your index build will require backfilling, like when you build a secondary index on existing data in Riak Don&#8217;t get me wrong, it&#8217;s not that they aren&#8217;t capable of doing this work, it&#8217;s just important to realize that this type of work must be done. </p>

<p>OK, you have a data model, and are managing the physical implementation; that&#8217;s good. But did you know that your NoSQL system still must interact with the disks on your server? It does. The better question is, do you know *how* it interacts with the disks? Actually, before we talk about the disks, do you know how it interacts with RAM? Some NoSQL systems absolutely fall over when they hit thresholds larger than RAM. For some, it&#8217;s total data set size, for others it might be the size of all index pointers in the system. Of course, maybe you have a system that doesn&#8217;t fall over, it just becomes slower, perhaps unacceptably slower. In either case, you need to be aware of these limitations. Have monitors in place for them, and then perform capacity planning accordingly. Now, let&#8217;s get back to disks. How crash-safe is your NoSQL server? Does it give you single node durability? Are writes automatically synced to disk with each put, or are they batched up and pushed out occasionally? Maybe it&#8217;s configurable; do you know how your systems are set up? If you were using Postgres, you could tune the durability guarantees for all of these cases. Your DBA knows this, and whoever is in charge of your NoSQL system needs to know this. Even if you think you are storing data you can afford to lose, chances are your business model must be aware of just how much exposure it has. Oh, but yours is a start-up, so you don&#8217;t have business concerns yet. . .? Still, the level of durability is going to have significant impact on your IO needs, and that, in turn, will impact your performance&#8202;&#8212;&#8202;and you can&#8217;t post your <a href="http://www.google.com/url?q=http%3A%2F%2Farstechnica.com%2Fbusiness%2Fnews%2F2011%2F09%2Fgoogle-devops-and-disaster-porn.ars&sa=D&sntz=1&usg=AFQjCNHwMY0pCnCygkWB0rqAtvk0gZjk4Q"><span>devops data-porn</span></a> if you can&#8217;t get decent systems performance.</p>

<p>Of course, disks are kind of unimportant these days, given that everyone runs multiple nodes, and you can have a distributed hash table running across multiple nodes with just a handful of Chef commands. That said, have you ever managed a complex distributed system? You know who probably has? Your DBA. By far the most common answer to the failover problem is to stick up a replicated database slave. It&#8217;s also common to see people putting up slaves for horizontal read scaling. DBAs understand the tradeoffs in consistency guarantees that come with these types of systems&#8202;&#8212;&#8202;not just at the node level, but from the applications point of view as well. You&#8217;ll need solid understanding of this on your dev team if you are going to build apps against a distributed data system. In addition, someone has to manage all of these servers and make sure they perform well. If your NoSQL system uses master-slave replication, someone with experience in this area might be handy. If you&#8217;ve ever built a Master-Master pair with individual Slave systems, you probably know what I am talking about. Oh, do you think running a clustered hash table system is easier? Just because you can add a new node to the ring doesn&#8217;t mean it&#8217;s free. You need both server level and cluster level monitoring in place. You need to make sure you can afford the IO and network strains as data is copied around, and you need to know under what circumstances locking will be involved. These things really do happen. </p>

<p>I remember when the MySQL documentation had a section devoted to explaining why foreign keys weren&#8217;t needed. Of course, once MySQL finally implemented foreign keys, it became a major headline for their release announcements. This is what happens as systems mature. Most NoSQL systems can cut down on overhead by eliminating (or more accurately, not implementing) many of the features people have come to expect from an RDBMS. Of course, which features are eliminated differs across systems. </p><p class="blockquote">Did you know you can write triggers for Riak in either Javascript or Erlang? Exactly which language you can use when differs depending upon the type of trigger.</p><p>To wrap your head around this, you need to have a good understanding of how triggers work, how asynchronous calls affect transaction semantics (or the lack thereof), and what types of work you might want to do on the server side. Some triggers are used to enforce data integrity or do data manipulation at the server level; these are the types I think work fine within a vertically scaled system. Others really are extensions of the application, and while they are sometimes frowned upon for adding overhead into a centralized resource like a typical RDBMS, in a decentralized system that scales out the arguments against them aren&#8217;t as clear cut. One thing I do know though: this is probably not something your SA wants to be involved with at all.</p>

<p>If all of this isn&#8217;t enough to make you think twice, let me mention one more thing. While you may not have a query language in your NoSQL system, that doesn&#8217;t mean you don&#8217;t query against it. Whether you are writing <a href="http://browsertoolkit.com/fault-tolerance.png"><span>distributed map-reduce queries</span></a>, trying to balance link-walking vs. secondary indexes, or trying to figure out whether the code you&#8217;ve written to pull back every key in the system is going to be a problem; there are going to be times when you will have to make these queries more efficient. This is probably going to be a more application-centric type of tuning than the traditional RDBMS, but watch as someone in your dev team becomes known as "the go to guy" for making your map/reduce query run more efficiently. And incidentally, you should also be aware that many of today&#8217;s NoSQL systems are trying to bolt on <a href="https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-ExampleQueries"><span>SQL</span></a> and <a href="http://code.google.com/appengine/docs/python/datastore/gqlreference.html"><span>SQL-like</span></a> interfaces into their systems. Who fails to get excited when thinking about rewriting queries with subselects into joins clauses against a Hadoop cluster?</p>  

<p>If you think that managing all of this sounds like an impossible task, you&#8217;re welcome. This is the job that DBAs have been doing for years. . .and yes, it can be incredibly challenging. Of course, it doesn&#8217;t have to work this way. You can draw the lines of responsibility differently right now. Make the application developers manage the data model, design the schema, and tune the queries. Let your ops people be in charge of building new nodes, managing replication, and ensuring you have valid backups. Maybe 10 years ago, you had to have a DBA to work with Oracle, but nowadays just about anyone doing software engineering can put up a Postgres database and tune their way to usability with about three wiki links; you don&#8217;t need a DBA to design good operational habits. </p>

<p>Also, some of you might think of this as some type of doom and gloom piece against NoSQL; it is not meant that way at all. It&#8217;s not that switching to NoSQL is a bad idea necessarily; there are some things that RDBMS software can&#8217;t do as well as a more dedicated solution. But, if you think that switching to NoSQL will just let you hand-wave away all of the challenges of running a database, you are terribly misguided. If you&#8217;re a DBA and you are worried about a future with NoSQL, take heart; study your product less and focus on these key architectural design points more. Those skills are critical now, and they will remain so in the future, NoSQL or not.  </p>]]></content:encoded>
            <pubDate>Wed, 18 Jan 2012 21:15:00 GMT</pubDate>
        </item>
        <item>
            <title>Infrastructure Cost Reduction</title>
            <link>http://omniti.com/seeds/infrastructure-cost-reduction</link>
            <guid>http://omniti.com/seeds/infrastructure-cost-reduction</guid>
            <description><![CDATA[Everyone would like to spend less money on their server infrastructure, but it can be difficult to figure out where money can be saved, and whether reducing the amount that you spend on infrastructure will result in a revenue drop, caused by outages an...]]></description>
            <content:encoded><![CDATA[<p>Everyone would like to spend less money on their server infrastructure, but it can be difficult to figure out where money can be saved, and whether reducing the amount that you spend on infrastructure will result in a revenue drop, caused by outages and reduced service quality. By looking at your costs, revenue, system metrics and task/time management system together, it&#8217;s possible to avoid these pitfalls, and reduce the amount of money that you spend on your infrastructure while simultaneously improving the quality of the services that you provide to your customers.</p>

<p>One of the first steps to reducing your infrastructure cost is to make certain that you have a monitoring solution in place that can help you identify underutilized parts of your infrastructure, as well as what parts of your infrastructure are costing you money.  </p>

<p>Most shops already have metrics in their monitoring system that can be used to identify underutilized equipment, but if you do not: CPU usage, disk IO, network IO, and memory usage are all good places to start looking&#8202;&#8212;&#8202;and they are all easy to monitor. When you find underutilized equipment, a money saving solution is usually pretty obvious. If the system is older, and is a good candidate for virtualization, then it should probably be virtualized. If an expensive network link is not being heavily utilized, or it&#8217;s being over utilized, reconsider whether there are cheaper (and better) options available from another vendor. Reducing costs by virtualizing, moving off of old hardware and getting network connections that are suited to your business should be second nature at this point, but it can still be difficult to isolate exactly where you can do this. Simple systems monitoring can help.</p>

<p>Figuring out which parts of your infrastructure are costing you money can be more difficult. Most businesses are not monitoring the number of customers they lose as a result of unplanned downtime, or what the cost of support on old hardware is over time. By integrating business information (like number of customer sign ups over a period of time, amount of money refunded or number of support cases opened) into your technical monitoring, error detection or trending system, you can immediately see what the results of an outage or change are; and, you&#8217;ll know how much money should be spent to fix a problem, or to build a more fault-tolerant infrastructure.</p>

<p>There are some things that are difficult to automate in monitoring, but should still be reviewed on a regular basis. Support contracts, rack space/colocation bills, bandwidth overages (or underutilized contracts for bandwidth) and power bills all fall into this category. As equipment and environments age, fixed costs become taken for granted. When this happens, you&#8217;ll frequently forget that you are paying money for rack space that is no longer in use, bandwidth that is no longer necessary, and expensive support contracts on systems that could be virtualized onto an under-utilized new system. A newer system is often already in place, but the legacy system is left running for months, if not years, in case the new system fails. When you replace something and keep the old system as a backup, set up a reminder to revisit the decision to keep the old system after a month or two. If you don&#8217;t, the legacy equipment can end up staying in use for years before someone remembers it.</p>

<p>Finally, review your own time tracking system to see how you are spending your time. It&#8217;s easy to get into the rut of documenting a manual way of taking care of a task, and then doing it that way every time. If you can automate a process (or even parts of a process), or make the documentation simple enough for anyone else to follow, you can reduce the amount of time you spend on things and have more time for setting up new clients, investigating new software and helping your users. </p>

<p>One of the things to look for in your time tracking system would be who is spending time on tasks, and what those tasks are; if senior people are using their time to do the same task over and over again, it can be a sign that the task should be better documented, so that more people in your group can take care of it (and the more expensive time of senior administrators can be used for more difficult work).</p>

<p>To summarize:</p>
<ol>
<li>Monitor everything and look for under- or over-utilized resources.</li>
<li>Track your time; if you are spending lots of time on the same procedures; automate them. If you are responding to the same problems over and over again, find a way to permanently fix them.</li>
<li>Watch your invoices. It&#8217;s easy to pay support, bandwidth and power bills month after month, or year after year, without reviewing them to see if you can get a better deal elsewhere, or to see if non-critical infrastructure is costing you more money than you would like.</li>
</ol>

<p>These ideas are all simple, but by considering them during your day-to-day operations, as well as during periodic reviews, you may find yourself spending less money, and using less time, on keeping your infrastructure running well. Reducing the cost of your existing infrastructure gives you more time and money to spend on improvements and new projects, rather than merely on maintaining what is already in place.</p>]]></content:encoded>
            <pubDate>Thu, 01 Dec 2011 20:37:47 GMT</pubDate>
        </item>
        <item>
            <title>Sometimes &#34;Sexy&#34; Can Be the Right Choice</title>
            <link>http://omniti.com/seeds/sometimes-sexy-can-be-the-right-choice</link>
            <guid>http://omniti.com/seeds/sometimes-sexy-can-be-the-right-choice</guid>
            <description><![CDATA[
Hardly a month goes by these days without some exciting new technology hitting the blogosphere, filling the imagination of CTOs all over. At OmniTI, we are often approached by people asking about the "razor&#8217;s edge" technology of the week. Freque...]]></description>
            <content:encoded><![CDATA[
<p>Hardly a month goes by these days without some exciting new technology hitting the blogosphere, filling the imagination of CTOs all over. At OmniTI, we are often approached by people asking about the "razor&#8217;s edge" technology of the week. Frequently, they are convinced that this is the technology that they need for their business, and often will try to shoe-horn their requirements to fit the new toy. We typically have to convince people of things like their dusty old relational database actually handling their data needs just fine, even if it isn&#8217;t <a href="http://www.mongodb-is-web-scale.com/"><span>web scale</span></a>. Tried and true typically works better than shiny and new.</p>

<p>Sometimes however, a client&#8217;s requirements really do lend themselves nicely to the newer technologies and we are justified in playing with them during business hours instead of at home! We love our <a href="http://omniti.com/is/hiring"><span>jobs</span></a> at OmniTI.</p>

<p>The request we&#8217;ll review here was fairly simple. The client needed a highly scalable and fast web service to provide geo-location data, based upon the ip address of the requester. They also had to serve a small static file. The service would run for a few months and then would be discontinued. It didn&#8217;t require true high availability, but we had to be able to fix it quickly if something went wrong.</p>

<p>Using technologies we were already employing on the project, we wrote a simple <a href="http://labs.omniti.com/labs/mungo"><span>Mungo-based</span></a> perl script to look up the information in the <a href="http://www.maxmind.com/app/ip-location"><span>MaxMind</span></a> city level database and return the data inside of a JSON object. Once placed on the existing Apache httpd servers along with the static document, we had a working prototype for their third-party to develop against, while we looked at the more complex issues in the request.</p>

<p>In this case, there were two immediate concerns:</p>
<ul>
<li>The service had to be fast and handle a <strong>lot</strong> of requests.</li>
<li>This component should not endanger the availability of the rest of the web services.</li>
</ul>

<p>The web farm already deployed to handle the client&#8217;s business used the <a href="http://httpd.apache.org/"><span>Apache httpd server</span></a> and, leveraging the platform flexibility, it grew to support a number of legacy web services. As this setup was already tweaked for these particular needs, we didn&#8217;t really want to reconfigure it. However, we needed to know where we stood from a performance point of view, to find out how much traffic we could handle.  A quick <a href="http://httpd.apache.org/docs/2.0/programs/ab.html"><span>Apache bench</span></a> testing revealed:</p>
<pre><code>Document Length:        188 bytes
Requests per second:    1306.85 [#/sec] (mean)
Time per request:       38.260 [ms] (mean)
Time per request:       0.765 [ms] (mean, across all concurrent requests)
Transfer rate:          520.79 [Kbytes/sec] received
</code></pre>
<p>Our goal was to be able to safely handle about 5,000 requests per second. While we could sustain that traffic by scaling out across our client&#8217;s multiple web servers, when traffic volume would reach worst-case expectations, there would be an unsafe likelihood of the servers becoming saturated, followed by service degradation. Or worse yet, all of the web services could become completely unavailable. Needless-to-say, either case would be unacceptable. We had to isolate this service from the rest, however isolating on similar hardware as we were currently using for the web-farm would have been a prohibitively expensive solution, especially considering the transient nature of a project designed to last only a few months.</p>

<p>With such requirements, a cloud deployment was the obvious choice. While there <a href="http://joyeur.com/2011/04/22/on-cascading-failures-and-amazons-elastic-block-store/ "><span>are</span></a> <a href="http://stu.mp/2011/04/the-cloud-is-not-a-silver-bullet.html"><span>plenty</span></a> of <a href="http://joyeur.com/2011/04/25/network-storage-in-the-cloud-delicious-but-deadly/"><span>reasons</span></a> to stay away from the cloud, there are some really good reasons to use it, as well. The cloud would let us use exactly as much CPU and bandwidth as we needed, and provide an easy and quick way to get more if we required it. Our service did not store persistent data, even at a session level, so if a cloud instance went "poof," there was nothing we couldn&#8217;t afford to lose. When no longer needed, we could just shut down or scale back the servers, without worrying about excess hardware&#8202;&#8212;&#8202;the exact benefit cloud supporters always want. EC2, here we come!</p>

<p>With the move to EC2, we had the option to deploy the prototype code that we had written already. However, as that code leveraged an existing ecosystem designed to service a much wider spectrum of needs, duplicating the environment would have been overkill, and attempting to strip it down to the minimum necessary would have been a rather daunting challenge with little long-term benefit. With the luxury of exploring a green field approach, we turned our attention to Node.js. At OmniTI, we had the advantage of having seen Node.js used already a few times for production services, and we had even incorporated it into a few solutions we had developed, so we knew that the type of light-weight, fast response code that we were looking to develop for this project was very well suited for Node.js. Through a bit of serendipity, <a href="http://omniti.com/is/theo-schlossnagle"><span>Theo Schlossnagle</span></a> had just recently branched, and then finished, a new version of <a href="https://github.com/postwait/node-geoip"><span>node-geoip</span></a> that was capable of reading the MaxMind City database. Add to that my personal joy for getting the chance to use Node.js in production, for a customer project, and the decision was clear.</p>

<p>Plan in hand, the perl script was quickly converted to Node.js and placed on a small Apache EC2 instance for load testing (thanks to <a href="http://omniti.com/is/zach-malone"><span>Zach Malone</span></a> for assistance with all of the cloud benchmarking work). The entire code follows.</p>

<pre><code>var http = require('http'),
    sys  = require('sys&#8217;),
    geoip= require('geoip');

var con = new geoip.Connection('/www/geodata/GeoIPCity.dat', 0, function(){});

http.createServer(function (req, res) {
    if( req.url == '/get_city' ) {
        res.writeHead(200, {'Content-Type': 'text/plain'});
        var ip = req.headers['x-forwarded-for'] || 
                     req.connection.remoteAddress;
        con.query( ip, function(result) {
            var obj = new Object();
            if(!result){ obj.city = 'Unknown';   }
            else       { obj.city = result.city; }
            res.end(JSON.stringify(obj) + "\n");
        });
    } else if( req.url == '/crossdomain.xml' ) {
          res.writeHead(200, {'Content-Type': 'text/xml'});
          res.end("<?xml version=\"1.0\"?>\n<!DOCTYPE cross-domain-policy SYSTEM \"http://www.adobe.com/xml/dtds/cross-domain-policy.dtd\">\n<cross-domain-policy>\n<allow-access-from domain=\"*\" />\n</cross-domain-policy>\n");
    } else {
          res.writeHead(404, {'Content-Type': 'text/plain'});
          res.end("File not found.\n");
    }
}).listen(80);
</code></pre>

<p>Slightly more than twenty lines of code. This will return a JSON object with the name of the nearest city, based upon client IP address, or "Unknown" if it does not resolve. It will also serve a crossdomain.xml file to any flash objects that need one, and return a 404 to any other requests. Where&#8217;s the web server, one may ask? Node.js takes care of all of that for you.</p>

<p>Simple Apache bench testing of this code gives between 300 and 600 requests/second on a Small EC2 instance with a single virtual CPU. </p>
<pre><code>Document Length:        223 bytes
Requests per second:    344.75 [#/sec] (mean)
Time per request:       145.033 [ms] (mean)
Time per request:       2.901 [ms] (mean, across all concurrent requests)
Transfer rate:          96.62 [Kbytes/sec] received
</code></pre>
<p>Yes, much slower than what we were benchmarking on our web-farm, but much cheaper to scale out, not to mention that we were having the service separation we wanted. To scale out, we had to load balance our instances; <a href="http://aws.amazon.com/elasticloadbalancing/"><span>Amazon&#8217;s Elastic Load Balancing</span></a> was used here.</p>

<p>It was expected that a small decline in service would occur due to the overhead, but we were pleasantly surprised to see slightly BETTER performance. Apparently, getting the IP address in Node.js from the request header is faster than getting it from the connection object, so having a load balancer in the middle actually improved performance.</p>
<pre><code>Document Length:        223 bytes
Requests per second:    367.87 [#/sec] (mean)
Time per request:       135.918 [ms] (mean)
Time per request:       2.718 [ms] (mean, across all concurrent requests)
Transfer rate:          112.40 [Kbytes/sec] received
</code></pre>
<p>In repeating the test with two, and then three, server instances behind the load balancer, each new instance added continued to scale the volume of requests-per-second we could handle by another ~350-600 requests/second. So, with only three small EC2 instances, we were able to crank out between 1200-1500 requests/sec of GeoIP lookups.</p>

<p>350 to 600 requests/second is a pretty large window, and it means that some of our EC2 instances do much more work then others. This is something that you have to deal with when you are deploying a cloud-based solution. Thankfully, EC2 gives you a lot of flexibility to rapidly create and destroy instances, so if you get an especially slow instance, it can be worth throwing the instance away and creating a new one. As a bonus, if needed, it takes fewer than 15 minutes to manually get a new instance provisioned, set up, and running, without using Amazon EBS. Not relying on EBS enabled us to dodge the infamous <a href="http://aws.amazon.com/message/65648/"><span>EC2 outage</span></a>.  Our service was unaffected despite running in the unfortunate Virginia data center cloud.</p>

<p>Now, just because we were using Node.js, and we were deploying to the cloud, doesn&#8217;t mean we toss away due diligence. In order to make certain that the EC2 solution offered good performance for the money, we decided to benchmark the same code on a Joyent <a href="http://www.joyent.com/products/smartmachines/ "><span>SmartMachine</span></a> that we had available.  A single Joyent system had the performance of ~3.5 small EC2 instances:</p>
<pre><code>Document Length:        223 bytes
Requests per second:    1564.30 [#/sec] (mean)
Time per request:       31.963 [ms] (mean)
Time per request:       0.639 [ms] (mean, across all concurrent requests)
Transfer rate:          438.43 [Kbytes/sec] received
</code></pre>
<p>The cost of the Joyent system was, however, twice as much as three small EC2 instances, plus a Elastic Load Balancer. Joyent includes a generous amount of bandwidth with any instance (Amazon does not), but their large, fixed monthly cost meant that we would not have as much flexibility to scale up and down as we did with EC2, which has hourly billing.</p>

<p>So, we had a working solution at this point, but we still had to make sure it would continue to work; in short, it had to be monitored. Normal end-to-end monitors and request timing monitors were put in place on the load balancer, as well as checks on each individual server instance. But, we also wanted to know how much traffic we were serving without anymore fuss. Node.js could keep track of that for us as well. By simply adding:</p>
<pre><code>var cities = 0, xmls = 0, fnf = 0, status = 0;
</code></pre>
<p>&#8230; some variable++'s in the appropriate spots, and &#8230;</pr>
<pre><code>
    } else if( req.url == '/status&#8217; ) {
        res.writeHead(200, {'Content-Type': 'text/plain'});
        status++;
        var obj = new Object();
        obj.cities = cities;
        obj.xmls   = xmls;
        obj.fnf    = fnf;
        obj.status = status;
        res.end(JSON.stringify(obj) + "\n");
</code></pre>

<p>. . .we could see exactly how much traffic, of each type, that each Node.js instance had served; along with whether any of them had crashed (as evidenced by a reset counter).  This was set up to be pulled by <a href="http://circonus.com/"><span>Circonus</span></a> which can consume the JSON data and graph the usage trends over time.</p>

<p>Perhaps also of interesting note, all of this was done almost a year ago.  "A few months" turned into much longer. The client&#8217;s required utilization has gone up and down with a corresponding number of EC2 nodes added or removed.  But this simple script hasn&#8217;t had to be modified or touched since.  It has happily run a production service without any problems for a minimal amount of time invested.</p>

<p>To be fair, this was a rather simple problem that could have been solved in a number of different ways, perhaps even more effectively. But sometimes it behooves you to explore those sexy new technologies, learn their trade-offs and understand them better. In this way, you&#8217;ll understand the trade-offs involved, and you can feel comfortable deploying them for critical components of an architecture. While it&#8217;s essential to remember that sexy doesn&#8217;t mean good, it&#8217;s a pleasant reminder that sometimes good can be sexy.</p>
]]></content:encoded>
            <pubDate>Tue, 22 Nov 2011 20:03:27 GMT</pubDate>
        </item>
        <item>
            <title>Thoughts on Web Application Deployment</title>
            <link>http://omniti.com/seeds/thoughts-on-web-application-deployment</link>
            <guid>http://omniti.com/seeds/thoughts-on-web-application-deployment</guid>
            <description><![CDATA[Abstract: A quick overview of various strategies to ease deployment of web applications, and some common pitfalls and failure modes to avoid.  Intended to be broadly technology-agnostic.

Introduction
Over the course of my career, I&#8217;ve worked in ...]]></description>
            <content:encoded><![CDATA[<p><i>Abstract: A quick overview of various strategies to ease deployment of web applications, and some common pitfalls and failure modes to avoid.  Intended to be broadly technology-agnostic.</i></p>

<h3>Introduction</h3>
<p>Over the course of my career, I&#8217;ve worked in a number of different environments, each with their own particular processes and procedures for deploying systems, from development to production. Over time, a number of best practice patterns and common anti-patterns have emerged, which this article will attempt to enumerate and explain. I hope this information will give you pointers and direction to improve your processes so that that deployment is made both easier and less error-prone. As much as is possible, I will be technology agnostic, so of course your particular environment may vary or require additional steps, but following along the broader themes listed here should be helpful.</p>

<h3>Step Zero: Intelligent Use of Source Control</h3>
<p>This seems like something that is almost forehead-slap obvious, but you should be using source control during development. Working without it is like doing high-wire acrobatics without a safety net. There is no project so small, either in scope or staff, that cannot benefit from some sort of source configuration management (SCM) system in place. Selecting which SCM to use (e.g., Subversion, git, Perforce) depends upon your team&#8217;s development style and environment choice , and is beyond the scope of this article. In general, the Pragmatic Programmers, O&#8217;Reilly and APress books covering particular systems tend to be good resources.</p>
<p>Further, source control must be used intelligently to be of much use. Common anti-patterns include things like having the development environment exist in only one shared space that uses one source control checkout so people are colliding over editing the same physical bits for any given file, or never branching/tagging so that trying to determine the exact state of your system at a given point in the past is an exercise in frustration.  If you ever have to ask "is anyone else editing this file?," you are either using a supremely broken SCM or you are doing something gravely wrong.</p> 
<p>Your SCM setup should enable you to work concurrently and in isolation with a minimum of hassles, allowing easy integration of work done concurrently on the same module or set of modules; easy reproduction of the system as it was at any point in time in the past; and, ideally, easy searching of commit history because all of these scenarios will come up repeatedly in any project of significant scale. For example, if you find yourself trying to diagnose a recurrence of an issue with a particular ticket number, your diagnosis will be vastly sped up if it is easy to find all of the commits related to that ticket number in the past. Similarly, being able to tell easily who exactly added a particular feature months or years ago might make it much faster to track down the organizational knowledge required to extend or fix it in the present.</p> 
<p>The more isolation feasible in the environment, the less coordination overhead is required to work together as a team on a given workload, meaning that your productivity scales more linearly with additional developer resources. Good usage of a SCM can aid this by making it easy to keep individual development environments in sync; a common best practice is to make sure that each developer can run their own stand-alone copy of the execution environment based upon a frequently-updated SCM checkout. As a specific example, with the common LAMP (or similar) technology stack, it is easy for each developer to have an account on a shared machine, with a checkout in their home directory that is used as the root for a vhost/distinct port so that each developer may work in isolation talking to different ports on the same host.</p>
<p>Having a particular "branch" or "tag" devoted to staging/testing and production environments is a common best practice. The particulars of how this is done vary by SCM, but every modern SCM should have some facility corresponding to one or the other of the above if not both. Branching/merging tend to be slightly more complex operations in most SCM systems, but the effort expended in learning these operations will repay itself many times over as you begin to be able to take a more sophisticated approach to the various states and stages of your system&#8217;s evolution over time. See steps two and three below for further discussion along this theme.</p>

<h3>Step One: Deployment Planning</h3>
<p>Take the time to write out a deployment plan, even if it&#8217;s just a brief one. At a minimum, your deployment plan should include:</p>
<ul>
<li>Name and short purpose description of the project (seems obvious, but depending upon how widely this information is distributed and how big your organization is, your readers may not automatically know what you&#8217;re working on in order to tie your authorship back to what this proposed production deployment is all about)</li>
<li>Names of and contact information for the staff responsible for its development (particularly tech leads and project managers)</li>
<li>Source location (e.g., links into the SCM&#8217;s web interface or descriptions of how to retrieve the source for SCM&#8217;s that don&#8217;t have a web interface)</li>
<li>List of affected systems/what resources will be used (i.e., which servers this code will be pushed to, are there any extra steps that have to take place, like running of database modification scripts or setting up of extra server software/new configuration for existing resources?)</li>
<li>Deployment and Rollback procedures (this may reference standard operating procedures in other documents if there is nothing out of the ordinary for the given deployment)</li>
</ul>
<p>A wiki is a common tool for this, but even an email to the right people or a mailing list can suffice. One advantage of a persistent document is that it can "grow" over time as the project evolves rather than being reconstructed from scratch on each deployment (i.e., things like project staff or the particular branch of the project&#8217;s source being used may change only infrequently, but dated/versioned logs might be kept of which revisions were rolled out when, or which revisions required an extra step, etc.). This is particularly important for continuous deployment environments (see next section). It is also easy to maintain a template like this, so that each one follows a common layout and helps staff remember frequently overlooked steps (e.g., helping developers remember to mention system software config changes required or database schema changes).</p>
<p>This may seem like unnecessary overhead (developers who truly enjoy writing documentation are not common), but retaining organizational knowledge about what is deployed where, when and why is crucial to keeping non-trivial systems running over time. Even if the original staff who deployed a system are still with the company (staff turnover being a fact of life for any organization), remembering precisely what was done and why potentially years after the fact is not an easy feat. The effort invested in this now will repay itself in the future.</p>

<h3>Step Two: Continuous Deployment<br/> or Phased Deployment?</h3>
<p>There are two common modes of deployment for web applications. The first models traditional software engineering by using phased deployment, where phases of release correspond to planned or scheduled bundles of additional features and new bug fixes. A variant of this is the "boxcar" or "feature train" model that ships a defined release on some set schedule ("If it&#8217;s ready in this six week window, it goes on the shipment train, if not, it waits for the next one."). This is common for environments that have rigorous quality assurance or change control requirements, as it allows a built in time period prior to each phase&#8217;s release for those processes to execute in a regular, repeatable fashion on a known schedule. In these environments it is common to "snapshot" the particular phase for deployment in some fashion, via something like a "branch" or "tag" as discussed in step zero. For example, a Q3 phased release for a system might have "prod-Q3-2011" as its source branch/tag. The state of the system so denoted might then further be used as the base for issue remediation hotfixes that must go live in between regular phased releases. For an automated deployment environment (see next section), the system would need to be aware of the correct current branch/tag to use as its deployment source (or perhaps offer the option of the currently available sources that match a given pattern to the user making the deployment).</p>
<p>The second, and more recent development, is continuous deployment. With continuous deployment, new features or bug fixes may go live at any time. Some environments push live to every user at the same time, and others use a "feature flag" approach where a given user must have a given flag or set of flags in their active session or profile to be exposed to the new code.  Care must be taken for "feature flag" setups to ensure that tests of the system (see monitoring and verification section below) are using the correct flag or sets of flags to accurately capture the state of the system as the end users see it.</p>
<p>What I will say next has proven to be one of the more contentious parts of this essay in internal discussion, so I freely admit that this is a point worthy of further thought particularly as time provides more evidence on the ways that continuous deployment works or fails.  I do not believe that continuous deployment systems should be configured such that the source for pushes to production machines (e.g. a branch or trunk or whichever nomenclature is appropriate for your environment) is the same space that developers initially check code into.  I am willing to stipulate that things like developer mindset and discipline in concert with automated checking scripts within the commit process may eliminate many sources of error that could be introduced in such an "insta-live" system, but I&#8217;m also a big believer in the power of Murphy.  Having some separation here, however low friction, should help prevent many errors (maybe something like a code review queue or holding pen that things go through before going to production, or development in branches with deploys drawn from trunk with many and small merges vs fewer large ones).  In my mind continuous deployment is more about release automation, scope of work per released quanta, and democratization of the release process combining to empower individuals to release quickly than any particular SCM configuration litmus test.</p>
<p>Which of these approaches works best for your team may be dictated by the business/regulatory needs your application must satisfy, or may be limited only by the consensus of personal preferences involved. Generally speaking more conservative environments will trend toward phased deployment out of necessity.</p>

<h3>Step Three: Deployment Automation</h3>
<p>The easier you can make it to do the "right things" for your environment, the more likely people are to do them. What these steps might be will of course vary widely, but common examples may include things like moving files into place (static assets, interpreted code), compilation and movement/packing of resultant binaries (for compiled language environments), application of database changes, and so forth. Almost all of these various steps may be automated via some mechanism (e.g., via scripting languages either directly on a command line, or perhaps in more sophisticated environments an actual deployment manager standalone application). Particularly large, multi-server systems may choose to roll deployments out in stages to increasingly larger subsets of their total infrastructure, effectively using A/B testing with progressively larger portions of their active user base to check for any problems as scale increases or negative user experience feedback; this is obviously much better done in as automated as possible fashion as the chance for simple errors increases dramatically as the number of manual interactions increases.</p>
<p>As a quick example of a simple implementation of this kind of setup, several years ago I worked in a PHP-based environment which used the "qa" and "prod" CVS tags to tag particular revisions of various files as being suitable for deployment to a particular environment. When a developer (with the right access privileges) accessed the deployment manager web application, he could select which tag to deploy; the system would do a CVS checkout of that tag to a scratch area and then do an rsync command to move all of the code (and associated static assets) to the appropriate server(s). This radically reduced the overhead of deploying code, although it was not perfect in the sense that database and other system config changes still required the involvement of the relevant systems teams. A variant of this in use at another organization similarly depended on conventional branches for "qa" and "prod", but instead of using rsync would use ssh to invoke svn up commands directly on each affected machine (Apache was configured to deny access to .svn directories).</p>
<p>One tool that seems often overlooked in this area is use of OS-native software packaging mechanisms to distribute content and execute scripts required for the given change set. These scripts may be either tailored to the particular release, or may be general standard scripts that by convention draw data from named portions of the source being deployed (e.g. a "db/001.sql &#8230; N.sql" file set might be iteratively applied in order if they exist, or a "etc/001.patch &#8230; N.patch" set of patch files might be applied in a similar fashion). Use of this sort of packaging system will make it much quicker to verify that a given app is installed, what files are associated with it, whether any of those files have been modified, and so forth, and also makes installation/upgrade/removal far more automated. Another example for a Java-based system might be an OS-level package that contains the compiled WAR file and pre-/post-install scripts to invoke the correct application server steps to install or update the application.</p>
	
<h3>Step Three: Monitoring and Verification</h3>
<p>Being able to keep a real-time watch on your system&#8217;s performance and user behavior is extremely important during and after a deployment. If server errors surge after a push, clearly the deployment will need to roll back, but other more subtle failure modes may also be important (a change that leads to increased latency on the site might hurt conversion/activity rates of users, for example). Having a system in place to collect and monitor these technical and business metrics will go a long way toward increasing your assurance that a given deployment has not introduced any issues.</p>
<p>In a related vein, having a suite of integration tests that you can run on production to quickly verify that all expected functionality is working at any given point in time can be extremely handy (so that you don&#8217;t have to wait for a user to stumble on the one out of the way use case that happens to now throw an error). This becomes particularly powerful in systems large enough that manual testing of the entire API/UI is inefficient. These integration tests must be distinguished from unit tests which are likely also part of your testing and deployment strategy, albeit at a more granular source-code level. In all cases, designing for modularity and testability will make your life much easier when it comes to verifying the behavior of your software, but that is a matter for another article.</p>
<p>The resources and further reading section below has links to a few different tools for both areas listed above. There are many other options, of course, so finding the best fit for your environment would be a matter of further research.</p>

<h3>Conclusion</h3>
<p>I hope this article has given you some insight into how to improve your deployment processes, with the goal being reduction in complexity and uncertainty related to making your system evolve to fit ever-changing business needs. The steps outlined above may be adopted/adapted to your organization in stages, but the more fully you adopt them the more synergistic benefits you will see. In all cases, the guiding principle should be to make it easier to do the right things for your environment and minimize end-user complexity. No matter what technology stack you are using, and no matter what type of application you are writing, getting deployment right can make the difference between going crazy from stress and having a happy, productive work day.</p>

<h3>Resources and Further Reading</h3>
<h4>Source Configuration Management Systems</h4>
<ul>
<li><a href="http://subversion.apache.org/"><span>Subversion</span></a> -- A common centralized SCM used by many organizations; free software.  Quality books covering "svn" are available from several publishers, and some are available online freely as well, e.g. <a href="http://svnbook.red-bean.com/"><span>the red bean svn book</span></a></li>
<li><a href="http://git-scm.com/"><span>Git</span></a> --  An increasingly popular distributed SCM, used by large projects such as the Linux kernel; free software.  As with svn above, git has several good texts in print and some are available online e.g. <a href="http://progit.org/book/"><span>Pro Git</span></a></li>
<li><a href="http://trac.edgewall.org/"><span>trac</span></a> -- A web interface to several common SCMs (svn, git, etc.); integrates a ticket management system and wiki as well as source browser, free.</li>
<li><a href="http://mtrack.wezfurlong.org/"><span>mtrack</span></a> -- Similar to trac but with several enhancements e.g. native ability to handle multiple projects per single install (trac as shipped is intented to have one instance per managed project)</li>
</ul>

<h4>Deployment Planning</h4>
<ul>
<li><a href="http://www.dokuwiki.org/dokuwiki"><span>dokuWiki</span></a> --  A common and full-featured wiki; free. PHP based so anything supporting that (apache or similar on Unix, IIS on Windows, etc.) should at least have a good chance of running it.</li>
</ul>

<h4>Deployment Automation</h4>
<ul>
<li> Scripting Languages&#8202;&#8212;&#8202;This will greatly depend on your environment, but almost any enterprise computing platform these days will have some sort of scripting mechanism, e.g. <a href="http://perl.org"><span>perl</span></a>, <a href="http://python.org"><span>python</span></a>, <a href="http://ruby-lang.org"><span>ruby</span></a>, etc.  (Windows versions in particular of things like perl and python may be obtained from <a href="http://activestate.com"><span>ActiveState</span></a> both freely and with support contracts.)</li>
<li><a href="http://rsync.samba.org/"><span>rsync</span></a> -- an intelligent method of syncing files between two computers, free software.</li>
<li><a href="http://rubyhitsquad.com/Vlad_the_Deployer.html"><span>Vlad the Deployer</span></a> --  a free, ruby-based deployment automation system. I&#8217;ve seen this used in-house in concert with additional development in ruby to produce Solaris and CentOS packages automatically as well as rolling them out to the target systems.</li>
<li><a href="http://rubyonrails.org"><span>Ruby on Rails</span></a> as a system deserves credit for thinking about deployment automation more than many other frameworks, e.g. database migrations and deployment managers like Capistrano/Bundler.</li>
<li>Your chosen operating system&#8217;s package management documentation; generally speaking any enterprise grade server operating system will have some sort of package management and documentation/guides will exist for how to make/maintain packages for that system.</li>
<li>Cloud-based deployments are another special case, as many "cloud" infrastructures are themselves scriptable to allocate/deallocate additional resources, making another level of potential automation as well as simply managing the deployment of code and config changes.  An example of this is <a href="https://github.com/nimbul"><span>Nimbul</span></a> from the New York Times (centered around Amazon&#8217;s set of elastic/cloud services).</li>
</ul>

<h4>Monitoring and Verification</h4>
<ul>
<li><a href="http://seleniumhq.org/"><span>Selenium</span></a> --  Selenium is a way to record and then play back web application interactions via browser, and is useful when constructing behavioral/integration tests to verify a site&#8217;s functioning.</li>
<li><a href="http://nagios.org"><span>Nagios</span></a> -- Commonly used infrastructure monitoring tool, can be a bit of a bear to set up the first time; free.</li>
<li><a href="http://www.cacti.net/"><span>Cacti</span></a> -- A graphing and trending application, free.</li>
<li><a href="http://circonus.com/"><span>Circonus</span></a> --  Circonus takes the setup and maintenance hassles out of monitoring and trending, available as a service.</li></ul>
]]></content:encoded>
            <pubDate>Tue, 01 Nov 2011 18:58:04 GMT</pubDate>
        </item>
        <item>
            <title>Your ORM Sucks</title>
            <link>http://omniti.com/seeds/your-orm-sucks</link>
            <guid>http://omniti.com/seeds/your-orm-sucks</guid>
            <description><![CDATA[I don&#8217;t like frameworks. Web application frameworks, ORMs, whatever.

I don&#8217;t mean that as harshly as it probably sounds. It&#8217;s something like saying, "I don&#8217;t like cooking with microwaves." They have their uses, certainly - I&#8...]]></description>
            <content:encoded><![CDATA[<p>I don&#8217;t like frameworks. Web application frameworks, ORMs, whatever.</p>

<p>I don&#8217;t mean that as harshly as it probably sounds. It&#8217;s something like saying, "I don&#8217;t like cooking with microwaves." They have their uses, certainly - I&#8217;m not going to scrub out a pan in the morning because I want to make oatmeal, for example - but there are limits to what they can do, and I think there&#8217;s a reluctance or inability to recognize that. I&#8217;m certainly not above nuking a pile of Bagel Bites, but I don&#8217;t tell myself that it&#8217;s haute cuisine.</p>

<p>Granted, like any framework, ORMs certainly have their uses, and most projects will benefit from using them in some capacity. No one likes writing the same boring INSERT, UPDATE and DELETE statements for every table. They enforce consistency - you essentially don&#8217;t have a choice about naming conventions or class structure anymore, so you can&#8217;t screw them up. They usually maintain relationships from the database as part of the code. Some have their own internal query cache. It&#8217;s usually easy to extend them. Unfortunately, based upon my cursory research, there appears to be at least one attempt at a MongoDB ORM, so I can&#8217;t use "don&#8217;t support nosql" as a point in their favor anymore.</p>

<p>So by all means, use an ORM every time. By which I mean that every repository should probably have one, and emphatically not that it should be used for every query. Because whatever the tool at hand: Zend Framework, Class::ReluctantORM, a microwave&#8202;&#8212;&#8202;there always ends up being a place where it doesn&#8217;t work, or doesn&#8217;t work very well, and you&#8217;re forced to do things the old-fashioned way. Sometimes the simple solution really is the best. Why would you bootstrap Zend and load up a bunch of model classes just to import some records? You can do that with a DBI handle and a perl script. Or flat files and sed/awk, probably. To some extent this is just a matter of opinion, and that&#8217;s fair, but there are situations where my way - the ugly, hacky way - is objectively and demonstrably better. Not always. But sometimes.</p>

<p>By Way of Example</p>
<ul>
<li>In what might be the canonical case of "Why Would You Do This", a listing of articles with headlines and perhaps publication dates, with the titles linking to individual article pages containing the full text. To render this list, the ORM-written query was selecting all fields from the table. Because that&#8217;s all it knew how to do; you ask for a list of articles and you get those articles, with no thought put into why you want them, or which fields you need, because it&#8217;s a generic tool and that&#8217;s all it knows how to do. Sometimes I think it helps to think of ORMs as the dumbest programmer you&#8217;ve ever worked with. Think of the query that guy would write, and that&#8217;s probably similar to the inefficient unreadable gloop you&#8217;re getting from the machine generation.</li>

<li>A three-layer navigation menu, with almost all the items on it determined by what was, or wasn&#8217;t in the database. After spending a few hours untangling what the thing was doing, it was something like this:

<pre>
<code>
select('e.event_id, e.name, e.url_name, i.url, 
 i.title, tr.title,  a.article_id, a.title, 
 ae.article_event_id, rg.title, rv.title,
 rm1.related_media_id, rm2.related_media_id, i.sort_order')
->from(CLASS . ' e')
->leftJoin('e.info_page i ON e.event_id = i.event_id 
 AND i.is_deployed IS TRUE AND i.pub_date <= NOW()')
->leftJoin('e.tour_results tr')
->leftJoin('e.articles ae')
->leftJoin('ae.article a ON a.article_id = ae.article_id 
 AND a.pub_date <= NOW() AND a.is_highlight IS TRUE 
 AND a.is_deployed IS TRUE')
->leftJoin('e.related_media rm1')
->leftJoin('rm1.photo_galleries rg 
 ON rg.photo_gallery_id = rm1.media_id 
 AND rm1.media_type = 'photo_gallery' 
 AND rg.is_highlight IS TRUE 
 AND rg.status = 1 AND rg.pub_date <= NOW()')
->leftJoin('e.related_media rm2')
->leftJoin('rm2.videos rv ON rv.video_id = rm2.media_id 
 AND rm2.media_type = 'video' AND rv.is_highlight IS TRUE 
 AND rv.status = 1 AND rv.pub_date <= NOW()')
->where('e.instance_id = ?', array(&#8230;))
->andWhere('e.deployed IS TRUE')
->orderBy('e.start_date ASC');
</code>
</pre>

This thing took around a second and a half to build and run the query, and returned 250 or so rows from the database. Then it took <i>30 more seconds</i> to parse it all into a nested structure of PHP objects. And for all that, the developer still had to write most of the SQL themselves. Given that it made an entire section of the site unusable, and that the replacement, hand-wrought query (for all of it&#8217;s faults) didn&#8217;t, I&#8217;m content to throw our ORM under the bus here.</li>

<li>A page to view poll results in a CMS admin. Either the ORM didn&#8217;t support anything as simple as "SELECT COUNT(*) FROM answers GROUP BY answer_id", or the person who wrote it didn&#8217;t think it was a problem to select 80,000 rows and then have PHP parse them into objects. Frankly I&#8217;m not thrilled by either alternative, and as you can probably guess, this thing ran out of memory and barfed on a pretty regular basis.</li>
</ul>

<p>The root of the problem (as with most problems) is not thinking critically, not being aware that all this magical query dust doesn&#8217;t come cheap.</p>

<p>You have to use the right tool for the job. It&#8217;s not uncommon for the balance between generic, easy to use and quick to develop, and bespoke, laborious and highly performant, to tilt sharply towards the latter. The pain in the ass here is that it&#8217;s not unusual for this sort of problem to lie dormant on a dev dataset (a dozen rows per table and just enough information to test out edge cases), and then one day rear up and slow your pages to a crawl or blow them up entirely, as soon as you hit real data. It&#8217;s up to the developer to have some notion about seeing this coming. Even then, everyone gets bitten by this from time to time.</p>

<p>What it boils down to is that if you write a bad query, one that does "SELECT * FROM tbl_huge LEFT OUTER JOIN tbl_big_mclarge" and returns an unnecessarily wide data set full of BLOBs, or that joins across a dozen tables when it only really needs 3, or that has a big stupid slow "SELECT &#8230; FROM &#8230; WHERE NOT IN (SELECT &#8230;)", or that tries to run a SQL "COUNT" in PHP, and it becomes a problem, it is your fault. I don&#8217;t care if you wrote the thing yourself, or if you used an ORM and <i>it</i> wrote the query, <i>it&#8217;s still your fault, and you are going to have to fix it</i>. "But that&#8217;s the way the product does it" is not an acceptable response. Ever. For anything. Code is running on your servers. You are responsible for it. A microwave makes wretched chicken, so I guess it&#8217;s time you learned how to work the stove, because I&#8217;m not eating that crap.</p>

<p>So by all means, use ORMs for your trivial cases, for basic stuff or where performance isn&#8217;t an issue. But it&#8217;s eventually going to hit a wall and you&#8217;ll have to do your own dirty work. And when that happens, you can&#8217;t say you weren&#8217;t warned.</p>]]></content:encoded>
            <pubDate>Fri, 21 Oct 2011 20:06:52 GMT</pubDate>
        </item>
        <item>
            <title>The Opportunity of Crises</title>
            <link>http://omniti.com/seeds/the-opportunity-of-crises</link>
            <guid>http://omniti.com/seeds/the-opportunity-of-crises</guid>
            <description><![CDATA[Nobody likes a crisis; they are difficult, troubling and sometimes dangerous. For most of us in web operations, the chances are slim that a crisis will be truly life-threatening, but when millions of dollars are on the line it can feel like a pressure ...]]></description>
            <content:encoded><![CDATA[<p>Nobody likes a crisis; they are difficult, troubling and sometimes dangerous. For most of us in web operations, the chances are slim that a crisis will be truly life-threatening, but when millions of dollars are on the line it can feel like a pressure cooker and have a negative impact on lifestyle, relationships and mental health. Most companies go to extreme lengths to avoid crises, and even when one does occur, the typical response is to first deal with it, and subsequently pretend it never happened. As if the memories are too painful to discuss, we avoid the topic all together; talking to your customers about it is quite risky.  It is probably best not to mention it at all; you should just move on. Unfortunately, for most organizations, reacting like this means missing a grand opportunity to make your company better.</p>

<p>Like any organization, we are always <a href="http://omniti.com/is/hiring"><span>on the lookout for new talent</span></a>. Of course, you want people who are "smart and get things done", but beyond that, I have found one particular personality trait to be critical to long term success at OmniTI; the ability to stay calm in a crisis. While I tend to think OmniTI does well in avoiding them, we do have a tendency to <a href="http://omniti.com/does"><span>attract customers with a lot on the line</span></a> and, apparently, with a critical mass of customers, so we are no strangers to crises. While it is pretty clear to me that composure under stress is a fundamental requirement in high-stakes jobs (like large-scale web operations), I think it is generally helpful in any situation.  Contingency planning can only get you so far, and when your packets are spilling all over the floor you need to keep a clear head about you to make sure you can assess and remediate as if you&#8217;ve done so since kindergarten. If you can&#8217;t remain calm, the situation can deteriorate quickly. Turning to the blame game before solving the problem at hand is a sure sign of such deterioration. You must fight that urge. If you can&#8217;t, your team can&#8217;t be as open with communications as you need them to be, and your recovery time will suffer. Be upfront about how you want your teams to respond, ideally before problems arise. There are real crises in the world where people die at the hands of companies; walking though one of these exercises can be humbling and enlightening; James Lukaszewski takes us through a "Death by Burger" scenario in his <a href="http://www.e911.com/monos/A001.html"><span>Seven Dimensions of Crisis Communication Management</span></a>, and outlines positive and negative ways that a company can respond to such an incident.</p>

<p>That said, resolving a crisis should not only be about solving the problem at hand. When calamities occur, it&#8217;s important to recognize that your company has an opportunity for introspection. What is it about your processes that led to the crisis you&#8217;ve just survived? Do your process and tool chains do everything they need to do? Don&#8217;t just determine if they work, but do they do the job in they way you would like the job to be done. Seldom will you arrive at good answers to these questions through the normal course of business. Even if you think failure is human (perhaps especially when you think so), it&#8217;s important to understand what processes failed or what information was unavailable that led to this human error. That information is crucial because, in most cases, the people on your team are acting in a manner they think is safe and appropriate&#8202;&#8212;&#8202;and in the best interest of the company. The knee-jerk reaction in these cases is often a summary dismissal, but that will often leave you with the core issue unaddressed: they thought their actions were acceptable. If you fail to gain an understanding of the underlying causes, this bleak episode is likely to become a rerun; either with a new employee, or perhaps with an existing team member who also doesn&#8217;t understand where the appropriate lines need to be drawn.</p>

<p>One thing I believe is very helpful is to look at how others handle these things. The Internet is new, Web Operations even newer; but crisis management and postmortem analysis are not. Quite often I see people lay blame at either the wrong people or processes in times of (and even after) failure. Ideally you should not be trying to lay blame at all, but instead figure out where improvements are needed. Many people mistakenly assume that crises are born out of mistakes; often they are not. As businesses grow over time, it&#8217;s easy for plans that were once appropriate to become inadequate. You need to look at your systems holistically. For folks in Web Operations, a healthy understanding of <a href="http://www.ctlab.org/documents/How%20Complex%20Systems%20Fail.pdf"><span>why complex systems fail</span></a> can help you gain a better perspective.</p>
 
<p>If you are running a team, you owe it to yourself to turn crises into dialogue. If your customers were affected, be honest with them about where things went wrong, and why what you did was the appropriate thing or how you plan to adjust course going forward. Be careful not to overreact; the goal should not be to add more process, but rather to improve process. Your next crisis is your next great opportunity to learn more about your organization and to strengthen it for the future. Don&#8217;t miss it!</p>]]></content:encoded>
            <pubDate>Mon, 12 Sep 2011 21:56:46 GMT</pubDate>
        </item>
        <item>
            <title>Security Is Not a Feature, It&#039;s a State of Mind</title>
            <link>http://omniti.com/seeds/security-is-not-a-feature-its-a-state-of-mind</link>
            <guid>http://omniti.com/seeds/security-is-not-a-feature-its-a-state-of-mind</guid>
            <description><![CDATA[ "Is our site secure?"

That is never a question you want to hear when launching a new website. And it is also an impossible question to answer. The technical definition of security is "the state of being free from danger or injury"; but you can never ...]]></description>
            <content:encoded><![CDATA[ <h3>"Is our site secure?"</h3>

<p>That is never a question you want to hear when launching a new website. And it is also an impossible question to answer. The technical definition of security is "the state of being free from danger or injury"; but you can never protect anything perfectly all the time. So the question should be, "Is our site secure enough?"</p>

<p>And it is never a question that should be asked at launch time. Security must be part of the planning, part of the programming, part of the testing, part of the deployment process and, finally, part of the monitoring and upkeep of a site. It should be part of every stage of development, however, "security at every level" doesn&#8217;t have to cost extra time or money, and in fact it shouldn&#8217;t. If "site security" means millions of dollars or weeks of extra work, then the problem lies with the website development process and its creators. When you start treating security as a "feature" instead of a necessary part of your development process, it becomes a resource eating monster.</p>

<p>Far too many IT professionals, managers and others involved in creating new sites view security as a last-minute feature in the push to "get it out fast." In fact, generally the argument for not attending to it is: "We can&#8217;t waste time or money on security features; we have to get this site launched!"</p>

<p>You won&#8217;t have a site, or a business, for long if someone manages to retrieve your entire database of personal information or credit cards. Like most things in life, a balancing act is required. Defining what security is exactly, and what it means for a site is always the hard part. What does "security" mean in a website or a web application? It means being defensive.</p>

<p>"Defensive driving" is a term thrown at every student in drivers training. It means to drive as if every other person on the road were an idiot trying to hit you, because the majority of them are less-than-fantastic drivers and being aware of the danger is half of the solution.</p>

<p>Any developer working on a website should be thinking in the same manner: Every user is an idiot trying to break the site. However, the reaction to that constant danger should be equal to the needs of the website. When driving on a sunny, dry road in broad daylight, a driver can be far less diligent than when driving on a wet road, in a blizzard or in the dark. The conditions of the road are going to affect stopping distance, maneuverability and the ability to avoid hazards. The amount of diligence needed for a website should be equally tailored to environmental conditions. An e-commerce site has far different needs than a social networking site, or a fan site for an author or artist.</p>

<p>Having a plan&#8202;&#8212;&#8202;from the beginning&#8202;&#8212;&#8202;for the important issues with the site is a necessary first step. Implementing the plan as part of your general process shouldn&#8217;t be the end of the line, however. The other critical piece of the puzzle is ongoing maintenance. Sites and audiences change, and those changes will mean new challenges. Proper monitoring and maintenance of a site is part of the process of security.</p>

<p>Knowing a site&#8217;s operating environment and type of users will help to define what security measures are needed up front, eliminating the problems inherent with trying to "bolt on" security after the fact. Even a general overview of what kind of information a site is going to collect and distribute is enough to have an idea of what kind of audience that site will attract. It is far easier to leave room for future security enhancements than to try to plug holes in an existing system.</p>

<p>So take the time to sit down before you start creating the site and answer some of the following questions.  Record the answers in a document and put it with your code so you can refer back to the answers.</p>
<ol>
<li>What kind of data am I going to be collecting and storing?</li>
<ol style="list-style-type: lower-alpha">
<li>Basic Information (Names and email addresses)</li>
<li>Personal Information (Phone numbers, physical addresses)</li>
<li>Asset Information (Credit Card numbers, bank information)</li>
<li>Identifying information (SS#, Drivers license numbers)</li>
<li>Business Information</li>
<li>Medical Information</li>
</ol>
<li>What kind of physical system am I going to be using and who has access?  This includes backups.</li>
<li>What kind of software am I going to be using and how will it be maintained?</li>
<li>What kind of ongoing system will be put in place to maintain the system and data?</li>
</ol>

<p>These questions will give you an excellent idea of how much concern for security your site will warrant.  The higher the level of information collected, the greater security you&#8217;ll need.  The less control you have over the physical systems in place, the more diligent your security measures need to be.  The less control you have over the software in place, the more security measures you may need to put in place. If you have little budget for ongoing monitoring, you&#8217;ll need to invest more in automating more security measures up front.</p>

<p>Remember that no matter what kind of site you are creating, the basics can never be ignored.</p>
<ol>
<li>Keep your software up to date with security fixes</li>
<li>Validate all input</li>
<li>Escape all output</li>
<li>If you&#8217;re dealing with something sensitive - use SSL for logins (the industry is showing signs of adopting SSL for everything).</li>
<li>Use sftp or scp or at the very least ftps for transferring files from your server</li>
<li>Regenerate a user&#8217;s session when access permissions change</li>
<li>Validation should always be done server side, even if you have javascript checks</li>
</ol>

<p>If security becomes part of your state of mind at every step along the way, instead of a last-minute, add-on feature, you&#8217;ll never have to answer the question "is our site secure?" because you&#8217;ll always be aware that it is secure enough.</p>]]></content:encoded>
            <pubDate>Thu, 11 Aug 2011 16:53:49 GMT</pubDate>
        </item>
        <item>
            <title>When things go wrong - a case study</title>
            <link>http://omniti.com/seeds/when-things-go-wrong-a-case-study</link>
            <guid>http://omniti.com/seeds/when-things-go-wrong-a-case-study</guid>
            <description><![CDATA[Theo Schlossnagle is very fond of pointing out that in operations, you can
never succeed in fulfilling expectations.

"Operations crews are responsible for the impossible: it must be up and
functioning all the time. This is an expectation that one can ...]]></description>
            <content:encoded><![CDATA[<p>Theo Schlossnagle is very fond of pointing out that in operations, you can
never succeed in fulfilling expectations.</p>

<p>"Operations crews are responsible for the impossible: it must be up and
functioning all the time. This is an expectation that one can never exceed."
(<a href="http://omniti.com/seeds/instrumentation-and-observability"><span>Instrumentation and observability</span></a>)</p>

<p>So, this article is about a time when things went wrong. It&#8217;s not about an
emergency situation where services were down, but more a subtle issue that
almost went unnoticed. We will review how the issue was detected, how it was
fixed&mdash;and most importantly&mdash;how a root cause was determined.</p>

<p>This issue affected a production website for one of OmniTI&#8217;s clients. They
had three web servers, all connected through a front-end load balancer
appliance.  (There were database servers as well, but they aren&#8217;t relevant for
this story.) Like any good (or even half decent) load balancer, it checks
often (every few seconds) to ensure that that the web servers are up and
serving web pages. If the web server appears down or isn&#8217;t responding, then
the load balancer stops directing traffic to that web server. This gives you a
measure of redundancy in addition to load balancing, if you have enough web
servers to cover the incoming requests even with some out of commission.</p>

<p>The problem was uncovered by chance when working on the load balancer. We
spotted that the load balancer was misdetecting that a server was down, taking
it out of service, and a few seconds later on the next check, it would bring
the server back into the rotation. This cycle repeated over and over, with
each of the web servers being taken out of service for a short period. During
this time, the site was still available: at least one of the servers was
continually in service. In addition, our monitoring showed that everything
appeared to be OK, both the external checks against the main website, and the
checks against the individual web servers.</p>

<p>Having discovered the issue, the troubleshooting began. One of the first
things to look at with any issue are log files. When set up properly, logs go
a long way in telling you what is going on; the hard part is figuring out
which logs have the information you need.</p>

<p>The first log file we checked was the load balancer log. It had entries
that looked like the following, that corresponded with the service failures:
</p>

<p><code>Monitor_http302_of_foowidgets-www1:http(192.168.1.51:80): DOWN; Last
response: Failure - TCP syn sent bad ack received with fin</code></p>

<p>So, according to the load balancer, the reason for the failure is 'TCP syn
sent bad ack received with fin'. The error message is highly technical and not
particularly helpful.</p>

<p>Here&#8217;s a quick (and incomplete) overview of TCP to explain what that
means:</p>

<p>When you open a connection, packets are sent back and forth with various
flags set - the relevant flags here are SYN, ACK, and FIN. The opening
sequence goes something like:</p>

<img alt="when-things-go-wrong_diagram-1.png" src="http://images.omniti.net/omniti.com/i/b/when-things-go-wrong_diagram-1.png" width="448" style="margin-bottom: 1em;" />

<p>And to close the connection:</p>

<img alt="when-things-go-wrong_diagram-2.png" src="http://images.omniti.net/omniti.com/i/b/when-things-go-wrong_diagram-2.png" width="448" height="288" style="margin-bottom: 1em;" />
<p>The explanation for the "syn sent bad ack received with fin" error is
likely to be:</p>

<img alt="when-things-go-wrong_diagram-3.png" src="http://images.omniti.net/omniti.com/i/b/when-things-go-wrong_diagram-3.png" width="448" height="288" style="margin-bottom: 1em;" />

<p>At this point, the load balancer gets very upset and sulks in the corner
(well, it prints the weird log message). I&#8217;m guessing that the bad ACK/FIN is
probably from a previous connection, but at a high level: "Something weird is
going on with networking".</p>

<p>When things are screwy with the networking, you need to look in detail at
what is going on across the network and try to work out what&#8217;s going wrong.
The tools to do this are tcpdump and wireshark.</p>

<p>The load balancer is an appliance with its own custom software, but
underneath it&#8217;s just Unix. You can get a shell and run tcpdump to see what is
going across the network. Wireshark is essentially a graphical version of
tcpdump and is used here to analyze the network traffic.</p>

<p>I grabbed all of the traffic, opened it with wireshark, and limited the
view to just the traffic going to the web servers exhibiting the problem. The
wireshark filter is:</p>

<pre><code>
ip.addr == 192.168.1.254 &&
    ( ip.addr == 192.168.1.51 ||
      ip.addr == 192.168.1.52 ||
      ip.addr == 192.168.1.53 ) &&
    tcp.port = 80
</code></pre>

<p>The 192.168.1.254 IP is the load balancer, and 51-53 are the web servers.
</p>

<p>The following is the output of two complete HTTP transactions on the
monitors:</p>

<img alt="when-things-go-wrong_screenshot.png" src="http://images.omniti.net/omniti.com/i/b/when-things-go-wrong_screenshot.png" width="989" style="margin-bottom: 1em;" />

<p>There is no SYN sent with bad FIN/ACK; everything is green and looks just
fine. So I rechecked the load balancer console, and it showed eight checks
went out in the same time frame as the tcpdump. However, I saw only two in the
previous tcpdump. Something wasn&#8217;t right.</p>

<p>At the same time that this was going on, I was in touch with support for
the load balancer vendor. They were very helpful (in the sense that they did
everything they could without actually getting to the bottom of the problem),
asking for several tcpdump traces, and even escalating to their engineering
team. At this point we were convinced that the monitor that checks whether the
server is down was broken, and reporting the server down, when it wasn&#8217;t.
Unfortunately, none of this shed any light on the underlying issue.</p>

<p>We had also checked the usual culprits:</p>

<ul>
    <li>Checked that the backend servers were up</li>
    <li>Checked the physical cables. Other services were transiting the same physical path and they were fine.</li>
    <li>Ran tcpdump on the server&#8217;s network interface that linked it to the load balancer, this showed the same thing as the dump from the load balancer.</li>
    <li>Checked the configuration of the monitor on the load balancer. Other services were using an identical configuration without issues.</li>
</ul>


<p>Then, the crucial discovery:</p>

<ul>
    <li>Not enough checks were being sent out (we spotted this before)</li>
    <li>The support representative casually mentioned seeing traffic going out
    of the load balancer through another MIP. He thought maybe some of the
    checks were going out of this other IP.</li>
    <li>None of us realized the significance of this at the time, and a couple
    of days went by&mdash;with support convinced there was an issue with the
    backend servers, and me running as many checks as I could to try to
    prove/disprove that the backend servers were an issue.</li>
</ul>


<p>Here&#8217;s a bit of explanation of what was going on:</p>

<p>The load balancer has 3 different types of IPs (simplifying a little):</p>
<ul>
    <li>VIP - Virtual IP</li>
    <li>MIP - Mapped IP</li>
    <li>SNIP - Subnet IP</li>
</ul>


<p>A VIP is an IP upon which you run your virtual servers. These are what
client traffic hits.</p>

<p>A MIP is described as: "You use MIP addresses to connect to the backend
servers." (from the vendor&#8217;s knowledge base).</p>

<p>A SNIP is described as: ". . .an IP address that enables you to access a
load balancer appliance from an external host that exists on another subnet."
(from the vendor&#8217;s knowledge base).</p>

<p>From the explanation above, it makes sense that you would configure a MIP
to connect to the backend server. This is what we did when originally setting
up the load balancer, and it turned out to be completely the wrong thing to
do, although it did work for a while.</p>

<p>Some more explanation - Multihomed networking 101:</p>

<ol>
    <li>A server has two interfaces on two different subnets - 192.168.1.0/24 and 192.168.2.0/24.</li>
    <li>The server wants to send a packet out to 192.168.2.10. To do this, it looks up the address in the routing table and sees that it should send the packet out of the second interface.</li>
    <li>Also, the source IP of the packet sent out is the IP address that is associated with the second interface. For example: 192.168.2.2.</li>
</ol>

<img alt="when-things-go-wrong_diagram-4.png" src="http://images.omniti.net/omniti.com/i/b/when-things-go-wrong_diagram-4.png" width="448" height="270" style="margin-bottom: 1em;" />

<p>If you&#8217;re familiar with networking, the above explanation should sound
pretty straightforward. The problem is, the MIPs on the load balancer don&#8217;t
work like that. Things work the same for steps 1 and 2 above, but instead of
matching the source IP address of the packet to the interface it&#8217;s sending out
on, it just picks one IP from the list of available IP addresses:</p>

<img alt="when-things-go-wrong_diagram-5.png" src="http://images.omniti.net/omniti.com/i/b/when-things-go-wrong_diagram-5.png" width="448" height="270" style="margin-bottom: 1em;" />

<p>Now, to be fair to the load balancer vendor, this is the correct behavior
for MIPs when you read more into what they are for. They&#8217;re 'last resort'
source IPs when nothing else is suitable, (i.e. you don&#8217;t have a matching IP
on the same subnet). Because it&#8217;s a 'last resort' IP, it just picks one.</p>

<p>We had 3 different MIPs at the time, one for an external network, one for
the client network, and one for our internal network. This meant that fully
two-thirds  of traffic from the load balancer was getting sent out from the
wrong IP:</p>

<ul>
    <li>192.168.1.254</li>
    <li>192.168.2.254 - wrong network</li>
    <li>1.2.3.254 - external network</li>
</ul>

<p>Believe it or not, this shouldn&#8217;t have mattered, and in fact didn&#8217;t matter
for most of our services. The default route of the server was through the load
balancer - it had to be to answer client requests, which came from external IP
addresses.</p>

<p>However, by a horrible quirk of routing, on the backend web servers,
192.168.2.X was set to go out on a different interface, and traffic wasn&#8217;t
getting sent back to the load balancer, meaning 1 in 3 monitor responses
weren&#8217;t getting sent back.</p>

<p>This also meant that each web server was not serving traffic 33 percent of
the time and we were effectively running off of two web servers. If the right
combination of monitors went off, all three servers could be taken out of the
rotation.</p>

<p>The temporary fix was to make sure that 192.168.2.0/24 went out via the
load balancer. A single command fix gave a 50 percent capacity boost. The fix
was just in the nick of time, too &mdash; just four days after I made the fix, the
site was featured on the front page of msn.com and we got the biggest traffic
spike ever in its history.</p>

<p>This fix was only temporary, as we were still having checks originate from
the wrong IP, and the real fix was to use the Subnet IPs which, as their name
suggests, actually respect subnetting.</p>

<p>As with all complex systems, the problem was caused by a number of
different issues combined:</p>

<ul>
    <li>The documentation was misleading, which lead to the wrong IP type
    being configured on the load balancer.</li>
    <li>Multiple networks on the backend server combined with virtualization,
    which led to incorrect routing when combined with the check originating
    from the wrong address.</li>
    <li>Fault tolerant systems combined with the tiny outage duration for
    individual web servers masking the issue and greatly increasing time to
    detection.</li>
</ul>

<p>Experience is always earned the hard way. Having things go wrong leads to
a deeper understanding of how complex systems work, as shown in the example
above&mdash;at the end, we had a much better grasp of the inner workings of the
system than before. Gaining this understanding is essential to prepare you for
new technologies and production troubleshooting.</p>

<p>When a technology "just works," it is pretty much guaranteed that you
don&#8217;t know "how" it works (at least deeply). The real challenge is building
architectures where usual, run-of-the-mill mistakes cause no disruption of
service. That is an art and the artwork is an invaluable resource to the
organization: it provides a canvas for learning and gaining hard-won
experience.</p>
]]></content:encoded>
            <pubDate>Mon, 20 Jun 2011 18:02:33 GMT</pubDate>
        </item>
        <item>
            <title>Write a Better FM</title>
            <link>http://omniti.com/seeds/write-a-better-fm</link>
            <guid>http://omniti.com/seeds/write-a-better-fm</guid>
            <description><![CDATA[If you&#8217;ve been around software for any time at all, you&#8217;ve encountered the type. You ask what seems to you as a reasonable question, and the belligerent sorts fall over themselves to be unhelpful; calling you lazy, an idiot, or worse, and t...]]></description>
            <content:encoded><![CDATA[<p>If you&#8217;ve been around software for any time at all, you&#8217;ve encountered the type. You ask what seems to you as a reasonable question, and the belligerent sorts fall over themselves to be unhelpful; calling you lazy, an idiot, or worse, and telling you to RTFM. If you&#8217;re lucky, they&#8217;ll tell you where the FM is. If you&#8217;re not, they&#8217;ll tell you to STFW for it.</p>

<p>For those not up on the acronyms, RTFM and STFW stand for Read The Manual, and Search The Web, respectively.</p>

<p>The trouble is that there&#8217;s a direct correlation between the probability that you&#8217;ll be told to RTFM, and the probability that the FM is rubbish. That&#8217;s because humility, patience and a willingness to help beginners go hand-in-hand with producing good documentation.</p>

<p>The burden falls on those of us within the software community to write a better FM.</p>

<p>For the last ten years or so, I&#8217;ve been involved in several efforts to write better documentation on several Open Source projects. I&#8217;ve noticed some trends in documentation. While some projects have stuck to the tried-and-not-so-true, RTFM, "the newbie is a loser" style of customer support, an increasing number of projects have moved to customer-centered documentation and customer-centered support.</p>

<p>If we want customers to RTFM, we are obliged to write a better FM.</p>

<h3>I&#8217;m Not a "User"</h3>

<p>Respect is very important. If you are unable to treat beginners with patience and respect, you shouldn&#8217;t be doing customer support, or writing documentation. While there are places for banter and inside jokes, technical documentation is not one of them.</p>

<p>Thinking of your audience as customers, rather than as "users", or, worse yet, "lusers",  will greatly influence how you write.</p>

<h3>Listen To the Questions</h3>

<p>While this may seem blindingly obvious, it&#8217;s clear from the documentation of many products, and in particular their so-called 'Frequently Asked Questions&#8217;, that they have no idea whatsoever what questions actual users of their product are asking.</p>

<p>To know what questions are being asked, you should frequent the places where these frustrated users hang out after not finding the answers in your documentation. These places tend to be web forums full of bad advice and broken examples. When you feel your irritation building, remember that they exist because your documentation wasn&#8217;t filling the need, and so these third-party sites sprung up to fill it.</p>

<p>You will quickly find that they tend to be full of people asking the same questions, again and again, and getting a variety of answers of varying quality. It is now your job to make sure that the official documentation answers these questions correctly, showing best-practice solutions to the real-world problems, and makes them easy to find. Failure to do so simply drives people to these question-and-answer sites where they will continue to get bad advice.</p>

<h3>Ask Smart Questions</h3>

<p>Several years ago, Eric Raymond wrote a document about how to ask smart questions. While this seemed like a good idea at the time, it has since become a lengthy tome that no beginner will ever actually read, and which drips with condescension.</p>

<p>The document states three things:</p>

<ol>
<li>Try to find the answer yourself before asking.</li>
<li>Provide all relevant supporting data with your question.</li>
<li>If you don&#8217;t understand the answer, it&#8217;s probably because you&#8217;re too stupid to live.</li>
</ol>

<p>Points 1 and 2 are good, right and important. Unfortunately, point 3 colors the tone of the whole document. Indeed, the word "idiots" appears first in the second paragraph. Although it seems that Eric thinks he&#8217;s being funny, instead he insults everyone who doesn&#8217;t know as much as him.</p>

<p>Down at the very bottom of the document is a section titled "How To Answer Questions in a Helpful Way." This is the most (and perhaps the only) useful part of the document, and well worth reading.</p>

<p>While it is indeed important to ask smart questions, and not expect that someone is going to hold your hand every step of the way, as documentation authors, it&#8217;s important to cast our minds back to when we first started&#8202;&#8212;&#8202;how lost we felt, and how we didn&#8217;t even know what questions to ask. If that was too long ago, you can readily refresh your mind by picking a software product you&#8217;re unfamiliar with, in a language you don&#8217;t know, and trying to install it and get it running. It will all come back to you.</p>

<p>As Donald Rumsfeld famously remarked, there are also unknown unknowns, the ones we don&#8217;t know we don&#8217;t know. Start by assuming that your customer isn&#8217;t an imbecile, but that they may not know what questions they should be asking.</p>

<p>Help your customers know what questions to ask by structuring your documentation in terms of how they are going to use it. Segment the documentation by audience (Developer, User, Administrator), and then further by task (Installation, Reporting, Upgrading) rather than in seemingly arbitrary groupings like "How-To" and "Other Topics".</p>

<h3>No Stupid Questions</h3>

<p>You&#8217;ve often heard it said that there are no stupid questions. While this is obviously false, much documentation seems to start with the assumption that all questions are stupid. There is a middle ground.</p>

<p>You must start by assuming that questions are smart and useful.</p>

<p>Frequently when I&#8217;m watching in an IRC channel, someone will ask "How do I do X?", and the immediate response is "You don&#8217;t do X, you idiot! That&#8217;s a stupid thing to do! Did your mother ever drop you on your head?"</p>

<p>Rather than treating them like a teenager, try to imagine that the person asking the question is a professional, like yourself, working on a project that might not have been their idea, but that nevertheless they need to get working.</p>

<p>Likewise, when writing documentation, keep in mind the real-world problems that are faced when using your product. Remember that not everyone has the in-depth knowledge of the inner workings that you do. And, most importantly, remember to treat your customers with respect, all the time.</p>

<p>In practical terms, this means avoiding words like "obviously", "simply", and "just", while providing many immediately usable examples with detailed explanations of each point.</p>

<h3>Laziness, Impatience and Hubris</h3>

<p>Larry Wall once declared that the virtues of a programmer are laziness, impatience and hubris. This is often taken as license to be a jerk. I would assert, to the contrary, that the virtues of a documentation writer are laziness, patience and humility.</p>

<p>Yes, it&#8217;s important to be lazy. When someone asks a good question, answer them thoroughly, in exhaustive detail, and then publish the response so that the next time the question is asked, you can answer with a URL. Doing something well once beats doing it poorly, again and again.</p>

<p>You must be patient.  Being impatient with a customer implies that you think they are either being intentionally obtuse, or that they are just too stupid to understand what you are saying. Being patient with them shows respect. The patient, respectful answer will stick with them, while the impatient rude answer will be remembered only as an unpleasant experience to put behind them.</p>

<p>And you should be humble. I find it useful to remember that the person I&#8217;m talking to is probably an expert on something of which I am completely ignorant. I also find it helpful to remember the first time I was asking questions, and the way that I was treated at the time.</p>]]></content:encoded>
            <pubDate>Fri, 22 Apr 2011 01:51:05 GMT</pubDate>
        </item>
        <item>
            <title>Integrating Search</title>
            <link>http://omniti.com/seeds/integrating-search-into-your-application</link>
            <guid>http://omniti.com/seeds/integrating-search-into-your-application</guid>
            <description><![CDATA[

 Why Do It?

This may seem like silly a question at first, as search seems to be everywhere and its usefulness is apparent in so many contexts. The context you should consider though
is your application. Ask yourself how search might benefit your app...]]></description>
            <content:encoded><![CDATA[<img alt="search-image.jpg" src="http://images.omniti.net/omniti.com/i/b/search-image.jpg" width="448" height="220" class="mt-image-none" style="" />

 <h2>Why Do It?</h2>

<p>This may seem like silly a question at first, as search seems to be everywhere and its usefulness is apparent in so many contexts. The context you should consider though
is your application. Ask yourself how search might benefit your application and your users. Keep in mind that search is not a replacement for a good user interface. A good user interface should make it easy for the user to locate what they&#8217;re looking for on your site, without typing anything into a search box.</p>

<p>There are many <a href="http://usability.gov/guidelines/"><span>browse-dominant users out there that prefer to click</span></a> rather than put their hands to the keyboard, no matter how prominently you display a search box or how well your search performs.</p>

<p>It shouldn&#8217;t be a chore  for the user to find the contact information on your web site. I&#8217;ve looked at thousands of crappy flash restaurant web sites. If you have a search input box on your website and I have to type "address" into it because I can&#8217;t find it on the site, there is a problem.</p>

<p>A bad search implementation can hurt the usability of your website. If your search functionality is very unfriendly or unaccommodating when it comes to search terms, people will become frustrated and potentially give up on what they wanted to do at your site. This is especially true of search-dominant users of your site. Your search should be able to handle commas, apostrophes, hyphens and other punctuation.</p>

<h2>Deciding What you Need</h2>

<p>When implementing search capabilities for a web application, many developers might rush into integrating a known solution without asking a few key questions first. While these questions may seem a given and lead you to the same tool, many people don&#8217;t ponder too deeply about them and the result is an adequate, but not optimal, search functionality for the application. Let&#8217;s take a look at some initial considerations.</p>

<h3>1. What should be searchable in my application?</h3>

<p>Let&#8217;s say I have an online shopping cart. It seems reasonable that I would want my users to search the products. Which products exactly? Not products that haven&#8217;t been published from admin yet or discontinued products; and perhaps I don&#8217;t want to show products that are out of stock. Lay it all out. Everything that you expect to be searchable and the conditions that have to be met.</p>

<h3>2. How do I want my users to interact with the search functionality?</h3>

<p>How flexible will the search be with what user&#8217;s type in? Will it show automatic suggestions as the user types? Will it attempt to fix misspellings?</p>

<h3>3. Direct linking?</h3>

<p>Should particular search terms go straight to a particular page instead of showing results? If there was only one result, it seems obvious to advance the user directly to their goal.  Sometimes, in the case of certain search terms, you may still want to lead the user on a very specific journey.</p>

<h3>4. How current do search results have to be?</h3>

<p>Is it imperative that the product/blog post I just added be immediately available in the search results? If not, how much lag time is acceptable?</p>

<h3>5. What kind of search options do I want to provide the user?</h3>

<p>Are there advanced search options, including date ranges, sort order and relevancy? Note that you shouldn&#8217;t overload the user with advanced search options from the start. There should be a simplified version of search that is the default, however the advanced options should be easily accessible.</p>

<h3>6. Do certain results rank or weigh higher than others?</h3>

<p>For example, If I search for tomato, does your blog post about your grandma&#8217;s spaghetti recipe come up before the result for the contact page that has my address as 123 Tomato St?</p>

<h2>Initial Setup</h2>

<p>Once you&#8217;ve decided what is best for your application and its users with regard to the application&#8217;s search functionality, you can start looking around at the available tools to implement it. You will want to map your search functionality needs to the capabilities  provided for by the search tools. See how well these tools perform. Users expect search to be fast, they really don&#8217;t care how much information you have to go through to find what they want. More than likely, the information you want to search is stored in a database. Ideally, one does not want to do full text searches against the database. It is expensive for the database and if you have a high-traffic site where searches are going to be performed fairly often, steer clear of it.</p>

<p>Instead, many popular search tools bring search outside of the database scope by indexing the data you want searched in your database. This usually means that I will choose the table columns that have the information I want available for search, such as product name and product description. I would then create a script that pulls this data and adds it to my index (note that index size is usually 20-30% of the initial data being indexed, depending on the search tool). You will more than likely want to run this script from a cron job to refresh your index on an interval that is dependent upon your needs. Note that you can add and delete from the index as items are inserted and deleted from your database. This means that your need to refresh the entire index may change if you use this approach.</p>

<p>A big chunk of the work that you will face is the initial index setup work required to provide for the features and conditions you want in place for your search. Features such as wild-card queries, sorting, field weighting, multiple merged indexes, multi-faceting, ranking, result clustering and date ranges are some examples.</p>

<h2>Presenting the Results</h2>

<p>Depending upon your application and how advanced you want your search to be, you can make assumptions and educated guesses about the intended results. Amazon is a good example. As of the date of this article, typing in "Black Swan" into the search box returns results for "The King&#8217;s Speech", "The Fighter", and "True Grit"--all within the top 10 search results. Amazon is making the assumption that I may be interested in other Academy Award winning movies. Their end goal is that I will purchase those, as well. This makes sense for Amazon, does it make sense for your application? How search-centered is your application? Factoring in these types of assumptions makes for a lot more complexity in your search application. How long before those Academy Award related search results fade away and I am left with only "Black Swan" results?</p>

<p>Most results are displayed by relevancy, which makes sense; sometimes it makes sense to sort the results by date and can be a helpful option to those perusing the result set. Providing match context in your results can also be helpful. Consider users whose search term matches an exact phrase in an article on the site, but in the search results, only the article title is displayed. While it may be the very article the user was searching for, the user may not realize that they found what they were looking for in that search. Showing match context for the search terms in your display results, where applicable, can be helpful to users.</p>

<p>Often, a user will know meta information about what they want. Information such as, "I know it was in a blog post" or "I know it posted around Christmas time." While we are not mind readers, we can still make it easier for our users to find what they want, based upon the meta data they know.</p>

<p>One way to do this is by clustering your search results into relevant groups. When displaying results, instead of showing the relevant results for everything mixed together, show the relevant results grouped by a strong meta identifier, some example relevancy groups being "Articles", "Products", and "Users", or perhaps by year 2009, 2010, 2011.</p>

<p>Is there a mobile version of your application? How do your search results look there?</p>

<p>Many times, site searches are implemented with Google site search. This is not a bad idea depending upon your site content and search requirements, however keeping those search results contained on your site as opposed to sending the user to a google site search result page, keeps the user engaged on your site and is less confusing than being redirected. Google site search provides for this functionality.</p>

<p>Avoid presenting your search results to look like something from a Google text ad. Many people will think that is what it is and it will be ignored.</p>

<h2>Post Implementation</h2>

<p>What are people actually searching for? Are you monitoring the queries that are coming through your search form? What are some of the top queries? What results are being given to the user for these queries? Are they what is expected? Most developers will implement search functionality and make sure that it functions, but fail to monitor or provide tests to ensure the search is actually useful after being implemented.</p>

<p>Search is an important part of the web, and the technology behind it is becoming smarter and faster. Take advantage of it, but first take the time to discover your application needs and how you can best serve your users.</p>]]></content:encoded>
            <pubDate>Tue, 12 Apr 2011 01:19:32 GMT</pubDate>
        </item>
        <item>
            <title>The Web Developer&#039;s Guide to Writing Native iOS Apps</title>
            <link>http://omniti.com/seeds/the-web-developers-guide-to-writing-native-ios-apps</link>
            <guid>http://omniti.com/seeds/the-web-developers-guide-to-writing-native-ios-apps</guid>
            <description><![CDATA[ Ever since the release of the first iPhone and the first official iPhone/iOS SDK, mobile computing has taken a huge leap into the handheld realm. Where only a few years ago, it was normal to see hipsters hunched over laptops while sitting at Starbucks...]]></description>
            <content:encoded><![CDATA[ <p>Ever since the release of the first iPhone and the first official iPhone/iOS SDK, mobile computing has taken a huge leap into the handheld realm. Where only a few years ago, it was normal to see hipsters hunched over laptops while sitting at Starbucks sipping their lattes, and surfing the web on the free wifi; nowadays they&#8217;ve broken free from their local coffee franchises (although still addicted to the caffeine) and roam the outside world, still surfing websites but now on their iPhones over 3G connections.</p>

<p>So how do you, as a veteran web developer, take advantage of this phenomena to write really cool mobile apps that engage the user on a whole new level? Well, there are several ways. You can bite the bullet and learn Objective-C, CocoaTouch, iOS SDK, and spend several months writing a true, native iPhone app. You can stay in your comfort-zone and write mobile-sized HTML5 web apps that sit, hands-tied to your servers with very limited device-centric capabilities. You can cheat a little bit and use PhoneGap or some other library to write HTML5 mobile web pages that can be submitted as apps to the App store. Or, you can cheat a lot and use Appcelerator to write your code in a language you&#8217;re very familiar with, on a cross-platform API with practically all the device capabilities available to you&#8202;&#8212;&#8202;all resulting in native apps that can be submitted to the App Store <em>and</em> the Android Market.</p>

<h2>What is Appcelerator?</h2>
<p><a href="http://www.appcelerator.com/"><span>Appcelerator</span></a> is hard to define. It&#8217;s an open source framework that acts partially like a compiler and partially like a runtime. Without going into the nitty-gritty about how it does its thing, let&#8217;s just say Appcelerator will take Javascript code built on top of its API and turn it into a native application for the iOS and the Android platforms. For the iOS platform, Appcelerator&#8217;s Javascript API maps to the Objective-C/CocoaTouch equivalents; and for the Android platform it maps to the Android Java framework. After the code is compiled, you end up with the respective platform&#8217;s native binary app. Also, Appcelerator has a third mode which lets you write cross-platform WebKit-based desktop apps. We&#8217;ll only be concentrating on the mobile side of things in this article.</p>

<p>Appcelerator comes in two parts, Titanium Mobile SDK and Titanium Developer. The Titanium Mobile SDK is the heart of the mobile app writing framework. Titanium Developer is a fancy, front-end GUI that lets you set up various environment settings, run the code in a simulator, compile it for a real mobile device and even package it for distribution to the Apple Store. As a side note, Titanium Developer itself is written using the Appcelerator Desktop SDK.</p>

<h2>Installing Appcelerator</h2>
<h3>Required Prerequisites:</h3>
<ul>
<li>Very good working knowledge of Javascript (knowledge of <a href="http://jibbering.com/faq/notes/closures/"><span>closures</span></a> are a plus)</li>
<li>An Intel-based Mac of some sort</li>
<li>Xcode with the iOS SDK installed (<a href="http://developer.apple.com/ios/"><span>http://developer.apple.com/ios/</span></a>)</li>
<li>Android SDK installed (<a href="http://developer.android.com/sdk"><span>http://developer.android.com/sdk</span></a>)</li>
</ul>
<h3>Prerequisites if you want to run on an actual iOS device:</h3>
<ul>
<li>Pay Apple the $99/yr fee and register as an <a href="http://developer.apple.com/programs/ios/"><span>iOS developer</span></a> </li>
<li>Create/download all the keys/certificates (just follow the directions they provide on <a href="http://developer.apple.com/devcenter/ios/"><span>http://developer.apple.com/devcenter/ios/</span></a> under the iOS Provisioning Portal)</li>
<li>Add your mobile devices to the developer account</li>
<li>Create an AppID for your app</li>
<li>Create a Provisioning Profile that binds your AppID to the various devices you registered (this is the way to get adhoc distribution to developer iPhones for testing)</li>
<li>Download all those certificates and provisioning profiles, install them and configure your Appcelerator app to use them when compiling, so the code can be signed correctly</li>
</ul>

<p>Running the gauntlet of getting the environment setup to test on an actual iOS device requires another article entirely. However, once you get through it the first time, it gets easier. Just follow the instructions on Apple&#8217;s site. Here, we will just stick with running things in the iOS Simulator.</p>

<p>You&#8217;ll notice that an Android SDK must be installed even though we&#8217;re only doing iOS development. This seems to be a nagging requirement in order for Titanium Developer to get going (things may get fixed in later releases so try it out without it first, but if it complains, go ahead and install the Android SDK).</p>

<h2>Making a New Project:</h2>
<ol>
<li>Create a New Project</li>
  <ol>
  <li>Open up Titanium Developer </li>
  <li>Select New Project</li>
  <li>Select Mobile as the Project Type</li>
  <li>Fill in the other fields (make sure App Id exactly matches the AppID you registered in Apple&#8217;s developer website)</li>
  <li>Click "Create Project"</li>
  </ol>

<li>You&#8217;ll be taken to the Project Settings window (the Edit tab)</li>
  <ol>
  <li>Just click "Save Changes"</li>
  </ol>

<li>Now click the "Test & Package" tab, you&#8217;ll see 3 sub tabs:</li> 
   <ol>
   <li>Run Emulator - Runs your program in the emulator</li>
   <li>Run on Device - Runs it on a developer iPhone/iPod Touch (you&#8217;ll need provisioning profiles setup)</li>
   <li>Distribute - Packages up the app for submission to the App Store (or adhoc distribution to non-developer profiled iPhones/iPod Touches) </li>
   </ol>

<li>Click on the "Run Emulator" tab and then click on the "iPhone" subtab
Ensure that the SDK is the latest iOS SDK version and click "Launch" </li>
</ol>

<p>If all goes well you&#8217;ll see a bunch of messages scroll past while the project code is compiling, and then the iOS Simulator launches. </p>

<p>Congratulations, you&#8217;ve successfully run the skeleton code!</p>


<h2>Modifying the Code</h2>
<p>Open up the directory you told Titanium Developer to create, all your code and assets will be stored in the <code>Resources/</code> subdirectory. You&#8217;ll notice that it already contains the <code>app.js</code> skeletal code; this is essentially the entry point into your app (ie. your <code>main()</code> routine). You&#8217;ll also notice <code>iphone/</code> and <code>android/</code> subfolders. This is where you keep assets that will override the default assets you have in your Resources folder for when you need to target the particular platform.</p>

<p>Now let&#8217;s actually write some code, rename/move the existing <code>app.js</code> file to <code>app.js.old</code> (be aware that any file with the <code>.js</code> extension will get compiled by Titanium Developer by default even if it isn&#8217;t used in your end product). Create a new file called <code>app.js</code> and enter the following code:</p>

<pre><code>Ti.API.info('Creating a new root window');

var w = Titanium.UI.createWindow({ backgroundColor: 'white' });
w.open();


Ti.API.info('Creating Label');

var label1 = Titanium.UI.createLabel({
    text: "Name Please",
    backgroundColor: 'gray',
    color: '#000000',
    top: 20,
    height: 'auto'
});
w.add(label1);   // add the label to the window


Ti.API.info('Creating Text Input Field');

var textfield1 = Titanium.UI.createTextField({
    backgroundColor: 'green',
    hintText: "Type Here",
    height: 35,
    top: 100,
    left: 10,
    right: 10,
    borderStyle: Titanium.UI.INPUT_BORDERSTYLE_ROUNDED
}); 
w.add(textfield1);  // add the textfield to the window


Ti.API.info('Creating Button');

var button1 = Titanium.UI.createButton({
    title: 'Click Me',
    width: 150,
    height: 30
});
w.add(button1);   // add the button to the window


Ti.API.info('Adding an eventListener to the button');


// Here&#8217;s the button 'click' event listener, 
// notice the second parameter is an anonymous function with
// a parameter 'e', this is the event dictionary.

button1.addEventListener('click', function(e) {
    Ti.API.info('Button was clicked, e is ');
    Ti.API.info(e);
    Ti.API.info('Text field has value ' + textfield1.value);

    if(textfield1.value.length <= 0) {
        alert("Enter your name please");
    }
    else {
        label1.text = "Hello " + textfield1.value;
    }
});
</code></pre>

<p>Click on the Simulator tab and launch the app. If everything works, you&#8217;ll be shown a white window with a gray background label that says <samp>"Name Please"</samp>, a text field and a button that says <samp>"Click Me"</samp>. Go ahead and try clicking the button. You&#8217;ll see an alert popup message telling you to enter a name. If you look through the <code>button1.addEventListener</code> callback function, you&#8217;ll see where that <code>alert()</code> is coming from. Now enter your name in the text field and click the button again, this time you&#8217;ll notice the gray label up top change to say <samp>"Hello xxxx"</samp> (where xxxx is what you entered in the text field).</p>

<p>You&#8217;ll notice that we don&#8217;t have any <code>main()</code> functions or event loops; all that stuff is handled by the underlying iOS SDK. We&#8217;re developing on essentially an asynchronous event-driven model (similar to the Javascript model in browsers).</p>

<p>Congratulations on writing your first app! Your next steps should be to browse through and experiment with the various API methods available in the Titanium Mobile SDK found at <a href="http://developer.appcelerator.com/documentation"><span>http://developer.appcelerator.com/documentation</span></a>. 

Also be sure to read the Getting Started guides:
<a href="http://wiki.appcelerator.org/display/guides/Getting+Started+with+Titanium"><span>http://wiki.appcelerator.org/display/guides/Getting+Started+with+Titanium</span></a>

as well as the Getting Started with the KitchenSink demo app:
<a href="http://wiki.appcelerator.org/display/guides/Getting+Started+with+Kitchen+Sink"><span>http://wiki.appcelerator.org/display/guides/Getting+Started+with+Kitchen+Sink</span></a>
</p>

<h2>Programming Notes</h2>
<h3>Old-School Debugging</h3>
<p>The best way to debug is to throw <code>Titanium.API.info('some debugging message')</code> all over your code. You can also use the name <code>Ti</code> rather than <code>Titanium</code> (ie. <code>Ti.API.info('some message')</code> ) to save you from having to type so much. It&#8217;s also worth noting that you can usually pass arbitrary objects/variables by themselves to <code>Ti.API.info</code> and it will try to print out the string equivalent of the object (if available). <code>Ti.API.warn()</code> and <code>Ti.API.error()</code> are also available for logging purposes (they&#8217;ll show up in the Titanium Developer console in different colors).</p>

<h3>Subtleties of <code>createWindow()</code></h3>
<p>You can pass in a javascript filename to <code>createWindow()</code> using the dictionary key <code>url</code>, this way you can modularize your code into smaller chunks. The <code>createWindow()</code> method will essentially fire off the new window in a separate thread and you won&#8217;t really have access to any variables or functions within it after its launched. You can send initial data into the new window by adding arbitrary key/values to the dictionary parameter of createWindow. From within the new window script you can get access to those intial variables via the <code>Titanium.UI.currentWindow</code> variable. For example if you had code like:</p>

<pre><code>  var w = Titanium.UI.createWindow({ 
    url: 'newWindow.js&#8217;,
    foo: 'bar', baz: ['zab', 'rab'] 
});
</code></pre>

<p>then, within the <code>newWindow.js</code> file you&#8217;ll be able to get access to <code>foo</code> and <code>baz</code> like so:</p>

<pre><code>Titanium.UI.currentWindow.foo      // gives us 'bar'
Titanium.UI.currentWindow.baz     // gives us ['zab','rab']
</code></pre>

<p>The only way to get data back out of the window is to use the event handling facilities of the framework. And on that note&#8230;</p>


<h3>Event Handling is powerful asynchronous communications stuff</h3>
<p>Make liberal use of the event mechanisms of the framework to communicate between various threads and subsystems in your app. In addition to the built-in events, <code>addEventListener</code> can listen to any arbitrary event name you want. The complementary function, <code>fireEvent</code> allows you to fire any arbitrary event name you like, with any parameters you like, in a dictionary that gets passed on to the event callback function. Practically every object in the Titanium SDK has the ability to fire or listen to events. The most globally accessible object is the <code>Titanium.App</code> object: you can write an event listener to <code>Titanium.App.addEventListener</code> and do a <code>Titanium.App.fireEvent</code> from completely different areas of your app (see the example I provided above). This allows you a great deal of power for inter-thread communications (i.e., sending messages between windows and the like). Any number of event callback functions can be added via the <code>addEventListener</code>, and they&#8217;ll all be called in turn when the event is fired. You can also remove a call back function from an event listener by using <code>removeEventListener</code>. If you fire an event that isn&#8217;t being listened to, it will just be ignored, no harm no foul. Events aren&#8217;t queued forever so if you aren&#8217;t listening on an event when it&#8217;s fired and you missed it too bad.</p>

<p>Here&#8217;s a quick example (we&#8217;ll attach the event listener to the global <code>Titanium.App</code> object):</p>

<pre><code>Titanium.App.addEventListener('fooEvent', function(e) {
     // e will hold { bar: 'rab', zab: 'baz', 
     // and some other core event fields }
     Ti.API.info('fooEvent was called and ' +
                      'we got this event dictionary:');
     Ti.API.info(e);   
});

// &#8230; meanwhile somewhere else &#8230;
Titanium.App.fireEvent('fooEvent', { bar: 'rab', zab: 'baz' });
</code></pre>


<h3>Closures will be your best friends</h3>
<p>As your applications get more complicated, you&#8217;ll be writing a lot of <code>addEventListener</code> type code throughout. It is essential that you keep in mind in what context that event callback function will get called, and what variables are available in that scope. If you create/assign an event listener function within a scope and make use of a variable that is only available in that scope, then when the listener callback function is actually called (some time later way outside of the scope), you may not have access to that variable any more as it has long since been destroyed. This is a common enough thing in Javascript and the workaround is using <a href="http://jibbering.com/faq/notes/closures/"><span>Javascript Closures</span></a>. There are plenty of articles out there that explain it in great detail but let&#8217;s just say it&#8217;s a nice way to bind a scope to a function so that when the function actually does get called the scope is available along with the variables. Here&#8217;s a way of writing that <code>button1 'click'</code> event listener more robustly using a closure:</p>

<pre><code>button1.addEventListener('click', (function (lbl1, txt1) {
    return function(e) {
         Ti.API.info('Button was clicked, e is ');
         Ti.API.info(e);
         Ti.API.info('Text field has value ' + txt1.value);

         if(txt1.value.length <= 0) {
             alert("Enter your name please");
         }
         else {
             lbl1.text = "Hello " + txt1.value;
         }
    };
}) (label1, textfield1) );
</code></pre>

<p>I realize that it looks a little weird, but if you study it carefully you&#8217;ll see that we are actually creating an anonymous function that takes two parameters (<code>lbl1</code> and <code>txt1</code>). Then, we&#8217;re immediately executing that anonymous function, passing in the variables (<code>label1</code> and <code>textfield1</code>) as the parameters. We do all that in one shot. Now within the anonymous function, all we do is return another anonymous function (this will be the actual event callback function that the <code>addEventListener</code> will get). Notice the inner anonymous function has the event 'e' parameter. Remember that Functions are first-class citizens in Javascript and are treated like any other object (there actually is a Function class in Javascript) and that&#8217;s why we&#8217;re able to do this. Notice within the inner anonymous function we&#8217;re using <code>txt1</code> and <code>lbl1</code> as the variables instead of the <code>textfield1</code> and <code>label1</code> we used before. What this accomplishes is the outer function creates a scope having variables <code>lbl1</code> and <code>txt1</code> and the inner function binds to those variables; now no matter where the button 'click' event callback function is called the scope of <code>txt1</code> and <code>lbl1</code> will be correct. It&#8217;s tricky at first, but if you get into the habit of using closures for such event-handling assignments it could save you a lot of painstaking debugging later. On a somewhat important side note, given that there is still a reference to that memory/object the garbage collector won&#8217;t be repossessing the memory space until the reference count goes down to zero (ie. the anonymous function with the closure goes away); this may not seem like a big deal until you realize you&#8217;re developing for a small embedded system with very tight memory constraints.</p>

<h2>Need More Help?</h2>
<p>The API documentation can be found at <a href="http://developer.appcelerator.com/apidoc/mobile/latest"><span>http://developer.appcelerator.com/apidoc/mobile/latest</span></a> . Be forewarned that the documentation is usually a few steps behind the actual code base, and quite possibly may be incomplete or incorrect. Experimentation and educated guesses are usually required when trying out new functionality. Alternatively you can search in the "Q&A" section of the developer site to see if anyone else encountered the same problems <a href="http://developer.appcelerator.com/questions"><span>http://developer.appcelerator.com/questions</span></a>. If you get really desperate you can start digging through the actual framework Objective-C code located in <code>YourApplicationDirectory/build/iphone/Classes/</code>.</p>

<p>Here&#8217;s a big hint that took me a very long time to figure out. Almost every class that takes the form <code>Titanium.UI.*View</code> is based off of the <code>Titanium.UI.View</code> class so they inherit all of <code>View&#8217;s</code> properties and methods. They can be used interchangeably whenever a View is required. Views are the workhorses of the Titanium GUI framework, they provide the rectangular regions upon which the GUI widgets are drawn.</p>

<p>One of the best places to get working examples for the Titanium Mobile API is the KitchenSink demo app. As its name implies, it has demo code for practically every feature offered by the API. You should be able to download it from the same servers you downloaded SDK from (<a href="http://developer.appcelerator.com/get_started"><span>http://developer.appcelerator.com/get_started</span></a>) . Simply uncompress the downloaded archive and import the <code>KitchenSink</code> directory containing the <code>tiapp.xml</code> file (the Titanium project configuration file) into Titanium Developer. Then, launch it in the Simulator. Browse through the <code>KitchenSink/Resources/examples/</code> subfolder to find the acual javascript code. There&#8217;s quite a bit of undocumented code/features in there.</p>

<h2>Troubleshooting Tips</h2>
<p>If you ever come across an annoying bug in code (and you will) that you are fully certain should work (especially obscure low-level Exceptions that are thrown before crashing), and you can&#8217;t seem to get past it, try clearing out the build directory. For iPhones, the build directory you want to delete will be <code>YourAppsDirectory/build/iphone/build/</code> . Do NOT delete the outer <code>build</code> directory or you&#8217;ll have to create the project from scratch again. </p>

<p>Worst case scenario: You may have to create a new project from scratch and copy the <code>Resources</code> and all the assets into it. I&#8217;ve had this happen a few times, especially when the Mobile SDK was upgraded.</p>

<h2>iPhone App Development&#8202;&#8212;&#8202;Only Faster</h2>
<p>Appcelerator is a fickle beast that&#8217;s definitely rough around the edges and ever-evolving, but when it works (and when you get used to its quirky ways), it&#8217;ll help you throw together a working iPhone app much faster than if you had to write it all from scratch in Objective-C. It&#8217;s especially useful for the lazy web developer.</p>
]]></content:encoded>
            <pubDate>Mon, 04 Apr 2011 20:53:29 GMT</pubDate>
        </item>
        <item>
            <title>Aimless Social Media: your brand deserves better</title>
            <link>http://omniti.com/seeds/aimless-social-media-your-brand-deserves-better</link>
            <guid>http://omniti.com/seeds/aimless-social-media-your-brand-deserves-better</guid>
            <description><![CDATA[

The state of social media integration today can be likened to the early stages of "Web 2.0", eight years ago. At that time many confused the phrase, assuming it meant either: a formalized change to the technology powering the Word Wide Web, or a visu...]]></description>
            <content:encoded><![CDATA[<img alt="Toy plane in a tree" src="http://images.omniti.net/omniti.com/i/b/social-article-bw.png" width="448" height="220" style="margin-top: 1em;" />

<p>The state of social media integration today can be likened to the early stages of "Web 2.0", eight years ago. At that time many confused the phrase, assuming it meant either: a formalized change to the technology powering the Word Wide Web, or a visual design movement that focused on glossy headers, cute icons, crayola colors and gradients galore (cringe). Lucky for us, it represented a paradigm shift in how we (marketers, developers, designers and end-users) viewed interaction on the web&#8212;a shift that ultimately spawned the first social networks. Fast forward eight years: the organic development of user-generated media through these social networks and emerging mobile technologies spawned Web 2.0's progeny&#8212;Social Media. HOORAY!  But, like many of those misguided early adopters in 2005 (who added RSS icons to sites containing no feeds) similar mistakes are being made integrating social media into sites today.</p>

<p>Curious about social media&#8217;s "RSS icon"? Well, I&#8217;m sure you have seen it on sites before: the ubiquitous "Like" or "Follow Us" button. It seems like a great feature; it suggests, "these guys are cool, they are on Facebook or Twitter!" But what do these features do for your site? Do they spread the word? Sure. But to what gain? What happens when users "follow" your brand, or "like" your product and then find little substance once back on your site&#8212;finding only more "like" buttons? Odds are they will quickly move on. The web is chock-full of options, and you are not the only person dabbling in social media. So, how can your brand stand out over the "other guys"?  How can you avoid the pitfalls of aimless social media integration?</p>

<p>The trick is to create a user experience that speaks to your audience; leveraging the best that social media has to offer, while avoiding any inherited flaws. The modern web is an ever-changing beast, mercurial in its habits and powered by the "ADD generation." Social media is a marketer&#8217;s weapon for targeting users in this fickle medium. With a deeper understanding of the technologies behind social media applications and a little creativity from your team, you can create a rich, unique experience that will not only keep your users coming back, but will pave the way for a growing audience. Using these sites simply as viral launch pads is a near-sighted use of their true power&#8230;The users who give you their time&#8212;and your brand, in which you invest so much&#8212;deserve something more interesting.</p>

<p><em>How can you get started?</em></p>

<h2 class="section-head"><span>S</span>tep 1:  Understand your user base, then create features for them.</h2>

<p>Often social features are added to sites as a Hail Mary pass, fueled by notions like, "it&#8217;s what <em>they</em> did, and <em>they</em> increased conversions by 8%". But <em>they</em> are not you, and <em>their</em> users are not necessarily the same as your users. So, when starting any social media project, you must first ask yourself: Are my users into social media, and if not, why not/could they be?</p>  

<p>What if your user base is not filled with savvy social media junkies? Expecting a novice web user to leverage the full stack of social networking sites, means tough sledding. For most businesses, the safe bet is to start with Facebook. The core functionality behind the massive social network, has proven to appeal to a <a href="http://www.digitalsurgeons.com/facebook-vs-twitter-infographic/"><span>wide demographic</span></a>. They allow businesses to setup a <a href="https://facebook-inc.box.net/shared/9e5jiyl843"><span>branded Facebook Page</span></a>, with little risk or investment. Once your account is set up, users can subscribe to your page by "liking" it. This is where the "like" button is good: It&#8217;s an easy-to-adopt viral feature that users understand. This is why it&#8217;s so widely used. However, you still must have something to offer&#8212;something enticing to engage your users enough for them to share with others. If you don&#8217;t, the virus dies. You must first create a compelling experience on your own site before you add in the "like" button and share content with the world; otherwise, the effect is minimal. It is as if you are yelling into a giant wind tunnel. With social media, you have to yell with a purpose in order to make an impact.</p>

<p>How do you yell with purpose? How do you take your branding and turn it into social media features? Here&#8217;s where you must understand the value proposition for your users. Explore which products/services/experiences you offer that could be repurposed as a social media vehicle. For example, if you operate a travel site, media (photos/videos) or shared trip diaries may be an approach to take. If you operate a product site, sales and offers would provide an enticing reason to visit. The beauty of the medium is that social networks are as diverse as the sites they serve. Knowing who you are as a brand, and what your users want, will help to solidify your social media strategy and ultimately secure your place in the space.</p>


<h2 class="section-head"><span>S</span>tep 2: The technology exists. Understand it, then use it.</h2>

<p>Under the hood, social network platforms, such as Facebook, Twitter, Flickr, Foursquare and many more, are powerful web applications. As they grew over the later part of this past decade, so did the widespread adoption of using <a href="http://en.wikipedia.org/wiki/Web_service"><span>Web Services, particularly REST APIs</span></a> in web application development. Through the use of these Web APIs, the modern web has become an integrated platform of shared functionality, services and data. Applications have emerged that use a multiple external API approach to create a more robust feature set. In development terms, this approach is referred to as a web application hybrid, or "mashup." These mashups make up today&#8217;s most popular applications including, surprisingly enough, Facebook: great for the social network. Even better for your site.</p>

<p>With a mashup of your own, you have the ability to leverage the power of multiple APIs when designing a rich user experience on your own site. It is an experience open to functionality that is as diverse as the web itself and includes applications like: <a href="http://code.google.com/apis/maps/index.html"><span>Google Maps</span></a>, <a href="http://www.last.fm/api"><span>Last.fm</span></a>, <a href="http://www.bbc.co.uk/programmes/developers"><span>BBC</span></a>, <a href="http://developer.netflix.com/"><span>Netflix</span></a>, <a href="https://affiliate-program.amazon.com/gp/advertising/api/detail/main.html"><span>Amazon</span></a>, <a href="http://developer.etsy.com/"><span> Etsy</span></a>, <a href="http://developer.usatoday.com/"><span>USAToday</span></a>, <a href="http://weather.weatherbug.com/desktop-weather/api.html"><span>WeatherBug</span></a>, <a href="http://www.salesforce.com/us/developer/docs/api/index.htm"><span>SalesForce</span></a>, <a href="http://code.google.com/apis/youtube/overview.html"><span>YouTube</span></a>, <a href="http://instagr.am/developer/"><span>Instagram</span></a>, <a href="http://developer.ebay.com/common/api/"><span>eBay</span></a>, and the aforementioned social networks, to name a few. With all these options, we can be kids in a functionality candy store. In order to gain any traction, you must find the service that best suits your needs.</p>


<h3>Let&#8217;s start with Facebook:</h3>

<p>First, design features that take full advantage of the API to create an integrated, enhanced experience on your site with segmented features and functionality. Sort of a: "Hey, if you clicked 'like' on our Facebook page, wait until you visit our site now&#8212;it&#8217;s cooler&#8212;more intuitive and better because you are now in the community. Cool, huh?". This approach not only promotes conversion from Facebook back to your site, it also promotes sharing&#8212;viral sharing&mdash;and we all know how much fun that can be.</p>

<p>One way you can achieve this is by designing a community feature for your site. Use the objects available via the API to leverage Facebook features such as wall posts, likes, notes, comments and user profile information. You can then create an experience where users will be able to communicate on your site&#8212;about your products&#8212;while seamlessly sharing back on Facebook. This is how you gain the power of Facebook&#8217;s social sharing features without affecting the user experience on your site. This community experience is one that you control; one that you can test, modify and enhance to better suit your own user base. Your users will appreciate the extra effort and care&#8212;something you can&#8217;t achieve with a Facebook page alone.</p>

<p>This community would have to use Facebook&#8217;s authentication methods in order to gain access to user data. By allowing users to sign-in using Facebook credentials, you take out the need for a registration process, making adoption easier for new users. In addition, you control the data collected, improving your business intelligence and paving the way for tightly targeted campaigns.</p>

<p>Now, say you&#8217;ve implemented this Facebook mashup; what happens if Facebook goes down? No worries! If you (or your engineers) have designed this feature intelligently, all objects will be generated and saved in your architecture, then pushed back to Facebook. In case Facebook goes down, the users on your site will remain unaware. <a href="http://omniti.com/seeds/breaking-social-dependency"><span>Breaking your dependency</span></a> on Facebook to handle operations avoids a potentially embarrassing situation for your brand.</p>

<h2 class="section-head"><span>I</span> have tried Facebook and I need more. What about other social networks?</h2>

<p>That answer really depends upon the user knowledge you gained through the discovery process done before you designed your strategy. For example, if you are a media company, the content/brand exposure vehicles, such as Twitter and Flickr, could be appealing options. However, if you are restaurateur, then the geo-location features in Foursquare are a logical choice. But not all businesses can make best use of the same solution. If you keep that in mind, then this exercise becomes vastly easier.</p>

<div class="seeds-cs">

<h3>Retail Store Case Study:</h3>

<p>Let&#8217;s say you are a boutique retailer with seven physical storefronts in the northeast corridor:</p>

<p>In the past, your web-site has been used as a brochure site to introduce new product lines, present new sales and promotions, and sell your "high-end" brand image to prospective customers. It&#8217;s sexy and well designed, but it doesn&#8217;t "work" to achieve your business goals. Your promotions are falling flat and your users don&#8217;t know they are there. How do you spread the word, and ultimately get more feet in the stores? Luckily for you, this is the core behind some of today&#8217;s most interesting and powerful applications, like Foursquare.</p>

<p>Using <a href="http://foursquare.com/business/venues"><span>Foursquare&#8217;s Merchant Platform</span></a>, you can create your own badges, campaigns and offers for users who check into your physical locations. This allows you to incentivize customers to come into your stores, with exclusive, geo-targeted offers.</p>

<p>You could create a "Super Shopper Badge," using the Merchant Platform. Set the unlock requirements to: "Check in at all seven of our locations, or at one location 20 times." Then, you would create a series of location-<a href="http://support.foursquare.com/entries/195165-what-is-a-special"><span>specific offers, that are only available to users with this achievement unlocked</span></a>. If I was shopping in a mall where one of your stores is located, I would see a call to action pointing out an offer nearby, noting that this offer is for anyone with a "Super Shopper Badge," I would want to see what type of product I could get if I qualified for this offer. You can modify the terms of these offers to target first-time shoppers (first check-in), or the "Mayor" (most check-ins at one location). These features elicit buying behavior almost immediately. It would not only prompt return visits from current customers (in hopes of unlocking achievements), it gives a powerful incentive for potential customers and rewards your most devoted customers.</p>

</div>

<p>Once you have customers in the store, buying your product, the next step is to enable users to share with others&#8212;spread the word to non-Foursquare users. How do you go about enabling users to share? Achieving this would, once again, require creative implementation of social media within your own architecture. Possibly a leader-board style user-tracking feature on your site, or a mobile mashup application that is a <a href="http://foursquare.com/apps/"><span>branded tool to engage new users</span></a>. Then, you could design campaigns and other social media initiatives around this "Super Shopper" campaign. Once you have a stable platform in place, the "like," "share" and "follow" buttons are back in the conversation. Now they have a job&#8212;a purpose that will give your campaign "legs" in the bigger social media market.</p>

<h2 class="section-head"><span>O</span>kay, I have some Ideas, but what&#8217;s next?</h2>

<p>Unlike the "like" button (no pun intended) more robust social features are not a 20-minute task. Ultimately, this is all a pipe dream without the technical know-how for implementation. These social platform APIs can evolve rapidly and vary from incremental changes to underlying technologies For example, <a href="http://developers.facebook.com/docs/guides/upgrade/"><span>Facebook changing from REST to Graph API</span></a>, or in the case of <a href="http://www.guardian.co.uk/technology/blog/2011/mar/14/twitter-developers-client-warning"><span>Twitter&#8217;s recent announcement</span></a>, a drastic change to the terms of service for API use. To stay ahead, you need an experienced team (in-house or vendor) that understands the functionality available, as well as the landscape for third-party applications on these social platforms.</p>

<p>To be successful, you must have creative thinking on both the code and the design side. You need it from marketing (sets the company marketing goal) to the designer (creates the experience to realize the goal); and from the designer to the developer (tech know-how to implement). Marketers have to engage their designers and developers to be successful. At the end of the day, you are adding social media to your site to drive new business. Without that goal in mind there is no ROI and the whole initiative falls flat. It&#8217;s not a cakewalk, but the technology is out there. In this article, we have only skimmed the surface of what can be done. Marketers have to get educated about everything social media has to offer before beginning their three-way conversations with designers and developers.</p>

<p>Once you have done your homework, <em>the fun can begin!</em></p>
]]></content:encoded>
            <pubDate>Tue, 22 Mar 2011 15:17:56 GMT</pubDate>
        </item>
        <item>
            <title>Breaking Social Dependency</title>
            <link>http://omniti.com/seeds/breaking-social-dependency</link>
            <guid>http://omniti.com/seeds/breaking-social-dependency</guid>
            <description><![CDATA["OMG, Facebook is DOWN!" That was the cry from millions when Facebook was unavailable for about three hours because of network issues. Given the nature of Facebook&#8217;s service, the downtime did not have any long-lasting effects on its user base. In...]]></description>
            <content:encoded><![CDATA[<p>"OMG, Facebook is DOWN!" That was the cry from millions when Facebook was unavailable for about three hours because of network issues. Given the nature of Facebook&#8217;s service, the downtime did not have any long-lasting effects on its user base. In fact, some say that the productivity significantly increased during the three-hour window without access to Facebook. The bottom line is: the unavailability of the social networking service doesn&#8217;t negatively impact its users (ego and reputation of the service aside). Does this also hold true for the companies leveraging Facebook, or other social networks, like Twitter, Flickr, FourSquare in their daily operations?</p>

<p>Today, more and more companies operating online businesses try to break into the social media realm by leveraging existing services to increase visibility and loyalty to their brand and bring more people to their sites (and consequently, increase the conversions, visits, purchases or participation). I&#8217;ve seen many incarnations of social networking implementations, from the basic, simplified authentication with Facebook Connect augmenting the regular process (for ease of registration/login), to full-blown applications relying heavily upon multiple features available from these services&#8217; APIs. Now, personally, I am all for having these services available and used strategically throughout the applications. It provides a tremendous benefit not only in brand familiarity and content, but also in cost saving&#8202;&#8212;&#8202;you&#8217;re leveraging years of someone else&#8217;s work for your gain. Consider Flickr. The storage, CDN and REST APIs to present the assets have all been developed and tested for you by a number of smart engineers for a number of years; all you need to do is to integrate the functionality within the content of your site. The same services are available to everyone, and you make the business decision about which features would be beneficial to your company&#8217;s strategy. The implementation of the features, however, varies significantly.</p>

<p>One of the major risks when implementing a third-party service is the reliance upon the availability of that service&#8202;&#8212;&#8202;one that you have no control over. And, no matter how large or successful that service is, it will go down at one point or another.  Twitter, as an example, is well-known for intermittent service degradation, often followed by noticeable outages. Now, imagine what happens during the Twitter downtime if your site&#8217;s content heavily relies only on Twitter API.</p>

<p>Let&#8217;s examine a situation where a large online media company decided to switch to Facebook Connect as the exclusive authentication method for their site. (To prevent the discussion about the viability of this choice, let me just note that there were legitimate business reasons for choosing this approach). This is where the fun starts. The graph below represents HTTP load time for the pages on the site at every stage of the process. Even without the captions on the graph, everyone should be able to pin-point the exact time when the new code was deployed, and the load time of the pages tripled. The project owners were notified, but since the load times were extremely low to begin with (thank you, properly implemented caching) the load speed was deemed acceptable, and the changes remained in production. Time passed. And then some more time passed. And then the dark day came - the day when Facebook went down. And the page load times on the media site tripled again, for a very brief period of time (while Facebook servers were just lagging), and then dropped to 0, i.e. "users are unable to see the site." Just like that-Facebook&#8217;s problem became the company&#8217;s problem.</p>

<img src="http://images.omniti.net/omniti.com/i/b/facebook-connect.jpg"/>

<p>Upon closer code investigation the problem was identified and resolved quickly, also reducing the page load time to it&#8217;s original threshold as a byproduct of the change, but it shows how dependent your site can become upon third-party service availability if the features are not implemented correctly.</p>

<p>How can these issues be avoided? There are a few common sense rules that, for some reason, are often ignored during development, which should help with the integration of external services without affecting your site&#8217;s performance.</p>

<ol>
<li><em>Only connect to a third-party service where needed.</em>
<p>Don&#8217;t try to connect to Facebook on every page load to validate that the user is still the user to whom you displayed the previous page. Cache the results locally.</p>
</li>

<li><em>Don&#8217;t make connections to a third-party service in the critical path of the page load.</em>
<p>Don&#8217;t load Google Analytics as the first thing on your page, you will delay the display of the content that actually matters. Make the connections after your content is loaded, or better yet, connect asynchronously.</p>
</li>

<li><em>Trap time-outs and errors.</em>
<p>You do it with your database connections, why would you treat external connections differently?</p>
</li>

<li><em>Create a fallback plan.</em>
<p>You have no control over external services, but you do have control over the content presented to your users. If  Flickr feed is the essential feature of your site&#8202;&#8212;&#8202;store the displayed history locally, so you can fall back to the latest available content in case Flickr is unavailable. Remember, sometimes stale content is better than no content at all.</p>
</li>
</ol>

<p>To make a blanket statement&#8202;&#8212;&#8202;don&#8217;t jump into using social media features without identifying a need for them and use them to support your primary business model. At the end of the day, when integrating any third party service, you are trying to leverage the benefits of the available functionality to enliven the experience for your own users, not to inherit the services&#8217; availability problems. Integrate smartly, not blindly.</p>]]></content:encoded>
            <pubDate>Mon, 14 Mar 2011 15:47:24 GMT</pubDate>
        </item>
        <item>
            <title>On the Engineering of SaaS</title>
            <link>http://omniti.com/seeds/from-making-software-to-running-saas</link>
            <guid>http://omniti.com/seeds/from-making-software-to-running-saas</guid>
            <description><![CDATA[Software has been around for a long time in various forms: open and closed, commercial and non-commercial. The one thing that holds true about software products is that you, as a consumer, have to acquire them, install them and operate them. &nbsp;For ...]]></description>
            <content:encoded><![CDATA[<p>Software has been around for a long time in various forms: open and closed, commercial and non-commercial. The one thing that holds true about software products is that you, as a consumer, have to acquire them, install them and operate them. &nbsp;For the past several years, there has been an industry movement away from providing software in this traditional sense and instead providing the use of the software as a service (SaaS). SaaS has been around in many forms. Many companies (and investors) have recognized the <a href="http://www.cooley.com/files/uploads/KippsCooley/kipps0909.html"><span>opportunities that SaaS provides as a business model</span></a>, but transitioning to it from a standard software development model requires a lot more than an executive decision. Herein I&rsquo;ll try to lend some insight into what&rsquo;s in store for you as you transition from a software company into a SaaS company.</p> 
<h3>1. A customer of one.</h3> 
<p>Typical software engineering processes are well-evolved and quite rigorous. They are designed to ensure that the product you release and ship around the world will boast minimal defects and incur as little as possible in the way of defect handling via patching or upgrading. While it may not be extraordinarily difficult to package the next version of your product, you must deal with making the installation/upgrade process as fool-proof as possible or you risk leaving customers stranded mid-upgrade. Getting the entire customer-base to upgrade to the latest version in a reasonable fashion is intractable and the more rapidly you release your product, the more frustrated customers become and the more unique versions you have to support &ldquo;in the wild&rdquo;.</p> 
<p>SaaS engeering couldn&rsquo;t be more different. Why? The typical software product driving a SaaS architecture has exactly one customer: you. You have one version of the product in production and it has to work all the time. An upgrade process, for example, is an entirely different beast. Making it robust and repeatable is far less important than making it quick and reversible. This is because the upgrade only ever happens once: on your install. Also, it only ever has to work right in one, exact variant of the environment: yours. And while typical customers of software can schedule an outage to perform an upgrade, scheduling downtime in SaaS is nearly impossible. So, you must be able to deploy new releases quickly, if not entirely seamlessly &#8212; and in the event of failure, rollback just as rapidly.</p> 
<p>You will find that your needs in operating the product will have a tremendous impact on the the engineering roadmap. Interestingly, you will likely find that the features incorporated into the product should have been on the roadmap in the first place, but you lacked the insight or foresight, because you were not responsible for operating the product in a production setting. From here on out, while you build the service for your users, you build the underlying software products for a customer of one.</p> 
<h3>2. You aren&rsquo;t a software company anymore.</h3> 
<a href="http://omniti.com/writes/web-operations"><span><img style="margin-left:0.5em;margin-bottom:0.5em;float:right;width:198px" src="http://s.omniti.net/i/content/books/web-operations-198.gif"></span></a> 
<p>You aren&rsquo;t a software company anymore, you are an operations company. Software as a Service is much more about service than software. In fact, the users of your service will be just as satisfied thinking that magic pixies power the service they use as some complex software system. With this change comes a rather intimidating shift in expectations. Users expect software to have bugs, they expect to schedule downtime to upgrade, install, backup or otherwise manage the software product they are operating. With a service, however, there is a strong predisposition of users to expect things to be &ldquo;always on.&rdquo; As a simple analogy, if you sell a user a diesel generator, they will expect it to need maintenance, needs refueling and have the occasional service issue. Sell them electrical service and watch them come with pitchforks demanding refunds if you have an outage of any sort.</p> 
<p>While this may seem silly at first, the expectation isn&rsquo;t out of line. It&rsquo;s a simple bit of economies of scale. Your job as a SaaS company is to operate the software, so logically you should do a better job than they would. Additionally, you are operating it for a large set of users, so it is a reasonable expectation that you have refined your operational techniques. Lastly, they pay you for one thing: to operate the service &#8212; so you had better get it right.</p> 
<p>Working as an engineering company with an operations focus rather than a product focus can be a significant challenge for traditional software engineering companies. &nbsp;You should expect to see roles removed, roles introduced and organizational structure changed to add accountability for operating your service as your users expect.</p> 
<h3>3. Continuous Deployment</h3> 
<p>One of the greatest advantages of being a customer of one for your software is that you don&rsquo;t have to worry about the oddball deployment or &ldquo;that guy&rdquo; that refuses to upgrade. &nbsp;It means that once you&rsquo;ve deployed the latest version of code into production, you have no legacy copies, no troubleshooting of version differences and a definitively less complicated error reporting process. This, however, can cause a paradigm shift in development and deployment processes. It means that you can have a bug report at 8 a.m., a fix by 8:15 a.m. and a deployment by 8:20 a.m. Traditional software engineering companies have no other word to describe this but &ldquo;insane.&rdquo; It might seem reasonable to simply elect not to subscribe to that pattern of behavior due to the risks involved, but there is weakness in that stance.</p> 
<p>In the era of SaaS, companies have engineered processes to successfully manage the risks of rapid deployment schedules (<a href="http://omniti.com/seeds/online-application-deployment-reducing-risk"><span>OmniTI</span></a>, <a href="http://timothyfitz.wordpress.com/2009/02/10/continuous-deployment-at-imvu-doing-the-impossible-fifty-times-a-day/"><span>IMVU</span></a>). What was once a patch release every two weeks can now be managed as hundreds of patch releases per day (in the extreme case). By carefully engineering risk out of the deployment process, a SaaS company gains agility to launch fixes, improvements and features into production at any time. If your competition can do this and you cannot, you are disadvantaged.</p> 
<p>While it may take considerable effort to redefine your engineering processes to adequately limit risk and allow for continuous deployments, the advantages are significant. Due to the velocity of deployments that must be supported, the process of deploying itself must be engineered to be non-disruptive to services. This alone has the side effect of enabling feature launches, upgrades and triage without consequential downtime. It is the first step toward an &ldquo;always on&rdquo; architecture.</p> 
<h3>4. Quality Assurance is now a continuous process.</h3> 
<p>Quality assurance has a strong role in software engineering. While there is much effort expended automating QA, an automated QA process is sufficient if your service is used only by automated systems. If humans consume your service, you must also have human-driven QA processes. So, while much of the QA process can be automated and performed rapidly, it will never replace human usage of the application to detect both errors and perceived errors. In SaaS systems, the velocity of user-facing change is (at least) an order of magnitude higher than in traditional software engineering. It is inevitable that bugs will not only appear, but that they will reappear. Performing a full QA regression prior to each release is often unfeasible.</p> 
<p>Your users are just as much members of your QA team as your employees. By making your users aware of that, by treating their feedback, complaints, bug reports and feature requests as first-class items, you enable them to improve your QA process and, more importantly, increase their tolerance for your mistakes. John Martin has a short, but enlightening, diatribe about the <a href="http://buildingsaas.typepad.com/blog/2006/08/highmetabolism_.html"><span>quintessential difference between QA in traditional environments and SaaS</span></a>.</p> 
<p>Perhaps the single most significant change to embrace is that of QA&rsquo;s place. What was once a engineering phase, a deliverable, or a series of bars on a project manager&rsquo;s Gantt chart (ultimately leading to a celebratory day of shipping a product release) is now a continuous and critical operational role within continually delivered and continually used service.</p> 
<h3>5. Multi-tenancy design.</h3> 
<a href="http://omniti.com/writes/scalable-internet-architectures"><span><img style="margin-left:0.5em;margin-bottom:0.5em;float:right;width:198px" src="http://s.omniti.net/i/content/books/scalable-internet-architectures-198.gif" /></span></a> 
<p>So far, we&rsquo;ve discussed mostly process changes that enable transforming from a builder of software to an operator of software. The last paradigm shift is perhaps the hardest as it relates to design philosophy and design goals rather than design processes.</p> 
<p>When traditional software is designed, it runs on a system or set of systems for a single user. While a &ldquo;user&rdquo; in this sense can be an individual, or a business unit or perhaps even a whole organization, it is clearly not &ldquo;all users.&rdquo; It is the difference between engineering a car and engineering a complete metropolitan transit system. It is an issue of designing at scale.</p> 
<a href="http://www.amazon.com/gp/product/0137030428?tag=akpa-20"><span><img src="http://images.omniti.net/omniti.com/i/b/the-art-of-scalability-188.jpg" style="margin-right:0.5em;margin-bottom:0.5em;width:188px;height:250px;float:left" /></span></a>
<p>Not only does this mean designing and building software that can handle thousands of times the load that your previous design enabled, but also engineering the solution to malfunction elegantly. Malfunction elegantly? Yes. All human engineered products will malfunction, it is a simple fact of life. In a SaaS, it is essential that when this happens that the malfunction is isolated to the smallest possible component of the service or to a specific customer. Back to our transit metaphor: the failure of a single bus, subway train or taxi must adversely affect as few users as possible; ideally, only those physically on the failed unit. This consideration is simply (and obviously) not present in the design of a single car.</p> 
<p>The engineering paradigm shift from a single-user product to a multi-tenancy product is the most challenging metamorphosis required by a software company that intends to adapt and survive in the SaaS era. Two books that talk about the underlying mechanics of these challenges are <a href="http://www.amazon.com/Scalable-Internet-Architectures-Theo-Schlossnagle/dp/067232699X"><span>Scalable Internet Architectures</span></a> (written by me) and <a href="http://www.amazon.com/gp/product/0137030428?tag=akpa-20"><span>The Art of Scalability</span></a> by Abbott and Fisher</p> 
<h3>Making good on a promise</h3> 
<p>While you may not have made a promise about what your SaaS offering will provide, the industry has set some undeniable expectations about what SaaS generally delivers. &nbsp;At a minimum, you must meet these expectations or your users will abandon you. These expectations are naturally derived from the key drivers for adopting SaaS: no maintenance, no upgrades, always current, always available, no commitment (or the desire for operational expenses over capital expenses). It is imperative that you understand where the bar is set for those wishing to shift into a SaaS delivery model.</p> 

 
<p>With the exception of software companies that produce software that powers SaaS, most traditional software companies must evolve into a SaaS delivery model or suffer death at the hands of competition. &nbsp;Evolving into a SaaS delivery model without addressing the above key points will lead to substandard service, artificially high operating costs, user attrition and eventual collapse. &nbsp;You have to do it. You have to do it right. Are you ready?</p> 
]]></content:encoded>
            <pubDate>Tue, 01 Mar 2011 16:15:28 GMT</pubDate>
        </item>
        <item>
            <title>Maintainable Stylesheets: Can CSS Be Object-Oriented?</title>
            <link>http://omniti.com/seeds/maintainable-stylesheets-can-css-be-object-oriented</link>
            <guid>http://omniti.com/seeds/maintainable-stylesheets-can-css-be-object-oriented</guid>
            <description><![CDATA[How can CSS be object-oriented? In short: it can&rsquo;t. But if we stopped there, then this would be a pretty short article! Let&rsquo;s take a look at what is meant by the term &ldquo;Object-Oriented CSS&rdquo; and how it can help improve your styles...]]></description>
            <content:encoded><![CDATA[<p>How can CSS be object-oriented? In short: it can&rsquo;t. But if we stopped there, then this would be a pretty short article! Let&rsquo;s take a look at what is meant by the term &ldquo;Object-Oriented CSS&rdquo; and how it can help improve your stylesheets.</p>

<h3>What&rsquo;s the Problem?</h3>

<p>First, we should take a step back and look at the problems that lead to CSS being difficult to maintain. Most projects start out easily enough. We write some styles for page structure, some default content styles, then some specialty content styles, right? We always ensure that our styles are separate from our markup and that our markup is clean and semantic, thanks to the web standards movement. When the project launches, we feel good about our work and don&rsquo;t think much about code maintainability. After all, it&rsquo;ll be easy to maintain because we wrote the code.</p>

<p>The first round of updates comes and it&rsquo;s not too difficult. The stylesheets are still fairly small and easy to navigate, so the appropriate styles are found and updated. Some new styles are added because the new content is a little different than the existing content. Soon, a second round of updates is made, then a third. Each time, a few more styles are added to the stylesheets. Then we&rsquo;re taken off the project for one of a hundred reasons, and someone else begins maintaining the site. Or perhaps we simply don&rsquo;t touch the site for a year and then have to come back and make some updates; after such a long interval, we don&rsquo;t remember as much about the code as we think we do. In either case, it&rsquo;s difficult to understand the mix of ids, classes and element names, so more style declarations are added, overriding previous styles where necessary to get the desired effect.</p>

<p>Before long, some styles become defunct and other styles become unnecessarily complex as the stylesheets become overly complicated. We&rsquo;ve ended up with spaghetti code, and the site&rsquo;s stylesheets now take longer to download as they get more bloated. How do we get out of this mess? Is there a way to create truly maintainable style sheets?</p>

<p>Yes, there is a way, and it&rsquo;s recently come to be known by the term, &ldquo; <a href="http://www.stubbornella.org/content/category/general/geek/css/oocss-css-geek-general//"><span>Object-Oriented CSS</span></a>,&rdquo; thanks to Nicole Sullivan. Style declarations are not objects in the programming sense of the word, but the term &ldquo;Object-Oriented&rdquo; refers to a mindset of thinking about how styles should be written and applied.</p>

<h3>Keep Your Styles Where They Belong</h3>

<p>Before we begin, we need to lay down a ground rule for keeping our styles maintainable: keep styles in the stylesheets. This means two things:</p>
<ol>
  <li><h4>Styles don&rsquo;t belong in markup.</h4> Nowadays, we usually are pretty good about this, but it&rsquo;s always a good reminder. Don&rsquo;t succumb to the temptation to insert a quick style attribute here and there. It seems innocuous now, but we may regret it later.</li>
  <li><h4>Styles don&rsquo;t belong in Javascript.</h4> This may or may not be up to us depending upon how the division of labor happens where we work. But if JS is on our plate along with CSS&hellip;great! That means we have the power to ensure that each site&rsquo;s behavior (JS) is separated properly from its styling. Javascript should <em>almost never</em> be used to manually modify styles on an element&mdash;the lone exception to this is when a style value has to be calculated based upon other information. If we set styles via JS, what happens when we need to change those styles in the future? We&rsquo;ll have to go dig through the JS and find every place those styles are being changed, instead of just making some simple stylesheet edits.</li>
</ol>

<p>If we can&rsquo;t edit styles with Javascript, how <em>should</em> JS interact with styles? The best method is using JS to modify elements&rsquo; classes. As we&rsquo;ll see below, classes are the keys to elements&rsquo; identities and run-time states. By using Javascript only for modifying class names, all of the actual styling is handled by the stylesheets, making styles easy to find and modify.</p>

<h3>And Now For Something Completely Different&hellip;Sort of</h3>

<p>When we discuss this new, &ldquo;Object-Oriented&rdquo; mindset, there&rsquo;s one key word that will be our mantra: &ldquo;reusability.&rdquo; Ok, there are really two key words: &ldquo;reusability&rdquo; and &ldquo;patterns,&rdquo; so I guess you could say we&rsquo;re looking for &ldquo;reusable patterns&rdquo; here.</p>

<p>If we start thinking about creating &ldquo;reusable patterns&rdquo; at the beginning of a project, it&rsquo;s a lot easier than going back and fixing a site that has already launched. But it still may be worthwhile to revisit existing sites&rsquo; code. Putting in the work now to make our stylesheets maintainable will not only save us future headaches, but it will also speed up our sites since the CSS files will be smaller and will download more quickly.</p>

<p>Before writing any code, we should pull out our layout comps and review them, looking for patterns. The patterns could be large and easy to spot like column arrangements or page block arrangements, or the patterns could be small like a box for entering login information. The patterns could be structural like column or block layouts, or the patterns could be stylistic like font and color choices. But finding the patterns is only the first step. When thinking about how to style these patterns, we need a paradigm shift in our approach. Instead of writing style selectors using IDs and element names, we will use classes to describe the elements&rsquo; identities and states. As a simple example of this, we&rsquo;ll look at a small toolbar of zoom controls.</p>

<p>When thinking on how to style a zoom toolbar, our first inclination might be to style the entire toolbar using its ID <code>#zoom_bar</code> and then style the individual controls using their IDs or element names, either <code>#zoom_bar #zoom_out</code> or <code>#zoom_bar button</code>. But we should stop and think about this for a moment. If we use IDs, then these styles are only applicable to this particular instance of the toolbar. If we want to reuse the styles we have to add another set of selectors, giving us <code>#zoom_bar button, #another_bar button</code>. Now we have two sets of selectors and will probably need to add more in the future. And what if we&rsquo;re styling link states? <code>#zoom_bar a:hover, #zoom_bar a:active</code> would turn into <code>#zoom_bar a:hover, #zoom_bar a:active, #another_bar a:hover, #another_bar a:active</code>. Writing styles this way quickly results in bloated stylesheets with way too many selectors.</p>

<h3>Decouple Your Styles</h3>

<p>Instead of styling elements according to &ldquo;what we call them,&rdquo; let&rsquo;s shift our thinking to &ldquo;reusable patterns&rdquo; and style elements according to two things: their <em>identities</em> and their <em>run-time states</em>. Now, what is the core identity of the zoom toolbar? Zooming is what it &ldquo;does,&rdquo; not what it &ldquo;is.&rdquo; Its core identity is that of a toolbar, or perhaps more generically, &ldquo;a group of buttons.&rdquo; So what if we give it the class <code>.button_set</code>? This refers to its generic identity and is very reusable. As for the controls inside, we don&rsquo;t use their element names either (like &ldquo;button&rdquo;), but instead we create a class based on their identity, which could be as simple as <code>.button</code>. This decouples an element&rsquo;s styling from its markup; now we can use <code>.button_set .button</code> as our selector, which avoids the specificity of a particular instance ID, which is not really reusable. This also gives us the flexibility to use buttons in our zoom toolbar or to change the elements in other (or future) instances of <code>.button_set</code>. What if we encounter an issue that makes us change all the buttons to anchors? By using an &ldquo;identity class&rdquo; and decoupling the styles from the markup, all we need to do is change the markup, not the stylesheets.</p>

<p>The other type of classes we will create for our <code>.button_set</code> is for the elements&rsquo; <em>run-time states</em>. In this instance we may have three different states: the default state, an active state and an inactive state. Creating classes for these states is as easy as creating <code>.active</code> and <code>.inactive</code> classes. Notice that these class names only describe the elements&rsquo; states, not what the elements look like or where they are positioned. Again, this decouples the markup from the styles, making the class selectors generic and extremely reusable.</p>

<p>When making this paradigm shift, we need to be careful to only use classes when defining our patterns&mdash;don&rsquo;t fall back into the trap of using IDs for styling. It&rsquo;s easy to relapse, but because of IDs&rsquo; <a href="http://www.stuffandnonsense.co.uk/archives/css_specificity_wars.html"><span>high specificity</span></a>, we should be especially vigilant about not using them for styling except to define occasional exceptions. And even then we should always ask ourselves, &ldquo;Can I take this exception and make it reusable?&rdquo; If we have to make an exception once, chances are good that we&#39;ll have to make it again later, and then we&#39;ll have two exceptions instead of one reusable pattern.</p>

<p>Please note that one ramification of decoupling our styles is that we may have to use multiple classes on a single element, making Internet Explorer 6 tricky to work with. In general, we should be able to work around this issue by adding concurrent classes onto different elements. But if we <em>must</em> support IE6 with multi-class selectors, we have two options: we can either use a Javascript fix (like <a href="http://code.google.com/p/ie7-js/"><span>IE7-JS</span></a>), or we can create &quot;joining classes&quot; which join together the styles from the individual declarations (e.g. create <code>.one_two</code> instead of using <code>.one.two</code>).</p>

<h3>Our Future Selves Will Thank Us</h3>

<p>So to recap: Object-Oriented CSS is a phenomenal approach to writing stylesheets, even if it has very little to do with traditional Object-Oriented-ness. However, one thing that all CSS <em>does</em> have in common with object-oriented programming is inheritance. Always remember to take advantage of inheritance! If we find ourselves wanting to add the same class or style to several elements, we should make sure it is absolutely necessary; we may be able to get the same results by adding the class or style to a parent element.</p>

<p>Taking advantage of inheritance&hellip;creating reusable patterns&hellip;decoupling our styles&hellip;it&#39;s pretty easy to see the benefits of using this approach on an entire site. Gone are the layered styles, on top of styles, on top of styles. Gone is the need to overwrite some previous style to get the effect we need. With a little careful planning, we all can have lean, fast stylesheets that make maintenance a breeze.</p>]]></content:encoded>
            <pubDate>Thu, 24 Feb 2011 16:53:29 GMT</pubDate>
        </item>
        <item>
            <title>Instrumentation and Observability</title>
            <link>http://omniti.com/seeds/instrumentation-and-observability</link>
            <guid>http://omniti.com/seeds/instrumentation-and-observability</guid>
            <description><![CDATA[

There has been considerable momentum established behind a movement called devops. This momentum is good.  There does not appear to be anyone coming out and saying "this whole devops movement is bad and ignorant." So, as one can assume with no notable...]]></description>
            <content:encoded><![CDATA[<img src="http://s.omniti.net/i/content/seasons/engineer-gears.png" width="450" height="250" alt="Gears and stuff"/>

<p>There has been considerable momentum established behind a movement called devops. This momentum is good.  There does not appear to be anyone coming out and saying "this whole devops movement is bad and ignorant." So, as one can assume with no notable adversaries, it stands to reason that the movement is a "good thing."</p>

<p>The devops movement is often thought of as an effort to bring the operations world into the development (software engineering) world. Statements like "A into B" are vague.  Let&#8217;s be clear on the concept: introduce the wisdom and experience from software engineering into the operations realm.  The software engineering world, over its brief history, has established many excellent paradigms including testing, version control, release management, quality control, quality assurance and code review (just to name a few).  These concepts, while they exist in good operations groups, are admittedly far less formalized and could stand some rigor.</p>

<p>So, while one might think we&#8217;d discuss the merits of software engineering principles in operations in this seed, we&#8217;re happy to disappoint. There are plenty of people talking about this already; they are making excellent points and getting their points across.</p>

<p>We&#8217;re here to speak to the other side of the coin. This is Theo at <a href="http://www.devopsdays.org/2010-us/"><span>DevOps Days 2010-US</span></a>, hosted by <a href="http://www.linkedin.com"><span>LinkedIn</span></a> in Mountain View, California:</p>

<video id="video_player" style="margin-bottom:20px;" src="http://images.omniti.net/omniti.com/video/media-assets/infrastructureascode-opt-1.mp4" controls=""  name="media" height="338" width="450"></video>

<p>Operations is not, and has never been, a janitorial service.  Operations crews are responsible for the impossible: it must be up and functioning all the time.  This is an expectation that one can never exceed.  One can argue that we establish SLAs (service level agreements) to bring these expectations within reason, but SLAs are legal terms that articulate allowable downtime, not desired down time.  Users want services available all the time.  As a result, operations is faced with an impossible task and, amazingly, makes good on unpromised availability more often than not.  Let&#8217;s talk about the <em>not</em>.</p>

<p>Operations is, by definition, the group that operates things.  These "things" encompass the entire technology stack: networking and systems hardware, operating systems, COTS (commercial, off-the-shelf) and open source application software and in-house tools.  Consider the following statement, which seems obvious, but is commonly overlooked: "It is easy to operate software and hardware that is operable."  Many common components in the information technology stack are simply inoperable by our definition.  This is where we get into the meat of things: how does one define operable?</p>

<p>Defining a component as operable is quite simple.  Inevitably, things go wrong.  When things go wrong, they must be understood to be repaired.  Troubleshooting is a zetetic process. To progress, one must ask questions.  These questions must be answered.  This should be plain and obvious to anyone who has ever experienced an unexpected outcome to a situation (technical or not).  So, why is this complicated? To be effective, one must not change the situation during the course of the question. This caveat is where things get complicated and fortunes are made.</p>

<p>To observe a situation without changing it is the ultimate achievement. While Heisenberg believed this to be impossible (and we agree), one can achieve a reasonably small disturbance during observation. An excellent example is the classic philosophical question, "If a tree falls and no one is there to hear it, does it make a sound?"  Let&#8217;s think about that question for a moment to better understand impact and side-effect.  Is the sound of the tree falling more or less likely to affect the overall situation than the actual destruction and subsequent felling of the tree?  The problem with many observation systems is that, in order to observe the sound of the tree, they must hew a tree during every instance of observation. We suggest a different approach.</p>

<p>Many systems have critical metrics, which are diverse and specific to the business in question.  For the purposes of this discussion, consider a system where advertisements are shown.  We, of course, track every advertisement displayed in the system and that information is available for query.  Herein the problem lies.  Most systems put that information in a data store that is designed to answer marketing-oriented information: who clicked on what, what was shown where, etc.  Answering the question, "How many were shown?" is possible but is not particularly efficient.  In order to answer the question, one must hew the tree and wait to hear the sound of its fall.</p>

<p>Instead of asking analytic questions, applications should expose this information as a consequence of normal behavior.  Just as the sound of the tree falling is a natural consequence of the act of hewing, the ad serving system is responsible for tallying the total impressions and exposing that information to those that care.  No significant work need be performed by the application to answer this question, just a pre-calculated response to a simple question. This enables a new way of application observation where witnessing metrics and their changes requires no substantial work by the application.  This paves the way to new types of application monitors (for example high-frequency monitors) that need not worry about altering the situation by observing it.</p>

<p>Not all questions can be asked before a problem occurs.  This is where observation ends and instrumentation begins.  Instrumenting code allows new questions to be asked and subsequently answered in a running environment.  A system admin or developer may look at a malfunctioning system and think, "How do I recreate this situation in a test environment?"  The reason we ask that question is because debugging in production is taboo. If a developer instruments code well, profound knowledge of the problem may be derived without the risk of altering its state.  <a href="http://dtrace.org"><span>DTrace</span></a> is the king of these systems and its adoption across various operating environments is growing.  Nevertheless, no one should argue that they should throw in the towel just because they don&#8217;t have DTrace available to them.  While powerful instrumentation might elude those without DTrace, we&#8217;ve found that we can get most of the way there with careful logging (a poor-man&#8217;s instrumentation) and continuously exploring critical metrics to expose for observation.</p>

<p>Many architectural components today provide an HTTP interface, primarily via a REST API.  Use it! Extend the HTTP server to expose critical component metrics via HTTP.  Use JSON, or use the <a href="https://labs.omniti.com/resmon/trunk/resources/resmon.dtd"><span>Resmon XML DTD</span></a>.  In Java, expose metrics via a Bean accessible via JMX. This can be a bit frustrating because Java-centric tools must be used to observe it, so instead, just expose those metrics via a servlet. There is even some free code for that: <a href="http://labs.omniti.com/labs/reconnoiter/browser/trunk/src/java/com/omniti/jezebel"><span>Resmon Java Servlet</span></a> (see Resmon.java and ResmonResult.java).  Exposed metrics can be tracked, trended and alerted on easily using tools like <a href="http://labs.omniti.com/labs/reconnoiter"><span>Reconnoiter</span></a> or <a href="http://circonus.com/"><span>Circonus</span></a>.</p>

<p>Making applications operable means that never again should operations personnel be stuck on the question, "The application appears hung, I wonder what it is doing?" All production code should be prepared to answer questions such as these at any time. "What are you doing?" and "How long is it taking?" are perfectly reasonable questions to ask of any piece of production code and you should demand a prompt and accurate answer.  The resulting metric data is consumable by both dev and ops teams, and even by those teams&#8217; managers.  After all, trending metrics is not just about detecting problems.  It is also fundamental to quantifying success.  This is what it means to be operable.  Software engineers everywhere, please make your software operable!</p>]]></content:encoded>
            <pubDate>Wed, 08 Sep 2010 20:30:52 GMT</pubDate>
        </item>
        <item>
            <title>Fast by default?</title>
            <link>http://omniti.com/seeds/fast-by-default</link>
            <guid>http://omniti.com/seeds/fast-by-default</guid>
            <description><![CDATA[ I recently attended the O&#8217;Reilly Velocity 2010 conference in Santa Clara, CA. For the past two years this conference attracted some of the smartest minds in web performance and web operations; this year did not disappoint.


 ~ James Duncan Davi...]]></description>
            <content:encoded><![CDATA[ <p>I recently attended the O&#8217;Reilly <a href="http://en.oreilly.com/velocity2010"><span>Velocity 2010</span></a> conference in Santa Clara, CA. For the past two years this conference attracted some of the smartest minds in web performance and web operations; this year did not disappoint.</p>

<img src="http://images.omniti.net/omniti.com/i/b/circonus-at-velocity.jpg" alt="Velocity 2010 Exhibit Floor"/>
<p class="cite" style="text-align:right;font-size:0.75em"> ~ James Duncan Davidson (<a href="http://www.flickr.com/photos/oreillyconf/4729116486/"><span><span>original</span></span></a>)

<p>Several exciting things debuted including <a href="http://opscode.com"><span>Opscode</span></a>'s hosted platform, Yahoo!'s <a href="http://hacks.bluesmoon.info/boomerang/doc/"><span>Boomerang</span></a> and our very own <a href="http://circonus.com/"><span>Circonus Enterprise Platform</span></a>.</p>

<p>Each year, the conference has adopted the mantra: "fast by default." This statement, largely applying to the web operations track, is an excellent theme.  The concept is that speed is feature number one and that your success as a online company is intrinsically tied to how users perceive the performance of your online presence.  This is true, the numbers tell us so.</p>

<p>The interesting part about web performance is that user-perceived performance comes from three separate elements: computation done by the service, computation done by the user and act of getting data between the two.  Velocity really focuses on the latter two: how do I optimize how content is delivered to my users and optimize how it performs once they&#8217;ve got it? This perspective is incomplete.  Should Velocity change to address all three elements? I say no. The audiences are different, the problems are different and there is no need to mess with a good thing. <a href="http://omniti.com/surge"><span>Surge</span></a>, on the other hand, only concentrates on the first element: server side performance and scalability.</p>

<p>Let&#8217;s face it, the server-side architectures that power today&#8217;s web services are as unique as the services they power.  Each site has its own unique challenges that come with its size, technologies, audience, offering and promises. Not to trivialize the web performance challenge, but the techniques used to increased user-perceived importance in transit and on the client side are largely the same from site to site (clean, small and effective DOM, CSS and Javascript, correct caching, image sprites, HTTP compression, etc.).  However, on the server side is where the unique magic happens.</p>

<p>Do you really think that the technology powering Google&#8217;s new Caffeine search indexer could be leveraged easily to help your internal service delivery platform? No. For a user to use your service, in an over-simplified form, they provide some input and receive from output.  Each time they ask a question and expect a result, you must "do some work."  Herein lies the challenge.</p>

<p>In a previous installment "<a href="http://omniti.com/seeds/yslow-to-yfast-in-45-minutes"><span>YSlow to YFast in 45 Minutes</span></a>", I explored reaping low-hanging fruit to achieve user-perceived speed-ups on this very site. The main effort there was to shorten the event horizon to render by removing, shortening and/or parallelizing various assets on which a page depends.  The obvious, but often ignored, part is that it all starts with a single request: "the page."</p>

<p>On the OmniTI web site, there is very little going on and as such, you&#8217;d expect that very little time is spent on our end "doing work" to give you the page content.  If you <a href="http://omniti.com/i/b/yfast-visit1.png"><span>look at the details</span></a>, you can see that to be true: 44 milliseconds waiting for data to start and 25 additional milliseconds waiting for the data to come down the pipe. This is relatively fast. This is not always the case; in fact, it often is not the case.</p>

<p>I was quite interested in this division of time and asked my helpful friends at <a href="http://keynote.com/"><span>Keynote</span></a> for some aggregate information. That information paints a rather interesting picture.  The average speed of a "web page load" comes in at over 2 seconds.  Obviously, these 2 seconds are split in some fashion amongst our three buckets.  What may be quite surprising is that, on average, 290ms seconds is spent server-side.  I speculate this is due to one of two reasons.  Most commonly, it is due to a lack of attention to how the architecture internally operates resulting in sloppy code and data architecture. To me, this is the better of the two reasons.  The other reason is a focus on "scale-out" with a blatant disregard for a maximum acceptable service time.</p>

<p>One web performance company, who shall remain nameless, actually spends as much as 2 seconds "thinking" before sending data to the client, producing an awful waterfall.  Note that the client-side performance is quite excellent, but still the user waits uncomfortably long.</p>

<a href="http://omniti.com/i/b/site-slow.png"><span><img src="http://omniti.com/i/b/site-slow.png" alt="Waterfall of Painful Initial Asset"></span></a>

<p style="margin-top: 1em">To put this in some perspective, a processor today can operate around 2.9GHz (that&#8217;s 2.9 billion instructions per second). 290ms sans a conservative 90ms of round-trip latency is 200ms of operating time or 580 million CPU instructions. The disturbing part of this is that most of what Keynote monitors is landing pages or specific hot paths, so many other pages on these websites are slower.  We all know that most websites today are more complicated than a single machine serving information, so a direct correlation of service time to CPU cycles is deeply flawed; however, I still believe it is illustrative, useful and compelling.</p>

<p>Furthermore, if your system is spending 200ms servicing a single request, you can do the simple math to find that even on an 8-way system, you can still only serve 40 requests/second. As your demand increases, you must add more and more machines. While provisioning these machines used to be challenging, the cloud has played on general performance-optimization delinquency and made this approach seem acceptable by making massive machine provisioning easy.  I&#8217;m here to tell you it is not acceptable.  Not only is it environmentally wasteful (using power and generating unnecessary heat), it is also wasteful of shareholder investment.  Faster sites running more optimally generate shareholder value.</p>

<p>The pervasive focus on front-end performance is explained by the easy gains that can be seen from relatively little investment.  However, as the numbers show, for most sites this simply isn&#8217;t enough to compete. <a href="http://www.scribd.com/doc/16877317/Shopzillas-Site-Redo-You-Get-What-You-Measure"><span>Shopzilla recently completed a 12 month engineering effort</span></a> to rearchitect their application because the server-side was too slow (pushing 8 seconds).  Now that it is blazingly fast, they have less infrastructure to maintain per dollar of revenue and an increase in revenue of 7-12%.</p>

<p>Attention to internal performance in fundamental to the success of online businesses.   Many of the larger web-based companies have smart people on staff that take performance seriously.  If you need help, this is what we work on for <a href="http://omniti.com/does"><span>our clients</span></a> everyday at <a href="http://omniti.com/"><span>OmniTI</span></a>.</p>]]></content:encoded>
            <pubDate>Tue, 13 Jul 2010 14:00:00 GMT</pubDate>
        </item>
        <item>
            <title>The cloud is great. Stop the hype.</title>
            <link>http://omniti.com/seeds/the-cloud-is-great-stop-the-hype</link>
            <guid>http://omniti.com/seeds/the-cloud-is-great-stop-the-hype</guid>
            <description><![CDATA[Cloud computing isn&#8217;t new, though I&#8217;m sure you&#8217;ve heard more about
it in the last few months than you did previously. The cloud is an
amazing thing, but one that is poorly understood. I believe this lack
of understanding stems from te...]]></description>
            <content:encoded><![CDATA[<p>Cloud computing isn&#8217;t new, though I&#8217;m sure you&#8217;ve heard more about
it in the last few months than you did previously. The cloud is an
amazing thing, but one that is poorly understood. I believe this lack
of understanding stems from technology confusion which is trumpeted by
corporations that have identified "the cloud"
as <a href="http://www.wikinvest.com/concept/Cloud_Computing"><span>a medium
for expansion and profit</span></a>. Don&#8217;t get me wrong, the cloud is useful
&mdash; but I hear some of the dumbest reasons why.</p>

<p>Before I launch my rant, I&#8217;ll qualify that <abbr title="Software as a Service">SaaS</abbr> existed before "the
cloud," yet in many defintions (like the link above) it is considered
a cloud service. I consider the cloud to be <em>only the infrastructure</em> because the software and the platform
has been provided by a third-party successfully before the term
"cloud" arrived. It isn&#8217;t fair to legitimize your concept by
repackaging two successfully proven technologies under your brand.</p>

<h2>The Cloud</h2>

<p>The cloud&#8230; what is it? A cloud is an infrastructure in which I
can provision computing systems. What makes this different from a
rack of servers? Very little, actually. The most important
difference is that provisioning of these systems is made convenient.
When a system is needed, the requester can programmatically start a
new one and needs not be concerned with network infrastructure,
machine specifications, power, cooling, etc. The cloud is built by
someone who cares about all of those things, but then it is packaged
in an easily consumable fashion. How does this happen? Well, this is
where people get confused.</p>

<h3>Virtualization</h3>

<p>This simple provisioning is empowered by some sort of
virtualization technology like <a href="http://www.xen.org/"><span>Xen</span></a>
(likely one of the commercial
implementations), <a href="http://vmware.com/"><span>VMWare</span></a>, <a href="http://www.sun.com/software/solaris/containers/index.jsp"><span>Solaris
Containers</span></a>
(Zones), <a href="http://www.parallels.com/products/pvc45/"><span>Virtuozzo</span></a>/<a href="http://wiki.openvz.org/Main_Page"><span>OpenVZ</span></a>,
etc. Why is this confusing? Beats me, but I see people listing the
advantages of virtualization as advantages of the cloud. As with most
technologies, you inherit the advantages of your foundation.
Virtualization brings a lot to the table, but you don&#8217;t need "the
cloud" to get it. Period.</p>

<p>The concept of private and public clouds is also poorly defined.
Some people hate the two terms, while others define them in useless
ways. I&#8217;ll define them in a very practical way in which the
differences have deep business meaning.</p>

<h3>Public Clouds</h3>

<p>The public cloud is Amazon&#8217;s EC2 and other similar "cloud
providers" where the owner of the underlying physical infrastructure
and the owner of the services running on the provisioned systems are
not the same. In this environment, your services run on someone
else&#8217;s equipment. What does this mean?</p>

<p>If they don&#8217;t pay their bills, the equipment can be seized. Other
companies may be running virtual environments on the same hardware,
same disks, same network. This means bugs in virtualization and
data isolation could result in information disclosure &mdash; the
really bad kind. At this point in time, I can&#8217;t envision a way to
make public cloud
infrastructure <a href="https://www.pcisecuritystandards.org/security_standards/pci_dss.shtml"><span>PCI-DSS</span></a>
compliant &mdash; and even if you could, I believe it increases the
possibility of compromise.</p>

<p>No virtualization is perfect (yet) in resource provisioning. This
means that defining a reliable performance expectation for a node in
the cloud can be very challenging.</p>

<p>It&#8217;s not all negative though. Because public clouds are popular,
they tend to have ample resources, which means more room for growth,
and a provisioning request is "less likely" to result in an message
that says "better luck next time, I&#8217;m flat out of horses."</p>

<h3>Private Clouds</h3>

<p>Private clouds are not shared. A private cloud is deployed by an
organization that wants the benefits of a cloud, but wants the
processes and premise controls over the infrastructure that powers it.
The key differences between a private cloud and public one are control
and size.</p>

<p>In a private cloud, you have fine-grained control over geographic
location. This can be important for meeting data availability and/or
redundancy guarantees made to clients. It can also be useful for
ensuring that at least part of your infrastructure is in a country
whose laws more closely align with your business needs.</p>

<p>There are clearly enormous advantages in the private cloud, in that
data security exists and the design and operation of the private cloud
can be congruent with business requirements providing more aligned
availability and consistent performance. The downside is that it is
likely to have more limited resources &mdash; provisioning 1000 new
instances is far more likely to result in a failure due to
insufficient resources.</p>

<p>So how big is big? In my experience, when you hit a run rate of 40
instances, build yourself a private cloud. That&#8217;s the point at which
it becomes undeniably cheaper.</p>

<h3>Resources</h3>

<p>One distinct and measurable difference between how private and
public clouds can be run is seen in the choice of virtualization
technology. Public clouds, by their nature, must isolate resources
between customers as extensively as possible to achieve acceptable
quality of service. There is no trust or cooperation between
virtualized customers.</p>

<p>No virtualization technology does this perfectly, but some do a
better job than others. Xen-based, and VMware-like solutions are some of the
more capable in this arena. Because both implementations run
completely separate operating system environments from a hypervisor,
they tend to segregate the guests more thoroughly by sharing less
resources.</p>

<p>This is good for guests, but bad for resource utilization. If I
need as much as 16GB of RAM for my instance and I&#8217;d like to run 8 of
them, that means I need 128GB of RAM in my host machine &mdash; that&#8217;s
an expensive box. On the other hand, if I need very little RAM (say
256MB on average of which 128MB is kernel and OS related processes)
the hypervised virtualization becomes quite bulky.</p>

<p>On the other side of the virtualization field are technologies like
OpenVZ and Solaris Containers (a.k.a. Zones). These technologies
share a kernel (and usually a filesystem buffer cache) across
guests. CPU resources can be sliced up, but memory (as it is shared)
is a challenge to dedicate cleanly to individual guests. While this
is clearly a bad (or at least challenging) thing for public cloud
providers, it is often completely acceptable for private cloud
needs.</p>

<p>The advantage of this "lightweight" virtualization is that you can
pack more guests onto a single host. We regularly run 40 Solaris
Zones on a single commodity server without issue. It is particularly
useful for applications that are low-powered, but in need of multiple
instances to meet their availability commitments.</p>

<h2>Burning the Straw Man</h2>

<p>Now that we know what clouds are, what&#8217;s the problem? The hype. The
hype is the problem. With hype come straw man arguments that delay or
hold back the healthy evolution and incorporation of this
technological paradigm.</p>

<h3>Argument 1</h3>

<blockquote><p>I need the cloud. In the cloud, if I need to deploy 50
machines, I can just do it. Without the cloud, I have to buy servers
and wait weeks for install and spend hours <span class="end-quote">installing
them.</span></p></blockquote>

<p>Deploying 50 new instances in a cloud is easier than 50 new
physical machines. But just because you can, doesn&#8217;t mean you should.
If it takes hours to install new machines, then you are doing your job
wrong. If it takes weeks to get your machines, then you are using the
wrong vendor. And <em>most importantly</em> if you suddenly realize
that you need 50 new machines, then you simply didn&#8217;t do your job
well. The cloud is not an excuse to avoid a business model. A
business model includes a budget and a solid, implementable plan for
growth based on thorough capacity planning. With that, you should see
it coming.</p>

<p>There are two reasons I hear when people justify the need to deploy
a large number of new machines, and both arguments fall apart when you
take a closer look.</p>

<h4>Argument 1a</h4>

<blockquote><p>Holy cow! Look at that traffic! I need fifty new instances. <span class="end-quote">Now!</span></p></blockquote>

<p><a href="http://omniti.com/seeds/dissecting-todays-internet-traffic-spikes"><span>I
know a bit about sudden traffic spikes.</span></a> If you need 50 machines suddenly to
handle a traffic spike, then, in all likelihood, you have built something wrong
and no amount of
provisioning will help. I&#8217;ve had the privilege of working with some
of the largest sites on the planet. I&#8217;ve seen traffic spikes of
10000% happen inside 30 seconds, but then again I&#8217;ve also seen more
than a gigabit of production traffic served to the masses off two $3k
USD boxes. If you are in that situation, you need a plan &mdash; and
it likely shouldn&#8217;t include "Oh shit! Bring 50 more instances
online!"</p>

<p>If you are providing a service that is unavoidably computationally
intensive, you actually have a solid argument. This is rare and I&#8217;ll
touch on that later.</p>

<h4>Argument 1b</h4>

<blockquote><p>I have a lot of developers and they each need their own
instance <span class="end-quote">quickly and easily.</span></p></blockquote>

<p>This is actually an awesome argument for the cloud. However, since
these are development instances, they don&#8217;t consume resources in the
same way that production instances do. We give out instances like
candy at OmniTI and typically can sustain about 40 instances on a
single $3k USD box using lightweight virtualization. CapEx and OpEx on
that are basically non-existent compared to an EC2 bill for the
same. As you can see, this is an argument for virtualization, not the
cloud.</p>

<h3>Argument 2</h3>

<blockquote><p>I want to use the cloud because that way I don&#8217;t have
to worry about networking and hardware <span class="end-quote">management.</span></p></blockquote>

<p>Network management has to happen. Hardware management has to
happen. You pay for it one way or you pay for it another. I&#8217;ve heard
people say that it takes countless hours per month to run 40 systems
including servers, switching equipment, routing, firewalls, etc. We
<a href="http://omniti.com/does/architecture-and-infrastructure"><span>manage
around 1000 servers at OmniTI</span></a> and from our immaculately maintained
time tracking system I can tell you that less than 35 hours per month
are spent on hardware provisioning, systems installation and concerns
of space/power/cooling. That comes out to about 2 minutes per machine
per month. Furthermore, I don&#8217;t have any reason to believe
that a cloud provider can do a significantly better job.</p>

<p>So, if so little time is spent on hardware and infrastructure
management, why does OmniTI have a busy ops team? Because
we&#8217;re doing all the <em>other</em> stuff. Configuring software,
performance tuning, and monitoring systems; monitoring systems to an
egregious and offensive level. I&#8217;m not speaking of CPU temperature
and disk failures (everyone monitors those). I&#8217;m talking about
realized I/O ops per spindle, network packets per interface, HTTP
response times, SSH keys, ICMP response latency, DNS, database health,
application-level correctness and, most importantly, business level
metrics. If you find this intimidating, look
at <a href="http://circonus.com"><span>Circonus</span></a> as an enablement
platform. If you like the cloud and/or SaaS, you&#8217;ll love this
service.</p>

<p>The operations team is the one place with access to data and
traffic that is "real-time enough" to detect business issues before
they manifest in significant monetary loss. Traffic anomalies,
chargeback rates, visitor retention&#8230; all these translate into money.
This is what ops does; they make things work; they make the business
work. And they spend a lot more time trending, investigating
and analyzing than they do replacing hard drives and network
cards.</p>

<h3>Argument 3</h3>

<blockquote><p>I can provision quickly in <span class="end-quote">the cloud.</span></p></blockquote>

<p>Yes. Yes you can. This is due to virtualization, not the cloud.
Download a virtualization technology and provision quickly outside the
cloud. I suppose that if my OS natively supports Virtualization (like
all modern OSs do), and my operations team leverages that to deploy
new instances quickly and easily, then we&#8217;ve created a cloud whether
we like it or not. Damn terminology. While it is now called a
"private cloud," I tend to just call it infrastructure operations.</p>

<h3>Argument 4</h3>

<blockquote><p>Operating in the cloud makes your environment more
resilient because you have to accommodate <span class="end-quote">unexpected
failures.</span></p></blockquote>

<p>What? This has to be the most back-assward statement I&#8217;ve heard on
cloud computing. Eagerly adopting an environment with a higher
failure rate because it forces you to be a better engineer? Well,
that&#8217;s not an engineer I&#8217;d hire. Good engineers have always known
that things can fail and have always had to design to accommodate that
truth &mdash; incessant reinforcement by some public cloud providers
is unwelcome and unneeded in this case. Assuming a well engineered
system (which should be an expected outcome of any engineering group)
the goal should always be to minimize the likelihood of failure within
budget.</p>

<h2>What the Cloud Lacks</h2>

<p>In addition to dismantling poorly constructed arguments for the
cloud, I thought I&#8217;d detail some of the things I find completely
missing in the cloud.</p>

<p>Generalization is the root of all evil when it comes to
performance. Just because you know how to use MySQL or PostgreSQL
doesn&#8217;t mean it is the right tool for every data storage need. People
have learned this lesson fairly well. In cloud infrastructures, there
is a goal to make systems alike to improve price points for capital
expenditure, reduce operation expenditure (slightly) by learning one
type of system well, and make the provisioning system simplistic. This
leads to the abomination that is "small," "large," and "huge" instance
sizes at some cloud providers.</p>

<p>As an engineer, when I have to build a system for a purpose I
specify as much as possible. AMD vs. Intel vs. Sparc? How many gigs
of RAM? What <em>speed</em> should the the RAM be? How much storage
do I need? How many I/O operations per second are required? Should I
use SSDs? How many networks must the system be on? Should we be using
link aggregation or not? VLANs? No VLANs? These are all important
things. If you need these things sometimes and everything has to be
the same, then you get these things all the time &mdash; paying for it
when you don&#8217;t need it.</p>

<p>It is a reality that when systems are specified, compromises are
made due to vendor relationships and part availability. However, the
requirements that drive these specifications still exist and are at
the root of the decisions: for instance I need 16GB of non disk-buffer
memory for working sets and 10,000 I/O operations per second. That
simply doesn&#8217;t translate to three cookie-cutter sizes.</p>

<p>Data is a big issue. There are a lot of companies out there working
on solving the data security issues that exist in public clouds
&mdash; let&#8217;s assume for a second that this is no longer an issue. A
follow-on issue is that the cloud is "out there" and the only way to
get data into and out of it is via the drinking-straw that is its
uplink. Drinking-straw you ask? Yes. The internet is, even today, not
as fast as a tractor-trailer full of tapes. If I have 10 TB of data
(which is extremely reasonable for any business intelligence system
these days), how do I back it up? I need a copy of that data off-site
and secure. We have some creative solutions around this using ZFS, but
still &mdash; I am contractually obligated to have my tapes (or some
other off-site and <em>off-line</em> storage medium). Private clouds
do not have this issue.</p>

<h3>Scaling Out or Scaling Up</h3>

<p>So many people talk about scaling out. Scaling out. Scaling out.
Scaling out. Scaling out is an excellent approach to tackling
requirements that cannot be easily accomplished on today&#8217;s hardware.
Not everything needs to be scaled out. I hear people say "I&#8217;m going to
have millions of records, I need to make sure my design can operate on
many machines." Millions? You&#8217;re going to go through the effort of
tackling distributed systems problems for a million rows? You have
priority issues. A single machine (with failover) is enough to do most
jobs. People lose sight of this too often. Making things redundant
(hot failover) is a lot easier than making them actively
distributed. So, if you can get away with scaling something
vertically, do it.</p>

<p>There are many cases where the growth of a specific system
component simply outpaces the availability of reasonably priced
hardware to scale it vertically. In these cases, you should make your
problems smaller. (You&#8217;d be surprised what can be accomplished over
beers with an expert in the field). If that fails, then you roll up
your sleeves and design your system to scale horizontally. Very few
systems require horizontal scalability from soup to nuts.</p>

<h2>Where It Works</h2>

<p>I said before that if you need to spin up 50 instances you clearly
didn&#8217;t do a good job planning. I&#8217;ll recant that and better qualify
where that is acceptable. That is acceptable when that is your well
thought-out plan. When would you need to spin up 50 new instances?
Let&#8217;s say you need to transcode a ton of video, let&#8217;s say you need to
sequence some DNA, let&#8217;s say you need to use a lot of computational
resources for a brief period of time and that is essential to your
business model. This is where the cloud shines like a super-star.</p>

<p>For computationally intensive tasks that are irregular, the idea of
batching work into a cloud of compute nodes is an excellent one.
Here, the advantages are clear. Given that each job can really gobble
up CPU resources, you can&#8217;t leverage the consolidation that
virtualization offers. At this point, the disadvantages are purely
the outcome of an equation of economics. How much does a CPU-second
cost and how much does it cost me to move the input for my job into
the cloud and extract the output from the cloud: instance costs and
bandwidth costs.</p>

<h2>The Honest Truth</h2>

<p>While it may appear that I hate the cloud, it simply isn&#8217;t so. I
hate the half-baked arguments for it. I hate the hype. It is a
perfectly legitimate tool in the already large arsenal of engineering
tools. Use the cloud where it makes sense, but please stop bludgeoning
me with the hype.</p>]]></content:encoded>
            <pubDate>Tue, 23 Mar 2010 00:00:00 GMT</pubDate>
        </item>
        <item>
            <title>Online Application Deployment: Reducing Risk</title>
            <link>http://omniti.com/seeds/online-application-deployment-reducing-risk</link>
            <guid>http://omniti.com/seeds/online-application-deployment-reducing-risk</guid>
            <description><![CDATA[Version control systems are nothing new to the world of software
development.  I&#8217;ll take the time now to unapologetically call you
an idiot if you don&#8217;t already have all your code and configurations in
a version control system. Once you sta...]]></description>
            <content:encoded><![CDATA[<p>Version control systems are nothing new to the world of software
development.  I&#8217;ll take the time now to unapologetically call you
an idiot if you don&#8217;t already have all your code and configurations in
a version control system. Once you start using version control, there
are several approaches available and, interestingly, online
applications work turns out to be profoundly different than
shrink-wrapped software.</p>

<h2>Traditional Software Development</h2>

<p>With shrink-wrapped software, you have features and fixes that are
integrated into the product and effectively queued up into what is
called a release.  Development of features is performed in version
control on what is commonly called a "branch" that allows isolation of
the developed feature until it is in an acceptable state to be
"integrated" back into a main line of development (also a branch) that
is used for integration testing.  Eventually, the features are merged
into a release branch and find their way to clients.  Bug fixes and
security related issues are addressed in a similar fashion (sometime
backwards in the process to fast-track their release to consumers of
the product).  This is a fly-by, over-simplified description of the
typical software development life-cycle.</p>

<p>A lot of people believe that how one manages to a release can
profoundly affect the product; two common strategies being "agile" and
"waterfall."  I&#8217;ll argue that both are valid, both have their place,
and both work in traditional software development.  The end goal is
the same: ship a quality product within the bounds of expectations set
by product management with the clients.  Typically, product releases
are made available to clients on regular intervals.  I&#8217;ve commonly
seen three, six, twelve and even 18 month release cycles.  Bug fixes,
security updates, patches, hot-fixes (they have many names) are
released more frequently (monthly, or for problematic products
weekly).  The client is responsible for upgrading their systems and,
if either feature or fix releases happen too frequently, the process
can become overly burdensome.</p>

<p>I&#8217;ll come out and make a rather unconventional claim: the approach
described thus far only works well when the number of clients using
the software is larger than one.  The larger the user-base, the better
this model works.  It might seem at first that large web sites that
have millions of users would be able to use this model to develop
their service, but now we&#8217;ve just exposed the crux of the issue.
Millions of users use their service, not their product. In fact, in
most cases, the only user of the actual software product is that
single web site. This alone shakes the foundation of traditional
development paradigms. Online environments have many parameters that
make this approach untenable.</p>

<h2>Get it on(line)</h2>

<p>Developing software for an online service, and developing
traditional software, have some fundamental differences.</p>

<p>In the online world, a software product drives a service to which
users have access. There is, most often, a single copy of the actual
software product in use. There is one consumer of the software:
you. The users are consumers of the service built atop the
software.</p>

<p>Most online services have thousands, if not millions, of users, and
as such the tolerance for disruptive upgrades is reduced (often
eliminated). You are forced into an environment where each production
upgrade happens only once, there are no practice runs and it simply
has to work.</p>

<p>In a traditional software model, new features can be distributed to
clients that are less risk-averse as a part of an early-adopter
program (a.k.a. beta program or tech preview program).  This approach
allows varied real-world tests of the new features so that when they
are made generally available in the product, the confidence in their
correctness and performance is sufficiently high. This simply doesn&#8217;t
work when the software you write for your service is only used by one
client.</p>

<p>Perhaps most challenging is the pace at which competition moves.
In the online world, I can have an idea this morning, an
implementation this afternoon and every client of my service that
shows up tomorrow will see it.  In fact, things can and do happen much
faster than that.  You might think that rapid concept-to-availability
push is reckless.  You might be right. But, your competition is doing
it.</p>

<p>The question is, how to you maintain a competitive pace and conquer
all these challenges, when the odds are stacked against you? The real
problem here is that the traditional software model bundles many
changes into a release and even the tiniest mistake can result in a
failure of the entire release (one mistake can break the whole
product). Each change should always be accompanied with a reversion
plan. Sometimes those plans are as easy as redeploying the product
sans the change, sometimes they are more involved. When hundreds of
changes are combined into a single release, the reversion to a
previous release becomes the intricate mess of hundreds of change
reversion procedures. When posed in these terms, the answer becomes a
bit more clear.</p>

<p>Each change could contain a mistake that could cripple the product.
However, if we make each change its own release, then the failure is
isolated to a micro-release that can be reverted with much less
disruption.</p>

<p>This leads to the very controversial technique of "deploying from
trunk."  Trunk (or HEAD or tip) is a version control term describing
the bleeding edge of the product. As people work on fixing regressions
and other bugs, as well as add new features into the product, they are
adding, modifying and removing code and configuration from version
control. If these changes are applied continuously and micro-releases
are done continuously, when the inevitable mistake occurs the
reversion process is isolated and prevents rollback casualties.</p>

<p>What&#8217;s a rollback casualty? If I make change A and you make change
B and they make their way into a single release, we have a casualty if
either (but not both) change has a bug requiring reversion. Due to my
mistake in A, we need to downgrade the product to the previous release
inducing a rollback of your perfectly functioning work on B. What&#8217;s
worse is that you could have put a lot of work into B ensuring that it
was done perfectly because you know that rolling it back would be
painful, but I knew that rolling back A would not be disruptive so I
was much less careful. This is just a nasty mess all around.</p>

<p>Big changes are scary, there&#8217;s a lot to test and a lot to plan. By
making micro-releases you amortize the risk by investing in deployment
efforts in a highly granular fashion.</p>

<p>So the real question is: How do you make this safe? Online
applications are not just a piece of code being run. They consist of
many moving parts that each change (often independently), but all
depend upon each other for correct operation; this is what makes
rolling back certain failed deployments so challenging. It might be challenging, but success is sweet: <a href="http://docs.google.com/viewer?a=v&q=cache:M3l1zbSSaRkJ:qconlondon.com/london-2008/file%3Fpath%3D/qcon-london-2008/slides/RandyShoup_eBaysArchitecturalPrinciples.pdf+ebay+"wired+off"&hl=en&gl=us&pid=bl&srcid=ADGEESiS2p2wu7dq6DMLDdaX0wqQtSXFRiDRUiWVJ8awjF3V4tm5pch8g5YKIOaIu675YRNrn0HtYxOzvfc82SKJwsY8uvmRTE8z_1MywgSCcB2FQM2VxXhIs2lCV9cF9bJ_ZXeEUaDd&sig=AHIEtbSXi5BdmtnIMI-fcx9RhwkNsKRpFg"><span>eBay</span></a>, <a href="http://codeascraft.etsy.com/category/operations/"><span><span>Etsy</span></span></a>, and <a href="http://code.flickr.com/blog/2009/12/02/flipping-out/"><span>flickr</span></a>.  It&#8217;s a tricky
balance that combines various philosophies:</p>

<ul>
 <li>"devops": engineering and operations are married and need to
     collaborate</li>
 <li>micro-releases: releases must never get too large, instead amortize
     risk with small, controlled releases</li>
 <li>dark launching features: building the feature out over time in a
     deployed and operational form to be simply "turned on" when
     properly qualified</li>
 <li>wired off: the approach that features should have on/off
     switches to provide an alternative to rolling back
     deployments</li>
 <li>fail forward: when things go wrong, have a solid plan to work
     forward to success (within your SLAs) instead of rolling back and
     trying again later.</li>
</ul>

<p>Each of these techniques require their own in depth despcription,
so we&#8217;ll leave that for future Seeds articles.  For now, just consider
that a traditional software engineering mindset can put you at a
desperate disadvantage in the world of online software
engineering.</p>]]></content:encoded>
            <pubDate>Wed, 17 Mar 2010 13:30:08 GMT</pubDate>
        </item>
        <item>
            <title>Marketing Malware</title>
            <link>http://omniti.com/seeds/marketing-malware</link>
            <guid>http://omniti.com/seeds/marketing-malware</guid>
            <description><![CDATA[
Internet registrar GoDaddy.com is notorious for two things: domain names and risque super bowl commercials. The infamy began in 2005, when GoDaddy paid $4.8 Million for two 30 second spots during Superbowl XXXIX. The commercial featured WWE wrestler C...]]></description>
            <content:encoded><![CDATA[<p class="first">
Internet registrar GoDaddy.com is notorious for two things: domain names and risque super bowl commercials. The infamy began in 2005, when GoDaddy paid $4.8 Million for two 30 second spots during Superbowl XXXIX. The commercial featured WWE wrestler Candice Michelle experiencing a &#8220;Wardrobe Malfunction,&#8221; a clear parody of the half-time show fiasco involving Janet Jackson the previous year. After having 16 variations of the storyboard filmed and rejected by the Fox Network, version number 17 was pre-approved for broadcast during the first and fourth quarters of the game. The 30 second spot in the first quarter drove a web site traffic increase of 1,600% to the GoDaddy site, and then a strange thing happened: the commercial never played a second time. NFL executives purportedly pressured Fox to pull the commercial due to its &#8220;inappropriate&#8221; nature, despite the fact that it had already been paid for, pre-approved by Fox, and initially aired. Led by GoDaddy CEO Bob Parsons, the blogosphere screamed censorship like bloody murder, which only served to fuel additional publicity. In the end, GoDaddy deemed the event so successful that they now define their brand around the &#8220;GoDaddy Girls,&#8221; airing annual Super Bowl commercials that tiptoe along the edge of broadcast acceptability.
</p>
<p>
At the center of the GoDaddy controversy is the correlation between advertising and brand identity. As a content provider, the Fox Network understands that their brand will be held responsible for the quality of both the program content and the advertising they deliver. If either aspect of the broadcast is sufficiently offensive or inept, they risk losing viewers to other stations. Consequently, Fox must provide content that is tame enough to avoid outrage from extremely conservative viewers while remaining provocative enough to satisfy the desire of third party companies seeking to push the envelope with their ad strategy. Content providers in any medium will be judged by the quality of the content they provide, and that includes the use and placement of advertising. This can be a tough balancing act, and Fox isn&#8217;t alone in walking the tightrope.
</p>
<p>
When it comes to the Web, the importance of managing advertising content takes on a new dimension. Hackers are increasingly using fraud and social engineering tactics to infiltrate advertising networks, and then utilizing their position within this circle of trust to inject malware, browser redirects, and cross site scripting attacks on unsuspecting visitors. If these attacks are successfully executed, hackers may steal credit cards, social security numbers, banking information, personal photos and anything else that has been digitized on the victim&#8217;s computer. Alternatively, a visitor&#8217;s computer may be turned into one of many &#8220;sleeper cell agents&#8221; in a botnet, ready to respond to a few keystrokes at any time and become an active participant in a worldwide Internet attack. It isn&#8217;t just web site visitors who are vulnerable. This same strategy can be used by black hat marketing consultants to siphon traffic from one web site to a competing web site or even to blacklist an entire site from the Google search index. The worst part? Hackers are able to target their attacks with profound granularity, making it extremely difficult for anyone within the targeted organization to know the attack is even happening.
</p>
<p>
The risk posed by vulnerable advertising mediums is not merely theoretical. In 2009, I documented two separate exploits that successfully penetrated highly trafficked and popular web sites. Both sites had full time IT teams who were previously unaware that the exploit was occurring. Furthermore, in a now highly publicized <a href="http://www.nytimes.com/2009/09/15/technology/internet/15adco.html?_r=2"><span>event</span></a> in September of 2009, the New York Times was the victim of a malware advertiser who legitimately purchased ad space from the Times while pretending to be a representative of Vonage. IT departments and technical staff are trained to watch site visitors for malicious activity, but painfully few are watching the advertisers.
</p>
<h2>Deconstructing the Hack</h2>
<p>
Client-side arbitrary code execution is the primary culprit behind advertising based attacks. Attackers first gain trust with the target by posing as a legitimate advertiser. This process may be as simple as paying for space in an automated advertising system, or as involved as calling a major corporation while posing as a sales rep or marketing executive from another company. After the attacker has been approved as an advertiser, they develop a custom script to exploit the medium being used. They may start off displaying advertising that looks legitimate, but inevitably they switch to the malicious ad that begins to infect or otherwise manipulate site visitors.
</p>
<p>
Depending on the amount of freedom granted to an advertiser, a variety of techniques may be used to deliver the hacker&#8217;s payload. Many setups allow advertisers to automatically submit a combination of HTML, CSS and JavaScript code to be embedded within the layout of the publisher&#8217;s site. In this scenario, hackers can easily embed malicious scripts by using JavaScript or by including an external Flash SWF in the markup. In the off chance that this code is reviewed at all before publication, it is likely reviewed by someone in marketing or sales who is only examining the submission based on the content currently displayed and is unable to analyze the underlying code for potential security vulnerabilities.
</p>
<p>
Allowing advertisers to place custom Javascript or Flash files inline as part of a web site&#8217;s markup is especially dangerous as the advertiser is no longer restricted by cross-domain access policies. This leaves the advertiser with the power to do virtually anything the web site developers can do, such as posting AJAX requests or altering any element of the DOM. In an attempt to prevent this, some web sites have opted for an alternative setup that links an iframe or traditional frame to a server belonging to the advertiser. However, this approach is also flawed because changes to the advertising code may be made at anytime and the publishing site is powerless to implement a pre-approval process or apply any automated content filtering.
</p>
<p>
Regardless of the method utilized, if an attacker gains the ability to execute custom code on the target site, Pandora&#8217;s box has been opened and virtually all the evil of the Internet may be unleashed. A few fictional yet plausible exploit scenarios include the following:
</p>
<h2>You Won (Malware)!</h2>
<p class="first">
Johann is a regular visitor to Finance Magazine&#8217;s web site. Like most Finance Magazine visitors, he is an investor with a diversified portfolio in mutual funds, bonds, and individual stocks. While browsing the latest financial news, a popup branded with the magazine logo suddenly appears and announces that he won a free 2 Year subscription to FM and a chance to win lunch with Warren Buffett. Lunches with Buffett are normally valued in the millions, and Johann has been wanting a print subscription to the magazine for some time. He clicks &#8220;Accept&#8221; and is asked to download the registration form. At first he is a bit suspicious and doesn&#8217;t understand why the registration form is a downloadable .exe file, <em>but it is a company promotional from the official web site</em>, so he reasons it must be okay. After downloading the application, he launches it and is asked several questions about his financial net worth. He is then asked for his full name, phone number, address, e-mail address and social security number. He is again a bit suspicious and doesn&#8217;t understand why he needs to enter his social security number, but the form does looks very professional. He clicks the info button next to the Social Security Number field and is shown this dialogue message:
</p>
<blockquote>
<p>
You must enter your social security number in order to ensure that &#8220;Lunch With Warren&#8221; contest participants are limited to one entry per person.
</p>
</blockquote>
<p>
With visions of the &#8220;Sage of Omaha&#8221; in his head and the promise of the next printed issue of Finance Magazine at his door step, Johann fills out the form and clicks submit.
</p>
<p>
Several months later, Johann&#8217;s financial life is in ruins. His personal information was used to register for several credit cards and a bank loan was issued for a Mercedes SLK in California. The executable file he downloaded included a custom Trojan Horse virus that allowed the attackers to login to his personal machine. They used this access to acquire his banking information and passwords, which they used four months later to wire over $10,000 to an account in the Cayman Islands. Although Johann is now suing for criminal negligence, his life (and his credit) will never be the same.
</p>
<h2>The Black Hat Who Stole Christmas</h2>
<p class="first">
Brianna is a successful small business owner with an online niche retail store that averages $25,000 in sales and 500,000 visitors per month. In addition to a product catalog, her site also contains high quality articles and videos that pull traffic from Google and allow her to further monetize her online presence by selling custom advertising. Advertising is sold on a month-by-month contract basis, and advertising providers are given an account that they paste HTML, CSS, and JavaScript in so it can be included directly in the site markup. Usually, Brianna&#8217;s sales skyrocket for the entire month of December as Christmas approaches. This year, however, she has seen a 60% reduction in traffic and sales are plummeting. After closely analyzing her site analytics, she realizes that traffic from Google has drastically been reduced. She went from being in the top 10 results for her product niche to not appearing anywhere within the top 5 pages, and she can&#8217;t figure out why. What Brianna doesn&#8217;t realize is that she sold an advertising slot for December to a Black Hat SEO optimizer working for a competitor. As part of his overall strategy for propelling his client to the top, he decided to cut the legs out from underneath the competitors by running advertising templates on their sites that carefully and skillfully violated nearly every technical SEO guideline required by Google. Brianna made a few hundred dollars for the advertising space, but unknowingly violating Google&#8217;s SEO rules would cost her tens of thousands of dollars in the year ahead.
</p>
<h2>Defending Against Malicious Advertisers</h2>
<p>Unfortunately, a foolproof technical solution to malicious advertisement doesn&#8217;t exist. However, this fact is not an excuse for apathy. The risk to both organizations and web site visitors can be mitigated by applying these technical steps:
</p>
<ol>
<li>Preview Ad Submission
<p class="first">
Within your advertising process, a system should be setup to preview all advertising before it is published live. This process should minimally include the ability to preview the way an ad looks and functions, but ideally it will also involve a quick scan of the actual advertising code to check for any unusual pieces of content (e.g. a few lines of encrypted JavaScript). If ad provider&#8217;s can automatically update their content, they should be sure that each new version is also approved before publication.
</p>
<p>
This tactic can not be used if ad placement is achieved by embedding frames or iframes that link directly to a third party server.
</p>
</li>
<li>Restrict Dynamic Advertising
<p class="first">
The single most effective method of preventing advertising abuse is to ban your advertisers from executing any dynamic advertising code. This process is as simple as stripping any scripting tags from advertising content and is extremely effective. However, the potency of this tactic comes at the cost of advertising flexibility as it limits all advertising to stylized images and text. In a medium where small games and other interactive content is often used to garner clicks and entertain eyeballs, this is often not a viable tactic. When scripting code is permitted, certain dynamic content should still be banned. For example, anything that involves alert() or document.location() calls in JavaScript could be stripped while leaving other code in place.
</p>
<p>
This tactic can not be used if ad placement is achieved by embedding frames or iframes that link directly to a third party server.
</p>
</li>
<li>Sandbox Dynamic Advertising
<p class="first">
In scenarios where dynamic content is permitted, it is useful to place ads within frames or iframes to take advantage of cross-domain safety restrictions. Yet as we have seen, using these constructs to link directly to a third party server is a security vulnerability as advertisers can easily change the content displayed without providing the publishers an opportunity to review the changes or strip malicious code. To obtain the best of both worlds, create an advertising sandbox by designating a domain or subdomain specifically to serve advertising content. Then place frames or iframes on your main site that link directly to the new domain. Because you control the ad domain, you will be able to preview ad submission, restrict dynamic advertising, and benefit from the added security that cross-domain security restrictions provide.
</p>
</li>
</ol>
<p>
Even when implementing the safeguards above, the decision to grant an advertiser space on your web site should not be taken lightly. Doing so inherently confers a degree of trust upon a third party. Ensure that trust is well placed by implementing a screening policy for all new advertising sign ups. Such a policy could be as simple as calling companies directly to verify that the representative is authorized to sell advertising on their behalf or as involved as requiring all advertisers to provide copies of their business incorporation license or other government issued identification. Regardless, the threat can not be delegated to the IT staff and forgotten. Marketing and sales play an equally important role, and the safest organizations are those who view security as the shared responsibility of all the members within it.
</p>   
<p>
There was a time when good advertising meant entertaining video or amusing copy. It could be judged purely on face value and the ability to generate ROI. That time has passed. In an increasingly interactive world, it is now more important than ever for organizations and individuals to understand that advertising consists of both form and function. Ignoring this fact can result in something far worse than assaulting the sensibilities of your audience; it can devastate their lives. Let content providers and audiences beware: the age of badvertising has begun. 
</p>]]></content:encoded>
            <pubDate>Tue, 22 Dec 2009 22:00:00 GMT</pubDate>
        </item>
    </channel>
</rss>

