Surge — A Scalability & Performance Conference, presented by OmniTI.

Surge 2010 Keynote & Speaker List

Discussing Scalability Matters…

…because scalability matters. Surge is more than an event, it's a chance to identify emerging trends and meet the architects behind established technologies. Learn from their mistakes and see how their victories can power your business forward.

Mike Malone Speaker

Mike Malone Infrastructure Engineer, SimpleGeo

Mike Malone is an infrastructure engineer at SimpleGeo where he works on building and integrating scalable systems that power the company's location platform. Since joining SimpleGeo, Mike has been working to ensure operational continuity in the face of rapid growth, partial system failures, and traffic bursts. Recently, he's been working on building an efficient multi-dimensional complex query overlay on top of an eventually consistent distributed hash table. Before joining SimpleGeo, Mike helped build the microblogging web site Pownce, where he learned a lot about the technical and social difficulties of scaling an online community. After Pownce's acquisition by Six Apart in 2008, Mike worked on the TypePad platform team, where he gained a great deal of experience building RESTful web services. In his spare time Mike enjoys tinkering with new technologies. When he's not on the computer, you can probably find him hanging out with his girlfriend, Katie, and their friends at a good bar.

Mike's Talks

Working with Dimensional Data in a Distributed Hash Table

Day 1 - 4:00 pm

Location: Marble

  • Data Storage, Scalability

Oops! Due to technical difficulties during the conference, the audio for this video is of substandard quality. We apologize—and we'll be looking to redeem ourselves next year.

Share the love:

Recently a new class of database technologies has developed offering massively scalable distributed hash table functionality. Relative to more traditional relational database systems, these systems are simple to operate and capable of managing massive data sets. These characteristics come at a cost though: an impoverished query language that, in practice, can handle little more than exact-match lookups at scale.

This talk will explore the real world technical challenges we faced at SimpleGeo while building a web-scale spatial database on top of Apache Cassandra. Cassandra is a distributed database that falls into the broad category of second-generation systems described above. We chose Cassandra after carefully considering desirable database characteristics based on our prior experiences building large scale web applications. Cassandra offers operational simplicity, decentralized operations, no single points of failure, online load balancing and re-balancing, and linear horizontal scalability.

Unfortunately, Cassandra fell far short of providing the sort of sophisticated spatial queries we needed. We developed a short term solution that was good enough for most use cases, but far from optimal. Long term, our challenge was to bridge the gap without compromising any of the desirable qualities that led us to choose Cassandra in the first place.

The result is a robust general purpose mechanism for overlaying sophisticated data structures on top of distributed hash tables. By overlaying a spatial tree, for example, we're able to durably persist massive amounts of spatial data and service complex nearest-neighbor and multidimensional range queries across billions of rows fast enough for an online consumer facing application. We continue to improve and evolve the system, but we're eager to share what we've learned so far.

Speakers will be added as we approach the event dates. Visit back for updates to this page and the Sessions Calendar.