SCADS: Scalable, Consistency-Adjustable Document Store.
Collaborative web applications such as Facebook, Flickr and Yelp present new challenges for storing and querying large amounts of data. As users and developers are focused more on performance than single copy consistency or the ability to perform ad-hoc queries, there exists an opportunity for a highly-scalable system tailored specifically for relaxed consistency, precomputed queries, and the ability to automatically scale with the number of users, while providing a richer structured data model than key-value stores. SCADS is an elastic storage layer built to satisfy high-percentile latency requirements and PIQL (Performance-Insightful Query Language) is a SQL-like language and set of query planning algorithms that allows bounding the work done by any given query so as to be able to provide a solid guarantee of response time in such applications.
- PhD Students: Michael Armbrust, Beth Trushkowsky, Kristal Curtis
- MS alumni: Jesse Trutna
- Undergraduate alumni: Haruki Oh, Stephen Tu
Recent papers: (PDF files and abstracts can be found here)
- Michael Armbrust, Kristal Curtis, Tim Kraska, Armando Fox, Michael J. Franklin, David A. Patterson Success-Tolerant Query Processing in the Cloud. Proc. VLDB 2012, Istanbul, Turkey. Describes the methodology, query algorithms and language extensions to SQL that allow placing hard upper bounds on the amount of work done by any one query, allowing an expressive relational store capable of satisfying high-percentile latency constraints to be built on top of an elastic distributed storage system such as the one described in our FAST 2011 paper (next bullet below).
- Beth Trushkowsky, Peter Bodík, Armando Fox, Michael J. Franklin, Michael I. Jordan, David A. Patterson. Scaling a Distributed Storage System Under Stringent Performance Requirements. Proc. FAST 2011, San Jose, CA. Describes the low-level storage layer used by SCADS, and how it is “Directable”—using public cloud computing, we can add and remove machines and model the effects of data copying so that the low-level storage performance remains stable and predictable, in turn providing predictability for the PIQL query planner.
- Michael Armbrust, Nick Lanham, Stephen Tu, Armando Fox, Michael J. Franklin, David A. Patterson. The Case for PIQL, a Performance-Insightful Query Language. Proc. First ACM Symp. on Cloud Computing (SOCC 2010), Indianapolis, IN, June 2010.Motivates the need for a performance-safe query language that can express queries with comparable expressiveness as full relational queries, yet bound at compile time the amount of work required to execute the query. The paper describes a language that fulfills these requirements and how a query planner/optimizer for it could be implemented in terms of a lower-level storage system with predictable performance.
- Michael Armbrust, Armando Fox, David A. Patterson, Nick Lanham, Beth Trushkowsky, Jesse Trutna, Haruki Oh. SCADS: Scale-Independent Storage for Social Computing Applications. Proc. CIDR 2009.
More Detail:
While relational algebra created a revolution in data management and a new industry around relational database management systems (RDBMS’s), there is little disagreement that today’s Web applications have different needs. The ACID (atomicity, consistency, isolation, durability) guarantees provided by RDBMS’s are stronger than needed for most Web applications, and fully-general relational queries are more expressive than needed by many Web applications; yet the engineering required to combine those properties in conventional RDBMS’s means that they scale less and cost more than special-purpose storage systems that sacrifice one or more of the properties. For example, Amazon’s Dynamo and Google’s BigTable sacrifice one or more of these properties in order to achieve far greater scale and higher throughput than any existing RDBMS, and even our ROC storage system prototypes relaxed ACID to facilitate crash-only design and SML-based automated monitoring.
While many “one-off” specialized storage systems have been built, each requires rewriting the application to use the storage system, which explains the longevity of SQL as an implementation-independent abstraction for describing operations on stored data. With SCADS (Scalable Consistency-Adjustable Document Store), we are working with Facebook, the Internet Movie Database, Amazon, and eBay to capture use cases for their large-scale distributed databases, with the goal of developing both a formalism comparable to SQL for reasoning about such applications’ storage needs and a prototype of a “SCADS engine” that can scale to 1,000 machines on Amazon EC2. We see an opportunity for a new abstraction with the advent of Ruby on Rails, whose Active Record middleware layer provides an object-graph model that fits the needs of many Web applications. Given the uptake of Ruby on Rails, a new abstraction that is near-compatible with Active Record would be much less disruptive than a completely new programming model. We believe a formalism is needed in which to ground this abstraction, both because it would facilitate the kinds of optimizations that today’s query optimizers perform on SQL queries (by applying relational algebra transformations) and because it could provide an implementation-independent specification for building future consistency-adjustable storage systems.