Research‎ > ‎


Reliable Adaptive Distributed Systems

Our goal was to use machine learning to automatically respond to changing demand in datacenters and detect and recover from both performance and crash failures.

Co-PI’s: David Patterson, Michael Jordan, Mike Franklin.

Papers and project history

Automatic Workload Evaluation (AWE) helps predict simultaneously several aspects of system performance when stimulated by a previously unseen workload through a novel application of Kernel Canonical Correlation Analysis. Our approach achieves predictions within 20% of measured values more than 80% of the time on a real customer workload, even in cases where the database’s built-in query optimizer gives poor estimates. We also explored extending the technique to Hadoop jobs and autotuned concurrent scientific codes. Fingerprinting the Datacenter helps us characterize past datacenter performance crises using a compact, searchable representation of datacenter state, building on and improving our earlier work with such representations of per-machine state both in terms of scale and in the retrieval accuracy achieved by the new representation.

  • AlumniKristal CurtisYanpei Chen, Kaushik Datta (with Par Lab), Peter Bodik (now at Microsoft Research), Rean Griffith (postdoc, now at VMware), Archana Ganapathi (now at Splunk), Charles Sutton (RAD Lab postdoc, now professor at Univ. of Edinburgh), Michael Armbrust (now at DataBricks)
  • Collaborators: Moises Goldszmidt (Microsoft Research Silicon Valley, a RAD Lab Affiliate); Harumi Kuno, Umeshwar Dayal, Janet Wiener (HP Labs, a RAD Lab Affiliate)