Possibly more useful than this page:

Selective, Embedded Just-in-Time Specialization (SEJITS).  Application writers require 3-10x fewer lines of code and can develop 3-5x faster when using productivity-layer languages (PLLs) like Ruby or Python rather than efficiency-layer languages (ELLs) like C or C++, and they don’t need to internalize new programming models like GPU’s or hybrid architectures to do so.  But PLL implementations can be 1-3 orders of magnitude slower than carefully-coded ELL implementations. SEJITS combines the best of both approaches by maintaining a catalog of hand-crafted specializers, each of which can generate ELL code to perform a particular computation pattern on a particular hardware programming model; for example, FFT on a CUDA GPU, or stencil computations on MPI.  Modern scripting languages such as Python and Ruby include the necessary facilities to allow late binding of a particular specializer to execute a function in the PLL application. In particular, introspection allows a function to inspect its own abstract syntax tree (AST) when first called, and determine whether the AST matches a specializer in the catalog. If yes, the metaprogramming support of the PLL is used to generate, compile and link ELL code to the running interpreter, and call the ELL implementation of the function; if no, the unmodified source program continues executing as usual in the PLL, albeit with lower performance. Our approach does not require modifying the source PLL program, which would wreck portability.

While just-in-time (JIT) code generation and specialization is well established due to the popularity of Java and Microsoft .NET, our approach \emph{selectively} specializes only those functions for which we have a matching ELL code generator, rather than having to be able to JIT-specialize the entire PLL.  As well, the introspection and metaprogramming facilities of modern scripting languages allow the specialization machinery to be \emph{embedded} in the PLL directly rather than having to modify the PLL interpreter. Hence we call our approach SEJITS—selective, embedded, just-in-time specialization.
SEJITS eases the job of efficiency programmers who create new modules specialized for particular computations: they can “drop them into” the SEJITS framework, which will make runtime decisions to use it when appropriate.  By separating the concerns of ELL and PLL programmers, SEJITS allowing independent progress in both areas.  We expect that over time, as more code generators are added to the SEJITS catalog, the main role of the PLL will be runtime specialization, dispatching, and serving as the (usually non-performance-critical) “glue” that joins expensive specialized computations together.  Ousterhout predicted in 1998 that scripting languages would evolve to serve just this purpose, but the scripting languages then popular (Tcl and Perl) lacked the introspection and metaprogramming facilities of modern scripting languages that make it possible to embed the specialization machinery in the PLL without modifying the PLL interpreter. While those languages could have been extended with specialization, making life easier for PLL programmers, it would have been much more cumbersome for the ELL programmers to incrementally add their new code modules to such a system.  We believe a key benefit of SEJITS is that it frees \emph{both} the PLL and ELL programmers to concentrate on their respecrequires the programmer to be not only a domain expert, but also intimately familiar with the programming model exposed by the target hardware (e.g. nontraditional architectures such as gpGPU’s or hybrid architectures like the Cell processor).  Consider a catalog of modules  each of which can generate ELL code to perform a particular computation pattern on a particular hardware programming model; for example, generating code for an FFT on a CUDA GPU, or generating code for stencil computations on MPI.  Modern scripting languages such as Python and Ruby include the necessary facilities to allow \emph{late binding} of a particular module to execute a function in the PLL.

Unlike generic JIT, SEJITS selectively specializes only those functions for which we have a matching ELL code generator, making the decision at the latest possible moment and leaving the rest to the PLL. The introspection and metaprogramming facilities of modern scripting languages to embed the specialization machinery directly in the PLL rather than having to modify the PLL interpreter, which makes life easier for the author of a specializer (i.e. for ELL programmers). In this way, SEJITS frees both the PLL and ELL programmers to concentrate on their respective specialties.

  • Alumni: Bryan Catanzaro (with Kurt Keutzer)
  • Graduate Students: Scott Beamer, Erin Carson (with Jim Demmel), Derrick Coetzee, Ekaterina Gonina, Shoaib Kamil (with Kathy Yelick), Jeffrey Morlan, Richard Xia
  • Undergraduates: Peter Birsinger, David Howard, Kevin Liang, Jessica Miller, Aakash Prasad

Recent papers: (PDF files and abstracts can be found here)

  • Shoaib Kamil, Derrick Coetzee, Armando Fox. Bringing Parallel Performance to Python with Domain-Specific Selective Embedded Just-in-Time Specialization. In Proceedings of the 10th Python in Science Conference (SCIPY 2011), July 2011.
  • Henry Cook, Ekaterina Gonina, Gerald Friedland (ICSI), Armando Fox, David Patterson, Shoaib Kamil.  CUDA-Level Performance With Python-Level Productivity for Gaussian Mixture Model Applications.  Proc. HotPar ’11.
  • Bryan Catanzaro, Shoaib Kamil, Yunsup Lee, Krste Asanovic, James Demmel, Kurt Keutzer, John Shalf (LBL), Kathy Yelick, Armando Fox.  SEJITS: Getting Productivity And Performance With Selective, Just-In-Time Specialization. Proc. 1st Workshop on Programming Models for Emerging Architectures (PMEA’09), Raleigh, NC, Sept. 2009.
  • Informal presentation on PySKI at Py4Science Python Users’ Group, UC Berkeley