Archive for category SML

Opportunities for // and multicore in SML

Parallelizing/decomposing big models and trading off accuracy, precision, etc. (Similar to trading consistency for availability/scalability in storage.) EG: EM training with Markov models, you have a single big data structure (the translation table) that everyone uses and then has to be globally updated (in the M-step). A NIPS paper (described as “hacky” by Alex Smola) partitions the model and uses peer-peer anti-entropy to periodically try to sync models.

In general, one avenue of opportunity is to improve performance or power of most sophistiacted models.  But another avenue is: what can we do with yesterday’s/less sophisticated models, which may be perfectly adequate for some app domains esp. if they could run in real time or be portable, and/or they could be used in a layered approach with more sophisticated models within a particular domain.

Universal data interchange format for SML data structures & models

Given that interesting apps will use multiple languages/frameworks (if not at the productivity layer, then at the efficiency layer), we should be working on portable in-memory and on-disk data formats for various types of ML models (and fast swizzling/unswizzling). Use Google Code Protocol Buffers and define some standard schemata?