Monday, July 16, 2007

Brewster Kahle’s talk on “Universal access to all knowledge”

I love Brewster’s talks and his enthusiasm. Makes me think there are good things going on in the world despite others’ efforts to thwart them.

Question he asks is: In our generation/lifetime, can we provide universal (online) access to all knowledge ever?

Can we put all the world’s text online?

LoC ~26M volumes, costs about $30/book (10c/pg) to do the whole chain
from scanning to putting up on spinning storage w.metadata
=> $800M scans entrie LoC and puts on spinning storage – would require
<100TB in ASCII format.

Audio? biggest cooperation from (eg) Grateful Dead and their tribute
bands! Open audio costs $10/disc – $10/hr (for vinyl or cassettes) =>
2-3M discs = $20-30M to digitize.

Video? archival films and some old films. Eg, govt propaganda/ads, old
classroom documentaries, “social training” videos (“Duck and Cover”),
stuff in the TV archive… $15/video-hour to digitize.

Software? about 50k commercial SW titles ever. Threatened by DMCA, not

Web? lifetime of a page (before change/del) is ~100 days. Wayback
snapshots every ~2mo.

Open Content Alliance building open collections among univesrities, with
support from MS, HP, Adobe, others. Internet Archive datacenters being
setup in Europe to build up their own collections and then swap among
themselves. (Lots of classical recordings that are legal to download in
Europe are blocked to US visitors “due to copyright laws”.)

Opportunity they are looking for help with: front ends for searching,
browsing, etc the Archive. “Can 1 person build a whole search engine
given the underlying infrsatructure of the Internet Archive?” We should
take him up on this challenge in the RoR class and RAD Lab apps!