Monday, May 13, 2013

Keyword Search, Plus a Little Magic

From Lingua Franca 
Google relies on at least four facts, all of them crucial, but especially the fourth one.
  1. Computer memory chips have become so cheap and so tiny that in an office-sized space you can pack enough random-access-memory units to store an utterly gigantic automatically maintained concordance to the whole Web, augmented with copies of huge portions of what is on those sites.
  2. Networks and processors have become so fast that your search command can be delivered to a server far away and checked against the gigantic index in just hundredths of a second.
  3. The number of sites containing all of the words on a list (rather than just some of them) goes down rapidly with the length of the list, and much more rapidly when the words have low probabilities of occurrence.
  4. Humans looking for a certain piece of information can on the whole be trusted to be smart enough to supply a list of words with the crucial property of having low probability in most texts but being guaranteed to occur in texts containing the desired information.

No comments: