Stephen Wolfram's: A New Kind of Science | Online
Jump to Page
Look Up in Index

Chapter 10 Notes > Section 12 > Page 1100 > Note (e) Previous note-----Next note
Notes for: Processes of Perception and Analysis | Human Thinking

*[Identifying] similar words

The soundex system for hashing names according to sound was first used on 1880 U.S. census data, and is still today widely used by telephone information services. The system works essentially by dropping vowels and assigning consonants to six possible groups. More sophisticated systems along the same lines can be set up using finite automata.

Natural language query systems usually work by stripping words to their linguistic roots (e.g. "stripping"->"strip") before looking them up. Spell-checking systems typically find suggested corrections by doing a succession of lookups after applying transformations based on common errors.

Even given two specific words it can be difficult to find out whether they should be considered similar. Fairly efficient algorithms are known for cases such as genetic sequences where small numbers of insertions, deletions and substitutions are expected. But if more complicated transformations are allowed--say corresponding to rules in a multiway system--the problem rapidly becomes intractable (see page 765).


Page image


Pages related to this note:


All notes on this page:

* History [of ideas about thinking]
* The future [of machine thinking]
* Sleep
* Pointer encoding [and memory]
* Hashing
* [Identifying] similar words
* All notes for this section
* Downloadable programs for this page
* Downloadable images
* Search Forum for this page
* Post a comment
* NKS | Online FAQs
From Stephen Wolfram: A New Kind of Science [citation] Previous note-----Next note