 |

Margarita Zeitlin
Bio [2008]
Margarita is a current linguistics and German student at New York
University, a longtime Beatles fanatic, and the owner of one 6-pound
copy of A New Kind of Science, which inspired her to attend
the NKS Summer School 2008. Fascinated by the realization that simple
rules can create incredibly complex behavior, she set out to
join the world of Wolfram in an attempt to apply NKS to the field of
linguistics. She enjoys spending her time in the
Documentation Center, sorting
through her errors, and, once in a while,
writing a successful piece of code.
Project Title
Languages, Letters, and "Loids"
Project
Linguists estimate that there are 5,000-6,000 languages spoken in the
world today, most of which belong to one of ten general language
families. The Indo-European language family is home to English,
German, Spanish, French, and Italian, and these languages act as the
focus of this project. The goal is to visually represent the
frequencies of letters in all positions of 2- to 10-letter words in
these five languages; by breaking up corpora of texts into characters,
it is possible to isolate those characters that appear most often in a
language. Although the letters of a language do not enumerate all its
sounds, this orthography permits a visual evaluation and comparison of
several languages. Based on this analysis, the project was extended to
create a random word generator for the five tongues based on the
probabilities given by the corpora. The final goal was to see whether any
of the randomly generated English words could pass for trademarks or
perhaps some new slang.
This project began as an attempt to look at five languages and
discover something about them only by studying their words. This goal
didn't pan out, but instead reinforced what was already known about
these languages. It is nevertheless satisfying to come to the same
conclusion after three weeks that linguists have reached after decades
of research. Besides, this method of research is unusual for a
linguist, so it is remarkable to have reinforced the knowledge of the
field from such an unorthodox perspective.
The random word generation was definitely the most amusing aspect of
this project. The filter is not the finest; looking at the blocks of
generated text, it is evident that most of the text does not resemble
any language. Some of the words are real words in their language,
some can pass for real words, and others are just gibberish. The
generator works relatively well considering that roughly 10% of 1,000
words were actual English words, versus roughly .4% when the words
were generated without the word frequency data. The German text
came out the worst, while the Italian text came out the best,
due to its unusually high frequency of vowels in all positions of a
word. However, the project doesn't end here. There is work to be done
on the filter, and it will be possible to create a better generator by
limiting where certain letters can appear (i.e. in English, "x" cannot
appear in the first position), creating groups of letters that often
occur together in a language (i.e. German's "sch"), and extending the
letters to include all phonemes, or sounds, of a language.
Favorite Radius 3/2 Rule Rule chosen: 1442
This cellular automaton is definitely one of my favorites. It reminds me
of a Hokusai painting, a beautiful mountain covered in snow. There is
some similarity to ECA 30 in its central complexity, but it is a much
more eccentric CA, having jagged edges and various patterns of tracks
throughout it.
Out of the thousands of possibilities, CA 1442 certainly has some very
interesting characteristics. I would be curious to find out what kind of
relation it has to ECA 30, as well as whether it is found in nature. Besides,
1442 was probably a good year.
|
 |

|