# Bioinformatics Part Two and Wicked Fleas

First off, the second part of the title: in CIS 730: Principles of Artificial Intelligence, we are starting to go through First-Order Logic (FOL) and FOL theorem proving by resolution. One year (2002, IIRC) I put up the proverb:
Only the wicked flee when no one pursueth.

and asked the students to translate this into FOL. e.g.,
\forall x, y .  Flees(x, y) & \neg \exists z . Pursues(z, x) \rightarrow isWicked(x)

Every now and then I get a student who hears the above proverb and defines isWickedFlea.

When the chuckles subsided, a couple of things occurred to me.

One was that archaic forms such as "pursueth" were as alien to some international students (e.g., those from India and China) and about as unrecognizable as cursive handwriting.

A second revelation, the main topic of this post, was that the entropy of typos and word sense and spelling ambiguities is variable and not easily constant-bounded. The predictive entropy would be one interesting effect to quantitatively measure where possible. Someday, this might aid in recognition of double entendres and intended puns.

But how might one use the information in general? Specifically, how would one hook a quantitative analyzer of text or speech-based discourse to a training corpora, and discover the highest-impact typos?

Banazir
