Overall, I had managed to scour out all the corners of ambiguity concerning when, where, who and how, leaving only the strange question of why left in fuzziness. Why was this 24-year-old still living at home, jogging at midday and preying on middle-aged women? Why was he living in this neighborhood where even a Megan’s Law offender is fairly hard to find?
Friday, May 23, 2008
Assault and Ambiguity
Sunday, January 13, 2008
Codes and Fervor
Here's some examples borrowed from Brendan McKay showing ELS patterns predicting the assassination of Martin Luther King, Jr. in the text of Moby Dick:
More are here.
A slightly less crazy speculation comes from Clifford Pickover (although it's not clear what the original source is) that given a (countably) infinite digit sequence for PI, Shakespeare must be inevitability coded with those numbers given a suitable representation scheme. While that might be, ELSs in PI would be rarer than in human language texts, I think, because the digit probabilities are very uniform for PI after a few thousand digits, unlike most languages that tend to have a more skew distribution of letters. So ELS words would be way down deep in the code, though not quite as far along as The Bard.
Friday, December 7, 2007
Dimensional Folding and Coherence
There is some rather interesting work in cognitive psychology and psycholinguistics on the relationship between writing styles and reader uptake. Specifically, the construction-integration model by Walter Kintsch and others tries to tease out how information learning is modulated by the learner's pre-existing cognitive model. There is a bit of parallelism with "constructivism" in educational circles that postulates that learning is a process of building, tearing down and re-engineering cognitive frameworks over time, requiring each student to be uniquely understood as bringing pre-existing knowledge systems to the classroom.
In construction-integration theory, an odd fact has been observed: if a text has lots of linking words bridging concepts from paragraph to paragraph, those with limited understanding of a field can get up to speed with greater fluidity than if the text is high-level and not written with that expectation in mind. In turn, those who are advanced in the subject matter actually learn faster when the text is more sparse and the learner bridges the gaps with their pre-existing mental model.
We can even measure this notion of textual coherence or cohesion using some pretty mathematics. If we count all the shared terms from one paragraph to the next in a text and then try to eliminate the noisy outliers, we can get a good estimate of the relative cohesion between paragraphs. These outliers arise due to lexical ambiguity or because many terms are less semantically significant than they are syntactically valuable.
A singular value decomposition (SVD) is one way to do the de-noising. In essence, the SVD is an operation that changes a large matrix of counts into a product of three matrixes, one of which contains "singular values" along the matrix diagonal. We can then order those values by magnitude and eliminate the small ones, then re-constitute a version of the original matrix. By doing this, we are in effect asking which of the original counts contribute little or nothing to the original matrix and eliminating those less influential terms.
There are some other useful applications of this same principle (broadly called "Latent Semantic Analysis" or LSA). For instance, we can automatically discover terms that are related to one another even though they may not co-occur in texts. The reduction and reconstitution approach, when applied to the contexts in which the terms occur, will tend to "fold" together contexts that are similar, exposing the contextual similarity of terms. This has applications in information retrieval, automatic essay grading and even machine translation. For the latter, if we take "parallel" texts (texts that are translations of one another by human translators), we can fold them all into the same reduced subspace and get semantically-similar terms usefully joined together.
Terence Yao's presentation is clearly aimed at grad students, advanced undergrads and other mathematics professionals, so his language tends to be fairly non-cohering (not, I note, do I think he incoherent!), and much of the background is left out or is connected via Wikipedia links. The links are a nice addition that is helpful to those of us not active in the field, and a technique that provides a little more textual cohesion without unduly bothering the expert.
Sunday, July 29, 2007
Semantics and Sonny Bono
"Bono and the tree became one"
That sentence has been an object of scrutiny for me over the past several weeks. It is short enough and the meaning seems fairly easy to digest: Sonny Bono died in a skiing accident. It might have shown up in a blog back when the event transpired, or in casual conversation around the same time.
So what is so fascinating about it? It is the range of semantic tools that are needed to resolve Bono to Sonny Bono and not to U2's Bono or any of the thousands of other Bonos that likely exist. First, we need background knowledge that Sonny Bono died in a skiing accident. Next we need either the specific knowledge that a tree was involved or the inference that skiing accidents sometimes involve trees. Finally, we need a choice preference that rates notable people as more likely to be the object of the discussion than everyday folk.
We could still be wrong, of course. The statement might be about Frank Bono, a guy from down the street who likes to commune with nature. It might be, but for a statement in isolation the notability preference serves a de facto role as a disambiguator.
How, then, can we design technology to correctly assign the correct referent to occurrences like Bono in the text above? We have several choices and the choices overlap to varying degrees. We could, for instance, collect together all of the contexts that contain the term Bono (with or without Sonny), label them as to their referent, and try to infer statistical models that use the term context to partition our choices. This could be as simple as using a feature vector of counts of terms that co-occur with Bono and then looking at the vector distance between a new context vector (formed from the sentence above) with the existing assignments.
We could also try to create a model that recreates our selection preferences and the skiing <-> tree relationship and does some matching combined with some inferencing to try to identify
the correct referent. That is fairly tricky to do over the vast sea of possible names, but is easy enough for a single one, like Bono.
It turns out all of these approaches have been tried, as well as interesting hybridizations of them. For instance, express the notability preference as a probability weighting based on web search mentions, while adding-in the distance between different concepts in a tree-based ontology, trying to exploit human-created semantic knowledge to assist in the process. It turns out that fairly simple statistics do pretty well over large sets of names (just choose the most likely assignment all the time), but don't really capture the kinds of semantic processing that we believe we undertake in our own "folk psychologies" as described above.
Still, I see the limited success of knowledge resources as an opportunity rather than a source of discouragement. We definitely have not exhausted the well.
