Friday, December 7, 2007

Dimensional Folding and Coherence

I was reading Terence Yao's blog on mathematics earlier today, enjoying some measure of understanding since his recent posts and lectures focus on combinatorics. In addition to the subject matter, though, I was interested in the way he is using his blog to communicate complex ideas. The method is rather unique in that is less formal than a book presentation, less holographic than a journal article for professional publication, more technical than an article in a popular science magazine, and yet not as sketchy as just throwing up a series of PowerPoint slides. And of course there is interaction with readers, as well.

There is some rather interesting work in cognitive psychology and psycholinguistics on the relationship between writing styles and reader uptake. Specifically, the construction-integration model by Walter Kintsch and others tries to tease out how information learning is modulated by the learner's pre-existing cognitive model. There is a bit of parallelism with "constructivism" in educational circles that postulates that learning is a process of building, tearing down and re-engineering cognitive frameworks over time, requiring each student to be uniquely understood as bringing pre-existing knowledge systems to the classroom.

In construction-integration theory, an odd fact has been observed: if a text has lots of linking words bridging concepts from paragraph to paragraph, those with limited understanding of a field can get up to speed with greater fluidity than if the text is high-level and not written with that expectation in mind. In turn, those who are advanced in the subject matter actually learn faster when the text is more sparse and the learner bridges the gaps with their pre-existing mental model.

We can even measure this notion of textual coherence or cohesion using some pretty mathematics. If we count all the shared terms from one paragraph to the next in a text and then try to eliminate the noisy outliers, we can get a good estimate of the relative cohesion between paragraphs. These outliers arise due to lexical ambiguity or because many terms are less semantically significant than they are syntactically valuable.

A singular value decomposition (SVD) is one way to do the de-noising. In essence, the SVD is an operation that changes a large matrix of counts into a product of three matrixes, one of which contains "singular values" along the matrix diagonal. We can then order those values by magnitude and eliminate the small ones, then re-constitute a version of the original matrix. By doing this, we are in effect asking which of the original counts contribute little or nothing to the original matrix and eliminating those less influential terms.

There are some other useful applications of this same principle (broadly called "Latent Semantic Analysis" or LSA). For instance, we can automatically discover terms that are related to one another even though they may not co-occur in texts. The reduction and reconstitution approach, when applied to the contexts in which the terms occur, will tend to "fold" together contexts that are similar, exposing the contextual similarity of terms. This has applications in information retrieval, automatic essay grading and even machine translation. For the latter, if we take "parallel" texts (texts that are translations of one another by human translators), we can fold them all into the same reduced subspace and get semantically-similar terms usefully joined together.

Terence Yao's presentation is clearly aimed at grad students, advanced undergrads and other mathematics professionals, so his language tends to be fairly non-cohering (not, I note, do I think he incoherent!), and much of the background is left out or is connected via Wikipedia links. The links are a nice addition that is helpful to those of us not active in the field, and a technique that provides a little more textual cohesion without unduly bothering the expert.

No comments: