Zeitgeist and Ethos: computational linguistics

Showing posts with label computational linguistics. Show all posts

Friday, May 23, 2008

Assault and Ambiguity

It began with a chance encounter. I was walking through a room with a TV on and a news caption was running during a commercial break. The newscaster intoned with provocative seriousness how a woman had followed a man who had sexually assaulted her weeks earlier to his home and he had been arrested. The location was my town and it rang a bell.

Three weeks back my wife and I went to a doctor’s appointment in Lafayette and, as we returned and entered our neighborhood, we were surprised to see a half-dozen police cars in our otherwise perfectly tame patchwork of planned homes and parks. During a walk two hours later there was still a squad car on one side street. A scan of the police blotter turned up the cause: a woman walking with her toddler had been assaulted and groped by a man who ran away when struck in the groin with a sippy cup.

Weeks passed until I overheard the news of the arrest. Good for her! The next phase of datamining impressed me with the thoroughness of the picture. I was able to use the television station archives combined with Google to find the mug shots, the original sketch of the suspect done by police sketch artists, the suspect’s arrest status in the county courts system, the location of the alleged perpetrator’s house, the suspect’s father’s name and place of business, the suspect’s mother’s name and place of business, a previous citation of the suspect for a moving violation (infraction) in a neighboring city, the county records concerning the amount and type of mortgage held on the suspect’s home, and a satellite view of the home as well.

Amusingly, also, was that the reporter in the news piece actually drove by our house and coincidently filmed our various vehicles. I could likely have read the license plates if I wanted from the footage.

Overall, I had managed to scour out all the corners of ambiguity concerning when, where, who and how, leaving only the strange question of why left in fuzziness. Why was this 24-year-old still living at home, jogging at midday and preying on middle-aged women? Why was he living in this neighborhood where even a Megan’s Law offender is fairly hard to find?

But strangely, it was the suspect’s last name that was the key to developing the search picture because the last name was so unusual. Had he been “Jim Smith” or “Joe Sanchez” or “Mike White” it would have been virtually impossible to make as much headway in extinguishing ambiguity. George Miller is quoted something like “There is only one problem in Artificial Intelligence: words have more than one meaning.” (And I can’t resolve the ambiguity of the source of that quote because George Miller is too ambiguous). This problem is amplified for searching across identities of places and people, or when special identifiers are introduced as placeholders in a single document (this happens quite often in technical literature where an acronym is used locally as a technical shorthand but is ambiguous outside that document or domain). Moving to the level of folksonomies for, say, labeling pictures on the web, we see the problem exasperated by the natural telegraphic shorthand that any labeling scheme suggests to the user purely by dint of the size of the entry fields.

Clever approaches to trying to apply context to help address these limitations start with statistical co-occurrence-based disambiguation and linkage analysis, and then run all the way through to using complex ontologies to try to infer the best relabeling of the ambiguous entity or concept as a canonical identifier. None of these methods can hope to achieve any level of perfection but a basket of them can enhance the process of information discovery and disambiguation.

Sunday, January 13, 2008

Codes and Fervor

I stumbled onto the continuing saga of the so-called “bible code” the other day and was amused to see that the issue continues to percolate along fourteen years after the original effort appeared in Statistical Science as a “puzzle.” My own involvement was briefly in the early years when I developed some code for performing searches in documents that matched the original effort. I handed the code off to Dave Thomas of New Mexicans for Science and Reason, though I believe he was already using the results from Brendon McKay from Australian National University for his work.

A brief description may help.

Some Israeli researchers following a Kabbalah-like speculation about hidden codes in the Hebrew OT looked for words and word relationships hidden in the characters. The way they thought the words were hidden was as what they called equidistant letter sequences (ELSs). An ELS is where each letter of a word is separated in the text by a certain number of other characters. When they found an ELS, they then looked in the immediate area of the text around the words for other ELS sequences that said something interesting about the original word. They paired these together as questions and answers.

Needless to say, it is pretty easy to take any text, find interesting short words as ELSs and then find interesting words as ELSs around them. With my original code, I used the system to decide what to have for lunch. I would find the word “lunch” as an ELS, then look around and get words like “taco,” “steak” and “fish.” As one can imagine, shorter words tend to have greater representation!

Here's some examples borrowed from Brendan McKay showing ELS patterns predicting the assassination of Martin Luther King, Jr. in the text of Moby Dick:

More are here.

The whole episode demonstrates a surprising vitality to craziness impregnated by religious fervor, considering the start in 1994, the analyses and counter-analyses from various fronts, the publication of a best-selling book, and the availability of commercial systems that help you now do your own bible code analyses.

A slightly less crazy speculation comes from Clifford Pickover (although it's not clear what the original source is) that given a (countably) infinite digit sequence for PI, Shakespeare must be inevitability coded with those numbers given a suitable representation scheme. While that might be, ELSs in PI would be rarer than in human language texts, I think, because the digit probabilities are very uniform for PI after a few thousand digits, unlike most languages that tend to have a more skew distribution of letters. So ELS words would be way down deep in the code, though not quite as far along as The Bard.

Friday, December 7, 2007

Dimensional Folding and Coherence

I was reading Terence Yao's blog on mathematics earlier today, enjoying some measure of understanding since his recent posts and lectures focus on combinatorics. In addition to the subject matter, though, I was interested in the way he is using his blog to communicate complex ideas. The method is rather unique in that is less formal than a book presentation, less holographic than a journal article for professional publication, more technical than an article in a popular science magazine, and yet not as sketchy as just throwing up a series of PowerPoint slides. And of course there is interaction with readers, as well.

There is some rather interesting work in cognitive psychology and psycholinguistics on the relationship between writing styles and reader uptake. Specifically, the construction-integration model by Walter Kintsch and others tries to tease out how information learning is modulated by the learner's pre-existing cognitive model. There is a bit of parallelism with "constructivism" in educational circles that postulates that learning is a process of building, tearing down and re-engineering cognitive frameworks over time, requiring each student to be uniquely understood as bringing pre-existing knowledge systems to the classroom.

In construction-integration theory, an odd fact has been observed: if a text has lots of linking words bridging concepts from paragraph to paragraph, those with limited understanding of a field can get up to speed with greater fluidity than if the text is high-level and not written with that expectation in mind. In turn, those who are advanced in the subject matter actually learn faster when the text is more sparse and the learner bridges the gaps with their pre-existing mental model.

We can even measure this notion of textual coherence or cohesion using some pretty mathematics. If we count all the shared terms from one paragraph to the next in a text and then try to eliminate the noisy outliers, we can get a good estimate of the relative cohesion between paragraphs. These outliers arise due to lexical ambiguity or because many terms are less semantically significant than they are syntactically valuable.

A singular value decomposition (SVD) is one way to do the de-noising. In essence, the SVD is an operation that changes a large matrix of counts into a product of three matrixes, one of which contains "singular values" along the matrix diagonal. We can then order those values by magnitude and eliminate the small ones, then re-constitute a version of the original matrix. By doing this, we are in effect asking which of the original counts contribute little or nothing to the original matrix and eliminating those less influential terms.

There are some other useful applications of this same principle (broadly called "Latent Semantic Analysis" or LSA). For instance, we can automatically discover terms that are related to one another even though they may not co-occur in texts. The reduction and reconstitution approach, when applied to the contexts in which the terms occur, will tend to "fold" together contexts that are similar, exposing the contextual similarity of terms. This has applications in information retrieval, automatic essay grading and even machine translation. For the latter, if we take "parallel" texts (texts that are translations of one another by human translators), we can fold them all into the same reduced subspace and get semantically-similar terms usefully joined together.

Terence Yao's presentation is clearly aimed at grad students, advanced undergrads and other mathematics professionals, so his language tends to be fairly non-cohering (not, I note, do I think he incoherent!), and much of the background is left out or is connected via Wikipedia links. The links are a nice addition that is helpful to those of us not active in the field, and a technique that provides a little more textual cohesion without unduly bothering the expert.

Sunday, July 29, 2007

Semantics and Sonny Bono

"Bono and the tree became one"

That sentence has been an object of scrutiny for me over the past several weeks. It is short enough and the meaning seems fairly easy to digest: Sonny Bono died in a skiing accident. It might have shown up in a blog back when the event transpired, or in casual conversation around the same time.

So what is so fascinating about it? It is the range of semantic tools that are needed to resolve Bono to Sonny Bono and not to U2's Bono or any of the thousands of other Bonos that likely exist. First, we need background knowledge that Sonny Bono died in a skiing accident. Next we need either the specific knowledge that a tree was involved or the inference that skiing accidents sometimes involve trees. Finally, we need a choice preference that rates notable people as more likely to be the object of the discussion than everyday folk.

We could still be wrong, of course. The statement might be about Frank Bono, a guy from down the street who likes to commune with nature. It might be, but for a statement in isolation the notability preference serves a de facto role as a disambiguator.

How, then, can we design technology to correctly assign the correct referent to occurrences like Bono in the text above? We have several choices and the choices overlap to varying degrees. We could, for instance, collect together all of the contexts that contain the term Bono (with or without Sonny), label them as to their referent, and try to infer statistical models that use the term context to partition our choices. This could be as simple as using a feature vector of counts of terms that co-occur with Bono and then looking at the vector distance between a new context vector (formed from the sentence above) with the existing assignments.

We could also try to create a model that recreates our selection preferences and the skiing <-> tree relationship and does some matching combined with some inferencing to try to identify
the correct referent. That is fairly tricky to do over the vast sea of possible names, but is easy enough for a single one, like Bono.

It turns out all of these approaches have been tried, as well as interesting hybridizations of them. For instance, express the notability preference as a probability weighting based on web search mentions, while adding-in the distance between different concepts in a tree-based ontology, trying to exploit human-created semantic knowledge to assist in the process. It turns out that fairly simple statistics do pretty well over large sets of names (just choose the most likely assignment all the time), but don't really capture the kinds of semantic processing that we believe we undertake in our own "folk psychologies" as described above.

Still, I see the limited success of knowledge resources as an opportunity rather than a source of discouragement. We definitely have not exhausted the well.

Zeitgeist and Ethos

Blog Archive

Interesting Stuff...

About Me