Overall, I had managed to scour out all the corners of ambiguity concerning when, where, who and how, leaving only the strange question of why left in fuzziness. Why was this 24-year-old still living at home, jogging at midday and preying on middle-aged women? Why was he living in this neighborhood where even a Megan’s Law offender is fairly hard to find?
Showing posts with label semantics. Show all posts
Showing posts with label semantics. Show all posts
Friday, May 23, 2008
Assault and Ambiguity
Labels:
computational linguistics,
folksonomies,
semantics
Wednesday, August 15, 2007
Moore and Semantic Skepticism
Strange, I found the Paglia essay interpenetrating all of my thoughts over the past few days, dredging up language swarms from old Derrida and Feyeraband essays, and dipping over into my work on disambiguation and ontology. See, linguistics-wise, I was once an empiricist with an almost palpable antagonism to the value of knowledge resources like ontologies in solving specific problems. I would reach first for a statistical model that was trained on the contexts of word occurrences, expecting that words can only be known by the company that they keep.
Even the notion that the Semantic Web can achieve any level of crispness in assigning metadata to online content was doubtful in that it was inherently impossible for content authors to assign metadata consistently. The position is postmodern relativism, if you will, derived from the same kind of semantic and pragmatic arguments that have been used to deconstruct machine learning: do I translate this as "terrorist" or "freedom fighter"? Well, what is your frame of reference? What is your meta-narrative?
A radical position is the folksonomy view that folks are themselves are the best determiners of how to tag metadata. In this view, they use whatever tags seem appropriate based on their own intuitions about the content. But does this get us around the Bono issue, below? Unlikely. It seems more appropriate to purely abstract and controversial concepts like "terrorist" or "justice".
So I think we need a gradation of semantic forms that range from relatively simple propositions about identity up through propositions about meaning and intent. The latter are purely Wittgensteinian word games, with agreement and disagreement strewn across the symbol space, but the former have lower average rates of disagreement over referential attachment.
This parallels the notion of post-postmodernism in a way, by accepting fluidity and chaotic symbol/signifier interactions but still anticipating a useful and uncontroversial basis for facts. G.E. Moore would raise his hand in salute.
Sunday, July 29, 2007
Semantics and Sonny Bono
"Bono and the tree became one"
That sentence has been an object of scrutiny for me over the past several weeks. It is short enough and the meaning seems fairly easy to digest: Sonny Bono died in a skiing accident. It might have shown up in a blog back when the event transpired, or in casual conversation around the same time.
So what is so fascinating about it? It is the range of semantic tools that are needed to resolve Bono to Sonny Bono and not to U2's Bono or any of the thousands of other Bonos that likely exist. First, we need background knowledge that Sonny Bono died in a skiing accident. Next we need either the specific knowledge that a tree was involved or the inference that skiing accidents sometimes involve trees. Finally, we need a choice preference that rates notable people as more likely to be the object of the discussion than everyday folk.
We could still be wrong, of course. The statement might be about Frank Bono, a guy from down the street who likes to commune with nature. It might be, but for a statement in isolation the notability preference serves a de facto role as a disambiguator.
How, then, can we design technology to correctly assign the correct referent to occurrences like Bono in the text above? We have several choices and the choices overlap to varying degrees. We could, for instance, collect together all of the contexts that contain the term Bono (with or without Sonny), label them as to their referent, and try to infer statistical models that use the term context to partition our choices. This could be as simple as using a feature vector of counts of terms that co-occur with Bono and then looking at the vector distance between a new context vector (formed from the sentence above) with the existing assignments.
We could also try to create a model that recreates our selection preferences and the skiing <-> tree relationship and does some matching combined with some inferencing to try to identify
the correct referent. That is fairly tricky to do over the vast sea of possible names, but is easy enough for a single one, like Bono.
It turns out all of these approaches have been tried, as well as interesting hybridizations of them. For instance, express the notability preference as a probability weighting based on web search mentions, while adding-in the distance between different concepts in a tree-based ontology, trying to exploit human-created semantic knowledge to assist in the process. It turns out that fairly simple statistics do pretty well over large sets of names (just choose the most likely assignment all the time), but don't really capture the kinds of semantic processing that we believe we undertake in our own "folk psychologies" as described above.
Still, I see the limited success of knowledge resources as an opportunity rather than a source of discouragement. We definitely have not exhausted the well.
Subscribe to:
Comments (Atom)
