Friday, May 23, 2008

Assault and Ambiguity

It began with a chance encounter. I was walking through a room with a TV on and a news caption was running during a commercial break. The newscaster intoned with provocative seriousness how a woman had followed a man who had sexually assaulted her weeks earlier to his home and he had been arrested. The location was my town and it rang a bell.

Three weeks back my wife and I went to a doctor’s appointment in Lafayette and, as we returned and entered our neighborhood, we were surprised to see a half-dozen police cars in our otherwise perfectly tame patchwork of planned homes and parks. During a walk two hours later there was still a squad car on one side street. A scan of the police blotter turned up the cause: a woman walking with her toddler had been assaulted and groped by a man who ran away when struck in the groin with a sippy cup.

Weeks passed until I overheard the news of the arrest. Good for her! The next phase of datamining impressed me with the thoroughness of the picture. I was able to use the television station archives combined with Google to find the mug shots, the original sketch of the suspect done by police sketch artists, the suspect’s arrest status in the county courts system, the location of the alleged perpetrator’s house, the suspect’s father’s name and place of business, the suspect’s mother’s name and place of business, a previous citation of the suspect for a moving violation (infraction) in a neighboring city, the county records concerning the amount and type of mortgage held on the suspect’s home, and a satellite view of the home as well.

Amusingly, also, was that the reporter in the news piece actually drove by our house and coincidently filmed our various vehicles. I could likely have read the license plates if I wanted from the footage.

Overall, I had managed to scour out all the corners of ambiguity concerning when, where, who and how, leaving only the strange question of why left in fuzziness. Why was this 24-year-old still living at home, jogging at midday and preying on middle-aged women? Why was he living in this neighborhood where even a Megan’s Law offender is fairly hard to find?

But strangely, it was the suspect’s last name that was the key to developing the search picture because the last name was so unusual. Had he been “Jim Smith” or “Joe Sanchez” or “Mike White” it would have been virtually impossible to make as much headway in extinguishing ambiguity. George Miller is quoted something like “There is only one problem in Artificial Intelligence: words have more than one meaning.” (And I can’t resolve the ambiguity of the source of that quote because George Miller is too ambiguous). This problem is amplified for searching across identities of places and people, or when special identifiers are introduced as placeholders in a single document (this happens quite often in technical literature where an acronym is used locally as a technical shorthand but is ambiguous outside that document or domain). Moving to the level of folksonomies for, say, labeling pictures on the web, we see the problem exasperated by the natural telegraphic shorthand that any labeling scheme suggests to the user purely by dint of the size of the entry fields.

Clever approaches to trying to apply context to help address these limitations start with statistical co-occurrence-based disambiguation and linkage analysis, and then run all the way through to using complex ontologies to try to infer the best relabeling of the ambiguous entity or concept as a canonical identifier. None of these methods can hope to achieve any level of perfection but a basket of them can enhance the process of information discovery and disambiguation.

No comments: