Saturday, April 28, 2007

Debate and Outcomes

As I wander through Jon Meacham's American Gospel: God, the Founding Fathers, and the Making of a Nation, I found myself wondering (sometimes out loud, my wife reports) what the potential upshot is of the current public dialog involving atheist thinkers like Dawkins, Sam Harris, and (soon enough) Christopher Hitchens. Dawkins has as one of his objectives to actually help people leave religion by showing them that reason and non-religion are better--that non-belief is morally superior to belief. Sam Harris has a kind of spiritual commitment to community and the world as a replacement for supernatural beliefs. Hitchens? I'll have to wait but am certain that it will be a rollicking and lyrical ride.

Now, Dan Dennett began his foray into the lion's den with his discussion with exceptional high schoolers and coining the term "Bright" as a replacement for the vast sea of terms that we use like agnostic, atheist, humanist, rationalist, freethinker and so forth. I consider the term unfortunate because it does cast aspersions on the religious by antonymy: they are "dulls". Dennett has tried to talk around that comparison, but the semantics are now stuck and I doubt the term will gain positive traction as far as his original goal.

Still, I do think there is merit to Dennett's core goal of outreach to young people to build awareness that there is a community of freethinkers. Far too often I suspect that young people turn their vague feelings that there is something just not quite right about the religious folks around them into equally vague religious professions simply to get along with others. Knowing that there is a public dialog and serious discussion gives them the option of thinking freely, of overcoming the stigma associated with public non-belief. That is probably the lasting effect of the public discussion, but it does need to be both sustained and managed at the local level.

It is the local level--grassroots activism--that needs effort to help overcome the tendencies for vague consumerism to be overlain with vague religiosity. Maybe the calls for studying religion in schools would be enough to demystify some of the core issues that drive this trend by showing the historical and modern facts. An even-handed approach would certainly call into question specific supernatural and moral claims by juxtaposing all of the competing ideas in the world. And then kids would start asking deeper questions.

Monday, April 23, 2007

Alpha and Kings

Brilliant quote of the week made by friend and business adviser, at Pier 39 in San Francisco, while watching the young male sea lions practice king-of-the-float while the alpha male looked skyward in disdain:

me: And isn't that what adults do is compete?

friend: Well, I don't compete anymore. I get others to compete for me.

Wednesday, April 18, 2007

Ambiguity and Ignorance

I keep thinking that we may be reaching a point where we declare the end of ambiguity.

When I see a face in a movie that looks familiar but I can't quite place, I go to imdb.com and find the actor in question, then trace back his or her career and typically read their bio as well. This happened recently when watching The Departed with my wife. The slightly implausible, too good-looking state counselor with an MD and a PhD turned out to have been...the archer in the short-lived Roar series. So I keep joking about how she is going to loose a quiver of arrows on her enemies throughout the remainder of the film. I got laughs initially and then had to shut up because I just couldn't let go.

And getting lost in Wikipedia still happens, cross-linking through historical treatises on the lives of scientists or the historical build-up to the Napoleonic Wars. Man, music theory on Wikipedia is amazingly dense; the maintainers of those pages are to be commended. Then there is LinkedIn for monitoring the status of friends and colleagues in the technology business. On the less savory side, I recently discovered a neighbor had a lien put on his house for failure to pay child support in 1993 while looking around the public records section of Santa Clara County. It fits. I never did like him. It seems I only really use Google for coding examples and occasional driving directions any more.

"The end of ambiguity" has a nice ring to it as we move forward into the 21st Century. Looking back, we were amazed at how little we knew or could confirm prior to the internet. We lived behind a veil of ignorance, safe in our cocoons of uncertainty. Conflicts emerged over factual matters that could easily be resolved had the right resources been accessible in a timely manner. Lessons learned were far too often not transmitted to the next generation of business, social and technology leadership, resulting in a massive waste of societal effort and brain power. It was not surprising that the impact of information technology on efficiency was either hard to measure or showed negative returns because technology had yet to provide the kinds of information access scales that lead to reduced ambiguity. Computers were just big calculators and filing cabinets before the internet.

Vannevar Bush's notion of Memex comes to mind. Even in 1945, Vannevar felt researchers were faced with an explosion of information and needed a "memory extender" that would use circuits and microfilm to connect together research papers and ongoing experiments in a given field. Vannevar wanted information discovery and linking, two things that the internet helps with. My own effort builds on the notion of enhanced discovery to support better personalization of information access. By improving personalization, we increase the rate of discovery, decrease ambiguity and, well, better target advertising to individual needs and interests.

Integration is key, though. Why can't I have gotten the bio and filmography of Vera Famiga from the directory on my DVR? Why can't I read about the history and conflicts of Northern California water policy as I browse through my local paper's discussion of watershed levels? Why can't I request a Wikipedia backgrounder from my car as I pause Sirius satellite radio during an NPR news article on the habitats of the Seychelles Islands? And why can't I rediscover those items later from a central collection with a bird's eye view of my own history?

The end of all ambiguity is likely impossible, but we are still doing an impressive job of lifting the veil of ignorance.

Monday, April 16, 2007

Spring and Construction

Ahh, spring break is over. The little one is back to school and the new DARPA proposal calls are out. And I am at a crossroads as to whether I should bother, but am itching to attack one of the topics because I am certain I could win. One of my new business advisers understands this quite well: do you take the government money and the obligations that it entails to follow-through on the research agenda (commercialization is never directly covered in these grants) or do you focus on the main line of business and the growth model that you have projected?

The DARPA topic is particularly interesting to me because my approach to solving the problem would invoke a psychological model known as "construction-integration" (CI). In CI, when someone is learning something, they are integrating prior knowledge with situational knowledge as they read or study the topic. This is closely allied with the educational model known as constructivism, but has some specific and measurable aspects when applied to textbook learning that takes it out of the softer realm of educational theory.

Specifically, CI has been used to explain some odd results in text comprehension where those who are well-versed in a topic area learn better when given relatively incoherent texts about a related topic. Now I don't mean that the texts are simply gibberish but merely that they are not measurably as "coherent" as other texts. That is, there are fewer linking ideas between paragraphs, more pronominal references are used and there is more of a burden put on the reader to fill in the gaps. Not surprisingly, for those with little understanding of a topic area highly coherent texts improve their ability to learn those new ideas. Machine-based methods can even score coherence fairly well, which is part of the technology used for automatic essay grading methods.

My application of CI and coherence would be applied to a novel domain, however, to fulfill the DARPA needs. The call looks for a realizable system in 3-5 years, which is an astounding timeframe to my mind in this age of internet acceleration and souped-up disintermediation. And if I took it on, I would have to think about it in that kind of timeframe, something that carries with it a bit of cynicism in that you don't want to move too terribly fast lest you make yourself ineligible for future funding by actually bringing a product to market.

Push, pull. Either way, it is better to have an embarrassment of riches than none at all, I suppose.

Thursday, April 12, 2007

Kurt and The Asterisk


Goodbye Kurt Vonnegut. His gentle, amiable writing style stunned me at age 12 and I continued to read everything he wrote up through Bluebeard, I think. He was the soul of humanism and devoted a whole chapter to people he had known in Palm Sunday, cherishing those linkages to the humble and great as justifying his own life. And in Bluebeard, again, the final painting of the protagonist is a visual record of everyone he had in his life, because it is our part in other lives that makes a story.

I thought back a bit having heard of his passing this morning and realized that my first great memory from Vonnegut is from Breakfast of Champions and is suitably juvenile: a crude drawing of an anus as an asterisk (accompanying image stolen). I have no idea what the context was but remember laughing about it for days, secretly snickering at the implausibility of that little drawing appearing in the middle of a "serious" work of fiction. He was laughing, too, when he drew it.

My second memory is a quote that goes something like: I will not participate in massacres, nor will I let my children participate in massacres. It's a good start to a humanist manifesto.

Goodbye, Kurt.

Wednesday, April 11, 2007

Graphs and Relationships

Mathematically speaking, graphs are collections of edges and vertexes (nodes). They can be undirected (no arrows) or directed (arrows). Graphs are useful for understanding ideas like connectivity in telecommunications systems, combinatoric relationships in algorithms, and large relationship networks. At IBM Research in the mid-90s, graph connectivity was used to characterize the World Wide Web by noting that there are "hubs" and "authorities" based on linkage patterns. An authority is a page that everyone points to, while a hub is a page that points to many other pages. The Google Pagerank system borrowed the same idea but simplified it a bit to not consider the linkage graph to be a directed graph. While the role of Pagerank in improving Google over their competition is vastly overrated and misunderstood, the approach did have an impact on their success for at least a subset of the queries that they service.

I've been more recently working on another kind of graph called co-citation matrixes in my startup effort. The idea borrows on the analysis of academic research papers that looks at papers that cite or reference other papers. A direct citation is obvious: I put a reference to your paper in my paper. Co-citation analysis looks at the papers that are linked together by both citing another paper. Now, sometimes those co-citations are fairly spurious or backgrounders to fundamental or related ideas. Sometimes though they are directly related to the topic of the paper. The interesting question is how to find the best set of relationships out of the huge relationship matrix.

More to my particular problem, how do you look at a graph built out of combinations of direct references and named entity relationships (people, places, organizations) and simplify it to find significant relationships and linkages? There are some very cool algorithms to do that based on looking at multipath linkages and checking whether the sum of their weights or normalized occurrence counts is greater than the more direct pathways. There are also some similar methods in matrix mathematics that try to fit a "reduced dimensionality graph" onto the existing graph. In other words, if I have 100,000 nodes and their linkages (up to 10^10!) can I create a new matrix that is most similar to the original one but is only based on 200 nodes? The "most similar" requirement constrains the choice of new matrix to somehow maximally represent the data distribution in the original matrix, exposing the most significant patterns of interaction, effectively bottlenecking or distilling the representation down to just the most essential aspects of the observed patterns. Related approaches are rampant in our nervous systems, helping to identify edges in noisy visual fields and isolate novelty in memory.

Another aspect of graph theory that emerges (excuse the preparatory pun) in large relationship graphs has to do with adaptive theory in a way. If you have a group of nodes (say, people) and you create a graph of their business relationships to one another over time, the time evolution of those relationships has some interesting properties. Specifically, small cliques emerge and then tentacles reach out and start joining the cliques together. As the graph grows, there is a point at which a "giant connected component" typically arises. That is, all the nodes become suddenly connected together. What is interesting is that the probability of that component emerging is not linear in the number of edges and nodes, but emerges quite suddenly when the edge/node ratio reaches a certain point.

I like that as a broad example of a transition in self-organization that is not predictable by the parts alone, yet results in a new class of relationships. My work is a bit more prosaic, of course, and involves algorithmic efficiency considerations combined with usability considerations for the relationships I am trying to distill down into their bare essentials, but the depth of the subject still resonates in the background.

Wednesday, April 4, 2007

Organisms and Startups

Team building exercises. Remember team building exercises? Trust, subordination of individual egos to the collective, division of effort, coordination. Corporations are like organisms in a way, with the organization thriving through specialization of the parts to execute, execute, execute. Or die. David Sloan Wilson draws sharp parallels between corporations and organisms in Darwin's Cathedral, and notes that in a free market there is real competition for resources, mating and spawning of new companies as people leave to start fresh and subdivisions are created.

I'm 3 months into my seed grant for my startup and it is time to start building my team up from just our three current technologists. It has been surprisingly easy so far, which keeps me from suffering too much under the weight of ambiguity. I snagged an ex-Xerox PARC researcher now a prof at Berkeley who is a perfect fit. I have a Chief Scientist at a video sharing startup who has been informally advising me for several years. He's too busy to be really active, but still stays in touch. In-house counsel at an old employer connected me to an ex-CEO who I am tapping for the business side. He may not be a good fit for the consumer web space, but he knows everyone in Sili Valley from his aerie in Saratoga and will undoubtedly be a tremendous asset. I also got set-up for referrals for the critical attorneys when needed. And soon I will be hitting up old contacts on Sand Hill Road. But not until the timing is right.

Optimism is almost palpable here on the Left Coast, as liquid as the fog banks that push in over the coastal range. But the optimism requires teams, specialization, growth, dynamism. Learning to sublimate the research engineer's heads down inward focus and reach out to build a team is one of the hardest and most rewardingly optimistic things I have ever had to do.

The organism is starting to grow.