wwHadoop: Analytics without Borders


This week at EMC World 2012, the EMC technical community is launching a community program called World Wide Hadoop.  I am really excited to be a part of a collaboration across the EMC technical community that has been looking to extend the "borders" of our Big Data portfolio by building on the success of our Greenplum Hadoop distribution in offering the open source community the ability to federate their hBase…

Big Data Universe: “Too Big to Know”


I was surfing to WGBH on Saturday when I came across a lecture by with David Weinberger (surrounding his new book Too Big to Know). I was sucked in when he eluded to brick and mortar libraries as yesterdays public commons, and pointed to discontinuous and disconnected nature of books / paper. The epitaph may read something like this "book killed by hyperlink, the facts of the matter are whatever…

Fallacies of Enterprise Information Management (part deux)…


With some hearty comments from Tom Maguire, I've been forced to adjust some of these fallacies: 1. Data quality is perfect - data is correct, complete and coherent across all enterprise contexts - People will remediate bad data - if inaccuracies are found (contrary to the axiom above) users will willingly and proactively make changes, and all users will agree with those changes 2. Relationships are Known - The linkages…

Information Management Truisms


I have long been a fan of Peter Deutsch's fallacies (btw I'm not alone, Google this AM produced over 22k references) of network/distributed computing, they have served as a set of guiding checkpoints for every distributed system that I have built. What I have found to be missing, however, is a similar set of fallacies/truisms for managing Information while we approach “internet scale” information infrastructure... the information explosion. Truisms defined…