Cloud Building: Trusted Multi-Tenancy Requirements

There are two primary actors in a cloud deployment, the customer of the cloud (Tenant) and the provider of the cloud (Provider). They represent a standard client / server relationship requiring a strong contract and visible performance against that contract.

I believe that there are a set of key capabilities required in order to maximize the adoption [by Tenants] of [Provider] cloud infrastructure. They are detailed as follows.

  • Flexibity
    • The cloud is dynamic in nature; tenant boundaries shift to meet business needs and locations change in response to overall workloads to meet contractual SL0s and to manage availability
    • Virtualization is required to support this operational model
  • Agility
    • The speed of contracting for services verses the capital budget, procurement, and deployment lifecycle to purchase the capacity will have a significant advantage in responding to the speed of business.
  • Elasticity
    • Empowerment to expand and contract capabilities as and when business requirements change and opportunities arise.
  • Economy [of Scale]
    • Simply shifting infrastructure from Customer to Cloud Provider will not provide Economies of Scale.
    • Cloud Providers must be able to pool and share their infrastructure resources in order to achieve Economies of Scale for their Customers.
    • Multi-Tenancy is the Key to saving of massive amounts of capital and maintenance costs trough sharing of pooled resources via cloud operating systems.
  • Trust [and trusted]
    • Multi-Tenancy yields cost savings, but Customers will NOT adopt Cloud Services unless they can be assured that the Cloud environment provides a higher level of granularity of visibility and control than their existing infrastructure
    • Trusted Multi-Tenancy is the Key to Cloud adoption.

These capabilities follow with a set of Tenant requirements (expectations):

  • Improved Control of and Visibility into the Environment
    • Self-service using web-based controls
    • Improved visibility of both function and expense
    • Transparency in operations and comp liane
  • Isolation from other tenants; must ensure
    • Privacy
      • protect data in use amp; data at rest
    • Non-interference
      • ensure their SLOs are met, regardless of other tenant workloads
  • Security
    • Identity
      • Single Sign-On (SSO) federated from Enterprise to SP
    • Ability to control access to shared resources (data path, control path, and key management/escrow)
  • Improved performance to expense ratio (shared capital)
    • Reliability
    • Operational agility (contract/expand)

As we distill the functional requirements, an architectural taxonomy of affected entities naturally emerges:

201206271409.jpg

Then we must evaluate our core architectural tenets of trusted multi-tenancy (please excuse the pun):

  1. Make customer-visible units of resources logical not physical
    • Known MT properties/capabilities on any layer directly exposed to customers
  2. Put those logical objects into containers [nested] with recursive delegated administration capabilities @ the container layer
    • Separates the implementation of a resource from its contract
    • Provides a common point of mediation and aggregation
    • Hierarchical (Layered) relationships must be supported on the data path and the control path
  3. Implement out-of-band monitoring of management activity
    • Verify actual state of system remains in compliance throughout management / state changes across tenancies
    • Out-of-band monitoring must be done at the container boundary for the container to support multi-tenancy
    • Multi-Tenant correlation (actual vs. expected) becomes critical to GRC

And lastly we believe that these further distill into a set of discrete design factors and principles that help to create a matrix of critical requirements:

201206271413.jpgprovided under copyright © EMC Corporation, 2012, All Rights Reserved

I hope that this discussion of Trusted Multi-Tenancy helps to clarify the complexity of the Tenant / Provider relationship as well as the complexity of the infrastructure required to fulfill Tenant service requirements in a cloud setting.

I want to thank the EMC team including Jeff Nick, John Field, Tom McSweeney, the RSA team at large as well as the 30+ members of our working group for the work that precluded this blog. – Dan

Project Razor: EMC and Puppet Labs

Today EMC and Puppet Labs are releasing via open source community contribution, a key module that serves to ground the last mile of the DevOps process. Project Razor is a tool that I’ve personally been looking for, for years. Razor is a service oriented (RESTful), model based, and policy driven “Operating System provisioning tool” which use their native installers for hardware compatibility.

During the project design meetings, we determined that we wanted a tool that could be:RazorBlade.jpg

  • a programmable power controller, exploiting BMC/IPMI or an Intelligent powerstrip
  • a capability discovery tool, using a downloadable kernel to take and maintain system inventory
  • OS delivery processor, that is state machine driven, using native OS installers (ESX, Ubuntu, SLES, RedHat, even Windows), delivering what we might call a targeted “bare minimum” OS (with just enough package to join the MCollective)
  • a consistent handoff, of the explicitly provisioned computer, to the DevOps environmentm for configuration control and customization
  • and have a pluggable, easily extensible state machine that could act as a framework for downstream work (since no one solution fits IT).

Having architected large scale clouds like Sun Cloud and developed for Cloud Foundry based platform environments, I found myself spending a huge amount of time provisioning bare metal boxes with an ever changing portfolio of hardware and Operating Environments. Then, once I put the box in service, I had another set of scripts (Post Install) that would consistently provision the application stacks. For me, there were really 2 things that were inefficient, one was the fact that the installation models (usually scripts) were brittle because of machine differences, software builds/updates, and configurations, and the second was that I kept cobbling together all the different pieces: ipmi, tftp, httpd, dhcpd, syslog, + a ton of scripts. The second failure was the fact that each OS distributor employed their own provisioning systems isolated to their OS; but my environment needed 3 linux variants, VMware vSphere 5, Windows 8 and even Solaris.

Constantly looking for a better way, we evaluated a variety of DevOps tools to replace the “post OS” scripts, and settled on Puppet from Puppet Labs. For me the ability to finally get back to my programming roots within an traditional “System Administration” context was just the flexibility that we needed. We could build a first class MVC, exploit re-use, modern programming languages and an amazing flexibility of deployment.

201205090727.jpg In my view, this is an environment designed, not for the traditional system administrator, but for the hyper-productive, administrative developer. Not catering to the lowest common denominator, the flexibility afforded by the PL approach can help to tear down some of the old walls between development and deployment in a valuable and empowering way.

We built Razor to integrate with the DHCP and network boot services, to support the cycling of power, and the taking inventory information.  Interestingly we used Facter here to deliver, dynamically the “inventory taking modules” so that we could keep the inventory kernel small, but support the dynamic delivery of key sensors just in time.  The inventory information is registered with Razor, and the system is put in an wait state until Razor decides what to do with this node.

Razor takes the inventory and provides a set of matching expressions to help one “tag” key features in order to add this machine to the appropriate sets.  Razor then enables the administrator to define a model = what they want done, and a policy that maps the model against the tags, so that when a tag is matched a policy is triggered and a model delivered.  The net result, if the box fits a known profile, the model (OS + configurations) is delivered and Razor then deals with the boot / iPXE delivered OS images, and HTTP delivered customized package sets to that node (including the Puppet Agend), and when all is done (and the second reboot happens), hands this node off to Puppet for subsequent customization.

What’s cool about this solution? Razor with Puppet now support the end-to-end configuration control capability for a brand new “bare metal” node, into a working, deployed, managed, configuration with repeatability, auditability, and flexibility.

Why open source? For EMC, we see that the cloud market place is emerging quickly, and further recognize that the transparency and open-ness of frameworks will be pre-requisite to broad adoption for innovation and safety, respectively.  This just the first of many contributions that EMC technology team will make to the open source communities in support of massive scale cloud computing, scale out architectures, and the improved automation and operational excellence. Our architectural patterns for these environments are strongly influenced by DevOps strategies and SideBand controllers as typified by the OpenStack architecture. In areas like Quantum (Software Defined Networks), Cinder (Storage/Volume Controllers), and Nova (Compute Controllers). To add to this list, I believe that Puppet + Razor become a “Configuration Controller, which, can serve as mature model based policy driven glue-ware for cross controller customized and configuration managed integration.

Not to sell our VMware partners short! We believe that there is an exceptionally strong role for Razor + Puppet = top/bottom DevOps in the ability to stand up both VMware clusters in the vCloud model, but also to scale out the delivery of integrated applications on top of that environment.

We’ll be showing a really neat demonstration of Puppet and Razor at EMC World today, so please do tune in to “Chad’s World“. And, I do know that some now want more detail, but I want to call out the two key EMC developers who made Razor possible, namely, Nick Weaver and Tom McSweeney. Both of them worked tirelessly first on trying to get the other industry configuration controllers (Baracus from SuSE and Crowbar from Dell), and then to architect the pluggable dynamic state machine for Razor detailed here in Nick’s Blog. And of course the Puppet Labs team including Nan, Dan, Teyo, Nigel, Jose, Scott and Luke who integrated with our team seamlessly and made this possible.

Lastly, I need to call out Chuck Hollis, who is constantly looking for the cool factor in the industry and in EMC, who has detailed Razor in his blog.

Big Data Exchanges: Of Shopping Malls and the Law of Gravity

Working for a Storage Systems company, we are constantly looking at both the technical as well as social/marketplace challenges to our business strategy. Leading to the coining of “Cloud Meets Big Data” from EMC last year, EMC has been looking at the trends that “should” tip the balances around real “Cloud Information Management” as opposed to “data management” which is really what dominates todays practice.

There are a couple of truisms [incomplete list]:

  1. Big Data is Hard to Move = get optimal [geo] location right the first time
  2.  Corollary = Move the Function, across Federated Data
  3. Data Analytics are Context Sensitive = meta-data helps to align/select contexts for relevancy
  4. Many Facts are Relative to context = Declare contexts of derived insight (provenance amp; Scientific Method)
  5. Data is Multi-Latency & needs Deterministic support for temporality= key declarative information architectural requirement
  6. Completeness of Information for Purpose (e.g. making decision) = dependent on stuff I have, and stuff I get from others, but everything that I need to decide.

I believe that 1) and 6) above point to an emerging need for Big Data Communities to arise supporting the requirements of the others. Whether we talk about these as communities of interest, or Big Data Clouds. There are some very interesting analogies that I see in the way we humans act; namely, the Shopping Mall. Common wisdom points to the mall as providing an improved shopping efficiency, but also in the case of inward malls, a controlled environment (think walled garden). I think that both efficiency in the form of “one stop”, and control are critical enablers in the information landscape.

Big Data Mall slideThis slide from one of my presentations supports the similarities of building a shopping mall alongside the development of a big data community. Things like understanding the demographics of the community (information needs, key values), the planning of roads to get in/out. And of course how to create critical mass = the anchor store.

The interesting thing about critical mass is that it tends to have a centricity around a key [Gravitational] Force. Remember:

Force = Mass * Acceleration (change in velocity).

This means that in order to create communities and maximize force you need Mass [size/scope/scale of information] and improving Velocity [timelyness of information]. In terms of mass, truism #1 above, and the shear cost / bandwidth availability make moving 100TB of data hard, and petabytes impracticable. Similarly, velocity change does matter, whether algorithmically trading on the street (you have to be in Ft Lee, NJ or Canary Warf, London) or a physician treating a patient, the timeliness of access to emergent information is critical. So correct or not, gravitational forces do act to geo-locate information.

Not trying to take my physics analogy too far, but Energy is also interesting. This could be looked at as “activity” in a community. For energy there is an interesting both kinetic and potential models. In the case of the internet, the relative connectedness of information required for a decision could be viewed in light of “potential”. Remember:

Ep (potential energy) = Mass x force of Gravity x Height (mhg)

In our case Height could be looked at as the bandwidth between N information participant sites, Mass as the amount of total information needed to process, and Gravity as a decentralization of information = the Outer Joins required for optimal processing. If I need to do a ton of outer joins across the Internet in order to get an answer, then I need to spend a lot of energy.

So if malls were designed for optimal [human] energy efficiency, then big data malls could do exactly the same for data.

Big Data Universe: “Too Big to Know”

I was surfing to WGBH on Saturday when I came across a lecture by with David Weinberger (surrounding his new book Too Big to Know).201202200850.jpg

I was sucked in when he eluded to brick and mortar libraries as yesterdays public commons, and pointed to discontinuous and disconnected nature of books / paper. The epitaph may read something like this “book killed by hyperlink, the facts of the matter are whatever you make them.”

Overall David, in his look at the science of knowledge, points at many interesting transitions that the cloud is bringing:

  • Books/Paper are discontinuous and disconnected – giving way to the Internet which is constantly connected and massively hyper/inter-linked
  • “the Facts are NOT the Facts” which is to point out that arguments are what we make of the information presented, and our analytics given a particular context. What we claim as a fact, may in reality proved to be a fallacy -gt; just look at Louis Pasteur and germ theory. History has so many moments like this.
  • Differences and Disagreements are themselves valuable knowledge. For me this is certainly true, learning typically comes through challenge of preconception.
  • There is an ecology of knowledge – There are a set of interconnected entities that existing within an environment. These actors represent a complex set of interrelationships, a set of positive and negative reinforcements, that act as governors on this system. These promoters/detractors act to balance fact/fallacy so as to create the system tension that supports insight (knowledge?). It’s these new insights that themselves create the next arguments – the end goal being?

I wanted to share this book because I believe that it re-enforces the need for businesses to think about their cloud – big data strategies. The question becomes less of “do I move my information to the cloud?” and more of “how do I benefit from the linkage that the Internet can provide to my information?” so as to provide new insights from big data.

Read the book, take the challenge!

The Greenplum “Big Data” Cloud Warehouse

The Data Warehouse space has been red hot lately. Everyone knows the top tier players, as well as the emergents. What have become substantial issues are the complexity of scale/growth of enterprise analytics (every department needs one) and increasing management burden that business data warehouses are placing on IT. Like the wild west, a business technology selection is made for “local” reasons, and the more “global” concerns are left to fend for themselves. The trend toward physical appliances has only created islands of data, the ETL processes are ever more complex, and capital/opex efficiencies ignored. Index/Schema tuning has become a full time job, distributed throughout the business. Lastly, these systems are hot because they are involved in the delivery of revenue… anyone looking at SARBOX compliance?

Today EMC announced the intent to acquire Greenplum software of San Mateo, CA. Greenplum is a leading data warehousing company with a long history of exploiting the open-source postgres codebase, with a substantial amount of work in taking that codebase to both a horizontal scale out architecture, but also a focus on novel “polymorphic data storage” which supports new ways to manage data persistence to provide deep structural optimizations including row, column and row+column at sub-table granularity*. In order to begin to make sense of EMC’s recent announcement around Greenplum one must look at the trajectory of both EMC and Greenplum.

EMC, with it’s VMware/Microsoft and Cisco alliances, and recent announcements around vMAX, vPlex… virtual storage becomes a dynamically provision-able, multi-tenant, SLA policy driven element of the cloud triple (Compute, Network, Storage). But, it’s one thing to just move virtual machines around seamlessly and provide consolidation and improved opex/capex – IT improvements. In my mind “virtual data” is all about an end-user (and maybe developer) efficiency… giving every group within the enterprise the ability to have their own data either federated to, or loaded into a data platform; where it can be appropriately* shared with other enterprise user as well as enterprise master data. The ability to “give and take” is a key value in improving data’s “local” value, and the ease with which this can be provisioned, managed, and of course analyzed defines an efficient “Big Data” Cloud (or Enterprise Data Cloud in GP’s terms).

The Cloud Data Warehouse has some discrete functional requirements, the ability to:

  • create both materialized and non-materialized views of shared data… in storage we say snapshots
  • subscribe to a change queue… keeping these views appropriately up to date, while appropriately consistent
  • support the linking of external data via load, link, link & index to accelerate associative value
  • support mixed mode operation… writes do happen and will happen more frequently
  • accelerate linearly with addition of resources in both the delivery of throughput and the reductions in analytic latency
  • exploit analyst natural language… whether SQL, MapReduce or other higher level programming languages

These functions drive some interesting architectural considerations:

  • Exploit Massively Parallel Processing (MPP) techniques for shared minimal designs
  • Federate external data through schema & data discovery models, building appropriate links, indicies and loads for optimization & governed consistency
  • Minimize tight coupling of schemas through meta-data and derived transformations
  • Allow users to self provision, self manage, and self tune through appropriately visible controls and metrics
    • This needs to include the systemic virtual infrastructure assets.
  • Manage hybrid storage structures within single database/table space to help ad-hoc & update perform
  • Support push down optimizations between the database cache and the storage cache/persistency for throughput and latency optimization
    • From my perspective, FAST = Fully Automated Storage Tiering might get some really interesting hints from the GreenPlum polymorphic storage manager

Overall, the Virtual “Big Data” Cloud should be just as obvious an IT optimization as VDI and virtual servers are. The constraints are typically a bit different as these data systems are among the most throughput intensive (Big Data, Big Compute) and everyone understands the natural requirements around “move compute to the data” in these workloads. We believe that, through appropriate placement of function, and appropriate policy based controls, there is no reason why a VBDC cannot perform better in a virtual private cloud, and why the boundaries of physical appliances cannot be shed.

Share your data, exploit shared data, and exploit existing pooled resources to deliver analytic business intelligence; improve both your top line, and bottom.

Technorati Tags: , , , , , ,

SOA + SOI = VOA?

So there has been much ado from commercial technologists, analysts and press about the emergence of SOA as a silver bullet in the defining of a common Enterprise / Inter-Enterprise Service model, and in an approach to the re-factoring of existing legacy systems. Though there are almost as many approaches to SOA as there are articles defining and re-defining the terms, one of the critical lessons is that SOA will only yield as much agility as the infrastructure upon which it runs.

Coming from a history of grid/utility computing programs has afforded me the benefit of viewing enterprise resources as consistent model-able elements that can create a virtual deployment infrastructure for this next generation of agile enterprise services. What remains critical, and a common theme in my weblog has been the need to better elaborate the systemic requirements of a system – whether represented by a single service interface, or through a composite model, service level objectives remain a difficult to define, more difficult to measure, and virtually impossible to predictably enforce set of goals. A recent read of the Enterprise Grid Alliances “Reference Model” is illuminating; as I think that the industry, through GGF, EGA, DTMF UCWG, SNIA and even eTOM are getting close to a basic taxonomy in which a SOA is deployed (hopefully dynamically) against a Service Oriented Infrastructure (SOI). So what am I really saying? Model Driven Architecture is attempting to establish consistent patterns for the organization of object oriented services, with clear separation of concern, and appropriate compositional granularity. SOA’s are constructed through Directed Acyclic Graphs of these sub-systems that define their inter-relationship (though the complete vocabulary isn’t standardized). Now that we understand the components that make up an enterprise service, we need to determine how to apply these models against infrastructure, and once again MDA can serve us well.

Let’s take our infrastructure (pools of elemental resources) that can also be dynamically assigned, are themselves objects, which can leverage similar model based approaches to their organization we quickly begin to recognize the need for both “extension” – supporting a consistent interface to a domain model, whilst enabling the customization that is a natural byproduct of real-life variability. Models, for me are a composition of patterns that provide layered abstraction to mitigate complexity and support replacement… These models are really purpose built realizations of patterns based upon a set of concrete contexts – a complicated way describing constraint driven pattern application…. I want to remodel my bathroom, to build a plan I need to take existing patterns from some kind of “standard” catalog, and apply them given my constrains… I’m tall, so I want the countertops raised, I’m disabled so I need to extend the natural 18“ on center rule for toilet spacing to 36” for access… you get the picture. By producing models (either abstract or concrete) we can begin to elaborate a design, manifesting all of the complexity within a set of stackable “layers” that ensures both loose coupling/east of replacement, along with the traceability against requirements and constraints.

So finally to VOA or “Virtualization Oriented Architectures”; architectures built with clear layer based / atomically separable abstractions… VOAs are Models in which abstraction can be late bound, but coherently orchestrable designs. These designs need to describe the “virtualization” constraints in a way that deployment contexts can be manifest at runtime rather than design time, through a more consistent set of abstraction layers/patterns that reflect the platform layer, service layer that you wish to virtualize. In lay terms… virtualization provides a component and a container that inherently provides the lifecycle management of that component. For VOA to work (as SOA) the functional capabilities as well as the non-functional or systemic capabilities need to be declared at design time, and then contractually provided at runtime.