Rethinking the Network

Michael Bushong

Subscribe to Michael Bushong: eMailAlertsEmail Alerts
Get Michael Bushong: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn

Related Topics: Cloud Computing, Virtualization Magazine, Data Services Journal, VMware Journal, HP Virtualization Journal, Cisco Virtualization Journal, Datacenter Automation, Government News, CIO/CTO Update

Blog Feed Post

Closing the Bio Gaps in Networking

Focus on the workflow problems of deploying and managing the servers – hence the DevOps and data center orchestration movements

It is no surprise to anyone that everything in IT these days is very cost conscious. It could be that your IT organization is run as a cost center, in which case the primary metrics all revolve around cost and schedule. Or it could be that IT is a direct part of your revenue stream, in which case you have to operate at scale, which involves getting a handle on costs. Whatever the model, you are very likely involved in more than one cost conversation today.

This dynamic is not unique to networking. On the server side, we have already digested a few transitional changes that have structurally altered the cost curve for server procurement and management. Where we used to scale up individual servers at the expense of both CapEx and manual server administration, virtualization allowed us to abstract individual servers into a larger pool of compute resources.  This changed the operational model into one of managing a resource pool – with its inherent fault tolerance and fluidity of resources.  Individual servers no longer mattered. It became a question of how to manage this pool of compute, and how to automate the processes and workflows of deploying applications and servers. VMware spearheaded this movement and effectively shifted the CapEx from high-end server hardware to enterprise virtualization software.  The value moved, but in return, they delivered a new set of capabilities through their resource abstraction.

xXaykkE5CC-12But this isn’t just a CapEx story.  While virtualization fundamentally changed the resource model from individual servers to an abstract pool of compute resources, it now has a new scaling limit: not server size, but an upper limit to how many resources can effectively be managed at once. With more virtual machines to manage and a seemingly limitless capacity to add more, virtualization would never work unless you could successfully abstract the complexity of managing these resources.  Server resources mean administration, not just hardware.  If virtualization vendors didn’t solve this administrative overhead issue, there would be little value in virtualizing a data center.  The point isn’t to manage hundreds of hypervisors – it’s to create a flexible pool of compute resources.  This explains why VMware gives away free copies of ESXi, but charges dearly for vCenter and its attendant workflow applications. The only answer here was to adopt programmatic control to hide the complexity of server management and focus on the workflow problems of deploying and managing the servers – hence the DevOps and data center orchestration movements.

What can we learn from this?

Whatever control model we end up with in networking ought to consider that the real value is in the abstraction of individual elements into a resources pool. Individual elements (like servers) only can become commoditized when they can be  fluidly plugged into a abstract resource model.  So if you want whitebox switches for lower CapEx, this will only scale if it doesn’t raise the overall burden of managing large numbers of discrete network devices.  If the value of individual devices drops, and you can afford to scale the number of devices, this will yield only a worse OpEx burden and overall scaling problem than you had before.  Commoditization without an effective abstraction model is a fool’s errand.

But is this the process we are going through on the networking side?

Oddly enough, the control model is emerging first, as SDN, NFV, and network virtualization. This is happening somewhat isolated from the forwarding architectures, The SDN controller discussion is rarely paired with any real discussion of underlying capacity handling.

And to make things more confusing, the control model that took off first was strangely related not to the biggest cost drivers but rather to how packets would be steered. OpenFlow is a useful construct for facilitating the separation of control and forwarding, but it seems an odd starting point for a technology space whose chief advantages lie in workflow automation. It might have made more sense to see OpenFlow emerge alongside other technologies, architected together to expressly address workflow automation. I think some of the OpenFlow noise has died down as it has taken a more natural position as a supporting rather than lead technology.

If long-term cost is the enemy (and it absolutely is), then we ought to start by understanding where that cost comes from before we design a control model that can minimize it. Operating expenses fall into a couple of major categories:

  • Environmental costs - Those costs tied to the physical design of the network. These include things like rack space, power, and cooling. Architectures that favor fewer devices with a lower footprint will ultimately drive lower long-term costs. This is why we have seen 3-tier architectures collapse to 2-tier architectures.
  • Management costs - Every port under management brings with it some administrative burden. This is partly why SDN architectures advocate controller-based solutions. By reducing the administrative touch points from many to one, the overall administrative burden is lower. Any solution that provides fewer administrative touch points will help curb long-term operational expense.
  • Bio Gaps - Any place where there is a gap in tooling and/or automation that requires human intervention, there will be a gap. Human capital is almost always more expensive than the machine equivalent, so wherever these gaps exist, there will be added operational overhead.

Whatever control model we settle on as an industry needs to be mindful of where these Bio Gaps exist currently, and where they are likely to reside in the future. So where do Bio Gaps exist? Generally, they will exist at three major boundary types:

  • Organizational boundaries - Whenever a task crosses from one team to another team, there is some sort of handoff. Take troubleshooting as an example. A user files a ticket for some application issue. The application team looks at it, triages, and assigns it to the networking team. This introduces a handoff that brings with it a context change and some corresponding delay. That boundary creates cost.
  • Talent boundaries – Whenever a task requires multiple people with different skill sets to be involved, there is some form of transition. Imagine a provisioning task that requires end-to-end provisioning. You might have the Cisco expert touch those devices while the HP expert manages the others. This creates a coordination effort that brings with it some cost.
  • Tooling boundaries – Wherever a task requires a switch in tooling, there is some manual effort required to integrate the data service between those tools. A user enters a help desk ticket to initiate some new application turn up. That requires information to be passed between the various provisioning systems. This incurs extra cost.

Understanding that these boundaries will drive operational costs, whatever control model we decide on ought to consider these explicitly. Keep in mind that when I say control model, I mean it in the most general sense. The model is not just a protocol or API. It also includes the people, processes, and tools required to execute that model.

Do the protocols coming out now explicitly address these boundaries? In some cases, the answer is absolutely yes. Part of the lure of OpenDaylight is that a controller built as a platform will offer many opportunities to produce, consume, and ultimately unify data. In other cases, the answer is a bit more murky. Proprietary controllers with no openly accessible means of sharing data do not facilitate integration. (As a side note, this is why Plexxi cares about the Data Services Engine so much. It adds an openly accessible interface to our controller.)

And finally, how might this control model change if the underlying forwarding model changes? Do we expect fine-grained flow control? Do we expect distributed capacity? Are we expecting any optical integration with dynamic pass-throughs (as with the CALIENT products)? Do we expect centralized algorithmic control to make better use of unequal cost pathing? And if so, what does this do to he underlying physical architectures? Do we expect real-time optimizations or something more coarse? And how are these changes triggered? Instrumented? Reported?

This entire line of questioning gets even more complex when we consider that the future of IT is not the siloed version we live in today. When compute, storage, and networking come together with the applications, how are workflows orchestrated across the major resource pools? This evolution will require the convergence of multiple control models, each of which has evolved organically in isolation.

Companies that do not plan for this type of convergence will be caught horribly flat-footed. The kinds of changes we are talking about are measured in years, not weeks or months. The personnel, organizational, and cultural impacts of changes like this are striking. How much more difficult will this convergence of control models be if we continue to design in our respective vacuums? I suspect most people don't really think about this too much. Customers should augment their arsenal of questions in the coming quarters to ensure they at least survey the landscape of possibilities. Failing that, the OpEx nirvana everyone is hoping for might not be attainable.

The post Closing the Bio Gaps in Networking appeared first on Plexxi.

More Stories By Michael Bushong

The best marketing efforts leverage deep technology understanding with a highly-approachable means of communicating. Plexxi's Vice President of Marketing Michael Bushong has acquired these skills having spent 12 years at Juniper Networks where he led product management, product strategy and product marketing organizations for Juniper's flagship operating system, Junos. Michael spent the last several years at Juniper leading their SDN efforts across both service provider and enterprise markets. Prior to Juniper, Michael spent time at database supplier Sybase, and ASIC design tool companies Synopsis and Magma Design Automation. Michael's undergraduate work at the University of California Berkeley in advanced fluid mechanics and heat transfer lend new meaning to the marketing phrase "This isn't rocket science."