Tuesday, May 17, 2011

Feel like there is not enough IT to go around?

The good news is that Spring seems to have sprung for IT.   There are signs of activity in many areas, new job openings, technology sales and most importantly - new projects.   There appears to be a drive from the business to allow IT to execute on much of the demand that has been on hold for a couple of years.   The challenge is that IT has been on a significant diet for the same length of time.

Are you seeing any of these symptoms?
  • Projects that were expected to start near the beginning of the year, are just being kicked off
  • Even though development resources levels may be addressed, new projects may be constrained by infrastructure capacity that has resourcing issues and its own ramp up schedule
  • New projects appear to be affecting each other because they may have been initiated simultaneously and draw on the same pool of resources
  • You have a feeling that quality may be affected as IT pushes the envelope on risk to achieve delivery dates
If this is affecting your organization, what can you do as an IT leader?
  • Ensure that the project portfolio maps well to the business strategy – this is not the time to be working on any projects that aren’t the highest business priority.
  • Mitigate risk by recognizing the issues of resource constraints and interdependencies and focus on 100% execution through good planning – avoid the pressure for ready, fire, aim projects.
  • Leverage variable capacity opportunities offered through consulting resources and the new secret sauce: public cloud – consider short-term ramp up of consulting for burst capacity, moving more infrastructure resources to outsourced managed services and consider a long term shift to leveraging cloud for infrastructure where your organization can live with concerns for data security.
Taking on these issues now could help you live with the result in 12 months.  

Tuesday, April 26, 2011

Amazon EC2 Outage and Cloud Strategy

Last Friday, Amazon experienced a partial outage of its cloud infrastructure.   Here the initial update and the closing updates:


Event Issue
"The problem started with a "networking event" that led to problems with how data is mirrored: We'd like to provide additional color on what were working on right now (please note that we always know more and understand issues better after we fully recover and dive deep into the post mortem). A networking event early this morning triggered a large amount of re-mirroring of EBS [Elastic Block Storage] volumes in US-EAST-1. This re-mirroring created a shortage of capacity in one of the US-EAST-1 Availability Zones, which impacted new EBS volume creation as well as the pace with which we could re-mirror and recover affected EBS volumes. Additionally, one of our internal control planes for EBS has become inundated such that it's difficult to create new EBS volumes and EBS backed instances. We are working as quickly as possible to add capacity to that one Availability Zone to speed up the re-mirroring, and working to restore the control plane issue. We're starting to see progress on these efforts, but are not there yet. We will continue to provide updates when we have them."

Closing update from Amazon:

As we posted last night, EBS (Elastic Block Store) is now operating normally for all APIs and recovered EBS volumes. The vast majority of affected volumes have now been recovered. We’re in the process of contacting a limited number of customers who have EBS volumes that have not yet recovered and will continue to work hard on restoring these remaining volumes…
We are digging deeply into the root causes of this event and will post a detailed post mortem.

One of the unfortunate realities of infrastructure and operations is that the goal will always be 100% uptime for all infrastructures but it cannot be achieved.   The SLAs for infrastructure and operations is very unlikely to be 100%.   The strategic question will always be what SLAs can be afforded, what is the impact to business agility for the target SLAs and what can be improved from a people, process and technology perspective to achieve the business goals and minimize cost. 

Because there are clear ties between performance, availability and security objectives and the success of outsource cloud infrastructure and operations, I believe that public cloud will outperform internal infrastructure over time.   This does not lessen the requirement for internal roles of architecture, end-to-end management of performance, availability and security, and vendor management.   These roles will increase in importance within organizations. 

The current Amazon issue re-emphasizes that a cloud strategy needs to include

  • Clear and continuous risk management program for IT
  • Enterprise change, incident, problem, release and configuration management process re-engineering
  • End-to-end SLA and systems management
  • Server provisioning process and technology
  • Patching process
  • Server configuration baselining and auditing
  • Repurposing of servers
  • Disaster recovery planning and testing

Tuesday, March 22, 2011

Private Cloud - why and how?


There is an explosion of change occurring in infrastructure and operations.   While it took almost a decade for virtualization to become main stream, cloud options are evolving much more rapidly.   There are two major business drivers – variable cost for consumers of IT resources and a need for increased IT agility.   All cloud options are built upon shared physical network, virtualized server and storage resources.   Cloud takes virtualization to the next level.   On top of virtualization it layers automated self-provisioning, chargeback for resource utilization, and service level agreements for cloud services that are in the service catalog.

The primary cloud discussions today center on when an enterprise will use public cloud and if it needs to implement private cloud as a stepping stone along the way or as a step-sibling for a longer period of time.   The growth of public cloud is large.   IDC estimates that the total expenditure on public cloud to be $29.5 billion by 2014.   There are some issues affecting the speed of public cloud adoption.   These are compliance concerns, data security, and cost.   As a competitive public utility, cloud cost will eventually go away as a concern.

In the meantime, Gartner believes that the most enterprises over the next couple of years will focus their attention on implementation of private cloud.   Today, there are three options for private cloud.   Enterprises can build their own private cloud (in their data centers or colocation sites), they can contract with a public cloud provider to create a physically separate private cloud for the enterprise (in cloud provider data centers or contracted colocation sites), or they can contract with a managed services provider to manage a private cloud in the enterprise data center.  

Whether an enterprise chooses to move to a public cloud or implement a private cloud, the approach to developing a strategy and implementation plan needs to follow the same methodology.   At a high level, the methodology has a four steps:
  • Define an end-state that satisfies business requirements including the financial goals, service goals and resourcing/role goals.
  • Identify the transition actions including development of services, financial changes, skill/role changes, ITIL process changes, and infrastructure changes.
  • Plan and communicate the individual transition work streams.
  • Communicate the overall program frequently and execute.
While the overall transformation creates business value and opportunity, each of the transition actions will create resistance.   Call me if you want to discuss this further.

Tuesday, March 08, 2011

Planning for Cloud Implementation

We have done a good amount of consulting on moving to public cloud (especially for companies that are not happy about the cost of their existing managed hosting vender).   On the initial discussion, one of the first questions is “what do I need to think about and how do I choose a cloud vendor?”

Moving to a public IaaS cloud vendor and to a lesser extent, a SaaS vendor is a typical data center move or implementation with a few twists and the usual issues that are easily forgotten.   While it is amazing simple and fast to build a new environment in the cloud, caring for it will take some planning, and may require changes to existing technology and processes.   While it may not be as formalized, even small organizations need to think through the issues.   Here is a checklist of items to think about:

Resiliency and Availability
Adding another node to your infrastructure network requires that you think through network configuration and redundancy, as well as server resiliency for servers that are in the new cloud environment.

Data considerations
Do you have constraints because of compliance, performance or support that affects where your data needs to be located.   A compliance requirement may force you to keep data in an internal data center and use it from the cloud.   A performance requirement may suggest a hybrid cloud with database servers in the cloud managed environment and web and application servers in the self-service environment.   Your database vendor or your performance requirements may not support a virtualized database server.

Compliance and Security
Does your IT implementation require that you have an intrusion detection or intrusion prevention system?   Is there a requirement that your infrastructure be located in a SAS-70 certified environment?   Are there requirements in your security policy that require multi-factor authentication?   Will you need to extend your vulnerability and penetration testing activities for the new site?

Identity management
How will enforce the user authentication and control policies for the new environment, e.g., when an employee or consultant leaves the organization?   Will you need to create a new AD domain and build a trust?  
Managing capacity
Monitoring performance and availability

Change, Configuration and Release Management
Will you need to add roles or workflow changes to the change management process?   What changes to you need to make to ensure that your configuration management database is current as you add and remove CIs from the new cloud environment?  Will you need to modify your release management process to push changes to the cloud?

IT service management
If there is a bump in the night, do you need to modify your incident management process to deal with workflow or contacts associated with the new environment?   Are there new services that you need to add to your service catalog to support users of the new environment?

Licensing
Will you need to extend software licensing to cover the new environment from with your vendors or will you acquire licenses through the cloud vendor or SaaS provider?

Testing
If you are moving applications or portions of applications to the new cloud environment, how will you approach functional and performance testing?

Disaster recovery
Will you build a DR site for the cloud implementation?   How will you approach data synchronization?   How does this affect your change, configuration and release management processes?

Friday, February 11, 2011

Cloud economics may surprise you

The economics of Cloud may incentivize changes in architecture.   Here is an example:

Many companies aggregate security log files from servers and network devices into a single repository to facilitate alerting on events and to support forensic investigation of security events.   For some organizations, compliance requirements like PCI indirectly make this a requirement (it would be too onerous to satisfy Requirement 11 without implementing a SIEM). There are many Security Incident and Event Management systems (SIEM) that support this.   Security log aggregation can create a large amount of network traffic to the centralized database.

Many cloud providers allow an unlimited amount of inbound network traffic, but charge for outbound network traffic.   This could create a situation for a company, considering all costs, where it is less expensive to place the SIEM and other monitoring infrastructure in the Cloud rather than inside the walls of the organization’s data center.   This may become even more obvious as the company increases the number of servers it puts in the Cloud.

Let me know if you would like help analyzing the cost of cloud for your organization.

Tuesday, February 01, 2011

Cloud Computing will change IT Organizations

At its core, cloud computing is outsourced infrastructure or application services. There will continue to be increasing adoption to allow IT to improve service levels, lower costs (if it can dial down services during periods of lower demand) and respond faster to change required by the business. In some cases, especially smaller organizations, a move to outsourced services can ensure that the number of jobs within an IT organization does not need to grow. This is good for business.

In medium and large organizations, the new ala carte menu options provided by various options in cloud computing will create new challenges for the IT organization. This will initiate a shift in job functions. There will be more outsourcing of jobs that are single-focus technical specialists (either to managed services or to services bundled with cloud offerings), but there will also be growth in need for architects, designers, development integrators, security specialists, compliance officers and IT managers of outsourcers within the IT organization. This is also good for the business because the leverage and value for the funded job position grows. IT has always created and lived with change and transformation. The cloud transformation, like all change, creates opportunities and challenges, but from the perspective of jobs, I anticipate that there will be continued net growth because as a whole, IT enables business.

Saturday, January 15, 2011

Can you effectively outsource a NOC?

Over the past 10 years, there has been significant growth of two types of outsourced NOC services – I think of these pure-play infrastructure services and targeted application services.   Do they answer the question, “Can you effectively outsource a NOC?”  I believe that these service offerings provide an answer topart of the question and at the same time can create a financial obstacle to solving the whole problem.   But there is another approach.

Pure-play Infrastructure Services
These services include support for network, server operating systems, management of databases, backups and execution of scheduled tasks.   To optimize this service offering, vendors focus on recruiting employees that are knowledgeable in specific horizontal technologies – Cisco, F5, Windows, Linux, Oracle, SQLServer, etc.   They implement good ITIL process and supporting technology.   What is lacking is an understanding of the applications that deliver the business value.   This limits their ability to predict business impact from infrastructure issues and their ability to react to alarms on breaches of transaction and process availability and performance thresholds that directly affect the business.

Targeted Application Services
Outsourced services for SAP, Lawson, Oracle Fusion, Warehouse Management Systems, Content Management Systems, and other similar applications get closer to alignment with business value.   However, many organizations have integrated multiple customized off-the-shelf systems and they have some homegrown systems.   The challenge of this reality is that the integration and interdependencies of these systems has created a single overall lovable Frankenstein of a system for the organization.   Solving the support question of any of the component systems is not sufficient to providing overall coverage for the business.  

The financial obstacle of partial outsourcing
Sometimes I think about Pure-play Infrastructure Services and Targeted Application Services as skimming the cream from milk.   If one is not trying to solve the whole support problem, it is the least expensive approach to get value for the outsourced dollar.   This is why there is a clear business case for outsourcing vendors.   They do provide economies of scale for the hired resources.   The challenge for the buyer is that the dollars that pay for these services are not available to put into the pot to solve the whole problem.  

So, can you effectively outsource a NOC?
Yes.   There is an alternative approach that leverages outsourced resources focused vertically rather than horizontally.   And, if an organization implements a monitoring system that crosses the infrastructure (data centers and cloud) and includes monitoring of the applications and their interdependencies, you can answer the whole question.   A key advantage of implementing the monitoring (and a ticketing system) within the organization is that it provides some additional independence from outsouring vendors.  

T3 Dynamics delivers professional services on monitoring and offers a SaaS offering for end-to-end monitoring.