upGrad KnowledgeHut SkillFest Sale!

ITIL Interview Questions and Answers Interview Questions

Information Technology Infrastructure Library or ITIL is a framework designed to standardize the planning, delivery, maintenance, and overall lifecycle of IT services within a business. ITIL basically outlines the best practices for delivering IT services. Whether you are a beginner or an intermediate or an experienced ITIL professional, this guide will aid you in increasing your confidence and knowledge of ITIL. You can check questions ranging from the role of Business relationship manager, Service level manager to Information security manager and Supplier manager. This write-up also provided step-by-step explanations for each question that will help you to know the concepts in detail. With ITIL interview questions to your rescue, you can be confident enough to be well-prepared for your upcoming interview.

  • 4.5 Rating
  • 13 Topics
  • 116 Questions Covered
  • 11816 Reader(s)

Beginner and Advanced

ITIL is an acronym for Information Technology Infrastructure Library and is a set of detailed practices for IT Service Management (ITSM) that focuses on aligning IT services with the needs of business.

ITIL conforms to ISO 20000 Section 11 and remains the most widely accepted approach to IT Service Management in the world (2 M+ certified people).

It is owned and governed by AXELOS (www.axelos.com); the most recent published standard is ITIL v4 Foundation level (February 2019).

The origins of ITIL date back to the 1980’s and it has been updated many times prior to ITIL v4; Figure 1.1 represents the changes (Source: http://itservicemngmt.blogspot.com/)
ITIL® – A Brief Introduction

It is strongly recommended that you go through the recorded Webinar (YouTube: https://www.youtube.com/watch?v=iRPivknhq2Y) on IT Service Management to gain a broad perspective before jumping in.

ITIL® Life Cycle

While ITIL v4 matures in adoption and the higher versions are released during 2019 and 2020, this document has been prepared to help aspirants succeed at interviews focusing on IT Service Management roles prescribed by ITIL 2011.

For the remainder of this document, we will follow the chronology of the lifecycle depicted above as we describe the roles.

We shall assume that our Questions and Answers are relevant in a large ‘hypothetical’ organization – where the scale necessitates specialization, i.e. we have each role uniquely fulfilled by one person only. This is only with the purpose of helping in understanding the role a bit more clearly. In the real world, often one person will be acting in different roles, even in large organizations. ITIL does provide guidelines and best practices on multi-role assignments, but this is out-of-scope for this document.

In the world of IT, the ‘customer’ refers to the business. In an outsourcing scenario, the customer of the IT service provider could be the IT department of the organization that is outsourcing the services. Because of the transitive nature of the customer–provider relationship, we tend to always call the contracting party the customer.

If we take a ‘macro’ view, then the customer is the final consumer of the service. Good service providers take this macro view – this helps in innovation even within the domain of an existing service.

E.g. in the case of a citizen services portal, the customer for the IT company developing the portal is the local government (e.g. the local council or municipality), but in the broader perspective, every citizen using the portal is a customer.

To reduce the confusion, it is better to distinguish the two entities as follows: the customer is the entity that the IT service provider is contracted with. This entity provides the requirements for the service design, and the service provider is bound by IT Service Level Agreements to this entity. The end consumer is the user, and possibly there are business SLAs that bind the customer to the user. E.g. a resolution time of 24 hours to a complaint lodged by the user of an Internet broadband service.

Introduction

Service Level Managers are also called Service Managers or even Service Delivery Managers in many organizations. This is a very key role in the ITIL® landscape and central to managing customer expectations and customer satisfaction. Good Service Level Managers are in high demand, and the experience levels vary from about 6 years upwards. With higher level of experience, a Service Level Manager may become more adept at managing more complex IT landscapes or more complex outsourcing organizations, e.g. a multi-vendor setup. The Questions below are likely to be useful to aspiring Service Level Managers, during an interview, and after they land a new job. It is packed with experience garnered from complex IT Service Management landscapes and while the answers follow the ITIL guidance, this is not what you will typically get in ITIL books. Therefore, the information presented is an add-on, as opposed to presenting what is already available elsewhere.

1. What is meant by ‘Service Level’? How is it determined and who determines it?

Service Level or level of service is a quantification of the scope of services. E.g. if Incident Management is a service that is provided, the IT service provider may provide some commitments on – the number of incidents resolved on a business day, the response and fix times (in minutes or hours) per level of severity, the lead time (in days) to provide a major incident report etc. Levels of service must be defined for every service provided. 

Often, levels of service are associated with financial rewards or penalties. Target service levels will typically be defined during the Service Design phase of the ITIL Lifecycle and the actual performance will be measured during the Transition, Operations and Continual Service Improvement phases. Context of the service provisioning determines the committed service levels, but common influencing factors are – system documentation, system stability, user base, user experience design considerations etc. The technology expertise of the service provider is also important, e.g. a service provider who specialises in a technology may be able to commit to better service levels than a more generic service provider. 

Service levels are defined during the Service Design phase and the Service Level Manager defines these, with inputs from the Business Relationship Manager and in consultation with the customer and by leveraging the capability model of the service provider.

2. What does Service Level Management (SLM) hope to achieve?

The primary objective of service level management is to define, document, agree, monitor, share, report and review the level of IT services provided. SLM also ensures that appropriate information is available to Business Relationship Management so that the latter may have more effective communication with stakeholders. Metrics collection is an ongoing process in the later phases – Transition, Operations and Continual Improvement. SLM should have the tools and methodology to analyze and make decisions regarding re-calibration of the service levels.

Apart from defining the service levels, SLM also ensures that IT and the customers have clear and unambiguous expectations on the levels of service to be delivered. While BRM owns the customer satisfaction survey process, the SLM process ensures that the results of the survey can be mapped to the service levels that are agreed.

Last, but not the least, SLM is also responsible for improving the levels of service delivered by the provider organisation. Improvements are not only necessary in the context of low customer satisfaction but also important for customer delight especially in a scenario where there are many other competitors IT service providers, which is typical in an outsourcing scenario.

3. What are the pre-requisites for a Service Level Manager to be successful?

There are mainly a couple of things that are inputs to the SLM process – the Service Portfolio and the Service Catalogue. The contents of these define the scope of services to those managed by SLM.

The Service Catalogue should be the single source of truth for the description of services agreed with a customer. Among other things, the description for each service should include current details, status, interfaces and dependencies on other services (which may well be provided by other IT service providers). These services could be current services being consumed or even the ones that are being designed or developed or transitioned into the live environment.

The Service Portfolio is a superset of the Service Catalogue. It exceeds the latter in terms of its scope i.e. it also includes information on ‘retired’ services, i.e. services that are no longer offered.

The Service Portfolio is internal to the IT service provider and includes all the services that are offered to all customers.

To understand with an example, if an organization is providing Incident and Problem Management services to customer A and Problem Management and Change Management services to customer B, then the Service Portfolio of the provider will include Incident, Problem and Change Management. However, the Service Catalogue for customer A will include only Incident And Problem Management (but not Change Management), and for customer B will include only Problem and Change Management (but not Incident Management). To put it in another way, the Service Catalogue is a customer-specific subset of the Service Portfolio, that is visible to the customer.

4. Have you heard of SLAM charts? What are these and why do we need them?

SLAM is an acronym for Service Level Agreement Monitoring, and SLAM charts are visual depictions of the actual level of compliance against the agreed levels. SLAM charts are built on top of data that is provided by the Service Transition, Operations and Continual improvement processes. The concept of SLAM forms the basis of many of the tools that are in use today to monitor the health of IT services and infrastructure.

SLAM charts will almost always include some visual aids to denote the health levels – usually on a Red-Amber-Green (RAG) traffic-lighting model, where Red is obviously poor health needing immediate attention, and Amber denoting services that need to be monitored so that they do not slip into the Red zone. Online tools and most of the cloud service providers will use SLAM charts for their customers built using agreed thresholds defined during the service design phase. E.g. two customers A & B, both requiring 98% Availability of a service may have different tolerance levels for service degradation. Customer A may define an Amber threshold as 96% to 97.99%, while customer B may have a more relaxed lower limit at 92% but a slightly stricter upper limit at 98.10%.

Dynamic SLAM charts may be used extensively by the operations teams such as the Application Management and the Technical Management Teams as well as the Service Desk. Modern tools allow drill down to the service level. SLAM charts are also used in service status reporting and in re-defining the service strategy – in the BRM process. They are a sure-shot source of current and historical information (trends over time) for the stakeholders involved in Service Improvement – for increased customer satisfaction and beating the competition.

5. What is an SLA? Does it always have to be documented?

The SLA is an acronym for Service Level Agreement. SLAs must always be mutually agreed between the customer and the IT service provider and documented. SLA information must be available to the ‘people on the ground’ – namely the development, application management, technical management and Service Desk teams.

SLAs will be referenced in the contract, and be used to arrive at contract pricing, as well as in defining rewards and penalties for the organization providing the services. Contracts are legal documents, so, there is always a possibility that the contents of the SLA documentation will be referenced during any legal proceedings. E.g. for medical equipment running on software at a hospital, there may be an SLA for providing on-site technical services in case of failure during a surgery. Failure to do so may risk the life of a patient, and therefore invite legal action against the service provider.

With the above, it is quite evident that SLAs must always be documented without exception.

The SLA must only contain what can be effectively monitored and measured. This is because it is difficult to define any contractual reference to subjective items.

6. A very junior member of your team is curious about ‘OLA’. How do you explain it to her?

OLA is an acronym for Operational Level Agreements. An IT service provider will typically agree to provide certain services at specified levels. This is documented in the Service Level Agreement (the SLA). However, in the real world, the service provider organization may need to contract with other organizations, or even with other internal departments. Let us see this with a couple of examples.

Let us say the customer contracts with an IT Service Provider X for providing Incident Management, Problem Management and Change Management services for System A. System A interfaces with a backend system B, which is serviced by another IT service provider Y. Upon investigating an incident, the Application Management team infers that the real issue lies in System B, so they pass the incident to the team in organization Y. Now, ownership for adhering to the service levels for the incident still lies with organization X. What if the team in Y accepts the incident, but reduces priority because they have other incidents to look at? How can org X ensure that Y attaches due importance to this incident? Through an OLA. Org X must formalize an OLA with Y, so that this incident is taken up within the OLA. E.g. all such delegated Severity 3 incidents will be turned around in no more than 20% of the overall SLA resolution. So, if the agreed service level with Org X for Severity 3 is 10 business-hours, the expected turnaround time from org Y is 2 hours.

7. Describe a typical day in the life of a Service Level Manager?

The Service Level Manager is the process owner for the Service Level Management (SLM) process. This is a critical role, and he wears many hats at a time. With the customer, he needs to negotiate and agree to the service levels for each of the services in the Service Catalogue – some of these would be for current services, and some for future services. Agreements may also be needed for improvements to service levels for current services, e.g. the IT Service Provider may commit to a 98% compliance for Year 1 of the contract and promise a 0.5% improvement Year-on-Year (YoY) basis. So, effectively, they are committing to a 98.5% compliance in Year 2 and 99% in Year 3.

The Service Level Manager knows most about the SLAs, so is best suited to align the OLAs with the downstream providers or suppliers. E.g. for on-boarding niche technology resources, there may be an OLA of 4 weeks with a staffing organization.

He also ensures that the service status reports are accurate and circulated with the relevant stakeholders in a timely manner. If there are any breaches reported, these should be investigated, and lessons learnt documented. He should represent the service provider organization in the service performance reviews with the customer and the suppliers.

At the Change Advisory Board (CAB), the Service Level Manager represents the assessing authority for the impact of changes to the current and planned services. For any adverse customer survey feedback, complaints on the non-compliance to the agreed service levels, or requests for improvement he will first record and then manage the complaint lifecycle to resolution, by liaising with the other departments.

8. Can you give some real-world examples of multi-level SLAs?

Multi-level SLAs are typical of outsourcing scenarios, where one IT service provider provides a bouquet of services to different customers. This is done to ensure that the specific needs of the customer are met – in other words, the service provider is customising its portfolio. While this obviously means a possibility of achieving greater customer satisfaction, there are trade-offs that must be made in terms of service levels and costs.

A file-sharing service is a good example. Let us say basic users can only transmit 2 GB per day for no charge, and the receiver can access the files for 7 days without a login. So, at the most 14 GB of storage will have to be provided to the basic users. For premium users, the levels could be higher – let’s say 10 GB per day with a carry forward option and the receiver can access it for 30 days. Let us also assume that this service is available round the clock. So, we are talking of a service that offers a generic service level of 24X7 ‘availability’ for all users, but different levels of storage ‘capacity’.

Now consider another dimension to this – that of data encryption. The data protection laws of a country may require that all files transmitted from IP addresses in that country be encrypted and the receiver must have a basic service login. To promote usage of their file sharing service, the service provider includes the provision to automatically create a login for the receiver, so that the latter may only need to reset their password at first login. With this, the service provider also endsup providing a ‘user creation’ service in addition to the ‘availability’ and ‘capacity’. This is a 3rd level of SLA, specific to the user as well as the service.So, in this example you can see 3 levels of SLA – generic level, user-specific level, and a user-specific service level.

9. You have joined a new company as the Delivery Head, and you need to do goal setting for your Service Level Manager. What are his success factors and KPIs?

As mentioned in one of the earlier answers, a Service Level Manager wears many hats. Therefore, his performance goals need to be set accordingly. In fact, often the success of an engagement will depend largely on how well the service levels have been met.

In SLM, numbers are key – this is also true for the KPIs for the Service Level Manager. First, how many of the service targets are being met, and what are the levels of compliance – Red? Amber? Green? Too few SLAs being met may imply over-commitment on the part of the service provider and too much compliance can imply that they may have chosen easy targets. If there are any breaches, these need to be counted, as well as the extent of the breach.

A Service Level Manager must keep all the SLAs up-to-date and communicate this to the people on the ground. Service performance reports must be accurate and timely, and actions identified during service reviews followed up to closure. He must actively involve himself in improving customer satisfaction by way of cost reduction, service improvement and innovation. Most SLAs are valid during the lifetime of a contract, so the Service Level Manager must actively participate in renewing the SLAs during the contract renewal. During the renewal process, he must consider the service performance in the last performing period and re-calibrate them by negotiating with the customer.

10. During your induction in the role of a Service Level Manager to the new company, they have shared with you only the signed copies of customer contracts. What should you do?

I must ask for the Service Level Agreements (SLAs) also. Contracts are most likely to reference SLAs, which are usually documented separately, as the latter is not a legal document and changes to the same pass through a less rigorous change control process. This also allows the Service Level Manager to make operational changes to the SLA if the effort and budget does not exceed what is mentioned in the contract.

For a Service Level Manager to understand the context of the services in his new organization, the SLA, and the Service Portfolio and Service Catalogue are the three primary things that he needs to be familiar with within the context of ITIL. He also needs to get copies of the SLAM charts and access to any dynamic SLAM dashboards in use for the service. The service performance reports, and the minutes of the service review meetings are also useful inputs, as these will give him a good understanding of the current status of the service.

To know his stakeholders better, he must also review and understand the OLAs with the other groups and the other suppliers. After studying the SLAs and the OLAs, he must be able to stitch them together to understand the overall situation with the service levels. If the same service is provided to different customers at different service levels, he must understand the reason behind this.

Finally, the service improvement register, which should also include the actions being taken to improve customer satisfaction and redress complaints, is another source of information.

All the above are must-see must-know and must-understand artefacts for the newly inducted Service Level Manager. And of course, a bit of luck as well!

Introduction

Information security is one of the most important topics in the present business scenario. With increasing usage of Information Technology in the day to day affairs of life and also the increasing competition in the business scenario, every business strives to keep its assets and intellectual property secure. People expect to be always connected requiring a constant flow of information in all directions and from all directions; people expect that Technology will enable them to do things that they cannot otherwise do comfortably. While this is a great thing to happen, it also exposes businesses to threats like getting hacked or other forms of loss of confidential information via phishing etc.

Information Security Technology has grown manifold in the past few years. Information security has become a topic of national and political interest and many countries are implementing legislation around information security. Information security officers are in high demand and command very good pay packages – making it a lucrative career option.

1. Why is Information Security important? Is it something optional?

Every business wants to conduct its business securely. Conducting business securely means protecting the intellectual property, providing a secured working environment to the employees and ensuring that the partners do not divulge any information that they may have access to via signing non-disclosure agreements.

Now, all businesses rely on IT to varying extents, and the business security must be extended to IT as well. In ITIL®, the goal of the Information Security Management (ISM) process is to align the IT security with business security and ensure that information security is managed in service and service management activities. In modern businesses, ISM is an important part of corporate governance, and has therefore, strategic importance. As a result, all ISM objectives are aligned with the business objectives and vice versa, and the management of information security risks is overseen by the company leadership. Although we are exploring ISM in the context of ITIL, it is not just something that is related to IT, it is very much a key aspect of doing business. It is a must-have, and not optional.

2. What are the typical security objectives of an organisation?

The following are some of the security objectives of an organization:

  • Information is available and usable when required
  • However, the information should be only available to those who need it, and who have the privileges to access that information. In other words, confidentiality should be maintained.
  • The IT systems that provide the above information are secured – i.e. they can resist attacks
  • The IT systems should be able to prevent failures or recover from failures should there be one. The systems should be available.
  • The information should be complete, accurate and protected against modification. If the information is modified in error or maliciously, an audit log of the modification should be available. This is referred to as the integrity of data.
  • All business transactions and information exchange between the stakeholders must be trustworthy. Communication channels must be secured.
  • Business processes will define the priority of the confidentiality, integrity and availability aspects; the business objectives will drive this.

3. What is a security framework? Who defines this?

A security framework is an essential component of the Information Security Management (ISM) process and will generally consist of the following:

  • An overarching Information Security policy
  • Specific security policies that derive from the above, but is more specific to strategies, controls and regulation
  • A set of security controls to support the above policies
  • The Information Security Management System (ISMS) – these will contain the standards, guidelines and procedures for managing them
  • Security strategy – closely interlinked with the business objectives
  • A security organization – with roles, responsibilities and people mentioned therein
  • A repository of the security risks and how these will be managed
  • A communications plan for how the security topics shall be disseminated in the organization, including training and awareness sessions
  • Finally, a monitoring framework to keep a tab on the status of compliance and the effectiveness of the security controls and communication plan

Providing a security framework is the responsibility of the executive management. They hold the final accountability for protecting the information related to the organization. Once the policy is framed, the security organization is tasked with ensuring compliance, and become the ‘guardians’ of the policy.

4. Have you heard of ISMS? Can you tell more about it?

ISMS is an acronym for Information Security Management System. The ISMS will contain the standards and guidelines that support the information security policies. The ISMS also provides the procedures on how these will be managed.

As a management system, the ISMS is a continuous cycle of Planning, Implementing, Evaluating and Maintaining of the standards and the guidelines.

As a part of the planning process, the Service Level Agreements (SLA), Operational Level Agreements (OLA), Underpinning Contracts must contain references to the security policy of the organisation wherever applicable.

As a first step of implementation, Security awareness must be created within the organization. Security must be implemented for the staff, networks, applications, end user computing devices. All assets must be registered and classified as per their sensitivity, and access to these must be controlled and monitored. Any breaches, i.e. security incidents must be reported and dealt with as per the procedures laid in the ISMS.

The next step, evaluation, is realized through conducting of internal and external audits, self-assessments and performing the causal analysis of security incidents.

Learnings at every stage of this continuous cycle must be used to maintain the ISMS and plan for more effective upholding of the security policy. These should be reported back to the stakeholders.

5. Do you have an Information Security Officer? What are his duties?

An Information Security Officer is referred to as an Information Security Manager as per ITIL terminology. The main duties or responsibilities of this role are as follows:

  • They will assist in developing and subsequently maintaining the Information Security Policy
  • They will create information security awareness in the organization through appropriate means, including training of personnel
  • Classify the configuration items (CI) in terms of the levels of protection and control
  • Perform risk management activities – like identify potential security risks and work on creating mitigation and contingency plans
  • Manage all the security breaches by taking remedial action
  • Analyse the security breaches and create an improvement plan for reducing the volumes of such incidents in the future
  • Participate in the change management process by performing the security impact analysis of changes and provide estimates for any information security related changes
  • Perform self-assessment security tests, conduct internal and external security audits
  • Uphold the security clauses in the Service Level Agreements and discuss any breaches or changes with the customer
  • Uphold the security clauses in the Operational Level Agreements and discuss any breaches or changes with the suppliers
  • Keep the executive management informed about the latest industry developments in the information security

6. You have joined a new company as an Information Security Officer. You realize that assets are not classified. What would you do next?

As the Information Security Officer my first step would be to understand the configuration items (CI) and how these are maintained in the organization, i.e. the configuration management database (CMDB). With an up to date CMDB in place, the next step is to organise the information assets as per ISO 27001 standards that directly apply in the ITIL context: Confidential (only senior management have access); Restricted (most employees have access, likely on a ‘need to know’ basis); Internal (all employees have access) and Public (everyone has access).

Depending on the nature of the business, there may be other levels that may need to be created, e.g. in a medical institution, doctors may have access to patient information, but not necessarily how the finances of the hospital work; on the other hand, the top management may not have access to the patient records. These levels must be discussed and agreed with the executive management – and must be in line with the business objectives and the information security policy.

Classification of the information provides buckets into which assets are logically arranged. The next step is to design the exceptions and the approval mechanisms for the exceptions. Once this is done, access rights must be provided as per the information security policy.

In a business scenario, new information assets are created regularly. Therefore, the next step is to educate the creators of the assets on how the newly created assets must be classified. This is achieved via training.

Once the above setup is complete, self-assessments and audits must be regularly scheduled to check for compliance to the classification policy, usually, these will be a part of the wider security audits.

7. Have you heard about the GDPR? Can you give some details?

GDPR is an acronym for General Data Protection Regulation and is hailed as the toughest privacy and security law in the world. It was enforced on 25-May 2018 by the European Union (EU). However, this law imposes obligations onto organizations anywhere, so long as they target or collect data related to people in the EU; e.g. if you are an Indian IT company doing a project that involves data related to people in EU - you become accountable for compliance!

There are 6 principles of GDPR and a final accountability principle, making it a total of 7 principles:

  1. Lawfulness, fairness and transparency — Processing must be lawful, fair, and transparent to the data subject.
  2. Purpose limitation —You must process data for the legitimate purposes specified explicitly to the data subject when you collected it.
  3. Data minimization —You should collect and process only as much data as necessary for the purposes specified.
  4. Accuracy —You must keep personal data accurate and up to date.
  5. Storage limitation —You may only store personally identifying data for as long as necessary for the specified purpose.
  6. Integrity and confidentiality —Processing must be done in such a way as to ensure appropriate security, integrity, and confidentiality (e.g. by using encryption).
  7. Accountability —The data controller is responsible for being able to demonstrate GDPRcompliance with all these principles.

An organization that violates the GDPR must cough up a lot of money as penalty. First, the data subjects are liable for being compensated for the damages as a result of the breach. Secondly, the data protection regulator in each EU country can slap a fine of up to 20 million euros or 4% of the global annual revenue of the organization in the previous financial year – whichever is higher. The amount varies with the severity of the breach.

Please visit the GDPR website for more details - https://gdpr.eu/what-is-gdpr/

8. Let me tell you 3 keywords – ‘Information Security’, ‘data protection’ and ‘privacy’? Are these same or different ?

'Privacy' and 'data protection' are very close in meaning, and the usage depends on the country. E.g. in the US, the term 'privacy' will be used in the context of the controls associated with the processing of personal data. In the European Union (EU), 'data protection' will mean the same thing. The difference may be ascribed to difficulty of making a translation to the multiple languages being used in the EU.

However, the situation with 'data protection' versus 'information security' is different. When an IT service provider or a website mentions about 'information security' being provided - it may mean that, e.g., they use encryption to transfer files so that only the sender and the recipient know the content being transmitted and no one else. But this may not protect the information related to the users themselves. E.g. to send a file, the sender needs to create a profile - how is the IT service provider dealing with the sender information? This is where data privacy (or data protection) comes in. It is a common wrong belief that an IT service provider promising 'information security' is also protecting the data of the users of the system. E.g. A few years back Yahoo declared that they were hacked in 2014 (a breach of 'information security') and user data for about half a billion users was stolen (a 'privacy' breach). This shows the difference between the two terms.

Note that 'privacy' and 'data protection' always refer to personal data, but 'information security' is different – it is more generic and ‘impersonal’.

9. What is the ISO:27001?

ISO 27001 (ISO/IEC 27001:2013) is a globally recognised information security standard that specifies how an organization can build a world-class information security management system (ISMS). It helps organisations manage their information security processes in line with best practice while controlling costs. Although it is related to information technology, it is technology agnostic and applies to all organisations - big or small. This universality has resulted in the standard being widely adopted across the globe.

ISO 27001 enables organisations to achieve accredited certification by an accredited certification body following the successful completion of an audit. It supports compliance with GDPR (General Data Protection Regulation) of the European Union.

Following are the controls that are offered by the standard:

  • Information security policies
  • Organisation of information security
  • Human resources security
  • Asset management
  • Access control
  • Cryptography
  • Physical and environmental security
  • Operational security
  • Communications security
  • System acquisition, development and maintenance
  • Supplier relationships
  • Information security incident management
  • Information security aspects of business continuity management
  • Compliance

10. Can a customer impose an SLA related to security? Can you give some examples?

Considering the nature of the business, the customer can always impose some service level agreements (SLA) related to Information Security. Examples include, but are not limited to:

  • Restriction of physical access to the project area and server rooms
  • Onboarding of a new project personnel happens only after Information Security training is completed and a Non-Disclosure Agreement (NDA) is signed
  • Performing a background check on project personnel and submitting the report to the customer
  • Periodically force reset of password, employ a strict password policy
  • Scanning of the service provider’s networks and devices on an agreed schedule
  • Restriction on the use of chat tools, mobile cameras
  • Usage of customer provided proxy server for accessing the internet
  • Installation of monitoring software on the project computers
  • Keeping a record of all access requests
  • Usage of a secure virtual private network and devices provided exclusively by the customer

Each of the above may be subject to audit periodically by the customer and any breach of the same may make the service provider liable to pay financial penalty. The amounts of the penalty and the conditions when it will be imposed are usually included in the contract along with other service level agreements, terms and conditions. With the introduction of strict security laws like the GDPR, customers are increasingly tightening the security requirements for the service providers.

Introduction

The supplier manager is a critical role in an IT services organisation. This is especially true because technology is advancing so fast that at any given point of time a service provider may not have enough technological capability to service a business, thereby necessitating an outsourcing scenario. A supplier manager has a mix of skills in an IT organisation - they must understand a bit of technology, possess good negotiation skills, and must have excellent interpersonal skills. Supplier managers must have high integrity and be able to do their work in the best interests of the organisation. While their job is to derive value from the amount spent on the supplier, they must also have the appropriate soft skills to ensure that suppliers engage as partners, as opposed to being pushed into a corner via hard negotiation.

1. What are the responsibilities of a Supplier Manager?

A supplier manager helps in the development and review of the contracts that the company has with suppliers. If there is more than one supplier for the company, the supplier manager must ensure that the supplier processes that are established with the suppliers are in line with organisational strategies. At any given point of time, the supplier manager maintains a supplier and contracts database; this could be a shared folder or a SharePoint location. Circumstances may require some of the supplier contracts to be modified – the supplier manager must ensure that all such changes are processed via the appropriate change control mechanism. One of the key things to change supplier contracts is the involvement of appropriate stakeholders, specifically senior management within the organisation as well as in the supplier organisation. As a part of the supplier and contracts database all the supplier information e.g. the names of the key persons, management representative for any escalations, contact details of all etc.are stored.

The most important job of the supplier manager is ensuring that the company gets the desired value from a supplier in terms of the deliverables that are expected of the supplier. From time to time, the supplier manager must also review the performance of the suppliers, document any changes required including a change of supplier, or at a minimum, issuing the supplier a warning. If a supply contract must be terminated, a process must be followed. E.g. adequate notice of termination must be provided, and any responsibilities of the supplier as a part of the termination process must be documented in the supplier contract. During the exit process, the supplier must be vigilant because an unhappy supplier could possibly leave behind loose ends that could result in the degradation of service afterwards.

2. As a supplier manager how would you deal with a multi-vendor situation?

A multi-vendor situation is always tricky.

A supplier manager must take care that the suppliers do not end up competing against one another, eventually making it their primary motive. In an outsourcing situation, suppliers may try to grab a bigger share of the work, pushed by their organisation targets; sales representatives are usually rewarded for bringing in more business into their companies. Therefore, getting a bigger share of the work is always high on their agenda; under this pressure, they try competing with the other vendors who work for you. There can also be incidents of non-cooperation between the suppliers resulting in a lack of communication, and this makes the overall service provisioning to your customer challenging for you.

While competition is one side of the coin, the other side is‘colluding’. This means that the suppliers unite with each other and decide on a resolution, for e.g. they may decide that they will not provide resources below a certain rate card, which could be higher than the industry average. In such a situation you will end up spending more than you had planned for. This could finally mean that you either don't make the necessary profits, or you are selling your services at a loss to the company.

There are many other ways in which a multi-vendor situation can go wrong, and therefore the supplier manager must constantly ensure that they take care in terms of the supplier relationships and strike a balance between making them collaborate with each other verses keeping them aloof.

3. Have you heard of the Supplier and Contract database? What is it?

The Suppliers and Contracts Database (SCD) is established as an integrated element of a more comprehensive Configuration Management system (CMS). The SCD should have all the details of the suppliers and the contracts that exist with the suppliers together with details of the type of services or products that are provided by that supplier and how these relate to the configuration items.

E.g. provisioning of laptops for the employees of a company may be with supplier X. This information must be stored in the SCD. Furthermore, details of the quantity of laptops, manufacturer, configuration of the laptops, lead time to provide laptops, whether laptop bags should be provided along with the laptops etc. must also be included. Supplier details such as single point of contact in the supplier organization, escalation path and contact details of the supplier manager, technical skills of the personnel who shall be installing software on the laptops may also be included in the SCD.

The SCD should store all the information and act as a single source of truth for all the supplier information in the company. Apart from the contract details for all suppliers, there should also be a mechanism to categorize the suppliers; how the organization plans to evaluate new suppliers, new contracts and how new suppliers can be on-boarded.

The evaluation mechanism is crucial to ensure that poor quality does not come in; the process should be adaptable to the changing business priorities of the organization, e.g. in a period of lean business, the focus in supplier selection may be on cost, but in a period of good business the focus may shift to choosing suppliers with niche capabilities. Failure to adapt might misalign the suppliers from the company objectives. Regular evaluation should be carried out for all the suppliers and this information should also be stored in the SCD.

Contract renewal and termination criteria should also be included in the SCD.

The SCD is constantly in use across the ITIL® lifecycle, through service design, service transition and service operations.

4. In your opinion what is the objective of the Supplier Management process?

The objective of the Supplier Management process is to manage suppliers and the services these suppliers provide. This is necessary to provide seamless IT services to the business, the users and the customers. At any point of time focus should be to ensure that the users of the business derives value for the money that is spent on IT. The Supplier Management process ensures that suppliers and their provided services can support the overall service levels that the business expects from the IT organization. The process promotes awareness of the business context of working with the suppliers and partners.

All ITIL service lifecycle phases must consider the process of Supplier Management with due importance and that the supplier manager is involved in all the stages, from strategy through design through transition and operation and finally to improve.

In the modern business scenario Supplier Management should be able to ensure that it has full control on the suppliers and can quickly re-align and motivate the supplier to the changing business needs of the organization without going too much into contractual discussions. Supplier flexibility is key to the survival of any business as it helps to mitigate the risks of the business organization when not having certain skills or capabilities.

The primary goal of the Supplier Management process remains to continuously derive value from the suppliers and the partners, often through a reward and penalty mechanism. Value will also be derived through partnering, maintaining continuous communication with suppliers and agility is maintained through having a pool of ‘preferred’ suppliers with whom Master Service Agreements are already in place.

5. What do you understand by supply strategy?

Strategy for an organization is devised during the Service Strategy phase of the ITIL® lifecycle.

The first step in defining supplier strategy is whether the nature of the business allows certain services or products to be sourced from outside of the company. Due to reasons of confidentiality, this may be a very controlled process where every product or service that is being sourced externally must pass through stringent quality checks, e.g. procured laptops may need to undergo ‘hardening’ to prevent hacks. When services are being sourced from a supplier, confidentiality and data protection requirements may necessitate that the supplier provide services through a secured network or even from a physically isolated and secured location.

Supplier strategy must also define what kind of services and products may be sourced from the suppliers. E.g. certain businesses may not allow the procurement of software product licenses directly from the software vendor due to the higher license costs; in such cases the company must go through license resellers.

For sourcing manpower from external agencies restrictions may apply too. E.g. a business may not want to hire staff from a provider that already supplies manpower to its competitor. In some situations, such as a project that requires staff to operate in the night shift, there may be restrictions on hiring women in certain countries due to cultural reasons.

While supplier strategy can be more generic and overarching there should still be room for tweaks so that certain limitations of a prospective supplier may also be overcome, therefore ensuring that the business does not miss out on procuring products or services of better quality from a good supplier.

Cost consciousness is one of the primary factors for supplier selection. Many businesses follow a linear approach of choosing the least expensive supplier. However, the supplier strategies also allow flexibility in terms of choosing quality of deliverables over price when dealing with certain niche skills or services.

A good supplier strategy is beneficial to the business. The management must spend a good amount of time in formulating this and utilize the learnings from the past.

6. What is the most important factor of a good supplier relationship?

A good supplier relationship has a centrepiece called ‘Trust’. Trust is intangible and needs time to materialize within a relationship. A customer-supplier relationship starts with the exchange of goods or services against money. Commitments are made and must be fulfilled. These commitments are mostly contractual, and subject to scrutiny and legal jurisdiction and may result in financial reward or penalty.

However, businesses do not run linearly where there are only simple transactions; doing business would then be quite easy. There are numerous examples where businesses make commitment to their customers based on what suppliers have committed to the business. E.g. consider a case when delivering your project will require having a niche skill on-board. One of your staffing vendors has agreed to provide you with qualified resources in a stipulated amount of time. You will rely on this commitment from your supplier to commit to the business about delivering the software at a certain milestone. If the supplier is unable to provide you a quality resource on time, then you are in deep trouble. This is when you lose faith on your supplier and can no longer trust them. On the contrary, let us say that your supplier comes up with a unique plan, goes ahead and invests at their end in training to provide the resource to you. This enables you to deliver your software on time and with good quality to your business.

In the above situation the unique plan that your supplier executes is not a part of your contract with them, but it has enabled you to fulfill your committed business objectives. It’s a good example of when your supplier has built trust with you. When you deal with this supplier in the future, you will be always assured in your mind that they can deliver.

Look at the same example from another dimension. A supplier who can propose a unique solution outside of the contract probably understands your business needs very well. What they have demonstrated is ‘partnership’, which is well beyond a transactional relationship. Partnering is a very desirable state in the world of Supplier Management. If you are dealing with partners as opposed to vendors, you are more likely to succeed in your goals. For the supplier it is equally motivating to be considered a partner than just another vendor. It represents a growth in the value chain for them.

7. Can you distinguish between a Service Level Agreement, an Operational Level Agreement and Underpinning Contract?

A Service Level Agreement (SLA) is an agreement between you as an IT service provider and the business as a consumer, to provide specific IT services at a certain level of fulfillment. E.g. you may have an SLA with the business to maintain the infrastructure at 99.97% uptime. SLAs will be referred to in Statements of Work and contracts and are legally enforceable. Failure to fulfill SLAs may result in financial penalty for the service provider.

An IT service provider may have limited capabilities and skills which may require that it source additional services externally from suppliers who have expertise in the same. To source these services externally the IT service provider may need to enter into another contractual agreement with the supplier, which is also legally enforceable between the IT service provider and the supplier. This agreement is also referred to as an Underpinning Contract (UC). UCs need to be created for procuring equipment or specialized services from third party for the supplier or vendor.

An IT service provider may be a large organization with different departments. E.g. the HR department is responsible for providing staff to the IT Service Manager so that they may fulfill the commitments made in the SLA. If the HR department is unable to provide good quality staff on time, the IT service commitments cannot be fulfilled. This is where the IT Manager may want to enforce Operational Level Agreements (OLA). So, OLAs are internal, and made within different departments of the same organization. In certain mergers and acquisition OLAs may be in force between the merged organizations.

8. Can you think of some suitable KPIs for Supplier Management?

One of the most basic KPIs for Supplier Management is the number of underpinning contracts (UC). SLAs exist between the business and the IT service provider, however, when the IT service provider is sourcing goods or services from a third-party or supplier UCs may be the key to the success of the IT service provider being able to meet the SLAs. Appropriate coverage of financial liability must exist in case the supplier fails to meet its objectives. The UC enables to transfer the liability partially or fully to the supplier. In other words, the UC acts as a safeguard or an insurance for the IT service provider when it fails to achieve its business objectives due to the poor performance of its supplier.

Another KPI is the number of contract reviews with the supplier. Contract reviews ensure couple of things. First it ensures that you keep having regular conversation with your supplier. Secondly, it also ensures that the discussion regarding the fulfillment of commitments as per the UC happens. This KPI measures the contact frequency and is therefore useful to determine healthy supplier management setup.

Yet another useful KPI is the number of identified contract breaches with respect to the UCs. Every shortfall of the supplier in fulfilling a contract will usually result in additional costs incurred by the service provider so that the agreed service levels can be provided to the customer. These additional costs will not be chargeable to the customer for the business. Contract breaches are usually identified during contractor reviews or for major breaches during the event itself. A supplier who breaches too many times may be liable to heavy penalty or delisting from the list of suppliers. The service provider may need to further identify another supplier as a suitable replacement.

9. What are the factors to consider when choosing a supplier?

A good supplier is likely to take accountability for quality issues and provide a plan to work forward to address these issues quickly. A supplier without accountability is more likely to deflect the responsibility and blame something else that they will try to pass off as something outside of their control. Good suppliers are usually open to having their processes and internal working audited whenever required. If the supplier is resistant to such audit, it may be a sign of trouble.

Another factor to consider while choosing a supplier is their ability to scale, e.g. as an IT service provider you want a supplier to provide you desktops. You may want to check if they will be able to supply in bulk when need it as well as supply a single piece when the demand is low. A supplier must be verified and possibly references regarding their skills be sought especially when dealing with niche content or skills. Experience of delivering technology solutions in the same domain e.g. banking, manufacturing, E-Commerce sector is also important.

As per the saying ‘culture eats strategy for breakfast’, the next check that you need to perform is whether the supplier’s culture is aligned to yours or whether they will be able to align to your culture. Culture is difficult to change and this fitment must be considered even before choosing the supplier. In international businesses the language barriers and the ability to maintain open and direct communication is important. It will also be the other way around, where as a service provider you may be doing business with an organization which is in a country that speaks a different language. In this case you may want to choose a supplier that also speaks the same language as the business you are serving.

The supplier must possess clear and comprehensive record keeping practices, e.g. they should record all the important decisions and honour the commitments that may have been made only verbally. A supplier must comply to ethical practices and also the regulatory practices whether it be the law of the land or the law under whose jurisdiction the underpinning contract is.

While all the above make a good supplier, what makes a supplier great is their focus on continual improvements e.g. does a supplier only care for the industry standards like ISO? Does it also care to continuously reduce waste and improve efficiency in their operations and possibly even commit to passing some of these cost savings to the customer as gain of productivity?

10. What are the typical contents of Underpinning Contract (UC)?

Underpinning contracts should contain the following information array minimum:

Service name supply information such as supplier name, address, contact person and contact details such as mobile number and email ID

Contract duration, i.e. not only start and end dates but also the terms and conditions under which renewal or termination may happen

Description of the service outcome - this is the scope of the service; what is the utility that is derived from consuming the service. Any warranty information should also so be included under the description of scope of services.

Communication channels and interfaces must be defined which should include not only the contact points and details for both the contractual parties but also a description of such interfaces e.g. how will these interactions happen – over phone, email or Skype. Any service reporting requirements including the content and the frequency must be described. Service reviews at periodic intervals must be included including the parties that will before forming review. Triggers to affect an escalation and the parties involved in the escalation should also be described in detail.

The window of service should be described in terms of hours of service availability as well as the exceptions e.g. weekends and local holidays

The types and levels of support must also be described e.g. remote support or onsite support or out-of-hour support

Service level requirements and targets should also be set in the UC and this would primarily be driven by the business Service Level Agreements and by those that the IT service provider has committed to the business. These could include availability targets, capacity / performance targets and any service continuity commitments including disaster recovery scenarios

If any standard has to be followed for example ISO or CMM this should be mentioned in the UC.

Roles and responsibilities, usage of subcontractors, pricing model including rules for penalties and chargebacks should also be included. The underpinning contract may also contain a glossary of terms and commonly used expressions so that the contracting parties are on the same page. References to other documentation for example master service agreement must be made.

Introduction

In the rapidly changing business environment, nothing is as constant as change itself. Any business that fails to change whether it be in terms of its strategy, products and services or fails to respond to the external environment – the market, regulations etc. is soon left behind and perishes in no time. This is enough motivation to have a Change Management process at the centre of the business – this process is owned by the Change Manager.

Businesses and IT are linked closely. That is why ITIL® defines the Change Manager role and the Change Management process around him.

Change Manager roles may not exist in isolation. With IT rapidly moving into a DevOps model, the Change manager role has been blended in with other roles, and a lot of tools are available to make the change management process stronger. This means that a lot more roles in the organization need to be knowledgeable about change management, and not just the manager. In an Agile world, change is welcome (refer to the Agile Manifesto), however, the principles of change management are still intact.

The Q&A in the next section should be understood in the context of the change manager role and not in the context of an individual. With the increasing importance of organizations being able to adapt to change, these questions are significant across the board – developers, scrum master, product owner, project managers, and portfolio owners; and not just change managers. The questions focus on ITIL, so, are specific to the IT industry. Managing business changes is out-of-scope.

There is limited understanding of the change management process, as what we see on the ground (in most cases) is always an ‘adapted’ version, that suits the business model. This leads us to comparing processes followed ‘here’ versus those followed ‘there’. Through the Q&A in the next section, I have tried to bring up the often missed out fine print in an ITIL context.

1. What is your understanding of the Change Management process?

Organizations establish a Change Management process to be able to manage the situations that their businesses face – this could be competing products or solutions, changes in regulations e.g. safety standards and laws, changing demographics, consumer preferences, attempting to enter new markets or exit existing markets, changing technology and so on. The list can be enormous. 

Any of the situations presented above could pose significant risks to the business objectives – that could be making profits, increasing market share and revenues or even serving a market segment. No business owner would want any disruption to meeting any of the objectives above. Even if a disruption is inevitable, they would want to minimise the impact, so that it remains transparent to the consumers of the products or services and they keep getting the same quality or service levels. 

Therefore, it is essential for the businesses to put in place a mechanism to manage whatever is not in its control and at the same time be able to fulfil its business objectives, nevertheless. This mechanism is Change Management.

In ITIL, Change Management is a Service Transition process because it bridges the transition between the old and the new. The old is the baseline and the new is the modified services or added services. 

2. What are the steps in the Change Management process?Steps in the Change Management process3. What are the major responsibilities of a Change Manager?

The role of the Change Manager is key in the Change Management process. All RFCs should be directed, in the first place, to the Change Manager. He will then review the RFC documentation and check that all the needed information is present. If not, he needs to get back to the requestor to seek more information. He may also consult with the CAB members or any other member in the organization who can provide more information.

Once the change is recorded, the Change Manager is accountable for prioritizing it in line with the business goals and categorizing it appropriately. The categorization helps in the selection of the CAB (or the ECAB) members, and then he chairs the CAB (or ECAB) meetings. During the CAB meeting, he will seek the advice of the members of the advisory committee to be able to assess, authorize and schedule the changes.

When the change is being worked on, the Change Manager helps in removing impediments, especially when these are related to the impact or newly identified stakeholders.

Finally, once the change is implemented, he will evaluate that the business objectives have been met and the Return-on-Investment (ROI) has been achieved. He shall be analysing the change trends – like success rate, earned value, incident metrics etc. and regularly reports these and the progress of all the changes to the management.

In the capacity of the Change Management Process Owner, he should retrospectively monitor the effectiveness of the process and plan to introduce improvements whenever necessary.

4. How is the life of a Change Manager – easy or hard?

A Change Manager is one of the central roles in the IT landscape. He must make a fine balance between stability of a running system and introducing changes to it for improvement, fixing broken stuff etc.

The most important factor is the knowledge of the change manager – functional, system and operational. Knowledge is acquired through experience and self-education.

A Change Manager also must balance the needs and risks to the various stakeholders who are impacted by the change. He needs to ensure the converse as well – those that should not be impacted must be left alone. Different stakeholders will have different interests in the implementation (or the non-implementation) of the change. Sometimes these interests may go beyond the business and technical realms.

The Change Manager must continuously ask the question – “What if …?” This is not an easy question for the one who needs to answer. When the stakes are high, such questions may not just be unwelcome, but may cause a change of direction if there is no good answer. Imagine a critical change not being implemented because one of the impacted stakeholders hasn’t prepared a ‘rollback plan’.

The job of a Change Manager is tough – one not for the faint-hearted.

5. What exactly is a ‘rollback plan’? Why do we need it?

All changes made to the baseline code and configuration pass through stringent quality gates. Such changes may have been made based on the impact analysis done in as a part of the change management process. However, as one of the US Presidents mentioned in the 1950s – ‘Plans are useless, but planning is everything’, it is quite possible that the reality after the change is implemented turns out to be something very different from what was envisaged earlier. The question is – have we planned for this ‘possible different reality’? Enter the ‘rollback plan’.

The terminology is self-explanatory. Every change that needs to be implemented must have a rollback plan, i.e. what are the steps that need to be followed if the changes cannot be implemented as planned, or the system becomes unstable post-implementation. The changes would need to be backed out with a sequence of activities, that would have people responsible for carrying them out. The rigor is the same as when implementing the change – the only difference is the intention – you are taking the system back to the baseline. Because you are backing out from an implemented change, this is also called the ‘backout plan’.

Some rollbacks may need to happen as an emergency, e.g. implementation of a new firewall hardware results in cutting off the system from the supplier network, stalling the supply chain. Such a rollback may need to be handled via the ECAB.

Remember – for every implementation plan there needs to be a ‘backout plan’.

6. What is a CAB? What does it do?

The ‘CAB’ is an acronym for the ‘Change Advisory Board’. Many people think that it is ‘Change Approval Board’ – that is not right. It comprises of many members and the membership will typically be defined in the Change Management Policy. Since the CAB is an advisory board, the main purpose of having this organization is to get the right advice on matters related to changes – so membership must be chosen wisely as per the priorities of the organization. The chosen members must also be knowledgeable in their respective areas, so that accurate inputs can be provided in the form of the right advice.

The Change Manager chairs the CAB meetings and is obviously a permanent member of the CAB. Also mandatory is the Configuration and Release Manager. Other members may include – Service Desk Manager, Operations Manager, Applications Manager, Information Security officer, Incident Manager. In real life, most of these roles will participate in the CAB meetings like permanent and may only excuse themselves if there is no agenda pertaining to their role. However, such a situation is unlikely. Sometimes specific support analysts having deep knowledge on a particular topic, incident or functionality may participate upon invitation – with the sole purpose of providing the right advice.

The CAB is concerned only with changes, so most of the discussions may be around future changes – i.e. the Requests-for-Change (RFCs), what risks may be introduced as a result of these RFCs and also prioritizing these. Other topics may also include the past changes that are implemented, rolled back or cancelled. If any implemented change causes the system to be unstable and results in incidents, then these incidents also need to be discussed in the CAB.

7. Your colleague tells you that a change manager is the ‘change agent’ of the organisation. What would be your reaction? 

The Change Manager may be a ‘change agent’ of the organization, but that is not his primary job.

The Change Manager is responsible for authorizing and approving changes with advice from the Change Advisory Board (CAB). The Change Manager is the sole authority who can put his stamp of approval for a change to be implemented (or not to be implemented). Because of this responsibility, he will chair the CAB meetings.

Making the decision about a change to be implemented is not trivial, therefore, the person must be knowledgeable about the system, architecture landscape and the stakeholders. That is a broad spectrum. This also means that this person can also actually be a change agent for the organization. Help is at hand, in the form if the Change Advisory Board – which advises the Change Manager in matters related to changes.

The Change Manager is also the process owner for the Change Management process. Since processes may need to change themselves, the Change Manager must continuously validate the efficiency and the effectiveness of the steps in the change process. E.g. for standard changes, he may want to create a Standard Operating Procedure so that this stands pre-approved and the cycle time for change implementation reduces as the CAB is no longer required.

While changes may lead to disruption, the objective of the Change Manager is to minimise the disruption – through the effective use of process, expertise and tools. He should be approving only changes that are beneficial to the organisation and aligned to the business goals and the objectives.

8. What is the first step of dealing with a change? Is this mandatory?

A change is usually received in the format of a ‘Request for Change’ (RFC). Whenever an RFC is requested by the business, the first step is to ‘record the change’. This is primarily the job of the ‘initiator’ – the person who is requesting the change. This recording results in a document, that could be on paper, as well as in electronic media or even in an online collaborative tool (e.g. JIRA).

Recording, or documenting the change is an important and mandatory step, as this document is then passed onto the Change Management organization. The Change Management organization is headed by the Change Manager, who may request further information on the RFC if the documentation is inadequate, incomplete or ambiguous. If this is back-and-forth communication is done multiple times, it results in a loss of valuable lead time for change implementation – therefore, the documentation must be accurate. There will always be some back-and-forth communication, but the objective of properly recording the change is to minimise these transactions.

The Change Manager is accountable and responsible for reviewing the change record and ensuring that it contains enough information.

9. You are the Change Manager. You return from a 2-week vacation and realise that a few changes have been implemented 'informally'. Things are normal post-implementation. Should you care?

Unfortunately, there is nothing called an ‘informal change management’. Changes, no matter if they are small or large, have less impact or more, must pass through the change management process.

‘Standard’ changes follow time-tested and time-trusted steps and the outcome is known – however this does not mean that the steps are bypassed. It only means that efforts need not be spent for analysis and the approval, as these are pre-approved.

‘Emergency’ changes that require immediate implementation must still pass through the Emergency-CAB (ECAB) – a subset of the CAB. It does not mean that the immediate changes can be implemented ‘at will’ or ‘as per management decision’. These are common excuses.

So, as a Change Manager, once you hear about the above situation, you should be very alarmed and take action to ensure that the change management process is followed in retrospect, and the necessary documentation created for future reference. In the longer term, you should aim to make changes and strengthen the Change Management process in the organization. The situation also indicates there may be a lack of awareness regarding Change Management in the organization, so some trainings may also be useful to educate people regarding this topic and the perils of not following it.

10. As a Change Manager, how will you ensure that the RFC documentation has all the relevant information?

For every change raised, the Change Manager must do a thorough review together with the CAB. During this review he must verify that the RFC documentation provides the following information at a bare minimum:

  1. Who RAISED the change? – it must be clear who is raising the change. In an organization, this would be a department or a role. Note that this role may or may not be an ITIL role, it could well be the business, e.g. the Web Marketing department.
  2. What is the REASON for raising the change? – almost all changes are raised for a business reason. Obviously, the person who raises the change knows the reason best. Sometimes changes may be raised for fixing earlier failed changes.
  3. What is the RETURN required for the change? – after the change is implemented, it must be useful to the business, there must be some benefit coming out of the implementation. E.g. increased revenue, more hits on a webpage, faster checkout times at a retail supermarket or increased stability of a system.
  4. What are the RISKS involved in the change? – while implementing changes result is some benefits, there is always a chance that something else will break. Proper impact analysis for the change must be done – a rollback plan must be created, and enough regression testing must be factored in.
  5. What RESOURCES are required to deliver the change? – this is about who and what will be required to implement the change. This may include people with the right skills, hardware, software and even facilities. E.g. to deliver a large IT upgrade program, you may need software engineers, devices, servers, licenses and new office space.
  6. Who is RESPONSIBLE for the build, test and implementation? – these are the people who would be working to implement the change – typically many of the other ITIL roles will be involved together with the project organization for larger projects. Developers, support analysts, project leads, and managers are included.
  7. What is the RELATIONSHIP with other changes? – this is about the cross impact of changes that are work-in-progress. E.g. a change that is about migrating Active Directory services may have an impact on another change related to corporate email software upgrade.

The above are also called the 7 R’s of Change Management.

11. What is the role of the CAB in an emergency change? 

In the context of an emergency change, we have a special CAB – called the Emergency CAB (ECAB). The ECAB membership is a subset of the more general CAB, and depends on the nature and impact of the emergency change. Because we are in a emergency, time is of the essence, and therefore a pre-defined delegation mechanism may have to be followed if certain members are not present. This definition must be included in the Change Management policy of the organization. 

To explain the need for this delegation with an example – suppose an emergency change needs a new IP to be whitelisted for changes to take effect, but the approver is unavailable, let us say that he is on a plane from Sydney to Dubai (that takes about 15 hours). A wrong approval may mean endangering the security of the infrastructure. To avoid such a situation, a delegate for the approving authority may have been defined, who can make this approval on behalf of the main person. 

Note that not all emergency changes require an ECAB. There may be emergency changes where a ‘temporary’ operating procedure may have been set till a known issue is fixed (which may be underway and will be addressed via CAB). In this case the relevant parties will execute the temporary operating procedure. E.g., the operations team kills a process daemon consuming too much memory. 

To keep the ECAB process lean, some documentation may be done retrospectively.

12. If there are too many emergency CRs coming, what may have possibly gone wrong?

Too many emergency changes may mean that in the recent past, some changes have been implemented without proper impact analysis. This, in turn, could mean there is a shortage of knowledge in the CAB. Either the right people have not been invited, or the Change Manager may not have the right skills.

Sometimes emergency CRs are a result of poorly defined processes – perhaps incidents are being pushed down as changes in the absence of a robust incident management process and without any defined problem management process. It is also possible that the configuration items (hardware, services etc.) are not assigned severities – this is the job of the configuration manager and should ideally be in the Configuration Management Database (CMDB). This lack of definition means that no one understands whether the impacted system is critical and is introducing a change just to mitigate the unknown risk. E.g. malfunction of a service that produces a monthly report may not need to be handled via emergency change process simply because you have enough time to make a permanent fix. Just raise an incident and move on for now.

Emergency changes may also be the result of wrongly prioritizing less important changes and missing on implementing the critical ones.

Emergency changes causes alarm bells and draws much attention, just like a major or a Severity one incident. Therefore, the Change Manager must educate the organization about it and when to invoke this and resist the misuse of this process.

13. How can you judge whether the Change Management process is robust and effective?

A good Change Management process will ensure that the percentage of successful changes are high. A robust process requires rigor, but that should not slow down the speed of implementing the change requests. The throughput of change requests being implemented should be high. E.g. too much of discussion and delays in decision making at the CAB may result in piling up of RFCs waiting to be implemented – this should not happen if a good Change Management process in place. Suitable delegates must be available for situations where time is of the essence, e.g. for ECAB. 

In terms of changes implemented, there should be very few that require any backing out or even remediation. Both processes consume valuable resources and sometimes specialised skills that will be charged at a premium because of the tight situation. Additional incident management capacity will usually be planned and budgeted when critical changes are implemented or a big release happens – however, a good Change Management process will ensure that the incident volumes remain well under control. 

Most importantly, changes should either deliver value by implementing improvements, or by preventing and removing the negative effects of some existing bug. This value can be measured in dollar value – increase of revenues, increasing market size, reducing downtime etc.

14. What is a Remediation Plan?

We discuss about the ‘backout’ or the ‘rollback’ plan in another question. That was about having a step-by-step guidance of how to return the system to its baseline configuration.

Now, consider the situation where an implemented change has caused some records in the database to be corrupted with unescaped special characters. This has been reported as an incident after about ten thousand transactions have already happened. You can roll back the changes, but then some essential desired functionality will be lost, and the already corrupted records still need to be taken care of.

To manage such situations, you will need to have a detailed plan on the adverse effects of each of the impacts. You will need to possibly assign extra temporary incident management capacity, so that a larger volume of incidents may be handled, and the lights remain on. Some scripts may have to be written to rectify the corrupt records directly, and for this you may need someone skilled at scripting. All this and many other things that you would possibly need to ‘douse the fire’ is what is called the remediation plan.

Remediation plans have well-defined triggers that should be agreed at the CAB. Unlike backout or rollback plans, remediation plans are not about returning to the baseline, but about ‘containing’ the damages that occur due to change implementation. There may be situations where backouts are impossible – like automatic installation of faulty security software in a hundred thousand strong software company. The best solution would be to create and push a patch to remediate the faulty installation.

Introduction

Service Transition is one of the process groups of the ITIL® Lifecycle that contains processes required to take a service from the design board into regular operations. Hence this process group contains processes such as change evaluation and management, application development, validation and testing, asset management and knowledge management.

The Transition Manager is not a role that is defined by ITIL but is nevertheless a very important role that is usually found in organizations that are committed to changing for the better.

People in the IT industry who want to focus on implementing change in their organizations may find the role of the transition manager to be exciting. Also, people who are already a part of the change organization (typically – a change manager), may find the transition manager role the next step they can step up to. Vendor or Supplier managers are closely linked with most of the transitions, so that is another parallel role from which people may consider switching over to the transition manager role. The term ‘transition’ is generic, but the role of transition manager is more specific to transitions of a service across the ITIL phases, across suppliers and between the business and the IT service providers.

Often, a transition manager will span across multiple roles such as change manager, supplier manager and now, DevOps. This role is more suited for experienced professionals, typically with ten plus years of experience.

1. What is a Transition Manager supposed to do?

A transition manager is a key role in the IT services organization.

His primary job is to ensure that the implemented and validated services are handed over to the service operations teams – which could be the Application Management team and / or the Technical team. While it sounds simplistic, this handover is critical to the success of the service provisioning. The criticality increases depending on the importance of the system to the business and also the organizations that have designed / developed the services and that which will be responsible for the operations. Since the dynamics of designing a service are different from developing it and different from operating it, the transition manager plays a vital role in ensuring that there are no gaps as the system transitions from a concept stage to something that can provide a service.

The transition manager achieves this by working with his stakeholders on building up the right levels of knowledge in the team – both technical know-how and system-specific knowledge. He would then put in place a development methodology and/or a delivery model for the services to be delivered. Such a framework ensures that a boundary is set for the party transitioned to. The boundaries may be quantified using service level agreements (SLAs) and/or key performance indicators (KPIs). While the transition manager may not calibrate this himself, his main contribution is to ensure these are in place before handover.

2. Who are the primary stakeholders for the Transition Manager?

In the ITIL lifecycle, the Transition phase sits between the Service Design and the Service Operations phases. This means that the transition manager will have stakeholders that are process owners and process managers of both these phases, in addition to the stakeholders that are a part of the transition phase itself. Fig 8.1 depicts the stakeholders for the Transition Manager across the other phases of the ITIL lifecycle; shorter distances representing more involvement and longer distances imply lesser involvement. Across all the interactions, the transition manager endeavours to ensure that the transition process is smooth.

primary stakeholders for the Transition Manager

3. What are the possible transitions in the industry?

In the IT services industry, various kinds of transition happen for various reasons. The most common is the ‘lift-and-shift’ model, where the IT services experience a change of provider, e.g. an organization X may no longer want to use the data storage services being provided by vendor A and decide to switch over to vendor B.

There can also be a situation where the organization develops capability over a period of time, and in-sources the services. E.g. organization X develops capability in a new technology T over a year and no longer requires the services of vendor C in the next financial year. The reverse may also happen. E.g. the organization X uses legacy technology as on date and wants to move to a new-age technology, for which it outsources the upgrade activity, followed by warranty support of a few months.

The last one also leads us to another possibility – that of a ‘fix & mix’ model. E.g. older systems that are built on evolving technology tend to accumulate technical debt. E.g. organization X has a system S that needs to be rid of technical debt. They may float an open tender to which a service provider may respond with a proposal for ‘supporting and fixing the technical debt’.

4. Why are transitions necessary in the first place?

There are many motivations for effecting a transition.

One of the most common reasons is to reduce the total cost of ownership for IT services. After an

IT system is commissioned into service (i.e. system ‘goes live’), the service consumer would expect the system to stabilise over a period and therefore the costs for managing the services the system provides should also reduce. However, this may not be the reality in many cases due to a poor design, technical debt etc. and therefore the business may look at on-boarding a different service provider. Sometimes, this could mean only a change of personnel, or also include infrastructure as well. E.g. Organization X decides to change its provider of data backup services from provider A to service provider B – this will involve a transition of the infrastructure as well as the support personnel. Or, Organization X decides to change sourcing of server administration services from vendor M to vendor N for its on-premise server farm – this is an example of change in personnel only. IT service providers compete fiercely for reducing costs in order to get business.

Often the business will also look at factors like providing customer experience – e.g. sourcing IT Service Desk services from vendors with good credentials in customer satisfaction. In a multi-vendor outsourcing setup, the business may want to consolidate all its IT services from a single provider, to enjoy economies of scale and reduce complexities of managing inter-company service level agreements. The reverse is also true, where the business may want to reduce the risk of depending on a single provider and outsource it to multiple vendors.

Some IT services may require niche skills, that may not be present in-house. This may require the transition of services to a provider who has the skills and expertise. In other cases, the business criticality of the system may be the reason for a transition.

5. What are the usual returns on investment (ROI) for a transition?

A transition is often treated as a project. It has a definite objective and must be completed in a finite amount of time. Like any other project, a transition project also represents an investment by the business, and therefore, must provide a return on this investment (ROI). The ROI will be measured most often, in monetary terms – e.g. a transition that costs $ 100K will have a ROI of 5 months if it results in a monthly monetary savings of $ 20K per month post-transition, provided the service levels do not degrade.

While that was straightforward, there are times when the primary objective is not saving costs. It may be to de-risk system stability – if a new system turns out to be highly unstable, the business may want to bring in a different vendor for supporting it as they want the system to stabilise sooner. Here, the increased stability (and consequently down-time reduction) would be the return that the business is looking for.

Critical systems that deal with business sensitive information, e.g. Business Intelligence systems, financial reporting etc., are less prone to be outsourced to external IT service providers. As a system matures, IT service provisioning may tend to be brought in-house – the return in this case being confidentiality.

Global businesses will also need the capability to scale on demand – this may require choosing of IT service providers that can serve the entire span – the most common example of this being the adoption of the cloud platforms such as AWS and Azure.

6. A new service associated with huge revenue and impacting thousands of users is being rolled out into operations, however, this is within the same organization. Do we need a transition manager or even have a transition process?

Yes.

Unfortunately, most of our organisations tend to operate in silos. This means that there is no common language and practices across departments within the same organization. Organizations that aspire to operate on ITIL best practices, realize that the leap from developing an IT service to when the service is being consumed by the end users is a big one. No matter how robust the design is and how skilled the developers were, there is always the potential of the services not producing the desired outcome and customer experience.

Because the development and support organisations represented in the ITIL life-cycle by the service design and the service operations phases do not necessarily operate as a single unit, there has to be a middle layer, an interface, that understands the process aspects of both sides and acts a glue to provide the necessary rationalization so that when the ownership of the service changes hands, nothing falls through the cracks.

The reputation of a service provider organisation rests on the ‘quality of service’ that the system provides, and therefore, utmost care must be taken to ensure that the service design and operations organisations work in a closed loop (much like the DevOps closed loop representation) so that no loose ends reach the service consumer.

7. What are the factors for consideration during an IT outsourcing transition?

There are many possible reasons behind a business taking a decision to outsource its IT services provisioning to an external vendor over doing it in-house. One of the primary reasons is to continue focusing on the core competence of the business. E.g. a traditional advertising agency that aspires to enter the digital advertising domain may still consider creative designing as its core competence. Therefore, it may outsource the technology aspects of digital such as Google Ad-words, site analytics etc. to an IT service provider that specialises in this.

Another reason could be to utilise the specialist services of an external IT service provider in a new domain, e.g. if a company has recently implemented an ERP solution, it may not have sufficient skills to service the ERP system internally, and this can be grown over a period, till when the business needs to have someone better skilled at it do it for them.

Sometimes, due the costs of labour in the country where the business exists, they may want to outsource the IT services provisioning to a vendor who is in another region where labour costs are lower. Businesses will also do this to derive time zone benefits, e.g. a longer window of support cover for an e-commerce platform that must be available 24X7. India has been in the forefront of the IT outsourcing industry for almost 2 decades now.

Finally, the total cost of ownership is another factor that drives outsourcing. E.g. countless companies have outsourced their infrastructure and platforms to cloud-based service providers such as Amazon, Google and Microsoft. This has reduced the IT capital investment and provides the flexibility to scale on demand.

8. How important is an assets database for a transition?

One of the key factors in achieving a great transition is a comprehensive transfer of knowledge. Knowledge is intangible and therefore, measuring the ‘levels of knowledge’ is always a difficult task. However, assets are tangible – code repositories, configuration files, technical and functional documentation, service level reports, monitoring dashboards are tangible items that are key to providing IT services. Infrastructure, code repositories, documentation and reports are all assets for the service – and a list of these are maintained in the assets database. 

Every transition must include the transfer of the assets at a detailed level – not just the existence of the asset, but also the interfaces (as a part of the Configuration Management Database – CMDB), the associated documentation, known errors and workarounds implemented, changes in progress (which means that there are assets being created or modified) and the service levels associated with the assets or groups of assets. This means that the assets database is critical to the completion of an effective transition.

9. What are the key activities during a transition?

Once the scope of a transition has been decided, there would typically a period when the receiving organisation would understand the services at a high-level. During this time, the receiver of the transition will understand the business objectives of the services, the expected business and IT service levels, assess the system stability and perform an estimation of the manpower required to service the system, functional and technical skills required, interfaces to the other systems and stakeholders involved. This initial phase is called the due diligence phase and information gathered will be used in the next phase – planning.

In the planning phase, the knowledge transfer activities will be planned, and handover dates finalised. Topics for the knowledge transfer, the subject matter experts to deliver it, the timings – all will be decided during the planning phase.

The next 2 phases often carried on in parallel are the knowledge transfer and the shadow phases. During these phases, knowledge will be transferred systematically to the receiving organization, and the latter will successively be able to demonstrate autonomy in being able to provide the services independently. Together, these two phases represent the maximum efforts being spent in making the transition.

The final phase is that of handover – where the incoming organization formally accepts the accountability for providing the IT services. Identified stakeholders will be informed and new communication channels will be ironed out.

10. Can you start a transition without the availability of the design documentation?

One of the most common risk issues highlighted during a transition is the absence of documentation, especially in an Agile setup where one of the values lays more importance on working software over comprehensive documentation. Under documentation, the most outdated is the design document, as subsequent iterations during service development have outdated what was written at the beginning, which means that the working software, or the ‘as-built’ system varies from what it was supposed to be, technically.

While this may sound alarming, we need to accept that evolution is a key to survival in today’s world, so this variation is reasonable, and in the best interests of the business. Therefore, we may need to start a transition even though the design documentation is outdated or absent. Of course there are ways to work around this apparent difficulty – by referring to the conversations in the user stories, having more dialogue with the development team, the testing and validation teams that have worked on building the service – and documenting them for future reference, or updating anything that exists. Establishing a traceability between the business objectives (that have not changed) and the working software is essential. This is a like reengineering the design document from the system that has been built, i.e. going from left-to-right in Fig.

Transition without the availability of the design documentation

Introduction

Nowadays, I often hear that the Release Manager role is non-existent due to the advent and rapid adoption of DevOps culture. This is a major misconception – the role of the Release Manager has shifted from the traditional; like many other things in the rapidly changing technology scenario.

The Release Manager role is an important role in the sense that it enables all the hard work put in by the development teams, analysts and architects, seethe light of the day, i.e. they help make the project go live. Without an effective Release Manager in place, business objectives will never be met, no matter how skilled your development team is.

In the Q&A section below, the ‘traditional’ release management concepts have been blended with the DevOps philosophies, ensuring that the reader gets ample exposure to understanding the role better.

1. What is release and deployment management all about?

The purpose of release and deployment management is to define and agree to release and deployment plans with the relevant stakeholders and customers. During the release and deployment activities, release managers need to ensure that the changes, being released and deployed, are properly managed. They also record and manage any risks and issues that are related to the new service or the changed service which has been implemented.

Before the release happens, the release and deployment management personnel must ensure that the assets that are being released have been recorded accurately in the configuration management system and they are compatible with each other as a single unit, so that they can produce the desired results when they are out in production. If there are any adverse effects of releasing a package, a backout plan must be devised and executed. The backout plan is also called the rollback plan and must be prepared before the release happens.

Release and deployment management is also responsible to ensure that customers and users can use the newly deployed or modified system as expected. The operations and support staff that will maintain the new system in the future must also be able to do so. Both of these require knowledge transfer from the implementation team.

After every release happens, there is usually a period of warranty. The release and deployment team are available during this period to monitor and take appropriate action, if any adverse impact of the new or modified components is noticed. During this period, the release and deployment team will take help from the change management and development teams to be able to make tweaks without impacting the essential functionality.

2. What is the unit of release?

Release in IT refers to a portion of a service or an IT infrastructure that is released for consumption by the users and customers. Release unit will contain multiple components and configurations that have been modified, as a part of Change management, and will provide some additional or different functionality of the system. Release unit may contain one or many packages depending on the extent of the impact. If the change being implemented spans across multiple technologies, there may be different technology-wise packages that implement this change.

Large changes are usually implemented in smaller chunks over time. Each of these changes may also be called a unit of release. The timing and functionality delivered through each unit is determined as a part of release planning. Release unit levels must be appropriate and will depend on what is being released. There is usually a release policy in every organisation. e.g., for a website, the release level maybe tied with the UI or web pages. Another way to design release levels would be to segregate by use case e.g.- search a product functionality.

While determining release units, one of the most important factors to consider is the amount of change required to deploy it. The amount of resources and time needed to build, test, distribute and implement the unit must also be considered. The complexity of the interface between the new unit and the rest of the unchanged system is also a factor that could determine what is defined as a unit. There could also be capacity considerations, business considerations such as business peak seasons that determine what could possibly be release unit.

3. What is the release and deployment model?

In the service design phase, the most suitable release and deployment models will be selected. This will include the approach, mechanism, processes and resources required to build and deploy the release in a timely manner and within budget. It must be kept in mind that after the initial release, all the successive releases will be on a system that is already live. Adequate care has to be taken not to disrupt the services that are already being consumed by the users and customers. This requires very detailed and careful planning, and this is where, having a release and deployment model helps to standardise the way changes are implemented on a live system without causing any disruption.

Release and deployment model will define the release structure for building a release package and the target environments. It will also define the exit and entry criteria including deliverables that must be delivered and those that can be delivered at a later release. Since every release will add or modify the functionality, documentation must also be released together with the configuration items. This is especially important if services are being consumed by users who are not tech savvy, like the booking clerk at the railway reservation counter.

Releases pass through the service transition phase via multiple environments such as development, testing, integration, staging and then finally into production. The deployment model should contain details on not only the physical and logical environments, but also show the activities that can be performed at each level or environment. Each environment should add value to the product being released.

4. What are the factors that you will consider when designing release and deployment model?

Release and deployment model must consider including activities that verify the compliance of a release with the related standards, enterprise architecture and any other user-centric factors like usability. There should also be activities to ensure the integrity of hardware and software. Sufficient regression testing must be conducted, and results of this test must be verified. Whenever possible, the delivery distribution, installation build and configuration steps should be automated. Many activities for the release will be same across many releases and therefore costly manual labour should be avoided whenever possible. This may require some investment.

Managing software licences should also be included when designing the release and deployment model. Licences that have expired or need to be renewed must also follow this model. Rollback strategies for the creation of a backup plan irrespective of the size of the release unit must be included as a mandatory task in the model. Documentation is another important aspect that must be included in the model. Documentation related to the release and deployment steps, target environment, troubleshooting guides, release notes must accompany every release that is being made irrespective of the size of the release unit.

The existence of a release and deployment model is important for such businesses that frequently go through change and make releases.

5. What does a build manager do?

A build manager performs the release packaging and making of the build, establishes the final release configuration which includes the software, hardware, infrastructure and all the associated information related to this and the knowledge that must be passed on to the operations team. He/she will then build the final release delivery and conduct smoke test before declaring the release to be usable in the environment. Build activities are carried out by the build manager for every environment i.e. test, integration, staging and production.

No release is likely to be perfect. Therefore, the build manager must also document the known errors that are being introduced together with the release. E.g., when deploying a build into the staging environment, he/she may declare that there are five low priority bugs that are undergoing fix by the development team and in the next build, these will be rectified. If, due to time pressure, a build must be released even with blockers. The bugs will be documented in the release notes and the build manager is also responsible for providing a workaround as to how to live with the bugs till the time they get fixed.

When the build is finally released into production and handed over to the operations team, the build manager provides inputs to the formal sign-off and handover process. Throughout the process, the build manager must interface with the other process owners such as information security, testing, change management, service asset and configuration management, capacity management, availability management, incident management and quality Management.

6. In the era of DevOps do you really need to have Release Managers?

DevOps is the streamlining of the activities surrounding IT solution development (dev) and IT operations (ops).

Release management entails planning, scheduling, and controlling a software build through different stages and environments.

In many cases, release managers are the gatekeepers of the change management process, ensuring that production deployments are well orchestrated and follow all the necessary steps for proper visibility and approvals are obtained throughout the process.

As companies look to adopt a DevOps philosophy, the role of release management must shift as well. In DevOps, teams take control over production deployments. The team is not just focused on creating working code, but also on the infrastructure, network, and other implementation items necessary to get the code into production. By having this accountability, teams will create code with higher quality, making production systems more reliable and maintainable.

Having change orders or some record of change is still needed in DevOps. These are not only needed in regulated environments to show traceability from code to deployment, but these also shed light on release volumes and time-to-market metrics, which are core to measuring DevOps maturity. However, there will be some change in the information captured in change orders. There will no longer be a need to track implementation or back-out plans as part of change orders. One just needs to track the application, its components, and its promotion schedule.

The key to maintaining these change orders is automation. Continuous integration pipeline should have the ability to communicate with change order system such that self-documentation can occur.

One of the biggest opportunities with release management being enabled via a tool is that it can integrate audit and security requirements into the process. Rather than doing post-release audits, one can use a suitable tool to integrate these controls as part of the pipeline.

As we can see from the above, release management is still critical in a DevOps environment. The function must drive the shift from a service-based organization to an engineering group that enables frictionless flow to production.

7. As a Release Manager, what are the DevOps terminology you know about?

  • DevOps refers to a dynamic relationship between software development and other ITdepartments like operations, security, testing. The goal of DevOps is to change and improve the relationship by emphasizing better communication and collaboration between the two. It aims at establishing a culture and environment where software releases can happen frequently and more reliably. The DevOps allows teams to react to business opportunities in a timely manner. For example, Amazon Web Services deploys software updates every 11.7 seconds.
  • Continuous Integration (CI) is a development practice wherein developers integrate their code into a shared repository regularly and frequently. An automated build – typically via a Continuous Integration system – verifies the code and allows errors to be detected early.
  • Continuous testing (CT) is a process that uses automated testing in order to gain immediate feedback on the business risks associated with a software release candidate. Automated testing is an integral part of CT. Automated testing refers to the detection process for software issues and defect prevention, whereas Continuous Testing addresses the wider challenge of improving the effectiveness of these detection sensors.
  • Continuous monitoring is the process and technology used to detect compliance and risk / issues associated with an organization’s operational environment. The operational environment consists of people, processes, and systems working together to support efficient and effective operations.
  • Continuous delivery describes the ability to get new features, bug fixes, configuration changesinto the hands of users quickly and in a reliable, replicable way. Continuous delivery is focused on automating delivery by using tools to execute various processes.

8. Have you heard of Scrum Release planning? What is it?

A very high-level plan for multiple Sprints (e.g. three to twelve iteration) is created during the release planning. It acts as a guideline that reflects expectations about which features will be implemented and when they are completed. It also serves as a base to monitor progress within the project. Releases can be intermediate deliveries done during the project or can be final delivery at the end.

To create a release plan, the following things must be available:

  • A prioritized and estimated Scrum Product Backlog
  • Estimated velocity of the Scrum Team
  • Conditions of satisfaction (goals for the schedule, scope, resources)

Depending on the type of project (feature-driven or schedule-driven) the release plan can be created in different ways:

  • If the project is feature-driven, the sum of all features within a release can be divided by the expected velocity. This will then result in the number of sprints needed to complete the requested functionality.
  • If the project is schedule-driven, one can simply multiply the velocity by the number of sprints, and this will give the total work that can be completed within the given timeline.

9. What are the sub-processes of ITIL® release management?

Release Management Support - provides guidelines and support for the deployment of Releases.

  • Release Planning –assigns authorized changes to releases and defines the scope and content of releases. A schedule for building, testing and deploying the release will be developed in the release planning process.
  • Release Build –issues all necessary work orders and purchase requests so that release component are either bought from outside vendors or developed/ customized in-house. At the end of this process, all required release components are ready to enter the testing phase.
  • Release Deployment –deploys the release components into the live environment. This process is also responsible for training end-users and operating staff and circulating information/ documentation on the newly deployed release or the services it supports (e.g. release notes)
  • Early Life Support –resolves operational issues quickly during an initial period after release deployment and to remove any remaining errors or deficiencies. This is also known as the hyper-care or the warranty phase.
  • Release Closure –formally closes a release after verifying whether activity logs and configuration management system contents are up to date.

In software engineering, a freeze is a period in the development process during which making changes to the source code or related resources become stricter or are prohibited. A freeze helps move the project forward, towards a release or the end of an iteration by reducing the scale or frequency of changes and may be used to help meet a product roadmap e.g. attaining a Minimum Viable Product (MVP).

The exact rules depend on the type of freeze and the development process in use; e.g., rules may include only allowing changes which fix bugs, or allowing changes only after thorough review by other members of the development team.

Common types of freezes are:

Specification freeze, in which the parties involved decide not to add any new requirement, specification, or feature to the feature list of a software project, in order to begin coding work.

Feature freeze, in which all work on adding new features is suspended, shifting the effort towards fixing bugs and improving the user experience. A feature freeze helps improve the program's stability and frees up manpower to work on the MVP. E.g., user interface feature freeze means no more features will be permitted to the user interface portion of the code; bugs can still be fixed.

Code freeze, in which no changes are permitted to a portion or the entirety of the program source code. Particularly in large software systems, any change to the source code may have unintended consequences, like introducing new bugs; these are often employed in the final stages of development, when a particular release or iteration is being tested, but may also be used to prevent changes to one portion of a program while another is undergoing development. Code freeze minimizes regression effects.

Introduction

The Incident Manager role is one of the most critical roles in the ITIL® world. When break fix service is provided by an IT service provider, the Incident Management process becomes central to the service provisioning. Incidents are governed by Service Level Agreements (SLAs) and failure to meet the SLAs are likely to result in financial penalty being imposed on the service provider. Proper management of incidents is key to successfully delivering the break-fix services. Since the Incident Manager owns the incident Management process, the day to day responsibility of ensuring smooth service becomes key to fulfilment of the contract. 

The Incident Manager role can be a good stepping-stone for support analysts to get into higher IT Service Management roles. An Incident Manager gets good exposure to the relationship between the day-to-day operations and service level management. Incident managers need to be very thorough with the Incident Management process being followed due to the continuous time pressure arising out of the SLAs in force.

1. What is an incident? How does it relate to any other event?

Events happen all the while within an IT system. Events are defined as a detectable change of state which has significance for the IT management or for the delivery of the IT service. Events are generally notifications which are created by the IT services monitoring tools or the system itself (i.e. the Configuration Item (CI). E.g. when disk space reaches a certain percentage (the event), an alert may be generated, but an incident may not be generated as there is still some capacity remaining. However, when disk space becomes 100% full (another event) and no more data can be written, an incident will be generated because the normal functioning of writing to the disk can no longer happen. Incidents can also be reported by users to the Service Desk or through a self-help tool. Sophisticated incident management systems also lend themselves to users sending an email to a predefined email-id. Incidents may also be logged by the technical staff, the Service Desk, the first line and the second lines of support.

As you can see from the above scenario, incidents can be defined as unplanned interruptions to an IT service or reduction in the quality of service. However, when a CI has failed, e.g. failure of one of the components of a continuous availability (CA) cluster that does not result in disruption of the service but still needs to be fixed by logging an incident (usually generated automatically by the event).

The impact of an incident may be reduced by implementing a workaround e.g. restarting of a failed CI. All incidents are subject to service level agreements (SLA). Response and resolution SLAs involve timescales that are predefined per priority of the incident and failure to fulfil them results in escalation, customer dissatisfaction and consequently, financial penalty.

Major incidents are a special kind of incident that have shorter time scales and greater urgency to be resolved.

2. You are the IT Delivery Manager for a project where the systems are not very stable. Due to cost pressures, you are contemplating having the same person to look after Incident and Problem Management. Will this work?

To answer this question, let us understand the role of the Incident Manager first. An incident is an event that causes disruption to the normal functioning of the system. The responsibility of the incident manager is to fix this incident or provide a workaround that can make the system work as closely possible as desired. The Incident Manager must fulfill the service level agreements (SLA) that have been agreed to. It does not really matter whether a deep investigation as to why the incident happened has been carried out or not. Once the resolution or the workaround is put in place the system is expected to start functioning as normal or near normal.

To find out what caused the incident, a deep dive into the symptoms and context in which the incident occurred is necessary. The Problem Manager's role is that of an investigator, and he must involve other groups such as application management, operations management, the product vendor etc. The problem manager must also analyze the data that can be requested from the Service Desk. All this will typically take much longer, and SLA timescales may not really have any room for it.

A good Incident Manager is expected to be SLA focused and wouldn't care about finding the root cause of the incident, on the other hand a good Problem Manager will try to get to the root cause of the incident no matter how much time and effort it takes. An Incident Manager’s responsibility is to make the system work again quickly after the incident occurrence; a Problem Manager’s responsibility is to prevent this incident from happening again in the future.

Considering the above discussion, it is evident that the working of the Incident Manager and the Problem Manager roles are nearly in opposite directions and therefore combining of the incident manager and the problem manager roles is not at all a great idea. At the same time, note that they have one common goal – to provide a smooth service.

3. What are the responsibilities of an Incident Manager?

The Incident Manager is the owner for the Incident Management process. As process owner, he is responsible for monitoring the effectiveness and compliance to the Incident Management process. He is also responsible for seeking suggestions for improvement of the process in the long run. 

On a day-to-day basis, the Incident Manager manages the work of the first and second line of support staff. He may be assigning the incidents based on the individual workload and monitor the number of incidents that are flowing through the process. E.g. at any given time, he should be aware of how many incidents are being worked on by the team, how many incidents are queued up with his staff or waiting for user inputs etc. He also needs to do a quality check on the incidents – e.g. whether the incident has been put on-hold due to the right reasons, whether the solution has been adequately documented etc.

The Incident Manager is also responsible for managing the Incident Management tool and should possibly have in-depth operating knowledge of the tool. Since the Incident Management process will be bound by SLA, he needs to ensure that the support staff have been well educated to use the tool effectively. Improper use of the tool may reflect poorly on the service levels.

When major incidents occur, the Incident manager acts as a pivotal point around whom all the other groups – support staff, Service Desk, application management, technical management, leadership team and the impacted stakeholders revolve. He is accountable for ensuring the major incident process is followed, key people are kept informed, and a post-facto major incident review process happens.

The incident manager’s role is extremely dynamic – most of the time he will be on his toes.

4. Have you heard about first, second and third lines of support?

These are different roles in the Incident Management process.

The first line is usually the Service Desk. They act as the first point of communication with the users. They can be contacted via raising tickets on a self-help ticketing tool, over the phone, online chat or via e-mail. Being the first line, they are responsible for logging all the incident and service request details accurately, categorizing them and assigning the tickets to the right queue or groups (second line). Often, they are equipped with information relevant for doing an initial investigation and diagnosis of the issue being reported, e.g. when a user reports a browser related issue, they may guide the user step-by-step to check the proxy settings. If settings are found to be not as expected they will guide the user to enter the desired details. Only if this does not resolve the issue, an incident will be logged and assigned to the second line of support.

The second line of support will comprise of staff that may have staff more technical than the Service Desk but are still not the technical or application specialists. If the team is big enough, there may be a certain number of more technical or functional staff. They are not responsible for communicating with the end-users, so they can stay more focused on resolving the incident.

The third line are the specialists, and they have much deeper technical or functional knowledge than the second line and will possibly be organized into their own functions or departments such as server administrators, network support, active directory support etc. An incident may require involving multiple third line groups for resolution.

5. Describe the incident lifecycle.

Every incident goes through the following lifecycle:

  • Identification: First the occurrence of the incident must be identified – this can happen via the event monitoring framework or through the reporting by a user via web, phone call or an e-mail to technical support.
  • Logging: All incidents must be logged with all the necessary details, including the timestamp irrespective of the channel by which it arrived. It should be the single source of information about the issue being reported.
  • Categorization and Prioritization: Categorization refers to putting the incident in the right bucket so that it may be assigned to the correct resolver group. Categorization may happen in multiple layers e.g. hardware issues may be sub-categorized into 1 – server issues or 2 – network issues; network issues may be further sub-categorized into 1 – LAN issues or 2 – ISP issues; LAN issues may even further be subcategorized into 1 – WiFi issues or 2 – Cable issues. Prioritization is based on the impact and the urgency of the incident and helps in taking a decision as to which incident will receive preference to be resolved.
  • Initial diagnosis: This is usually done by the Service Desk, where the analyst will follow some specified steps to understand more about the incident and at the same time trying to resolve the incident on first call. If the incident cannot be resolved at this stage, it will be assigned to the second line of support.
  • Investigation and diagnosis: The support groups that are dealing with the incident will investigate and diagnose what has gone wrong. All the steps being carried out should be documented within the incident in the tool for ready reference. This step may happen in parts across different resolver groups.
  • Resolution and recovery: When a suitable workaround or fix are identified, this will be applied and tested for full recovery. The recovery process may occur in several steps and will count towards the resolution SLA for the incident. The group performing these activities will mark the incident as ‘resolved’.
  • Incident escalation: An incident may be escalated when the quality of resolution is poor or it has breached the timelines for response and/or resolution. This is called vertical escalation and will typically be from the Service Desk → Service Desk manager → service level manager → IT delivery manager → business relationship manager. An incident may also be escalated across the incident management roles, i.e. Service Desk → Level 2 Support groups → Level 3 support groups → external vendor / product vendor / support provider. This is called functional escalation.
  • Incident Closure: Incident closure will be done by the Service Desk, irrespective of the channel by which the incident was identified. An incident can be closed only upon user confirmation, or after a pre-defined amount of elapsed time after resolution.

6. What are the differences between ‘prioritization’ and ‘categorization’?

  • Categorization refers to putting the incident in the right bucket so that it may be assigned to the correct resolver group. Categorization may happen in multiple layers e.g. hardware issues may be sub-categorized into 1 – server issues or 2 – network issues; network issues may be further sub-categorized into 1 – LAN issues or 2 – ISP issues; LAN issues may even further be subcategorized into 1 – WiFi issues or 2 – Cable issues.

Categorization may be aligned to the assets hierarchy in the Configuration Management Database (CMDB). Proper categorization helps the support group to shortlist and zero in on the potential source of the incident. Having the correct categorization hierarchy helps to free up more SLA time for the investigation and diagnosis of the incident. Since categorization will mostly be done by the Service Desk, they should be provided with the relevant guidance for correctness.

  • Prioritization is based on the impact and the urgency of the incident; higher the impact of the incident, higher will be the priority assigned; if the incident needs to be resolved with great urgency, the priority is higher. Assigning a priority helps in taking a decision as to which incident will receive preference to be resolved. E.g. – the IT service level manager is on his way to meet the customer but is unable to access the service dashboard due to some issues with his VPN profile. An incident logged for this will be very urgent, but the impact is limited to only one individual – the service level manager. Consider another case where the user blog on a web portal is no longer working and users worldwide are unable to enter their comments and reviews about the company products for half a day. Blogs are not the most important bit, so the incident may not be urgent; but because users are spread across the world, the impact is very high. In both the cases, therefore, the incident priority will be high.

7. When does an incident get escalated? What do you mean by escalation?

Any incident is first logged with the Service Desk. In case the Service Desk is not able to resolve it within the agreed timeframe or realizes that resolution requires deeper expertise, they will escalate it to the next level. Similarly, when the second level of support realizes that they will require more in-depth expertise from the application or the operations management team, they will escalate it to the third line of support. This is called functional escalation. The levels of functional escalation may vary from project to project. If the project is using a commercially available software product, there may be incidents that require the intervention of the product team, which is typically an external organisation. This is also a functional escalation. In this case however there may be Operational Level Agreements (OLA) or Underpinning Contracts (UC) with the external support group.

There is yet another kind of escalation called hierarchical escalation. If incidents have high priority e.g. Priority 1 incidents, then the IT managers must be informed regardless of whether the time to resolve it has elapsed or not, regardless of whether the group with which the incident is has the capability to resolve it or not. Priority 1 incidents typically arise when services are unavailable causing high customer dissatisfaction; IT managers must therefore know about this right from the time the incident is logged. IT managers may respond to such escalations by assigning additional members, calling in subject matter experts early in the incident lifecycle. Hierarchical escalation can also be triggered by the user or the customers i.e. the person who logged the incident. One of the reasons for doing this is that the user is not satisfied with the resolution provided. Such an incident resolution is called as ‘not first-time-right’.

The number of levels, time scales and quality of resolution requirements for both functional and hierarchic escalations needs to be agreed and SLA targets defined in the contract. Usually all major support tools like Remedy, Service Now etc. include automatic escalation management functionality that can be configured by product experts based on need.

8. Who owns an Incident?

Incident ownership always remains with the Service Desk. Regardless of where the incident is in its life cycle, its priority, whether it has been hierarchically escalated or functionally escalated, the ownership will always remain with the Service Desk. The Incident Manager owns the Incident Management process, but the Service Desk owns the incident.

The Service Desk remains responsible for tracking the progress of the incident, keeping users informed about the status of resolution of the incident and ultimately ensure that the incident has been closed.

When an incident is being resolved by another support group e.g. the second or the third level of support, there is need for an effective communication mechanism to ensure that the Service Desk is kept updated on the progress being made on the resolution of the incident. This is usually achieved through the Incident Management tool by updating fields such as ‘solution description’. Often, many support groups will update this field only when the incident has been finally resolved leading to internal miscommunication. Because the Service Desk does not know the status, they are not able to provide the correct status of resolution back to the user or the customer who logged the incident and may ultimately result in customer dissatisfaction.

For hierarchically escalated or high priority incidents the Service Desk acts as a communication hub that the Incident Manager must appropriately keep informed and leverage to ensure that communication is made horizontally, vertically and even externally to the customer. The Incident Manager must maintain a constructive working relationship with the Service Desk manager and the Service Desk.

9. What are the typical metrics for an Incident?

Incident Management reports are produced by the Incident Manager. Since the Service Desk owns every incident, the Incident Manager must prepare this report in close collaboration with the Service Desk and the support groups that are handling the incidents; usually quantitative data is provided by the Service Desk and qualitative data (e.g. related to functional and technical aspects) will be provided by the support groups. Many modern tools for Incident Management provide automatic dashboards to provide to senior IT Service Management a real-time view into the status of the Incident Management process. Some important metrics for incidents are as follows:

  • Total number of incidents reported in a particular time frame
  • Number of incidents resolved and number of incidents backlog (awaiting resolution)
  • Meantime to Resolve (MTTR) – this will usually be grouped by incident severity and impact
  • Percentage of incidents resolved within agreed service level agreements
  • Efforts and/or cost per incident
  • Percentage of incidents ‘First-Time-Right’ (FTR), i.e. correct resolution was provided.
  • Percentage of incidents resolved at first call i.e. ‘First-Call-Resolution’ (FCR)
  • Count of incidents per Service Desk agent, this will be utilised for Service Desk efficiency calculation and staff planning
  • Number of incidents and their seasonality (daily, weekly, monthly, yearly)
  • Count and percentage of incidents that are incorrectly assigned
  • Count and percentage of incidents that are incorrectly categorised

10. What is a service request? Who works on it?

A Service Request (SR) is logged by a user seeking information, advice or to effect a standard change or to gain access to an IT Service.

The request fulfilment process is put into place so that standard services may be provided when needed, without the necessity of going through the approval cycles every time. Standard services are referred to those where pre-approval exists e.g. when a new employee needs access to the company intranet. SRs may also be logged when the user needs some information or advice e.g. when an account holder calls up the Service Desk to enquire about the change of her mailing address in the system or to check the status of an incoming payment. In this case a SR will be logged for record (e.g. subject to later audit) and to ensure that the time spent by the Service Desk in responding to the user can be justified.

Some SRs may be frequently recurring – so a predefined process flow can be devised to include the steps and information needed to fulfil the request, individuals that must be involved, timelines for fulfilment and escalation paths when not fulfilled. These are predefined SR models. This ensures consistency of resolution and assigns accountability for the model to the appropriate service groups. E.g. in the case of a new employee a laptop may be provided that has a ‘standard build’ with specific software. The cost of each new laptop and the approvals for installing the specific software has been given by the IT and possibly the Information Security department already. There is no need to take approvals every time a new employee join.

As with incidents, the ownership of SRs also lies with the Service Desk. In an IT system service requests will generally follow predefined models.

Often, there can be confusion about what qualifies as an incident versus a service request. It may so happen that a few incidents cause SR models to be created or changed. E.g. in an IT service provider organisation, it is observed that most of the new employees are reporting incidents related to the pre-installed anti-virus software on their laptops. After analysis of these incidents, the IT Department may decide to install a different anti-virus software in the future standard build.

Introduction

If you want to get an understanding of where this role comes into the picture, then understand this – a Problem Manager is typically someone that wears the hat of an investigator and deep-dives into places that the Incident Manager leaves behind for investigation. Problem Manager roles are often merged with other roles but should never be merged with the Incident Manager.

Problem Managers have a crucial role in the IT services industry, and service providers would typically look out for a person with an ‘investigative’ and ‘fact-finding’ mindset. Product companies and service providers that specialise in niche skills would be on the lookout for problem managers. Even if there is no advertised job, they would still value the mindset.

1. What is ‘systems thinking’?

Because a system is made up of many sub-systems – e.g. servers, cloud solutions, integration components, front-end technology, back-end technology etc. and each requires different kind of skills and specialization – it is difficult to form a team that can cater to the entire span. Consequently, the sub-systems start working in silos and end up competing or at worse, conflict with each other. The prevailing mindset is about ‘throwing it over the wall’. Obviously, this reduces the efficiency of the system (as a unit) thereby reducing the business value delivered, increases the lead time for implementing changes and portfolio managers incur more overheads in terms of management oversight.

‘Systems thinking’ is a mindset that requires the IT organization to think of the entire system ‘as a whole’, as opposed to parts of it. Even if you can visualize this as an ‘ideal situation’, you can guess that this requires a certain level of organizational maturity and certain type of organization culture where people and departments are transparent with each other in sharing of knowledge and clear segregation of roles and responsibilities. DevOps is one step in that direction with the tooling and automation, but there is more to it. The life of a problem manager is likely to be easier in an organization that identifies with ‘systems thinking’.

2. What is a ‘known error’?

A known error has a history of past occurrence, possibly repeated more than once. However, unlike an incident, the root cause and solution or workaround to the problem is known and documented. If the incident repeats, the support analyst may look up the documentation (we call this the ‘known error database’ or KEDB) and follow the steps to resolution.

If errors are ‘known’ – why aren’t we resolving them? The reasons are many – maybe there are other higher priority items, maybe the management does not want to invest in a service that will soon be retired, maybe there is already a change being implemented in parallel or even maybe the customer is unwilling to pay a premium for a permanent resolution.

‘Known errors’ play an important role in enabling a Service Desk for efficiency. The Service Desk should have access to this database and a proper way to correlate the newly reported incident with the known error and possibly inform the customer about the workaround. In certain cases, as in end-user-computing – implementing the workaround may be a self-help step, e.g. restarting the laptop to implement a security patch.

3. How are Problem Management teams formed?

Problems need to be resolved once the root cause has been found. This requires the formation of a team (or squad) that prioritizes and focuses on this activity for a pre-defined period. You can liken this to the definition of a ‘project’. Problem management teams are generally temporary and should be composed of individuals that have subject matter expertise and systems knowledge. Once the objective is achieved, the team can disband or start working on similar problems. Problem management teams may comprise of members from the incident management team (i.e. those who have first-hand experience of the incident), the technical team (if there is an infrastructure impact), the application management teams (depending on which applications are impacted), development teams, testing teams and even the service improvement teams (whenever we have ‘proactive problem management’). 

At a given point of time, there may be multiple problem management teams that can be working on different problems. Each of these teams select a leader for themselves who follows the problem management process under the watchful eyes of the process owner, i.e. the problem manager. 

4. Someone tells you that Incidents and Problems are one and the same. Do you agree?

They aren’t the same. 

Incidents are events that cause disruption to normal provisioning of services. Incidents are real-time and happen when a system is ‘on’ and catering to business needs and users. 

Due to some fault within an IT system, incidents may tend to get repetitive. E.g. a printer is sometimes unable to print documents back-to-back. As a result of this multiple users are ending up with a ‘paper jam’ and documents are not getting printed. Now, if this printer is in the warehouse where the shipment documents need to be printed and delivery trucks dispatched, it is a major incident – because without the paper documentation, the trucks just stay there causing the supply chain to slow down. The technical team may default the printer to single-side printing, but this means that double the paper will be used. So, this can only be a ‘workaround’ – i.e. temporary solution till a permanent one is found. 

The Problem Management process would typically be looking at finding the permanent fix. First step is to find out the root cause as to why this happens only at times. E.g. is it user-specific? Is it really a hardware problem (like misalignment) or a software problem (e.g. the printer drivers)? After investigation, problem management team will provide the most feasible solution, which could be anything from having a standby printer, replacing the printer etc. 

5. Is Problem Management always reactive?

No, we also have ‘proactive problem management’. Proactive problem management originates in service operations, but most of the activities would fall under the Continual Service Improvement (CSI). Every IT service provider should consciously work on proactive problem management as this ensures that the probability of incidents occurring is reduced. Remember that every incident is undesirable and causes disruption to the normal business and consequently customer dissatisfaction as they are not receiving the expected service. 

Proactive problem management can be done in many ways. Pain-value analysis can be done – which may include the incident count, duration, severity and other weighting factors to arrive at ‘pain levels’ for a system or component of the system (in ITIL terms – per configuration item (CI)). Do this for all the systems in scope and you have a full picture of what is most troublesome for you. Work on a proactive plan to reduce the ‘pain’. Another great way is to do a Pareto Analysis and find the group of incidents that take up the most efforts or cause the most amount of SLA breaches. Again, work on a proactive targeted plan for the top 20% of this group.

A final note – proactive problem management could be one of the key differentiators for an IT service provider in a competitive environment.

6. What are the most important pre-requisites for Problem Management?

The most important element for problem management is the availability of data. Without the availability of data, the problem management process does not have a starting point. Imagine a scenario where a new employee finds on day 1 that the company laptop allotted to him is very slow. He goes to the technical team who make some changes to the anti-virus software, and it works better. After a week, the laptop has slowed down again, and the technical team now disable some background services. After a month, the employee is back to the technical team for the same issue, and this time they disable some browser plug-ins.

In the absence of a ticketing system, there would be no data for the above, and no one would remember the history of this employee’s laptop. Having a ticketing system ensures that each time an incident is logged, and adequate information is stored – it would be three incidents already in our above example. If the same employee turns up for the fourth time, maybe it is high time to do a deeper investigation; but you will not be able to do this if you do not have the previous data with you.

And it is not just Incident data, there should be data from all the other ITIL processes – Asset Management, Change Management, configuration management as well. Only then can the problem analysis happen.

7. Do you know any problem-solving techniques?

One of the simplest techniques to solve a problem is brainstorming. Brainstorming sessions are usually chaired by a moderator and the people in attendance will contribute their ideas about the solution in a round-robin fashion, till the end of the meeting or when people have run out of ideas. The moderator ensures the session proceeds without doing the evaluation on feasibility of the ideas.

Cause-effect analysis is used to drill down from the problem back to the possible causes. Causesare usually categorized under – people, processes, products (e.g. technology), partners (i.e. suppliers) and ‘Mother Nature’ (something beyond control, e.g. natural calamity). This is done diagrammatically, resulting in an Ishikawa (or fish-bone diagram).

The Kepner-Tregoe method is a simple process for structured problem solving where the solutioning is kept separate from the cause identification process. The phases are – describing the problem, identifying possible causes, evaluating the possibilities, and confirming the real cause.

Fault Tree Analysis links events with Boolean logical operators–AND, XOR, OR and graphicallyindicates the chain of events that lead to the failure (or the problem).

In the Component Failure Impact Analysis technique, all the components are mapped, and we identify which of these are single point of failure and which have failovers. While this could be a proactive problem management technique, the Service Outage Analysis is a reactive technique where past outages are analysed – which was the most significant and what are the addressable parameters. These parameters can be categorized in the same manner as in “Cause-effect analysis”.

Problem Reviews happen after the solution has been implemented, the objective being to findout what worked and what could have worked better. Again, the four dimensions as in “Cause-effect analysis” are used for this.

8. If you are looking for a Problem Manager for your organization, what are the basic skills that you will be looking for?

The biggest competence for a Problem Manager is the ability to connect the dots in a situation where there are seemingly unrelated events. Anyone may be able to acquire the necessary technical and functional skills via training and learning on the job, but the ability to connect the dots is an ability that comes with experience and possibly talent.

In an IT services scenario, Problem Managers are usually experts in a particular technology, e.g. server administration, or functional experts e.g. SAP Master Data Governance (SAP MDG).

The other quality I would look for is analytical capability. Providing solutions is only a secondary responsibility of the Problem Manager, often there are experts from the technical, operations and change management teams that will work on providing the solution. The primary role of the Problem Manager is to derive the Problem statement – in short, what exactly is the issue? He has tools and techniques at his disposal, which we have already discussed about. Identifying the real problem is key to allocating and investing resources to resolve the problem.

9. What is a Problem Record? When does it start and when is it considered closed?

A 'Problem' is possibly the cause of one or more incidents; this cause is unknown at the time a Problem Record is created (based on symptoms). The Problem Record is the lifecycle artefact of the Problem, from detection to closure.

A Problem Record is uniquely identified. It contains information such as - when it was detected, the problem owner, symptoms, affected users, IT services and business functionality, a problem category and priority (that is a function of urgency and impact), configuration item (CI) correlation, status history, progress log and linkages to other problems, incidents, known errors or knowledge database items.

To close a Problem record, the solution must have been accepted as 'implemented and solved the problem' by the impacted parties. If there has been a workaround that was put in place, this should now be withdrawn, and relevant parties informed - e.g. the Service Desk and/or the application management or the technical teams that the workaround is no longer required. All linked incidents may be now closed and the documentation for the impacted CIs must be updated to reflect the changes that may have been made to the CI. Last, but not the least, the Problem Record should be updated and only then, should it be marked as 'closed'.

10. Have you heard of RCA? What is it?

RCA is an acronym for Root Cause Analysis. RCA is a collection of problem solving methods used to uncover the actual reason for an issue happening - in the ITSM scenario this could be repeated incidents, degradation of service quality etc. Using RCA, a problem is first defined (what to solve), understood and analysed (diagnosis) and finally solved (fixing the 'what').

The 'root cause' is a low-level (or in-depth) non-conformance - the epicentre from which it all starts. The RCA technique ensures that it solves at the epicentre rather than at any higher level. Workarounds or 'band-aid' fixes may be temporarily applied to the higher levels (which are possibly just symptoms) as an interim. It is always cost-effective to fix at the epicentre, rather than anywhere else - this is the logic behind doing an RCA.

Even in the ITSM scenario, root causes may not necessarily lie in technical components. E.g. the problem of repeated malfunctioning of a printer may not be related to the printer setup or printer drivers. RCA may reveal that it was a poorly qualified technician that serviced the printer last time, so the issue is with resource skills (people) and nothing else. Imagine the costs of using a band-aid approach in this case, where you keep replacing cartridges to achieve better results.

Introduction

A Service Desk is primarily a communications centre that provides a single point of contact between a business organization and its customers, employees and business partners. The purpose of a service desk is to ensure that users receive appropriate help in a timely manner.

Service desks often act as a face of the business to the outside world. This makes managing a Service Desk a vital activity for business success. The role of the Service Desk Manager is crucial as he needs to strike a fine balance between enabling the staff to be able to deal with human emotions in the right way and managing the technical resolution aspects.

Service Desk Managers may also need to act as analysts of the data - such as Incidents data.

1. You are the Service Desk Manager. You find 2 incidents in your queue – logged at almost the same time and with the same priority and undergoing resolution, but you are not confident of their resolution. Which one should you escalate?

One of the metrics that a Service Desk abides by is the time that it can keep an incident in its queue before making a functional escalation to the second line of support. The other side of the balance is the resistance it may face if an incident is escalated without proper diagnosis. Often, conflicts arise when the second line support feels that the Service Desk is just acting as a channel to pass tickets without doing any diagnosis regarding proper categorization, information logging and checking the knowledge base for possible documented procedures.

In the situation described above, since the timelines are similar, we must check for the impact of the incident. Greater impact means more users are affected, and therefore, the incident with the higher impact must be transferred to the second line of support as this group possesses more specialised knowledge. Breached incidents with higher impacts often cause more user calls and increase the workload of the Service Desk, causing them to further slowdown.

If the incident was raised through a system alert, there could be an ‘incident storm’ – which is typically more and more alerts being raised as the system expects the alerts to be reset within the SLA – again, this will increase the workload of the Service Desk manifold.

2. What is the Service Desk?

A Service Desk is the single point of contact between the service provider and the users or customers. The Service Desk manages the incidents and service requests (SR), handles the communication with the user who has logged the incident or the SR.

The goal of the Service Desk is the same as that of the Incident Management process: To get normal services restored to the users as quickly as possible. We must remember that the Service Desk may not have any specialised technical or functional skills. Therefore, the only way that the Service Desk can, by itself, fulfill this goal is by referring to the documentation available to it, published by the more specialised teams. Absence of proper documentation will impair the efficiency of the Service Desk.

The IT Service Desk may resolve an incident or fulfil a SR by following the documented process steps associated with the same. E.g. a user who has called up and raised an incident that he is unable to access the Bank's website, may be taken through a series of steps by the Service Desk on the call. If this does not solve the problem the Service Desk will pass the incident to the second level of support.

For SRs, Service Desk may need to follow predefined request models and address the customers need. Once again, these models must be well documented and intuitively understandable by the Service Desk staff.

3. As a Service Desk manager what benefits of your Service Desk can you showcase to your management?

The Service Desk is a user or customer facing organisation. Often, it will act as the face of an entire business to the user community. Here lies the biggest achievement that a Service Desk may hope to make – all the customer and user communication happening via a single point of contact whether it be answering a query, addressing a service request (SR) or notifying about the status of a service. A Service Desk should always endeavour to provide improved customer service, create a positive customer perception and contribute to customer satisfaction. It should always strive to get better in terms of quality of service and speed of customer service.

The IT Service Desk should also strive to become proactive as it matures. E.g. it may notify the user when a user request is resolved versus the user having to call up the Service Desk to get an update.

Because of its unique position in the Service Delivery organisation, the IT Service Desk also has access to large amounts of data related to incidents, SRs and service level performance. The Service Desk may utilise this data to provide meaningful analysis and reports for the IT Service Management. This will ensure faster and more accurate decision making by IT Service Management. Such data may also help in identifying potential failure scenarios even before the failure happens.

Last but not the least, a high-performing Service Desk may be able to compensate for some of the deficiencies that may be present in the rest of the organisation.

4. What are the responsibilities of a Service Desk?

The first and foremost responsibility of a Service Desk is to log all relevant incidents and Service Requests (SR) with all the relevant details and updates. It must be kept in mind that the Service Desk is the owner of the incidents and the SRs that are logged. Then the Service Desk provides the first-line investigation and diagnosis and also attempts to resolve it to the best of its knowledge.

When the Service Desk cannot resolve the incident or SR on its own it escalates the same to the second level of support. If the resolution is completed by another support group, the Service Desk must ensure that appropriate resolution comments have been added to the ticket or the request. Upon confirmation from the user that all is well, the Service Desk will close the ticket or SR.

The Service Desk is in charge of maintaining a continuous communication with the user who logged the incident or SR. This communication may include details about the progress made on the resolution, tentative timelines, any workaround that could be implemented till the matter is resolved, information and assurance about the problem management activities that are going on to prevent recurrence of such an incident. After the request or incident is closed, the Service Desk may also conduct a user satisfaction survey with the user.

Finally, the Service Desk must ensure that the knowledge database is updated with the most relevant information so that more incidents and service request in the future may be resolved at the first call, thereby improving the overall efficiency of the IT Service Management process.

5. if you are asked to set up a Service Desk what kind of people would you get on board?

Service Desk staff may or may not be required to have deep technical skills. However, they must have certain skills which are essential to perform as a customer or user facing organisation. First and foremost, interpersonal skills such as verbal communication on a telephone, written communication skills, ability to listen actively and showing empathy to the user who is having a problem are very important.

Service Desk staff must be trained to understand the business of the organisation. E.g. the Service Desk for a bank must be equipped with knowledge of the banking services that are provided while the Service Desk for a consumer electronics company must be equipped with knowledge about the different products that are in the purview of the Service Desk.

That brings us to the scope of service being provided – the next important thing at the Service Desk staff must know. E.g. a banking application that is available via desktop and mobile application may not have a Service Desk that caters to the mobile app issues. This may be resolved only through a ‘log issue’ button on the app itself. When a mobile user reports an incident the Service Desk must politely inform their limitations and guide the user to use the correct mechanism.

If ‘first call resolution rate’ metric is key to the business, the Service Desk may need to be equipped with deeper technical and functional knowledge so that more issues can be resolved by them rather than referring to the other support groups. Irrespective of the level of such knowledge, all Service Desk staff should have some basic diagnosis skills, which should be supported by making available the appropriate documentation.

The staff must be thoroughly conversant with the Service Desk tool because most of the time they may be at the end of a phone and have to record the issue being reported by the user. Without good tool knowledge this process will take a lot of time causing user frustration.

Finally, the Service Desk must be aware of the IT Service Management processes, especially those that would be interacting with it on a regular basis such as incident management, request fulfillment, technical and operation management functions.

6. How do you compare between two Service Desks?

Two Service Desks may be compared based on certain metrics, some of these are below.

  • The percentage of calls resolved during the first contact –where the resolution has beenprovided when the user has contacted the Service Desk for the first time. This is the most desired state of Service Management. Another related metric is the percentage of calls where the Service Desk did not require to escalate the incident to the second line of support. This shows the self-sufficiency of the Service Desk in terms of Knowledge and possibly technical skills.
  • Average time to resolve an incident when the Service Desk is able to resolve it.
  • Average time to escalate an incident –this is a measure of the Service Desk ability to judge itsown capability and understand the significance of the incident to the business; sometimes there could be conflicts with the second line of support who may feel that more incidents are being pushed to them and the Service Desk is trying to play safe.
  • Average cost of handling an incident by the Service Desk –since maintaining of a Service Deskcontributes to the overall operational cost, these must be designed optimally to maximize the return on investment. Staff utilisation is important.
  • Percentage of incidents where customers have been informed appropriately without delay –This ensures high customer satisfaction and keeps the communication between the business and IT service organisations smooth.
  • Average time elapsed between when an incident is marked as‘resolved’ by another resolvergroup to when it is set to ‘closed’ by the Service Desk.
  • Percentage of incidents where requested customer satisfaction surveys were completed. Thisis a measure of the survey process compliance and not the actual customer happiness.

7. What is ‘follow the sun’?

Global organisations may have running businesses and users all around the globe. Considering that a typical workday will not exceed 8 hours, this means that the Service Desk must be designed in such a way that users can dial in and avail services no matter in which part of the globe they reside. To ensure that the Service Desks in any region of the world do not work beyond their normal working hours, there must be multiple Service Desks set up all around the globe, in different time zones.

E.g. if an American user logs an incident during his daytime this will be handled by the Service Desk in America. If the incident cannot be resolved by them during their normal working hours, they could pass it on to the next Service Desk which may possibly be located in India. Similarly an Indian user logging an incident during his daytime may have someone based out of Europe working on it. This revolving model of Service Desk and service provisioning is called ‘follow the sun’ implying that the Service Desk operates as a single unit, but staff is active only during the normal working hours of various regions on the globe.

Implementing ‘follow the sun’ model is very useful for global businesses who can ensure that customers can be serviced at any point of the day. However, it requires a huge investment in terms of managing multiple Service Desks, ensuring that knowledge levels across all the Service Desks is uniform, language barriers are taken care of and quality of service does not degrade as the control passes from one Service Desk to another. E.g. it should not so happen that the IT Service Desk located in country X does not possess sufficient capability to resolve aging incidents, or the incident gets logged in a language which is not understood by the next Service Desk. The SLAs are likely to be violated in such cases.

When implementing ‘follow the sun’ model the Service Desk Manager must remain conscious of the risks of the staff not understanding the incident management processes and escalation mechanism to be followed during incident handling.

8. What are ‘specialised Service Desk’ groups? 

Some organisations consider certain services to be more important than others. Users who avail of these services are given access to ‘specialist groups’ directly rather than going through a generalized Service Desk. It is also possible that the user would have paid a premium to avail of such ‘priority’ services. These services are provided by the specialised Service Desk groups.

The specialised Service Desk groups are imparted specialist technical or functional training and are more familiar with the product or the service than the normal Service Desk. Usually, such specialised Service Desk will be staffed with more experienced members who may have once worked as a part of the normal Service Desk.

For the service provider, there are additional costs of maintaining such specialisation. First, the staff are required to be more qualified and consequently are more expensive. Second, more product

services training and familiarisation of the staff is also an investment. There needs to be a minimum payload of users willing to avail of the specialised service for it to be cost-effective to the provider as well as the individual user. If there is staff turnover in the specialized teams, it may be more difficult to find a suitable replacement, therefore running the risk of lower service levels at such times – this is likely to have an adverse impact on customer satisfaction. A fine balance must be achieved while establishing such specialised groups.

However, the benefits of having these are immense. The users receive priority service and Service Desk staff may consider this as a career progression, effecting staff retention.

9. What are the different kinds of Service Desks that you are aware of?

Service Desks maybe Local, Centralised or Virtual.

Local Service Desks are physically close to the user community it serves. The most pressingreasons for maintaining a local Service Desk are language, cultural and political differences across regions – especially when the business is geographically spread across multiple time zones. Another reason could be the presence of a special group of users who are customers at a location.

When technical skills are important for a Service Desk, the service provider may find niche technical skills only at a location. Presence of several VIP or high-value customers at a location may prompt the business to have one, e.g. a bank having a local Service Desk in an area home to a large population of high net worth individuals.

Centralised Service Desk: On the other hand, it may be possible to keep only one Service Desk ata location. Users from other locations are still required to call up this Service Desk. If the business is spread across different time zones, this will mean that the centralized Service Desk must operate outside of their normal working hours of the location where it is situated. This optimises the cost of maintaining the Service Desk considering that fixed costs of other locations are not incurred. It may also make the Service Desk more efficient because you are all in one place and the staff can deal with any seasonal higher call volumes. However, if the Service Desk activities require physical presence such as providing a laptop to a user then a bare minimum local presence will also be required in addition to the centralised Service Desk.

A Virtual Service Desk leverages communication technology and corporate support tools to give the impression of a centralised Service Desk to users from all locations. In reality, this it may be just a toll-free number and / or a continuously monitored mailbox. Virtual Service Desks can be easily automated and latest technology such as artificial intelligence may be implemented for a more efficient working. Behind the toll-free number or the mailbox are the Service Desk team who would be located at places that possibly have a lower setup costs; it is unlikely that the users will ever know about the physical location of the person he is speaking to. Considering language and cultural barriers, some of these could also be located with the users thereby leveraging a global + local approach.

10. As a Service Desk manager what are the parameters you will consider for deciding the most appropriate staffing levels in your Service Desk?

There are several factors that must be considered when deciding the staffing levels of a Service Desk.

First, the customer’ service expectations, the business requirements, the service level agreements such as response times, resolution times, call wait times etc. Business requirements may include coverage for multiple time zones, out-of-hour support requirements, desk side support etc.

Since the Service Desk is user facing, user parameters will impact the staffing of a Service Desk as well. Most important are the user volume, seasonality during the day, week, month or year, languages understood by the users, request types, user demographics such as gender, education levels etc.

As a service provider organisation for multiple customers doing different businesses, it is also important to factor in the business of the customer. E.g. if the customer is a banking organisation the Service Desk must be quite critical as customers will generally have financial interests and impact. On the other hand if the business is related to entertainment products, it may be more useful to have a Service Desk that understands the entertainment business.

The Service Desk manager must design ‘shifts’ in a way that includes a mix of functional, technical and communication expertise to reduce the impact on the IT services at any point in the service window. Yet another determinant is the availability of an updated knowledge base that the Service Desk can refer to. As a Service Desk manager, it is most important to ensure that the Service Desk staff has access to the knowledge resources.

Following is a list of the activities that a technical manager may need to do daily:

  • Manage the team of technicians.
  • Assist the technical team in resolving complex technical issues.
  • Provide inputs to the problem manager during the investigation of a problem.
  • Provide inputs to the incident manager during the resolution of an incident as required.
  • Participate in the major incident process by providing the necessary inputs and technical expertise for the infrastructure of the impacted Configuration Items.
  • Ensure that the infrastructure SLAs are being fulfilled e.g., availability, capacity, and resilience.
  • Create reports for the management.
  • Design and implement dashboards so that the appropriate stakeholders are able to monitor the status of the infrastructure on a need basis.
  • Keep up to date on the latest technologies relevant to the IT service – propose these as a part of the Continual Service Improvement initiatives.
  • Reduce the efforts spent on repetitive reporting to various stakeholders.

The Business Relationship Manager (BRM) is a ‘customer-focused’ role. They manage relationships with existing customers and engage in establishing meaningful and effective working relationships with the customers. They are also responsible for managing new opportunities of providing new IT services to existing customers and to newer customers. 

BRMs are responsible for ensuring that the outcomes of a provided IT service meet the requirements of the customer. They may need to explain the achieved outcome using the business jargon of the customer. This implies that they may need to understand the technical aspects of the service as well as the business of the customer. 

Any complaints related to the IT services are always routed through the BRM or at least keeping him informed. He is accountable for ensuring that the complaint is addressed promptly by the service operations teams and will provide updates post-fix. Complaints have a wide range – a shortfall of service levels, shortage of manpower, re-opening of an incident or too many of them, improper Service Desk communication etc. Periodically, the BRM must initiate a Customer Satisfaction Survey and follow-up on the satisfaction ratings received. 

BRMs are also known by other names – Account Managers, Business Representatives and Sales Managers. 

Yes, absolutely. 

When we talk about the BRM role being responsible for managing the relationship, this extends to more than just ‘wining and dining’ with the customer.

Having a relationship means that there is a continuous dialogue between the customer and the service provider. While the content may vary across relationships, there are a few basic topics that must be addressed on a continuous basis and not just at the time of adding new services or during contract renewal. A BRM must make himself aware of the latest situation within the customer organization and correlate this with any service that is currently provided. Is any change necessary? Being the representative of a technology service provider, he may also educate the customer about the latest technology and its usage in similar industries. 

Without a BRM from the service provider, the customer may be clueless regarding how to make a formal complaint for the contracted services – if a BRM has kept a proper working relationship, then many complaints may get redressed prior to escalating. The BRM must also work with the customer to set up a ‘satisfaction’ survey at regular intervals. Apart from the formal aspect of filling up a form, a lot of feedback may be gathered informally in casual conversations. Such feedback helps in keeping the services well-aligned with the expectations and ensures that the customer does not consider any other competing service provider organization. 

Low customer satisfaction ratings are usually a big topic at any service-providing organization. The first step is to analyse the ratings and the comments if provided. If enough justifying comments have not been provided, then the BRM should contact the customer and set up a meeting to understand more. Here is where many organizations get it wrong. They end up asking. 

‘why the ratings are low?’ instead of asking questions about the service quality. 

The BRM must be the person at the centre of this conversation. I have seen operations teams getting defensive and, worse, aggressive with the customer. This should be avoided, as it only takes the relationship towards a downward spiral. The defence, if any, should be based on facts – like the service level metrics that have been collected throughout the duration of the service provision. The BRM should first check if any customer complaints have occurred and redressed. 

Sometimes, customers may have expected more service improvements as opposed to just ‘keeping the lights on,’ especially when the services are stable. They would say that they would no longer rate highly only for managing the business-as-usual but would like to see more value being delivered through problem management, root-cause analyses, automated releases, etc. 

Creating a forum for having such conversations should be done all along and not just after the satisfaction survey and is the responsibility of the BRM. A low rating may also indicate that this has not happened before and, therefore, a failure on the part of the BRM. 

Yes, absolutely. 

Let us assume that you have received a customer satisfaction rating of 4 out of 5, 5 being the best. While this may be a great score, there is still room for improvement to try and achieve a 5 by 5. As with most of the things in the real world, it may be easier to climb the lower ratings than the higher ones – so moving from a 4 to 5 is far more difficult than, say, moving from 2 to 3. 

Most people who engaged in service transition and operations could tell that a 98.98% response time adherence is better than a 97%, but what if you have been consistently meeting the 98.98% mark over the last 6 months? It means that your standards have gone up, and people have become more productive and committed. That is all good news, but what if you realise upon investigation that 80% of your incidents are repeated, and the people have just become too good at resolving them – they could literally resolve incidents with their eyes closed, right? Now, as a BRM, think if that is indeed a great way to achieve a 4 of 5 ratings. Are you really protecting the interests of the customer? Not really. 

So, you need to take some steps – e.g., asking the customer for some investment for making a change to the application code – stating that this could possibly mean a lower support cost. Show them the ROI. 

Well, this is just an example; but as a BRM, you need to continuously seek ways to better satisfy your customer by offering value-adds. Maybe you shall reach that number of 5 out of 5.

Like other processes in ITIL, the Business Relationship Management process can also be measured, and the BRM has KPIs too. 

The most obvious is the number of customer complaints that were accepted for redressal. Lesser, the better! We can also measure the number of appreciations that have some in – of course, in this case, more is better. There may be specific situations where skilful relationship management saves a complaint. These are harder to measure but must go to the credit of the BRM. 

There can be other customer complaints that are not accepted. This is a tricky metric, as often BRMs will label a customer as ‘a difficult customer’ when the latter keeps complaining, whether justified or not. Many organizations may not measure this at all, but there may be subtle reasons behind an ‘ever-complaining’ customer that the BRM must investigate. 

Customer Satisfaction Surveys (CSS) are yet another source of relationship management information. The rating itself is a KPI, of course, and the higher they are, the better. For existing customers, consider the trend over time – is it changing for the better or for, the worse. A couple of more subtle CSS metrics are the number of surveys conducted and the number of responses that are received. A higher number of conducted surveys indicate the proactiveness of the BRM. The latter is a little more complex, and interpretations vary. Mine is that a customer is likely to be enthusiastic about responding to a survey when he is enthusiastic about the services and the service provider. Now, the BRM has a lot to contribute to building that enthusiasm. 

The Business Relationship Management process is a part of the Service Strategy phase in the ITIL Lifecycle. Interestingly, this process was only included in the 2011 release of the ITIL. 

The purpose of this process is to identify the specific needs of the customer and be able to distinguish such needs across customers. An effective working relationship between the customer and the provider must persist throughout the duration of the service provisioning. Understanding the needs is important, but it is equally important that the service provider engages in explaining the value of the service offerings they already make or can make in the future, should the customer agree to pay for the same. Business Relationship Management also puts in place the checks that enable both the customer and the service provider to consider that they understand the needs and offerings before the contract is signed and the services delivered. 

Where the Business Relationship Management process adds value is in aiding the service operations teams, e.g., the application or the technical management teams, to focus on day-to-day activities that require expertise and ensure alignment, and provides a big picture view when necessary. 

The Business Relationship Management process is a part of the Service Strategy phase in the ITIL Lifecycle, and this exists with the purpose of providing effective collaboration between the service provider and their customers. 

The most important objective of this process is to ensure a high level of customer satisfaction throughout the engagement. This is to ensure that the business with the customer is sustainable in the long-term. Whenever necessary, customer requests for services must be catered to, and newer services added to the portfolio may be proactively offered to customers. 

The Business Relationship Management process ensures that the service provider organization recognizes the changes in the customer environment as well as industry-wide changes in technology that could possibly impact the services that they offer to the customers. E.g. as the industry shifts to cloud-based solutions, a service provider may develop some capability in this space and present this as an offering to an existing customer whose data centre it currently manages. 

The Business Relationship Management process should also have provision to engage the appropriate stakeholders in dialogue and amicably resolve any service escalations through conflict management techniques. 

BRM must be kept in the loop regarding the ongoing service transitions and operations. In most service provider organizations, this will be achieved by defining suitable communication channels as a part of the service design. 

However, while there may be a specialised department in large service-providing organisations, the BRM for a customer is an individual with limited capacity for processing information. Often BRMs may handle more than one customer. 

The relevant information about service transitions will be related to the change management – how many changes went live, the pipeline, how many changes had to be backed out or remediated, and a high-level summary of the post-implementation review findings. Information regarding the skills and capability of the service provider resources is also useful as it provides assurance to the customer about service stability. For example, experience levels of subject matter experts and technical training provided. 

Information and metrics about the Service Operations processes may include the incident and service request trends, service level adherence reports, major incident reports, and any automation efforts for better event management. 

BRMs should also have all the information regarding to customer complaints, the current status of the complaints, the estimated time to complete (ETA, ETC), and the lessons learned. They should not just have the current data but also relevant historical data and the context of such data at their disposal. Often BRMs will present their own dashboards using the above information when they engage with the customer.

Technical Management function plays two major roles in an ITIL organisation: 

  1. It is the custodian of technical knowledge and expertise related to managing the IT infrastructure. It ensures that the knowledge required to design, test, manage and improve IT services is identified, developed, and refined.

  1. Technical management also provides the staff to support ITIL lifecycle. It ensures that resources are effectively trained and deployed to design, build, transition, operate and improve the technology required to deliver and support IT services. 

The requirements for technical management are defined in the Service Strategy phase, expanded in Service Design, validated in Service Transition and will also be refined in Continual Service Improvement. 

The Technical Management function also strikes a balance between the skills, utilisation and the cost of the technical resources. E.g. having an expensive resource with excellent technical skills may not be suitable if there is not enough work for this role. The function can then decide whether it should set up a robust knowledge management process, establish knowledge artefacts, and then hire contractors to do the work as per the resource forecast. 

Technical management also provides guidance to its operations teams about how best to carry out the ongoing operational management of Technology. 

TCM is a process of identifying, selecting, and evaluating new technologies (such as tools, methods, and processes) to incorporate the most effective technology in a software system. 

An organization would typically establish a TCM group that shall be responsible for assessing emerging technologies and managing changes that occur in existing technologies. The technologies that tend to improve the capability of the standard software process of the organization are the top priority. 

TCM helps in maintaining awareness of new technologies in an organization. It assists organizations in selecting the most appropriate technology to improve the software quality and productivity in IT. Before incorporating new technologies in the organization, both advantages and disadvantages of implementing the technology are checked with the help of a fail-fast prototype that helps to assess the impact and returns from the new technology. The selected technology stack most suitable for the organization is proposed and, upon management approval, incorporated into the standard software process of the organization. 

In addition to the above-mentioned objectives, other common objectives of TCM are: 

  • Minimize Total Cost of Operation (TCO).

  • Formulate policies related to the usage of existing or legacy technology.

  • Identify reasons for discontinuity and delay and replace them with better project management practices.

  • Maximize asset utilization and reduce inventories.

  • Reduce expenses by performing intake checks and evaluations prior to use.

Technology Management is the use of technology for human benefit. It covers the entire cycle of planning, design, optimization, operation and control of technological products, processes and services. It also includes managing of technical personnel. Technology management programs typically include instructions in production and operations management, project management, computer applications, quality control, safety and health issues, statistics, and general management principles. 

Technology innovations follow the form of an "s" curve though originally based upon the concept of the standard distribution of adopters. In broad terms, the "s" curve suggests four phases of a technology life cycle – emerging, growth, mature, and aging. 

These four phases are coupled to increasing levels of acceptance of new technology. In recent times for many technologies, an inverse curve – which corresponds to a declining cost per unit is also depicted. The inverse curve is found to suit innovation in the domain of information technology when initial investments in the new technology are high. 

The Carnegie Mellon Capability Maturity Model proposes that a series of progressive capabilities can be quantified through a set of threshold tests. These tests determine repeatability, definition, management, and optimization. The model suggests that any organization must master one level before being able to proceed to the next. 

Gartner has popularised the Hype cycle, which suggests that marketing of new technology results in the technology being overhyped in the early stages of growth. 

The following steps are likely to increase your success at implementing technology change:  

  1. Find the Right Technology Platform – There are a few things you should consider in a new technology solution. It should facilitate integration with existing platforms, provide ease of use (user interface), and offer product features that you really need. Time must be invested to research, conduct surveys, consider the pros and cons and get informed before selecting a new technology. 

  1. Lead from the Top – The leadership team must set an example by embracing the change and by providing clear and specific examples about how the technology will make life easier and simpler. 

  1. Acquire Knowledge – Treat your technology partner as being most knowledgeable about their products and as to how to get the most out of the platform. Leverage their expertise as an enabler for the change you wish to adopt through training and consulting. 

  1. Communicate – Technology change management leaders should work together to develop a solid communication plan and convey a consistent message. This plan must include answering all questions and alleviating concerns, and explaining how the new platform may impact team members. 

  1. Create a Measurable Plan – Dumping a new technology on a team and expecting it to learn new business processes and adapt to it is not a reasonable expectation. Incremental, measurable goals for the implementation and the use of the new technology must first be developed and assigned to the team members and then held people accountable to fulfil them. 

  1. Review and Adjust Progress – Implementation plans must have flexibility inbuilt – this makes them adaptable to roadblocks as well as positive risks that make the actuals go faster than planned. 

  1. Listen – Management must make the time and effort to listen to the concerns of employees, ask questions, and address them to keep everyone on target. Listening to and addressing concerns make people move forward faster. 

Application management ensures that the IT organisation can access appropriately skilled staff whenever required. As part of application management, the knowledge that the staff needs to possess is identified, developed and refined. Staff will first be trained on technical as well as functional aspects of the application and then they will be deployed into service delivery. 

One of the important factors to keep in mind is that a balance must exist between skill levels vis-a-vis the cost of the resources. Decisions must be taken around whether to staff the service delivery with in-house staff or contractor staff or whether to keep a centralised pool of resources. Once the resources are trained and deployed to deliver services, the application manager must ensure that the resources are optimally utilised. Optimal utilisation of resources becomes even more important if expensive contract staff has been deployed. Utilisation is usually one of the key KPIs of application management functions. 

The responsibility for developing new applications lies with IT application development. However, application management will contribute to a buy or bill decision. They will also provide guidance to the IT operations team in matters related to the applications and provide information on the functionality that is implemented within the application. 

Application management can be performed by any department or team involved in managing and supporting operational applications. Applications that have been sunset (or retired) do not form a part of the active application management function. The application management team plays an important role in the design, testing, and improvement of applications that provide IT services to customers and users. Application management may be involved in development projects to the extent of providing information related to interfaces and lessons learnt from managing existing applications. 

Application management teams possess expertise regarding the functionality implemented by the applications. The teams may also have functional experts on-board, e.g. an IT application that is part of a supply chain may have domain experts on sales and distribution. Of course, there will also be technical experts who will know how the functionality has been implemented in executable code. Application management teams are specialized teams, and therefore sharing of staff across different applications may prove to be difficult. 

Application management teams act as the third level of support for the incident management and problem management processes. They will provide expert inputs to resolve incidents and problems. 

A change management process must control any changes to the functionality provided by an application. In an enterprise, applications will not exist in isolation; it is most likely that any changes to one application may have an impact on another. During the impact analysis process for changes (a part of change management), the change manager must analyse these interdependencies using the Configuration Management Database (CMDB) and involve the relevant application managers to assess the impact of change on their respective applications. If there is a change that affects multiple applications, the efforts and cost expended by these application management teams to implement the change must be included in the efforts and cost estimates for the change (and budgeted accordingly). 

Application managers should be invited to the Change Advisory Board (CAB) meetings by the change manager. It may so happen that a change cannot be implemented by an application management team because of another change which is work in progress. Unless these interdependencies are identified, change management cannot be successful or will not produce the desired results. 

Since application management teams possess specialised knowledge related to the applications, they may also perform application development work limited to changing existing code. This may be very useful in a situation where changes must meet strict timelines and there is no time available for knowledge transfer to new members. It also helps to utilise any available bandwidth within the application management team. The quality of deliverables should be better as the changes are performed by staff that are knowledgeable of the application. 

The application management function is responsible for ensuring that applications are well-designed, resilient and cost-effective. A properly designed application operates with minimum disruptions. This, in turn, reduces the number of incidents and downtime, resulting in a happier customer and user. Resilient applications are fail-safe and keep the failures transparent to users of the applications and users perceive that the applications are running smoothly. A cost-effective application is one where the total cost of ownership (TCO) - staff required to manage IT, software and hardware costs, cost of poor quality, degree of automation etc. is optimal. 

Every application implements functionality that satisfies certain business objectives. Application management function needs to ensure that these functionalities provide by the applications remain available to the users. Managing applications require technical and functional expertise. By maintaining a pool of resources that have the necessary technical and functional expertise, the application management function ensures that any technical failures that occur can be speedily diagnosed. Technical failures arrive at the application management team via the Incident Management process where they play the role of the third level of support. 

Due to the changing business scenario, applications will need to evolve over a period by introducing new functionality or modifying the existing ones. The application management function provides the staff and expertise necessary to estimate and assist the application development team in the implementation of changes. 

This is how the application management function meets its objectives.