ZERO OUTAGE THE ZERO OUTAGE PRINCIPLE AS A REQUIREMENT FOR DIGITAL TRANSFORMATION - T-Systems

Page created by Cody Moore
 
CONTINUE READING
ZERO OUTAGE THE ZERO OUTAGE PRINCIPLE AS A REQUIREMENT FOR DIGITAL TRANSFORMATION - T-Systems
ZERO OUTAGE
THE ZERO OUTAGE PRINCIPLE
AS A REQUIREMENT FOR
DIGITAL TRANSFORMATION
.
ZERO OUTAGE THE ZERO OUTAGE PRINCIPLE AS A REQUIREMENT FOR DIGITAL TRANSFORMATION - T-Systems
WHITE PAPER ZERO OUTAGE

CONTENT
NO DIGITAL TRANSFORMATION WITHOUT FAIL-SAFE IT                                       3

QUALITY AS THE MOST IMPORTANT DECISION-MAKING CRITERION                              4

ZERO OUTAGE: THE PATH TO IT DEFECTS                                                  5

THE 3-P PRINCIPLE: PEOPLE, PROCESSES, PLATFORMS PEOPLE: THE CRITICAL HUMAN FACTOR    6

PROCESSES: THE COMPANY’S FRAMEWORK                                                   8

PLATFORMS: A COMPANY’S FOUNDATION                                                   10

ZERO OUTAGE IN PRACTICE			                                                          12

2
ZERO OUTAGE THE ZERO OUTAGE PRINCIPLE AS A REQUIREMENT FOR DIGITAL TRANSFORMATION - T-Systems
WHITE PAPER ZERO OUTAGE

NO DIGITAL TRANSFORMATION
WITHOUT FAIL-SAFE IT
Reliable information and communication technology (ICT) is the basis for successful digital
transformation, both in terms of internal IT operations and ICT services purchased from a service
provider. Companies’ commercial activities and their entire existence today depend on it.

Businesses which fail to build permanently fail-safe ICT run the risk     The smooth co-operation required can only exist if there is a common
of having major problems. The market research company Gartner             quality standard. As such, the ICT industry needs an ecosystem com-
predicted as early as 2013 that a quarter of all businesses would         mitted to the Zero Outage principle, one which follows common rules
disappear from the market if they were unable to meet the quality         for quality management – goals which can be optimally pursued
requirements for digital transformation – so-called “digital              with Zero Outage. T-Systems used Zero Outage as early as 2011 to
incompetence”.                                                            introduce a complete quality-assurance program for ICT services.
                                                                          The aim: to minimize downtime and thus maximize its customers’
But ensuring quality in ICT is an intricate management task.              business activities in the digital age.
Countless components need to work seamlessly together at all
times so that areas like production or sales can operate smoothly.        The following white paper provides an overview of Zero Outage.
This requires clear standards: for the processes, technical platforms
and when training personnel (the “3-P principle”). These standards
must not only be introduced and implemented, but also consistently
maintained.

In addition to these standards, constant staff vigilance and a sense of                     ZERO
                                                                                           ERROR
urgency are also of crucial importance, because human error
remains the most common cause of disruptions and outages. This
can only be helped by a holistic approach, which systematically
raises staff awareness about quality, and ensures every staff member
feels committed to a Zero Defect culture.                                                 PRINCIPLE
The focus on quality must not be limited to the business’s four walls,
because businesses of every size and industry work together across
sectors. This means there are increasingly more gateways and
touchpoints. Unless every participating organization has and main-
tains the same high understanding of quality, there will be a risk of
defective products and outages.

                                                                                                                                           3
ZERO OUTAGE THE ZERO OUTAGE PRINCIPLE AS A REQUIREMENT FOR DIGITAL TRANSFORMATION - T-Systems
WHITE PAPER ZERO OUTAGE

QUALITY AS THE MOST IMPORTANT
DECISION-MAKING CRITERION
There are a number of studies which illustrate the great importance
of quality in services, particularly in this digital age. For example, two
thirds of the businesses surveyed by the consultancy firm PwC in 2015
state that, at 84 percent, quality is the most important criterion when
choosing a service provider. This puts quality well ahead of financial
considerations (58 percent). The Information Services Group (ISG)
also found that IT quality plays a role “very frequently” to “always” in
companies’ decision-making. General performance, in the sense of
stable processes and sustainable services, is an especially important
factor here.

EVERY OUTAGE COSTS MONEY
Increasing digitalization is putting more pressure on IT departments in
companies across all industries. All telecommunications, rescue service
systems, postage logistics, transport companies, trade, the entire finance
sector and much more are today dependent on problem-free IT.

The more platforms and processes are interlinked, the more dependencies
exist and thus the more likely it is for incidents to occur. These incidents
– even the tiniest ones – can have serious effects, including complete
outage on critical business services.

                                                                               Every outage costs money: More than 37 million man-hours are
                                                                               lost by European companies with 50+ staff alone as a result of
                                                                               IT outages and data recovery – and that is just per year. In many
                                                                               sectors, even a brief outage of the IT systems today causes major
                                                                               financial losses for the affected businesses and establishments.

                           (37
                                                                               Apple’s App Store was unavailable for eleven hours in 2015 due
                                            mln h                              to technical problems, forcing the company to absorb 2.2 million
                                        man-hours                              dollars in losses – per hour. And impacts of this scale are not an
                                                                               exception, they are the norm. Meeting the high-quality
                                                                               requirements for IT in the modern business world thus takes a
                                                                               strategy which minimizes the number of incidents, while also
                                                                               rectifying any disruptions as quickly as possible. This success
                                                                               strategy has a name: Zero Outage.

        European companies with over 50 employees lose more
          than 37 million man-hours due to IT downtime and
                      data recovery - every year.

4
ZERO OUTAGE THE ZERO OUTAGE PRINCIPLE AS A REQUIREMENT FOR DIGITAL TRANSFORMATION - T-Systems
WHITE PAPER ZERO OUTAGE

ZERO OUTAGE: THE PATH TO IT                                                 INCIDENT MANAGEMENT
Zero Outage is the term for how an organization behaves in terms of         Incident management constitutes a major part of Zero Outage. Stand-
systematically and efficiently handling quality-related tasks – with the    ardized, comprehensive incident management repairs an acute error as
aim of continuously increasing quality. Zero Outage thus affects            quickly as possible by achieving maximum professionalization through
telecommunications and IT operations, services, projects, the               repeated solution processes. Incident management includes a clearly
optimization of customer interfaces, and the involvement of further ICT     defined communication chain and various escalation levels, as well as
suppliers. It is important to note here that Zero Outage also refers to     a general manager-on-duty service – known as the “red telephone”.
the behavior of an organization’s entire workforce – from top               Similar to a standby service, dedicated representatives from the senior
management to entry-level staff.                                            or top management, along with a special team, can be contacted 24/7
                                                                            about critical incidents. The manager on duty is directly involved, and
The Zero Outage program covers measures across all levels – from            co-ordinates all problem-solving processes as the main contact. Around
state-of-the-art platforms, to smooth, standardized processes with          140 managers work as managers on duty at T-Systems worldwide,
short repair times, to specially trained staff. This is because stable,     taking turns to bear responsibility in times of crisis.
reliable ICT can only be achieved through optimum interaction
between humans and technology.
                                                                                                             ZERO
The most important principle of Zero Outage is always that of                                               OUTAGE
comprehensive, proactive risk management. It operates under the
motto of “prevention, not reaction”. It is not about being the fastest to
put out the fire in the worst-case scenario, but rather to foresee risks,
develop a plan B and C in advance, and thus prevent the fire from
starting in the first place. Great importance is thus placed on
comprehensive quality assurance right from the planning phase for
                                                                                                   Optimisation      Operation of tele-
changes or projects, as well as on a generally high degree of standardi-                           the customer      communications
                                                                                                     interface           and IT
zation for processes and technology.
                                                                                        Implementation        Delivery of          Integration of
                                                                                          of projects          services            ICT suppliers

Zero Outage includes specific rules and behavioral guidelines for
various incidents, such as, in the case of defective system                 ÖKOSYSTEM
components, for network, power or VoIP outages, and even for                In order to ensure top quality and reliability end-to-end at all levels,
incidents that arise while implementing a change. Active risk               T-Systems works with partners and suppliers upholding the same
management serves as the basis for all Zero Outage initiatives: Every       high-quality standards. After all, they are an integral part of the process
single risk cluster is monitored for risks, e.g. incidents, and the meas-   chain, both in terms of providing solutions and services, and in
ures taken are constantly optimized and further developed. In this way,     emergencies. To enable any incidents to be rectified as quickly as
Zero Outage has managed to achieve 99.999 percent availability in           possible, error sources to be clearly established, and a final solution to
ICT, corresponding to an outage time of just a few minutes a year.          be found, it is vital to directly involve the respective supplier. In 2013,
                                                                            T-Systems thus expanded the existing Zero Outage program to include
STANDARDISATION                                                             partners and suppliers. Around 30 top global suppliers and over 60
Clearly defined standards for platforms, processes and personnel are        access providers are already Zero Outage-certified. Every year, over 500
pre-requisites for maximum availability and reliability. Standardization    unannounced emergency simulations (“fire drills”) ensure the agreed
reduces complexity, and is crucial in preventing or quickly rectifying      quality is upheld end-to-end, both by T-Systems and the suppliers.
incidents. At the same time, the Zero Outage strategy also focuses
intensively on operational problems in order to ensure the right
conclusions can be drawn. This is the only way improvement initiatives
can be started. Even in project management and software engineer-
ing, the parties that are in charge all work in accordance with clearly
defined, tried-and-tested processes and standards which describe the
results prepared by the various project roles during specific phases
and stages.

                                                                                                                                                    5
ZERO OUTAGE THE ZERO OUTAGE PRINCIPLE AS A REQUIREMENT FOR DIGITAL TRANSFORMATION - T-Systems
WHITE PAPER ZERO OUTAGE

THE 3-P-PRINCIPLE: PEOPLE,
PROCESSES, PLATFORMS
As previously mentioned, optimum functioning between personnel,
processes and platforms is essential for guaranteeing 99.999 percent
availability.

PEOPLE: THE CRITIC AL HUMAN FACTOR                                               People                         Processes
The human factor plays a central role when it comes to incidents
in critical system operations. Take air traffic control as an example:
Human error is the main cause of 60 percent of plane crashes. In IT
                                                                                                 THE
system operation, the human-error percentage is much higher, at over                        3-P-PRINCIPLE
80 percent. Critical systems can today be secured to an extent that
renders outages extremely unlikely – but only as long as humans do
not make any big mistakes.                                                                     Platforms
Gaining control over the aforementioned problems requires a holistic
approach which goes hand in hand with the culture embodied by an
entire company.

                                                                          CREATING A ZERO OUTAGE CULTURE
    If an incident occurs and is caused by human error, this is often     In order to successfully and sustainably incorporate the quality
    due to the following:                                                 mindset into a company’s culture, it must become the focus of
                                                                          all values. This affects all areas. The Human Resources
    • The people involved may be operating using different terms,
       behavioral patterns and priorities                                 department plays a key role in firmly esquality approach.
    • The people involved are not properly trained (lack of expertise    If quality is already well and truly integrated as an important
       and certifications)                                                criterion in the recruitment process, and also influences salary
    • A sense of urgency is lacking in critical situations               models, career planning and employee performance appraisals,
    • Errors when implementing changes and solving problems (no          the corresponding standards and values permeate through all of
       dual-control principle)                                            the organization’s departments.
    • Middle and senior management do not have the details of
       operational matters
    • Shifting of responsibility back and forth (“incident ping-pong”)

6
ZERO OUTAGE THE ZERO OUTAGE PRINCIPLE AS A REQUIREMENT FOR DIGITAL TRANSFORMATION - T-Systems
WHITE PAPER ZERO OUTAGE

Creating a Zero Outage culture at a company involves a number of factors. The following measures have, however, been identified as key
tools in ensuring the success and permanent establishment of this culture:

• “Practice what you preach”: If a manager themself embodies          • Managers on duty are also appointed from the operational IT
  values and standards and acts as a role model, their staff are           divisions and from among all managers. The managers are also
  more likely to adopt these and associate them with the positive          trained and are contactable at night and at weekends according
  example. Such behavior is particularly important when an organi-         to a rotating schedule – to assist with incidents or, for example, a
  zation wants to gear itself around high quality standards.               change weekend. This also allows there to be a contact from the
                                                                           management who is available every day and night to help with
• The company has defined Zero Outage as a top priority for now           escalating the solution.
   and years to come. There is a mission supported by everyone,
   and a strategy pursued by everyone.                                  • In the event of a major incident, the manager on duty is the first
                                                                           person to enter the telephone conference and push for the cause
• There are clear, measurable KPIs and a plan of what is to be            to be investigated until the error is rectified. They embody the
   achieved over the next twelve months, as well as a long-term            sense of urgency and act as a role model for all staff involved.
   strategic objective. Overarching goals, such as reducing major in-
   cidents by X percent, figure in the top management’s and senior      • The heads of department and team leaders in middle and lower
   management’s personal target agreements.                                management are also responsible for quality in daily business.
                                                                           They have a checklist which helps them during daily quality
• Successfully changing an entire culture requires not just one           checks with the team and gives them a guideline as to the quality
   single area to commit to Zero Outage, but everyone, including           benchmark.
   those expected to resist it.
                                                                        • Feedback culture: It is important to involve the key players from
• A weekly slot is reserved for the quality update at management          the operational areas when further developing process standards
   board meetings: The quality manager reports on what currently           and policies. An opportunity for direct feedback should also be
   are the most important quality KPIs, the highlights and lowlights       provided whenever possible. This may be in the form of a Q&A
   of the past week, and the status of important improvement               session held after the staff call, an anonymous feedback survey,
   programs. If necessary, decisions are also made directly at this        or on-site breakfasts, where staff can discuss the quality strategy
   time. Action is then taken and monitored to see whether                 in a relaxed, casual atmosphere.
   improvements have been made – with a follow-up for the next
   week.

                                                                                                                                                  7
WHITE PAPER ZERO OUTAGE

QUALITY ACADEMY                                                                  PROCESSES: THE COMPANY’S FRAMEWORK
Only staff who constantly hone their skills can lead the organization            A modern, efficient business model is based on countless processes
to success, which is why T-Systems established the so-called Quality             which ensure, at the various levels of a company, that the processes
Academy as a standardized training and certification platform in 2013.           function correctly. A company’s fundamental processes depend on IT
It serves as a think tank for company-wide knowledge transfer across all         and telecommunications in virtually every industry, which is why high
quality-related process and IT training courses. Over 20,000 T-Systems           process quality end-to-end is essential in making ICT environments as
staff and almost 100 top partners and access providers are now certified,        fail-safe as possible — because, if the ICT quality is not right, a single
ensuring a standardized understanding of quality and solution expertise          process error can block or even suspend a business’s entire process. An
at all levels. All of the Quality Academy services are geared around             operational fault in an ERP system, for example, is difficult to
certain professional careers. Staff in the operations division, for example,     compensate for through manual replacement methods and processes.
are trained in the dual control principle when implementing changes,             The consequence: suspension of business operations within a few
while project managers increase their know-how relating to quality               hours.
gates and touchpoints in projects. Each employee can utilize specially
configured training modules for their specific career and role and can           PROBLEM AREAS AND SOLUTIONS
easily navigate between individual sections within the so-called playlist.
This enables faster completion of the training, as well as content tailored
                                                                                     The causes of process disruptions are extremely diverse, and
specifically to the respective target audiences. Staff with little involvement
                                                                                     may lie at various levels and departments of the organization:
in a process/subject area do not need to complete extensive training, but
                                                                                     •D  ifferent adaptations of the existing standards, such as
instead only the content relevant to their tasks. These methods also make
                                                                                        ITIL, COBIT, PMI
it easy to combine topics relevant across the board into training modules,
                                                                                     • Highly complex, impractical, scientific process descriptions
thereby helping process and tool aspects to be learned simultaneously.
                                                                                     •N  o documentation of responsibilities or a general lack of
                                                                                        end-to-end responsibilities
Attractive new formats like simulations, mobile training courses or                  •P  rocesses do not fit together because departments operate
game-based learning provide variety as a change from the monotony                       separately from one another
of normal, web-based training courses or recorded instructions with                  • T he alert chain for incidents starts too late or does not
supporting PowerPoint slides. The “flight simulator” is one example                     function consistently; precious time is lost
which has proven useful: This online training course allows various                  •N  o focus on sustainable problem-solving or investigating
                                                                                        causes; the organization persists with workarounds, which
scenarios and problems related to daily work to be simulated on the
                                                                                        constantly create new errors
desktop and addressed, so as to prepare staff for real operations and
thus prevent human error.
                                                                                 However, the ICT systems in many companies do not always display the
CERTIFICATIONS                                                                   necessary quality. Coupled with this is the fact that, when a company
Quality Academy certification plays a key role in keeping the workforce’s        grows, so does the number of internal processes, making it increasingly
knowledge of Zero Outage verifiable and up to date. On the one hand,             difficult to co-ordinate them all. While standards like ISO 27000 or the
certification is an important means of proving the employee’s knowledge,         IT Infrastructure Library (ITIL) have significantly increased the degree
and on the other, it is also a good indicator for managers, giving them          of industrialization in the IT world, they are yet to satisfactorily ensure
an overview of the team’s knowledge level. From the overall perspective          the reliability and stability of IT systems. These standards only describe
of a global quality organization, certification is essential for imparting       what quality is, not how it is achieved.Too many different approaches are
knowledge across the board and rolling out new standards.                        taken here, and IT outages continue to occur too frequently.
                                                                                 Standardization must thus start right at the beginning of an IT project.
A certificate expires after 18 months and must be renewed – thereby              Only then is it possible to achieve high process quality.
ensuring a continuous focus on up-to-date content. It also facilitates the
onboarding process for new staff, who can undertake the training cours-
es at any time and get certified in a similar way to a “driver’s license” –
giving an important feeling of achievement and also being an ideal way
of consolidating the quality mindset and relevant knowhow early on.

8
WHITE PAPER ZERO OUTAGE

CLEAR DESCRIPTION AND DOCUMENTATION OF PROCESSES                           competence areas, a lack of end-to-end responsibility also impacts
Inadequate process descriptions are a typical fundamental problem          negatively on IT quality. Every process disruption is not only a potential
affecting process quality: Highly complex, impractical essays often        error source, but also makes it harder to see the whole
result in IT staff not knowing, for example, what to do in the case of a   picture. Silo mentalities and isolated sub-processes inevitably lead to
fault. Such instructions are counterproductive and result in errors and    an overall concept relevant to the company’s success being left by the
conflicts. Simple, easily comprehensible process descriptions are thus     wayside. It is thus imperative to establish consistent processes which
needed in order to ensure smooth everyday operations and prevent ICT       view IT challenges holistically. Regular training courses ensure the
outages from happening in the first place. This also requires a clear      necessary process compliance in emergencies. If the individual
assignment of roles within the company.                                    process stages have been practiced, a strict procedure will be
                                                                           successful even under intense pressure.

                                                                           CONFIGURATION MANAGEMENT
                                                                           Configuration management is a good basis for improving processes.
                                                                           An orderly Configuration Management Database (CMDB) is a useful
                                                                           indicator of whether processes function correctly and are applied in a
                                                                           disciplined manner in areas such as change management, patch and
                                                                           release management, and monitoring. The aim of configuration
                                                                           management is to document compliance with a configuration unit’s
                                                                           physical and functional requirements, and create full transparency
                                                                           in relation to this, with a view to ensuring every party or department
                                                                           involved with the configuration unit uses the right and appropriate
                                                                           information.

                                                                           SYSTEMATIC PREVENTION
                                                                           There will always be another outage: Anyone who is serious about
                                                                           quality management and wants to improve and standardize their
                                                                           process quality over the long term is reliant on clean, tidy
                                                                           documentation. Only through consistent documentation existing
                                                                           processes can be permanently optimized, and error sources
                                                                           minimized.

                                                                              The Zero Outage strategy covers the following fundamental
                                                                              points in order to make the company’s process landscape
                                                                              more reliable:
                                                                              • Simple process descriptions
                                                                              • Clearly define responsibilities and processes
DISTRIBUTION OF EXPERTISE AND RESPONSIBILITY
                                                                              • Regularly simulate emergencies
Other frequent problems include breaks in, or ambiguities regarding,
                                                                              • Consistently document and analyze incidents and Faults
responsibility when company or even just departmental boundaries
are crossed. Overlaps or inconsistencies in collaboration can result in
each party relying on the other, with no one ultimately feeling properly   High process quality is essential for stable, highly available ICT. Clear
responsible.                                                               rules and structures, and consistent implementation thereof, are
                                                                           required in order for the IT’s creative potential to freely develop in the
Efficient crisis management is required if a fault occurs. Insufficient    interests of the company’s objectives – based on the motto: Only those
alert chains and cumbersome delegation of tasks and responsibilities       who control their processes are not controlled by their processes.
unnecessarily prolong the repair process – and therefore also the risk
factor for the company’s business activities. Roles and processes must
therefore be clearly defined and consistently followed. This also
includes a strict dual-control principle, which guarantees error
prevention and quality checks in critical matters. Along with unclear

                                                                                                                                                   9
WHITE PAPER ZERO OUTAGE

PLATFORMS: A COMPANY’S FOUNDATION                                            POSSIBLE ERROR SOURCES
Standardized, high-performance, and, most importantly, highly available
platforms are pre-requisites for a Zero Outage philosophy. The plat-             Despite all precautions, technical faults cannot be discounted
forms must, however, also always comply with the latest technological            at a platform level. The most common causes of these are:
standards and have multiple back-ups. Experience has shown that while            • Redundant components mutually disrupting one another
technical faults are only the cause of incidents in protected systems in         • Defective firmware
exceptional cases, they should definitely not be neglected.                      • Outdated hardware
                                                                                 • Defective monitoring
                                                                                 •E xcessive complexity as a result of different technologies
                                                                                   and versions

                                                                             Faults and outages mean extra work, delays and, in the worst-case
                                                                             scenario, even downtime. That is why it is also important to learn from
                                                                             previous faults and prevent new ones. This requires having as broad a
                                                                             picture as possible of all systems used by the customer, meaning that not
                                                                             only are the systems operated by the IT service provider taken into ac-
                                                                             count, but also the customer and the systems it operates itself. A “critical
                                                                             landscape” should always be created, containing all existing compo-
                                                                             nents – i.e. systems, applications and interfaces – from the customer’s
                                                                             end to the supplier’s end.

                                                                             Clearly defined standards form the basis for maximum availability and
                                                                             reliability. Standardization reduces complexity, which is in turn crucial
                                                                             for preventing or quickly rectifying faults. Fewer spare parts and experts
                                                                             with specialized knowledge are needed, and there are fewer unknown
                                                                             reactions during changes.

                                                                             CHANGE MANAGEMENT
                                                                             Change management plays an important role, because defective
                                                                             changes are today the most common cause of faults. Companies have
                                                                             invested huge sums in tools and training courses. Yet, despite these
TECHNICAL BASES                                                              efforts and countless books on the topic, most studies show that
Zero Outage is based on duplicated data-center technologies. All data        between 60 and 70 percent of all change projects at companies still fail.
and systems must be available in two identical but physically separate       Changes are, however, becoming necessary more and more frequently
data centers. If one data center has an outage, the other takes over. In     and at shorter
the event of hardware defects, a second in-built network component           intervals, as rapid technical advancements at the customer’s end are
supplies the power. The same applies for storage to prevent hard-drive       constantly generating new requirements for speed and memory. Cloud,
defects. Further redundancy at a higher layer ensures additional             IoT, Big Data etc. all enable new business models, which in turn need
minimization of the residual risk. This may be a second, active, complete    new systems and platforms. The requirements for mobile usage have
server in a second data center, for example. This is the only way of being   also increased dramatically.
able to offer customers 99.999 percent availability.

10
WHITE PAPER ZERO OUTAGE

                                                                              SECURITY
    Changes in hardware are often extensive. The most common                  Protecting against outages through redundancy and uniform standards
    requirements are:                                                         is just one side of the coin. Backing up these systems accordingly is
    • Existing systems need to be serviced                                    also the order of the day. Companies of all sizes and industries are
    • Old or defective hardware must be replaced                              faced with ever increasing security requirements. Any identified
    • Firmware must be updated, and security loopholes closed                 security loopholes must be immediately closed, and unknown ones
    • New systems must be integrated into the IT landscape                                                       sought as a preventive measure.
                                                                                                                 Waiting for loopholes to become
                                                                                                                 apparent is grossly negligent,
                                                                                                                 because it gives many hackers and
All these points must be planned and coordinated intensively, and
                                                                                                                 copycats more chance to take advan-
there needs to be a risk assessment of the consequences of the
                                                                                                                 tage of them.
changes. If a system which was functioning flawlessly yesterday stops
                                                                              “Mobile working” is another main gateway for hackers and malware:
working today, there are essentially only three possible causes:
                                                                              An appropriate infrastructure for accessing the company’s network
Changes in use (increased access, including hacker attacks or
                                                                              externally should be established beforehand, enabling secure access to
viruses), physical defects or – and this is by far the most common –
                                                                              the company infrastructure by public networks. Similarly, there need
something has been changed in the system or its configuration.
                                                                              to be clear rules as to which company applications can be accessed
                                                                              externally and which cannot. It is also mandatory to clarify with one’s
                                                                              customers, in advance, which access rules exist for their systems and
 CLEARLY DEFINED STANDARDS FORM THE BASIS                                     applications.
  FOR MAXIMUM AVAILABILITY AND RELIABILITY.
    STANDARDISATION REDUCES COMPLEXITY.

Zero Outage change management is about consistently minimizing
risk when implementing changes and keeping impacts as small as
possible. Every permanent change to a customer’s IT landscape is
assessed and checked, based on the same criteria. This requires a
quality assurance system applied consistently across the company’s
entire organizational structure. Successful changes result in change
models or templates, which are used to perform similar changes in
future. If change models are highly standardized and can be applied
globally, procedural optimizations can also quickly be made accessible
to all teams around the world. As such, the Zero Outage approach is
increasingly being incorporated into change plans. The expected result
of every step-in implementing changes is already noted in the change
plan. When performing the change, every step is followed by a dual
control check to see whether this result was achieved.

If an error does occur, a detailed examination is conducted. One
common technical cause is a hardware, software or configuration error.
This involves getting to the bottom of often complicated situations.
As complex environments almost always also involve suppliers with
their components, these suppliers are also incorporated and asked to
contribute to the analysis. In the case of technical defects, it is also
important to ask why the redundancy, which is virtually always set up
for potentially critical services, did not work, or why a failover scenario
did not take effect.

                                                                                                                                                  11
WHITE PAPER ZERO OUTAGE

ZERO OUTAGE IN PRACTICE
Zero Outage by T-Systems is a comprehensive program required for establishing maximum quality in ICT,
which is in turn a pre-requisite for stepping into the digital age and being able to operate successfully in it.
In other words, no reliable ICT means no digital transformation.

Zero Outage covers a number of technical and procedural measures,      Our staff are the backbone of our Zero Outage program, because
all of which serve the purpose of ensuring quality. They also always   they embody our strategy, and are the best, most credible
go hand in hand with the human factor, because staff at the service    representatives of quality, both internally and externally. They
provider’s end, and of course also at the customers’ and suppliers’    promote the issue with their expertise and precision, so that we can
ends, are ultimately the ones who embody the notion of quality and     also keep ensuring high quality.
apply it on a daily basis.

Mai 2019

    Contact                                                                publisher
   T-Systems International GmbH                                            T-Systems International GmbH
   Global Delivery Excellence                                              Hahnstraße 43d
   Doris Reitter                                                           60528 Frankfurt, Germany
   Fasanenweg 5                                                            http://www.t-systems.de
12 70771 Leinfelden-Echterdingen, Germany
You can also read