Self-Healing and Resilience in Future 5G Cognitive Autonomous Networks - J. Ali-Tolppa, S. Kocsis, B. Schultz, L. Bodrog, M. Kajo Nokia Bell Labs ...
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Self-Healing and Resilience in Future 5G Cognitive Autonomous Networks J. Ali-Tolppa, S. Kocsis, B. Schultz, L. Bodrog, M. Kajo Nokia Bell Labs janne.ali-tolppa@nokia-bell-labs.com 26-28 November Santa Fe, Argentina
Robustness
• “Capability of performing without failure under a wide range of conditions ”
Merriam-Webster Dictionary
Resilience
• “An ability to recover from or adjust easily to misfortune or change”
Merriam-Webster Dictionary
26-28 November
Santa Fe, ArgentinaWhy is resiliency important in 5G?
• 5G is by nature dynamic and complex
→ Unforeseen circumstances are bound to happen
• Use cases requiring ultra-high reliability (URLLC)
Robustness (redundancy etc.) is no longer alone
enough!
26-28 November
Santa Fe, ArgentinaHow to design for resilience? • Monitor and adapt Focus • Decoupling, Common core modularity principles 26-28 November Santa Fe, Argentina
How to design for resilience? • Monitor and adapt Focus • Decoupling, Common core modularity principles 26-28 November Santa Fe, Argentina
Self-Healing in Radio Access Networks 26-28 November Santa Fe, Argentina
Detecting Anomalies without Labelled Training Data
Which are anomalous?
Example 1 Example 2 Example 3 Example 4
Meaningless question Red Green
26-28 November
Santa Fe, ArgentinaAnomaly Detection
Feature selection
Relevant feature:
Color Shape Color and shape
26-28 November
Santa Fe, ArgentinaRadio Access Network Anomaly Detection
Feature and context selection
• Input features include typically Performance Management (PM)
Key Performance Indicators (KPIs) and Fault Management (FM)
alarms, but other (additional) inputs can be used as well, e.g. log
analysis
• Is the whole input space profiled, including cross-correlations, or
only selected projections of it (single KPIs, selected KPI pairs etc.)
• Context needs to decided, e.g. will the profiling be done per
network function or a group of network functions, hourly, diurnal
profiles for network traffic dependent KPIs etc.
• In our work we used PM data only and created diurnal profiles for
traffic-dependent KPIs and cross-correlations for selected KPI
pairs.
26-28 November
Santa Fe, ArgentinaRadio Access Network Anomaly Detection Simple time-context dependent profiling of a timeseries 26-28 November Santa Fe, Argentina
Radio Access Network Anomaly Detection Cross-correlation profiling with clustering • First a clustering algorithm is applied, which omits the most probable outliers to clarify data • Correlation is modelled only inside the clusters • Can model also non-normal multivariate distributions 26-28 November Santa Fe, Argentina
Diagnosis
Anomaly event detection and diagnosis
anomaly pattern
anomalous timeframe KPI1
KPI1 average value (KPI1)
KPI2 average value (KPI2) KPI2
KPI3 average value (KPI3)
KPI3
Time
• Anomalous timeframes are detected by using DBSCAN algorithm on the
anomaly levels of selected features against their profiles
• By aggregating the selected feature (KPI) values in the anomaly event
timeframe, the event is represented as an anomaly pattern
– The diagnosis feature set can be, and often is, different than what is used in the
detection!
• The root causes of the detected anomaly patterns are diagnosed against a
diagnosis knowledgebase
26-28 November
Santa Fe, ArgentinaDiagnosis
Active learning assisted diagnosis
st ruct ured view of t he dat a
ret hink loop rest ruct ure
int erpret ation of t he dat a
a) A human operator provides the machine with his own interpretation of the data
– By attaching labels to anomaly points or clusters while considering information from step b)
b) The machine provides the operator with a structured view of the data
– By clustering the data points while taking into account information from step a)
26-28 November
Santa Fe, ArgentinaHolistic Self-Healing
Across domains and management areas in mobile networks
In a complex system, improving the resilience of only one part or level of organization can sometimes
(unintentionally) introduce fragility in another. To improve the resilience, it is often necessary to work
in more than one domain and scale at a time. - A. Zolli, A. M. Healy, “Resilience – Why Things
Bounce Back”
Coordination is required between the self-healing actions of, for example:
• Network Management (NM): Management automation aggregated on a (Virtual) Network Function (V)NF level
• Quality of Experience (QoE) driven management: Optimizing the end-to-end customer experience at the
application and individual subscriber level
• VNF and Service Orchestration
26-28 November
Santa Fe, ArgentinaKnowledge Cloud
Transferring diagnosis knowledge
• Collecting the diagnosis knowledge base is significant effort
• It would be desirable to be able to diagnose previously unforeseen problems
• This could be mitigated by sharing diagnosis knowledge between self-healing function deployments
• However, translating, i.e. generalizing and re-applying, diagnosis knowledge from other
deployments is a difficult problem
– Transfer learning methods
26-28 November
Santa Fe, ArgentinaDemonstration in SON Experimental System 26-28 November Santa Fe, Argentina
Evaluation Results
• Fault injection in a testbed
• Radio attenuation
• Backhaul misconfiguration
• Background traffic increased until the optimization functions can no longer
remedy the problem
• Our solution detected and diagnosed both conditions before they lead to a
service degradation
26-28 November
Santa Fe, ArgentinaConclusion
• In 5G, networks are becoming ever more complex and dynamic
• At the same time, new use cases are requiring increased reliability
• We need intelligent resilient networks that can react to unforeseen
problems and adapt to changes in their context
• A step in this direction is the self-healing method presented in this
paper, based on anomaly detection and diagnosis
• We need methods to share knowledge and coordinate the self-
healing actions across management domains, areas and
deployments
– Standardized interfaces not only for sharing data, but also for sharing
knowledge and machine learning models
26-28 November
Santa Fe, ArgentinaThank you
You can also read