A Pattern Based Method for Simplifying a BPMN Process Model - MDPI

Page created by Curtis Tyler
 
CONTINUE READING
applied
           sciences
Article
A Pattern Based Method for Simplifying a BPMN
Process Model
Mateo Ramos-Merino * , Luis M. Álvarez-Sabucedo               , Juan M. Santos-Gago
and Francisco de Arriba-Pérez
 Telematic Engineering Department, Escola de Enxeñaría de Telecomunicación, Universidade de Vigo,
 Campus Lagoas-Marcosende, 36.310 Vigo, Galicia, Spain; lsabucedo@gist.uvigo.es (L.M.Á.-S.);
 jsgago@gist.uvigo.es (J.M.S.-G.); farriba@gist.uvigo.es (F.d.A.-P.)
 * Correspondence: mateo.ramos@gist.uvigo.es; Tel.: +34-986-814-073
                                                                                                    
 Received: 26 April 2019; Accepted: 4 June 2019; Published: 5 June 2019                             

 Featured Application: To work with a simplified version of a BPMN model that condenses the
 information of interest can be very interesting from a human point of view (the understanding
 of the model is facilitated). Moreover, an optimised version of a BPMN model can yield in more
 efficient results for mining process software and data analytic techniques.

 Abstract: BPMN (Business Process Model and Notation) is currently the preferred standard for
 the representation and analysis of business processes. The elaboration of these BPMN diagrams is
 usually carried out in an entirely manual manner. As a result of this human-driven process, it is
 not uncommon to find diagrams that are not in their most simplified version possible (regarding
 the number of elements). This work presents a fully automatic method to simplify a BPMN process
 model document. A two-phase iterative algorithm to achieve this simplification is described in
 detail. This algorithm follows a heuristic approach that makes intensive use of a Pattern Repository.
 This software element is concerned with the description of feasible reductions and its enactment.
 The critical concept lies in the discovery of small reducible patterns in the whole model and their
 substitution with optimised versions. This approach has been verified through a double validation
 testing in total 8102 cases taken from real world BPMN process models. Details for its implementation
 and usage by practitioners are provided in this paper along with a comparison with other existing
 techniques concerned with similar goals.

 Keywords: BPMN; simplification; process

1. Introduction
     Business Process Model and Notation (BPMN) is an Object Management Group (OMG) standard
for specifying business processes. The current version is 2.0.2 [1], and it allows the modelling of
business processes both from a graphical perspective and from a machine-readable perspective (using
an XML-based representation). It makes BPMN a suitable language for the description of processes
in a visual way and their subsequent analysis using data analytic or process mining techniques [2].
Presently, BPMN is a leading industry standard for specifying workflows. It is broadly used, for
example, in the domain of health-care protocols [3], software process tailoring [4], Data Quality
management [5] or Internet of Things proceedings [6], among many other cases.
     This paper describes an approach for the simplification of a BPMN process model aimed
to optimise the number of elements included in the workflow. During the simplification task,
the behaviour described initially must be preserved, i.e., the new simplified workflow model must be
coherent with the original one, offering the same flow logic.

Appl. Sci. 2019, 9, 2322; doi:10.3390/app9112322                             www.mdpi.com/journal/applsci
Appl. Sci. 2019, 9, 2322                                                                              2 of 28

      This simplification procedure aims to achieve two primary goals, namely:

•     Provide a cleaner and easier to understand graphic representation. Therefore, a better
      understanding with less effort of these workflows will be provided to human readers [7].
•     Reduce the complexity of the model for its processing by software agents. An optimised
      version of a BPMN model will yield more efficient results for process mining software and
      data analytic techniques.

     It must be borne in mind that an efficient BPMN simplification can be a complicated task. In real
scenarios, the number of possible paths in a workflow can grow very fast as the number of activities
increases. Furthermore, due to loops and other structures in the model, the number of different
combinations in the activities execution can cause an explosion in the number of states. This paper
shows an iterative procedure to achieve the simplification objective. In the proposed approach,
the deployment of a Pattern Repository will play a principal role in the simplification procedure.
This pool of BPMN-based patterns is gathered by the authors, and it contains the core of the proposal.
     The main goal of this contribution is to provide a mechanism to simplify, if possible, a given BPMN
Process type workflow (that describes a sequence or flow of activities with the objective of carrying
out work). This type of workflows are commonly created in a non-automatic manner, i.e., a human
operator creates them. First of all, this manual creation process is prone to introduce redundant or
repetitive paths that do not provide new information to the behaviour of the model. In this situation,
the proposed algorithm will identify the unnecessary elements and will generate a new simplified
BPMN diagram offering an equivalent description of the workflow. Secondly, the application of the
simplification algorithm is also intended for more complex issues. In particular, presently, it is common
the generation of BPMN diagrams involving hundreds of activities from different perspectives and
different levels of abstraction [8]. It provides a series of advantages [8] but also results in more complex
flowcharts and more analysis time demanding models.
     To illustrate that issue, the reader can consider Figure 1. It represents a process involving activities
from three different roles: Project Manager (PM), Operator (Op), and Administrative (Ad). Let us
suppose now that due to particular constraints, only a subset of these activities are relevant for concrete
analysis. In a specific context, it could be interesting to focus on a particular role (or a set of them).
Let’s assume that we are concerned just with management operations, i.e., those performed by the
Project Manager and Administrative roles. Therefore, it would be required to eliminate those activities
in which the other roles (i.e., the Operator) participate. In this case, taking into account the entire
model in Figure 1a is an unnecessary burden as only activities A, G and H are of interest. Therefore,
before launching the analysis (either visually by a human or by any automatic techniques), it will be
convenient to get a simplified model containing only the desired activities, like the model shown in
Figure 1b.
     In this example, not only the unneeded activities have been eliminated. The resulting model
includes fewer Exclusive Gateways and fewer paths (sequence flows). In short, a new version of
the entire workflow has been generated, but it was preserved its original behaviour (taking into
account only those activities of interest). Despite the simplicity of this example, the complexity of this
transformation is elevated for a real world BPMN model.
     As the reader may note, if an activity or set of them do not provide any actual information,
then removing it can be a real advantage. The simplification is clear in a more compact model:
the fewer elements involved, the easier it is to understand the model. Of course, the resulting model
can be tackled in a more efficient fashion in further analysis and operations. Analysis of the simplified
workflow (including only the needed behaviour) can produce even more transparent and concise
results due to the elimination of unnecessary elements. In this way, it is possible to have a general
model with hundreds of activities and generate different perspectives depending on the particular
interests. With this, we will have access to a complete family of processes.
Appl. Sci. 2019, 9, 2322                                                                             3 of 28

    This type of simplification attains greater significance in real world process models. In an
organisation with hundreds of models containing hundreds of activities, it is possible to find
superfluous or repetitive paths that do not provide new information to the behaviour of the model.

                            Task A                                              Task A
                             (PM)                                                (PM)

                   Task B
                    (Op)

                   Task C   Task D   Task E
                    (Op)     (Op)     (Op)

                                                                                          Task G
                                                                                           (PM)

                   Task F            Task G
                    (Op)              (PM)

                            Task H                                              Task H
                             (Ad)                                                (Ad)

                 (a) Example BPMN model                                    (b) Simplified model

                              Figure 1. Input and output of the simplification process.

     This work presents an automatic heuristic-based approach to perform this type of simplification
on any given BPMN model. Starting with a model (like in Figure 1a) and a list of the interesting
activities, the objective is to achieve a simpler model, as presented in Figure 1b, containing just the
relevant information for a specific goal. To get this result, several issues must be borne in mind as
discussed in following sections.
     The remainder of this paper is structured as follows. Next section highlights other approaches
available in the literature related to this work. Then, Section 3 discusses about the pattern repository,
a main piece of the proposed algorithm. Section 4 deeply describes the algorithm devised. Section 5
performs a discussion about the complexity of the algorithm. Afterwards, Section 6 presents the
algorithm validation using 8102 cases taken from real world examples. Section 7 verifies the usability
of the proposal in a real scenario. Finally, Section 8 points out the conclusions.

2. Related Work
      The work presented in this paper is related to initiatives and projects in the Business Process Model
Abstraction (BPMA) and Business Process Variability Modeling (BPMV) domains. Main similarities
and differences regarding some of the most significant proposals in these areas are reviewed.
      BPMA has been proposed in last years (see [8–11]) as a solution to a common problem appearing
in large companies worldwide. Traditionally, companies maintain process model repositories including
thousands of different workflows for each of the perspectives and goals to be tackled. As a consequence,
the models within their repositories tend to include complex interrelationships, and they may even
include overlaps or the same behaviour in different perspectives [8]. In this scenario, the primary
purpose of BPMA is to design detailed models containing the whole set of information about the
processes and to provide suitable abstraction mechanisms for generating the desired model view.
This way, different kinds of users can access different views and perspectives on processes with an
Appl. Sci. 2019, 9, 2322                                                                              4 of 28

adapted visualisation. The approach proposed in our work can help to achieve these objectives, as it is
intended to provide an algorithm to generate a simplification on a process model based on a set of
relevant activities.
      According to [8], and based on the cartographic generalisation by [12], BPMA deals with three
main generalisation components: “why”, “when” and “how”. “Why” is concerned with the reasons
and the goals of the abstraction. “When” includes the rules, thresholds and conditions to perform
the abstraction. “How” describes the transformation and techniques for achieving the abstraction.
This last one explains the method and the algorithms for performing the simplifications of the models.
The simplification algorithm proposed is framed on the “how” generalisation component of BPMA.
      The basic abstraction operations defined in BPMA are “elimination” (to delete elements in the
model to reduce the coverage level) and “aggregation” (to create groups of elements to increase the
granularity level). The former produces a model containing no information about the deleted elements
while the latter preserves some part of the information. In the present work, the simplification
algorithm is concerned with the elimination operation using a pattern matching method. In fact,
some BPMA approaches, such as [13–15], perform eliminations using a sibling concept including
Event-Driven Process Chains (EPCs) [16] fragments and transformation rules. There are also some
other works that explore the pattern reduction method, using a closer approach to this proposal.
In [17], five reduction rules that remove the undesired elements in the workflow model are presented.
The authors aim to identify structural conflicts in the process model (like deadlocks and lack of
synchronisation). They propose some correctness criteria to identify this structural conflicts by
exploring all possible instance sub-graphs. However, starting with a brute force method to explore
them may not be computationally effective in many contexts. To fight back this issue, they propose to
perform firstly a simplification of the model based on these five rules previously identified. In our
case, the objective is to delete from the model all the elements and structures that are correct but not
significant for purposes of analysis, and, then, to explore the resulting model. However, even the
simplification concept can be similar in some way to the previous works, the final goal is very different.
The major similarities are related to the “adjacent reduction” and “closed reduction” rules. These rules
can be applied in both works: [17] and the current one.
      In any case, these works tackle the issue exclusively from a theoretical point of view, not a practical
one. They do not provide a practical mechanism capable of carrying out and implementing these
operations automatically. Additionally, these approaches are not based on BPMN notation, but on other
languages and models (for example EPC). On the contrary, our proposal focuses, firstly, on providing
a practical mechanism that, using a scalable repository of patterns, implements a solution in a fully
automatic manner. Secondly, it is based on the BPMN language, the most widely used solution for the
description of workflows in productive environments.
      In [15] it is used some of the reduction rules previously identified in [17], but with a different
purpose. In this case, the objective is closer to BPMA. The authors of this work are concerned
with the “how” component and focus their efforts on the elimination operation. They identify
Single Entry Single Exit (SESE) blocks and substitute them with edges or paths. In this procedure,
the authors apply reduction rules for simplifying some elements in the process model. Besides of
these similarities, the simplification rules are not the most important part of that contribution. The core
contribution of these works is concerned mainly with the aggregation and elimination operations
on the so-called SESE blocks. Also, they are not focused on BPMN, as they are using what they
call “process schemes”, a process graph with atomic activities and control elements between them.
Moreover, our approach is not limited only to Single Entry Single Exit blocks. Multiple Entry Multiple
Exit blocks are also analysed.
      It is common in organisations to work with families of process variants. In this way,
the conventional modelling languages have not specific support to represent these families.
Over the past decade, several approaches have studied the Business Process Variability Modeling
(BPVM) field in order to tackle this issue by extending the conventional business process modelling
Appl. Sci. 2019, 9, 2322                                                                              5 of 28

languages [18]. Presents a complete survey in the BPVM domain. It analyses 66 relevant publication
from the year 2000 that cover 23 approaches to solving the problem. According to the variability
mechanism, the approaches can be sorted into four classes: node configuration, element annotation,
activity specialisation and fragment customisation. The approach proposed in the current paper
can be categorised in the fourth class: fragment customisation. This class includes the works that
propose the manipulation (insert, delete or modify) of entire fragments in order to create variation and
customisation of the base process model. This class includes two main groups of approaches: Process
variants by options (the most relevant works are [19–22]) and Template and Rules (the most relevant
works are [23,24]). In the first group, the main idea is to mark the process model with the called
adjustment points and to perform the change operations over these predefined points. This approach
supports four types of operations to change the process flow: delete, insert, modify and move.
For example, with the move operation, it is possible to relocate a fragment in the base model bounded
by two adjustment points to another part bounded by two distinct adjustment points. The second
group associates a set of business rules to a specific template model that will be used to perform the
change operations. The different process model perspectives allowed in the family can be inferred by
using the rules associated with the template model.
      Meanwhile the main concepts behind all the BPVM approaches are very similar to our work,
the philosophy and the application of the proposal are quite different. In particular, our approach
uses a combination of the insert and delete basic manipulations to implement a complex operation:
replace. In our case, a full Multiple Entry Multiple Exit fragment will be replaced by a simplified
version offering the same behaviour but using fewer elements. One of the keys of our proposal lies
in to obtain an efficient process model (i.e., with fewer elements). Moreover, our approach is generic
and can be applied without modification to any BPMN process model. The most relevant works in
the literature discussed in this section do not use this approach. They define rules and fragment for a
specific family of models used by a specific organisation.
      Other related works are those that deal with Business Process (BP) patterns [25]. The BP patterns
are examples of process modelling that show how to connect activities within the workflow to solve a
specific problem (i.e., synchronisation, parallel split, or exclusive choice, among others). To represent
these patterns, a fully graphical notation oriented to human users is usually used. The Workflow
Patterns Initiative has been working since 1999 in this field. The objective of this initiative is to provide
a conceptual basis for business process design on different modelling perspectives (control flow, data,
resource, and exception handling). Using the proposed patterns, the modelling stage of the business
process is facilitated and a concise and straightforward workflow that adequately represents the desired
behaviour is achieved. This goal has certain similarities with the one pursued by the present article.
The BP patterns intend to approach the problem from the beginning, the modelling phase, while our
proposal has another approach. A given model can change along its life cycle, so simplification in later
phases, after the initial modelling stage, can be required. The following sections show actual situations
(common in certain fields [26]) where it is required to remove a subset of the activities from a BPMN
model. Under this premise, the simplification of the BPMN in the middle of its life cycle turns out to be
of the utmost importance. In this context, the present proposal stands out, developing an alternative
point of view and automating the task of simplification.
      The gist of the above-mentioned works on BPMA and BP patterns has much in common with
the fields of refactoring [27] and anti-patterns [28] which in turn are closely related to our proposal.
Antipatterns share certain features with the simplifiable fragments of a process model and can be
found for different modelling languages (EPC, BPMN, PetriNets, YAWL, among others). The task of
refactoring resembles the action of searching and replacing these fragments with an optimised version.
For a long time, there have been many works devoted to refactoring on varied contexts (for example
centred on Unified Modeling Language (UML) diagrams [29], class diagrams [30] or use cases [31])
but it has been just in recent years when more proposals related to process models have appeared.
In general, each work has different objectives and different refactoring premises (quality in terms
Appl. Sci. 2019, 9, 2322                                                                            6 of 28

of the number of elements, ease of understanding, consistency with changes, etc.). Ref. [32] focus
on those process models that are shared by different users and analyse how to maintain a dynamic
workflow in which several people interact along their life cycle. These works focus on evaluating the
consistency of the patterns in this type of dynamic environments. There is a large number of works,
for example [33,34], that aim to improve some aspect of the quality of the models through refactoring
but do not present a fully automated mechanism. In these works, assistance to the user is particularly
relevant as they suggest sets of model refactorings. Ref. [35] presents an interesting approach in which
human interaction plays an important role in replacing fragments taking into account the semantics
of the model and the real behaviour intended to be represented. Therefore, it is possible to propose
substitutions that, even though not formally coherent, result in a final model more friendly and very
simplified. However, in order to achieve this objective, the automation of the process is also sacrificed.
      A set of works in the bibliography related to refactoring follow an approach that is known as
Model transformations by demonstration (for example [36,37]). The general idea is first to monitor
the substitution procedure performed by a human user. Thus, all atomic operations performed by the
user are captured by comparing the initial and final models. In this way, they can propose a set of
recommendations associated with the refactoring operations that are used to assist the simplification
procedure. During the last few years, many works related to the application of refactoring and
antipatterns techniques for the field of process models have emerged. However, and to the best of
our knowledge, neither of them focuses on the type of patterns identified in this work nor shares the
goal tackled by this work. This is due, in part, to the fact that the patterns considered by our work are
not very common in the early stages of the life cycle of process models. This situation changes when
analysing successive iterations in process remodelling. Notably, the patterns proposed here begin to
appear, and their substitution becomes especially important when the process model has to undergo
changes. In this scenario, it may be very useful to eliminate subsets of activities. These circumstances
occur more often than is usually thought at first sight since the remodelling and enhancement of a
process model is an essential task within the life cycle of BPMN models.
      Another point that is not very usual in the field of refactoring and that occurs in the present work
is based on using additional information about the process model itself to carry out the simplification.
This aspect can be seen in the examples proposed in Section 7, which takes into account the roles of the
people involved in the process to perform various types of simplifications. This approach is mentioned
as a trend and research opportunity in [38]. This document also considers as a relevant trend the
creation of fully automatic mechanisms for simplification. This feature is also a relevant aspect in the
present work.
      From a high level of abstraction, and contrary to other approaches already mentioned in the
literature review, the main objective in the current work is to achieve a simpler and more efficient
model to use in combination with additional analysis and process mining techniques to improve the
understanding of the model by a human user (similar goals are discussed as future challenges for
the Business Process Model domain in [39]). Furthermore, this work is performed over the preferred
standard for the representation and analysis of business processes, BPMN, making it of a broader
usage for the community.

3. Pattern Repository
     To meet the proposed objective, authors decided to build a repository containing ordered pairs of
patterns. Each of these pairs is made of the complex version of a pattern and its simplified version.
This repository will be used later by the proposed algorithm to carry out the simplification of any
given BPMN model. This section focuses on the identification of the minimal set of required patterns
and its representation.
Appl. Sci. 2019, 9, 2322                                                                              7 of 28

3.1. Repository Population
      In the frame of the present work, a pattern is understood as a sub-part of a larger BPMN process
model, i.e., a set of different elements (activities, gateways, connectors, etc.) with well-defined
relationships. A pattern is considered as reducible (or simplifiable) if another pattern that represents
the same behaviour than the first one but using fewer elements is possible. Figure 2a,b exemplify
this idea with two simple patterns. As shown, the original pattern contains more sequence flows
elements than the simplified one. However, replacing one block with the other one would not affect
the behaviour of the global model. In the rest of the figures (Figures 3–6), the reader can observe the
other pairs of patterns in the repository. For example, in Figure 5, a pattern including three different
Exclusive Gateways is presented. The reader can note that one of the Exclusive Gateways (the superior
one) is not needed. Thus, it can be removed without losing any relevant feature. Its inputs will be
the new inputs for the lower right gateway. Applying this simplification reduces the total number of
elements in the new model.
      Taking into account the pattern repository, it can be followed that the pattern pool will contain
two groups of patterns connected by a function. This function maps a set of reducible or simplifiable
patterns (such as Figure 2a) to a set of simplified patterns (such as Figure 2b). Formally speaking,
the function is surjective because there are different simplifiable patterns that can lead to the same
simplified pattern.
      To discover those pairs of patterns that constitute the repository, a systematic review of a broad
set of BPMN models was carried out. A series of steps have been performed to finally identify the
set of 5 simplifiable patterns considering the Exclusive Gateway elements of the BPMN notation.
This type of elements is the most common one to define the behaviour at a decision point in real
BPMNs. The process of discovery of the patterns has pursued two main objectives:

•     Identify a set of pattern pairs that would substantially reduce a BPMN process model.
•     Get the minimal set of them. Thus, greater efficiency is guaranteed when carrying out the search
      and replacement tasks performed by the simplification algorithm.

     The identification of patterns was conducted using a limited set of BPMN models intended to
discover simplifiable fragments visually. These BPMN models were generated from real models,
inserting on them several redundant paths to get potentially highly simplifiable models. Starting with
a BPMN model that refers to a real context, a random set of activities was eliminated (replacing
them with a connection between the previous element and the next one). This process is similar
to the procedure followed in stage 1 of the algorithm (see Section 4). By systematically repeating
this procedure on different real BPMN models and eliminating different combinations of activities,
a working set is obtained on which to identify the simplifiable patterns.

                     A                   A'                                 A                   A'

                     ...                 ...                                ...                 ...

                     N                   N'                                 N                   N'

                      (a) Original pattern                                  (b) Simplified pattern
                                               Figure 2. Patterns pair A.
Appl. Sci. 2019, 9, 2322                                                                                                                         8 of 28

                                                                                                                                  A'
              A                                                A''

                                                                                                       A                          ...
              ...                                                  ...

                                                                                                                                  N'
              N                                                N''                                     ...
                                                   A'
                                                                                                                                  A''

                                                   ...
                                                                                                       N                          ...

                                                   N'
                                                                                                                                  N''

                                  (a) Original pattern
                                                                                                             (b) Simplified pattern
                                                                    Figure 3. Patterns pair B.

                          A                              B                                             A                              B

                                  (a) Original pattern                                                       (b) Simplified pattern
                                                                   Figure 4. Patterns pair C.

                                A'                                                                               A'

                                ...                                                                              ...

                                N'                                                                               N'
                                                             A'''                                                                         A'''

                    A                                        ...                                 A                                        ...

                    ...                                      N'''                                ...                                      N'''
                                                   A''                                                                    A''

                    N                                                                            N
                                                   ...                                                                    ...

                                               N''                                                                        N''

                                  (a) Original pattern                                                       (b) Simplified pattern
                                                                   Figure 5. Patterns pair D.

                          A'                                                                                     A'

                          ...                                                                                    ...

                          N'                                                                                     N'                       A'''
                                                                   A'''

              A                                                     ...                          A                                        ...

              ...                                                  N'''                          ...                                      N'''
                                             A''                                                                          A''

              N                                                                                  N
                                             ...                                                                          ...

                                             N''                                                                          N''

                                  (a) Original pattern                                                       (b) Simplified pattern
                                                                    Figure 6. Patterns pair E.

      On a first step, the process of identification of patterns was carried out individually by each of the
authors. Each one of the BPMN models of this working set was analysed following a series of stages:
(i) identify a simplifiable fragment; (ii) carry out a repeated simplification of this fragment until all
the occurrences of it have been eliminated; (iii ) analyse the resulting model and check if there is still
redundancy; (iv) if so, identify a new type of simplifiable fragment and (v) repeat steps (ii), (iii) and
Appl. Sci. 2019, 9, 2322                                                                                      9 of 28

(iv) until a model that does not show redundancy is obtained. With this first step, each of the authors
obtained a set of simplifiable BPMN fragments.
      On a second stage, the pairs of patterns identified by each author was put together, and the
duplicated ones were discarded. Patterns that could be broken down into several smaller patterns
were also eliminated. This feature is especially significant to guarantee the minimum possible number
of patterns in the repository. Finally, a set of 5 pairs of patterns shown in Figures 2–6 were obtained.

3.2. Pattern Representation
     The patterns in the repository are software components. In particular, the repository patterns are
represented as Java classes implementing the SimplifiablePattern interface shown on the Lisitng 1.
This interface includes the methods necessary to search and replace a simplifiable pattern in the
original model with a simplified one:

•       searchPattern This method has to look for the reducible pattern in the model to simplify. If the
        pattern is found, it returns the point in the model in which the pattern has been identified.
•       reducePattern This method has to simplify a pattern in a model (found previously by the previous
        method) by replacing the reducible pattern with the simplified pattern associated (that contains
        less elements).

 1    public interface S i m p l i f i a b l e P a t t e r n {
 2      public BpmnId searchPattern ( Bpmn model ) ;
 3      public Bpmn reducePattern ( Bpmn model , BpmnId startElmnt ) ;
 4    }

                                         Listing 1: Interface SimplifiablePattern.

     Thus, the Java classes intended to contain the information about the pair of patterns will play
a central role as they will include not just the graph models but also the artifacts to support the
operations required in the simplification process (described in the next section).
     The population of the repository begins with the identification of the pair of patterns: the
simplifiable pattern and its simplified version (cf. Figures 2–6). From the analysis of both versions,
the code must be developed in order to transform the BPMN fragments information into a Java class
that contains the behaviour of the simplifiable pattern and the actions that must be performed on the
BPMN to apply the simplification. As an example, the patterns pair shown in Figure 2 is implemented
on the Listing 2.
 1   public class PA implements S i m p l i f i a b l e P a t t e r n {
 2   @Override
 3   public BpmnId searchPattern ( Bpmn model ) {
 4     ArrayList < BpmnExGateway > exGateways = model . g e t E x c l u s i v e G a t e w a y s () ;
 5     for ( BpmnExGateway gateway : exGateways )
 6     {
 7       ArrayList < BpmnId > o ut S eq u en ce s Id s = gateway . g e t O u t S e q u e n c e s I d s () ;
 8       for ( BpmnId seqId : o ut Se q ue nc e sI d s )
 9       { // check if some sequence returns to gateway
10         BpmnId nextElmnt = model . g et Se q ue n ce By I d ( seqId ) . target () ;
11         if ( nextElmnt . equals ( gateway . getId () ) )
12         { // Element found
13           return new BpmnId ( gateway . getId () ) ;
14         }
15       }
16     }
17     return null ;
18   }
19

20   @Override
21   public Bpmn reducePattern ( Bpmn model , BpmnId start ) {
Appl. Sci. 2019, 9, 2322                                                                                      10 of 28

22       BpmnId gId = startElement ;
23       BpmnExGateway gateway = model . ge tGatew ayById ( gId ) ;
24       ArrayList < BpmnId > o u tS eq u en ce s Id s = gateway . g e t O u t S e q u e n c e s I d s () ;
25       for ( BpmnId seqId : o ut Se q ue nc e sI ds )
26       { // search all sequence returning to gateway
27         BpmnId nextElmnt = model . g et Se q ue n ce By I d ( seqId ) . getTarget () ;
28         if ( nextElmnt . equals ( gateway . getId () ) )
29         { // Element found
30           model . del eteSeq uence ( seqId ) ;
31           gateway . de leteOu tgoing ( seqId ) ;
32           gateway . de leteIn coming ( seqId ) ;
33         }
34       }
35       return model ;
36   }
37   }

                               Listing 2: First pair of patterns implemented as a Java class.

      The first method of the interface, searchPattern, consists mainly in a loop along all the exclusive
gateways in the model to find out if one of them matches with Figure 2a. For each gateway, it looks for
a sequence that exits and returns to it (i.e., doing nothing). If a gateway with these conditions is found,
the method returns it. Conversely, the second method, reducePattern, receives as input parameter the
starting element of the pattern found by the previously described method (if the case). The objective
now is to convert the original pattern into its simplified version (cf. Figure 2a,b) inside the whole
BPMN process model. In this case, it looks for all the sequences that are not necessary and deletes them.
      The reader may note the use of the Java class Bpmn. This class is intended to parse an XML
file describing a BPMN compliant document into a set of data structures representing the BPMN
process model. Also, it offers a collection of high level operations to operate on the model facilitating
the implementation of the patterns as previously described (cf. Figure 7). It includes, for example,
methods to get the input and output sequences of an element or to delete and modify the connections
between the activities.

                                                                               Other BPMN elements

                                     Task        ExGateway       Sequence                 ...

                                                                Bpmn

                                      ...

                                      + parseXML(XML)
                                      + searchTaskById(BpmnId)
                                      + getTasks()
                                      + getExclusiveGatewayById(BpmnId)
                                      + getExclusiveGateways()
                                      + ...
                                      + getSequences()
                                      + getSequenceById(BpmnId)
                                      + getSequenceByTarget(BpmnId)
                                      + createSequenceBetweenElements(BpmnId, BpmnId)
                                      + deleteSequence(BpmnId)
                                      + ...
                                      + deleteTask(BpmnTask)
                                      + deleteExclusiveGateway(BpmnExGateway)
                                      + ...
                                      + size()

                                                      High-level methods over BPMN document

                                                      Figure 7. The BPMN class
Appl. Sci. 2019, 9, 2322                                                                                                  11 of 28

     To illustrate the implementation of the pattern repository it has been presented a straightforward
example involving a pair of patterns. The separation of the execution into the two methods can sound
to the reader unnecessary. However, real usage has brought into light that it is very useful for the sake
of code simplicity and scalability in the more sophisticated patterns. The complete repository of all
the patterns identified by the authors was implemented under this scheme. This repository is easily
scalable by adding new patterns as new Java classes.

4. Algorithm Description
     The model simplification algorithm is tackled in a two-stage approach. The first stage is concerned
with the generation of the new version of the model by removing the undesired/unneeded tasks.
Afterwards, using this new model as input, the optimised version of the model is obtained as the
result of an iterative procedure that uses the repository of patterns described previously. Eventually,
a redesigned and simpler version of the model, if possible, is obtained.
     In the following subsections a real world example will be presented to illustrate the phases of
the algorithm. An excerpt of a workflow that describes the process of opening a bank account is
considered. In this workflow (cf. Figure 8), activities can be classified into two groups:

•     Operations performed by the client using a computer terminal (C).
•     Operations involving interaction with the bank staff (BS).

                                 A: Get customer
                                 coordinates (C)

                   B: Retrieve                      C: Build
                   customer
                  information                      customer
                                                   profile (C)
                       (C)

                            D: Explain account types                               D: Explain account types
                               and select one (BS)                                    and select one (BS)

               E: Physical deposit            F: Deposit through      E: Physical deposit            F: Deposit through
                       (BS)                   bank transfer (BS)              (BS)                   bank transfer (BS)

                            G: Sign documents and                                  G: Sign documents and
                            open bank account (BS)                                 open bank account (BS)

                                      (a)                                                   (b)

      Figure 8. Opening a bank account. (a) All the operations; (b) Operations involving interaction with the
      bank staff.

4.1. First Stage: Deleting Undesired Activities
    On the first stage of the proposed schema, undesired activities selected by the user must be
removed. The resulting model must be still coherent. Therefore, the algorithm must delete the
appropriate sequence flows in the BPMN models too (i.e., sequence flows connecting the deleted
Appl. Sci. 2019, 9, 2322                                                                                                               12 of 28

activity). Also, new sequences connecting the surrounding elements of a deleted activity (previous
with next ones) must be created. In Figure 8 an example is shown of the input and the output of
this stage, in case of aiming at getting a simplified version of the workflow for operations requiring
interaction only with the bank staff, Figure 8a (representing the input) is considered as the starting
point. In this line, Figure 8b represents the output model where sequence flows replace the operations
without interaction with bank staff.
      As the reader can note, in this new model, not all elements in Figure 8b are required to represent
the desired behaviour. This issue will be tackled later on, on the second stage of the algorithm.
      A fragment of the Java implementation for the first stage is presented in the Listing 3. The first
stage function loops over the list of activities to be deleted and removes the activities defined by the
user. To delete one single activity it is mandatory to establish a direct connection between the previous
element and the upcoming one (another activity, a gateway, etc.). Upon the new connection is created, it
is possible to delete the old activity and the old sequences as they are no longer of any use in the model.
 1   public Bpmn firstStage ( Bpmn model , ArrayList < BpmnId > t a s k s T o D e l e t e I d s ) {
 2     for ( Id t askToD eleteI d : t a s k s T o D e l e t e I d s )
 3     { // Search each task to delete ...
 4       BpmnTask task2Del = model . search TaskBy Id ( ta skToDe leteId ) ;
 5

 6          // ... and delete it safetely
 7          model = this . deleteTask ( model , task2Del ) ;
 8       }
 9       return model ;
10   }
11

12   // Deletes a task in a safetely way
13   private Bpmn deleteTask ( Bpmn model , BpmnTask task2Del ) {
14

15       // Search sequences Ids
16       BpmnId inSeq = task2Del . ge t In Se q ue n ce Id () ;
17       BpmnId outSeq = task2Del . g e t O u t S e q u e n c e I d () ;
18

19       // Search elements before / after these sequences
20       BpmnId e l m n t B e f o r e T a s k I d = model . g et Se q ue n ce By I d ( inSeq ) . getSource () ;
21       BpmnId e l m n t A f t e r T a s k I d = model . ge t Se qu e nc e By Id ( outSeq ) . getTarget () ;
22

23       // New connection skipping the activity
24       model . c r e a t e S e q u e n c e B e t w e e n E l e m e n t s ( elmntBeforeTaskId , e l m n t A f t e r T a s k I d ) ;
25

26       // Delete activity and old sequences
27       model . deleteTask ( task2Del ) ;
28       model . del eteSeq uence ( inSequenceId ) ;
29       model . del eteSeq uence ( outSequenceId ) ;
30

31       return model ;
32   }

                                                     Listing 3: First Stage Java code.

      In this first stage, a simple method to remove the undesired activities is used: to delete one,
a direct connection between the previous element and the upcoming one (another activity, a gateway,
etc.) is established. Upon the new connection is created, it is possible to delete the old activity and the
old sequences as they are no longer of any use in the model.

4.2. Second Stage: Simplifying the Model
     In this stage, the goal is to simplify the outcome from the previous stage to obtain a simpler
version. As the reader can note from Figure 8b, the workflow produced in the first stage can include
unneeded BPMN elements and, therefore, it can be simplified to conduct more efficiently further
Appl. Sci. 2019, 9, 2322                                                                                             13 of 28

operations on it. This is because the first stage eliminates the activities creating connections between
previous and upcoming elements.
     Figure 9 shows the general architecture for the second stage. As the reader can see,
different elements must work together to conduct the simplification of the model. The pattern
repository uses a Bpmn class containing general utility functions to perform operations over the BPMN
model. The Patterns Engine, in the centre of the figure, can use the patterns included in the repository
to offer a high level API that is used to perform the simplification.
     An excerpt of the Java implementation for this engine and the API deployed is shown in the
Listing 4. The piece of code includes some essential methods that will be used for the second stage
algorithm. The engine maintains a list of the patterns available in the repository (check line 3). This list
is dynamically populated in the constructor method of the engine. It allows looping over all the
patterns simplifying each one in the model. The engine also offers methods to simplify of the next
pattern in the list (lines 20–31), to check if the list has been entirely explored (lines 33–38), and to
rewind the pattern iterator (lines 40–42).
 1   public class P attern sEngin e
 2   {
 3     ArrayList < SimplifiablePattern > patterns ;
 4     int current ;
 5

 6        // Engine constructor
 7        Patt ernsEn gine () {
 8           // Load dinamically all patterns in repository
 9           // Exception handling omited
10           Reflections r = new Reflections ( " repository . package " ) ;
11           Set < Class
Appl. Sci. 2019, 9, 2322                                                                             14 of 28

    In this stage, the algorithm, using the API offered by the PatternsEngine, looks for existing
sub-models in the input model that matches any of the existing ones in the repository. Upon each
matching, the corresponding transformation is conducted. This process keeps on going until no more
changes on the model are possible. The process of searching and reducing a pattern is performed until
no more reductions can be evaluated over the model.
    Finally, as an example, in Figure 10 the reader can see the outcome of the second stage on the
model from Figure 8.

                                      Figure 9. Architecture from second stage.

                                                       F: Deposit through
                                                       bank transfer (BS)

                           D: Explain account                                    G: Sign documents
                               types and                                              and open
                             select one (BS)                                     bank account (BS)

                                                       E: Physical deposit
                                                               (BS)

                                            Figure 10. Result after stage two.

5. Complexity
      In this section, the complexity of the algorithm is analysed from two different points of view:
(i) regarding the size of the input BPMN model and (ii) regarding the number of pattern pairs in
the repository.
      To carry out this analysis, the flow diagrams shown in Figure 11 will be used. These diagrams
contain the basic behaviour of the algorithm with the most relevant operations from the point of view
of time complexity. Firstly (see Figure 11a), the algorithm consists of a patterns loop that runs through
the set of pairs of patterns within the repository. As explained in the Section 4, this loop is executed
until no more patterns to simplify can be found. In this way, the number of executions of the loop
depends on the complexity of BPMN model under consideration.
      In each execution of the patterns loop, each of the pairs of patterns is probed (with the search
operation) for a possible replacement (see Figure 11a).
      The replacement of a pattern (Figure 11b) has a complexity of O(1) because its execution time does
not depend on the size of the BPMN model or the number of patterns in the repository. The replacement
operation involves simple modifications on the BPMN model, actually, just adjusting or deleting some
Appl. Sci. 2019, 9, 2322                                                                                                15 of 28

BPMN elements. The reader can see on the Listing 2 the Java code responsible for the replacement of
pattern A (lines 21–36).
      Regarding the search operation (Figure 11b,c), its complexity cannot be disregarded. The search
for a possible simplifiable pattern needs to carry out a loop that checks each element included in the
BPMN model. However, it is sufficient to go through the entire list of elements only once during a
search operation. In each iteration the focus is on one BPMN element and its immediately adjacent
elements (those connected directly with a sequence flow stereotype). The idea is to check if the
relationship between the considered element and its adjacent elements is identical to the one expressed
in the simplifiable pattern. These check operations have a complexity of O(1) (typically it will be
a comparison between pairs of values) and their execution time does not depend on the size of the
BPMN, nor the number of patterns in the repository. As the reader can note, the time consumption
in this model is due to the search loop (Figure 11c), which only depends on the size of the BPMN
model linearly.

                                                                                    Search on list
                                                 Patterns Pair X
                                   patterns
                 Patterns Pair A     loop
                                                                                 loop one time over
                                                                                   BPMN elements

                 Patterns Pair B                                                                              Search loop
                                                 Search on list
                                              (depends on the size                      more             no
                                              of the BPMN model)                     elements?

                 Patterns Pair C
                                                                                        yes

                                                                                   Check adjacent
                 Patterns Pair D
                                                                               elements (compare with
                                                Replace pattern                  simplificable pattern)
                                              (constant operations)

                 Patterns Pair E
                                                                          no         match ok?

                                                                                         yes
                 some pattern
       yes                         no
                  simplified?                          End                                End                    End
                                                                                 Simplifiable pattern     Simplifiable pattern
                                                                                        found                not found

                      (a)                              (b)                                       (c)

      Figure 11. Flow diagrams to analyse the complexity. (a) Algorithm flow diagram; (b) Patterns pair
      flow diagram; (c) Search simplifiable pattern.

       It is also worth noting that it is not uncommon for a BPMN model to show a tree-like design,
i.e., to include many branches on its representation. On first thought, this could pose an issue
for searching a particular fragment. However, to perform the search for the simplifiable pattern,
and due to the model representation that the BPMN notation allows, it is not necessary to dive for
the various bifurcations that a model can present. In fact, it is just required to check elements one by
one, regardless of its position on the model, to evaluate the adjacent elements in order to verify if a
simplifiable pattern can be spotted.
       Summing up, it can be stated that the search operation has a complexity O(n). The reader can
check an example of a search operation on the Listing 2, which shows the Java code responsible for the
search of pattern A (lines 3–18). The creation of the search loop (considering each element of Exclusive
Gateway type) is shown on line 5. The number of iterations of this loop depends directly on the size of
the BPMN model.
Appl. Sci. 2019, 9, 2322                                                                             16 of 28

5.1. Complexity Regarding the Size of the BPMN Model
      As the reader may anticipate, it is quite common to encounter with large, in terms of elements,
BPMN models. Therefore, it is convenient to get an idea beforehand of the scalability of the proposal
in terms of number of BPMN elements. This is the reason to estimate the complexity of the given
algorithm from the point of view of elements within the BPMN model under consideration.
      For the patterns loop (Figure 11a, the key is to interpret how the number of iterations varies.
This value is very dependent on the BPMN model under study but it is not easy to estimate beforehand
the number of possible simplification within the model, i.e., the number iterations. The worst case
would be to assume that for a given BPMN a simplification should be made for each element of the
model. Thus, the relationship between the number of simplifications and the size of the BPMN is
direct and linear. As a consequence, it can be set that the complexity can no longer be worse than O(n).
For example, the reader can imagine a BPMN model of size x containing 10 simplifiable patterns, 2 of
each of the types contained in the repository. This means that the patterns loop is executed 5 times.
If we now imagine a BPMN model of size 10x, it is reasonable to think that it could potentially present
10 times more simplifiable patterns. In this way, the patterns loop would be executed this time 50 times.
In this way, we will consider the execution of the patterns loop with a complexity of O(n) with respect
to the increase in size of the BPMN model.
      In the process of searching and, if the case, replacement of the pairs of patterns, the key lies in
the search operations. As mentioned, the replacement of a pattern has a complexity O(1). However,
the time spent searching for a pattern depends directly and linearly on the size of the BPMN model.
Therefore, this operation has a complexity O(n). Since the repository contains 5 pairs of patterns,
the search operation is performed in sequence 5 times. Therefore, according to the rule of the
concatenation of actions, the time complexity of the set of searches will be the maximum order
of the actions present in the set, that is, O(n).
      For the complete algorithm, taking into account the complexity of the loop and the complexity of
the search and applying the iteration rule, it follows that the total complexity of the algorithm is O(n2 ).
Therefore, the execution time grows quadratically with the size of the BPMN model.

5.2. Complexity Regarding the Size of the Repository
      Another factor that could affect the complexity of the algorithm is the number of simplifiable
patterns in the repository. As it could impact on the feasibility of the model, it was decided to study
the complexity of the algorithm considering this variable.
      When the number of patterns in the repository increases, the calls to the search and replacement
methods will increase for the same patterns loop execution. However, the time taken to execute the
individual search method of each of the patterns does not change (since this only depends on the size of
the BPMN model). Moreover, increasing the number of patterns does not have to increase the number
of iterations of the patterns loop, as it is just linked to the size of the BPMN model itself. Therefore,
the actual increase of the execution time by increasing the number of patterns is directly linked with the
new pairs of patterns, adding a search operation of O(n) for each pair added. The number of iterations
of the loop is not linked with the numbers of pairs of patterns in the repository. For example, the reader
can imagine an example in which the patterns loop is executed 5 times. If initially, the repository has
5 patterns, this leads in an execution of 25 search operations. Assuming now that the repository has
for example 50 patterns, and that the patterns loop is still running 5 times, this results in 250 search
operations. In this way, it follows that the complexity of the algorithm concerning the number of
patterns is O(n). Time grows linearly with the number of patterns in the repository.

6. Validation. Testing the Simplification
    The authors were prone to the experimental exploration of results to assess the validity of the
proposed solution. Therefore, it was decided to run it against a repository of BPMN models. After each
Appl. Sci. 2019, 9, 2322                                                                                    17 of 28

execution, a consistent model representing more simply (when possible) the behaviour of the process
must be generated. The behaviour described by the new models must be coherent with the original
ones, and new behaviours cannot be generated. The objective is to check the correct simplification of
each tested model.

6.1. Repository of BPMN Models and Inputs for the Algorithm
     The input for the algorithm is composed by a BPMN model and a list of interesting activities.
To generate the input sets our starting point is a repository including 63 different models taken from
real world examples and collected from different sources:

•     38 models have been translated from the Event-Driven Process Chains (EPC) model collection
      provided by Laue et al. that was used for the study [40].
•     10 models have been translated to BPMN from different Petri Nets and others 15 models have
      been discovered from trace logs. The original Petri Nets and the trace logs have been taken
      from [41] (available online in [42]).

     For each model on the original repository, a set of about 60–150 different combination of interesting
activities were created. In this way, 8102 different inputs for the second stage of the algorithm
were produced.
     Executing the algorithm against this combination of up to 8102 different inputs parameters will
produce 8102 simplified models in which the reducible patterns identified should have been simplified.
To check the correct execution of the algorithm, the simplifications were checked according to the
corresponding to the inputs. Two main aspects were analysed in each case:

•     If at least one pattern has been identified and simplified, the number of elements in the simplified
      model must be less than the number of elements in the input model (regarding gateways
      and sequences).
•     No wrong behaviour is introduced in the simplified model. The preserved activities must keep
      the causal dependency between them. If, for example, activity A is followed by activity B, but A
      never follows B, then it is assumed that there is a causal dependency between A and B (that must
      be preserved in the simplified model).

6.2. How to Check the Causal Dependency?
     To assure this feature the footprint matrix and fitness concepts from process mining techniques
were applied. The footprint matrix represents the causal dependency between the activities in a log or
in a model (more about this concept can be learned in [43,44]). This concept is used in some process
discovery algorithms, like the α-algorithm [45], helping it to scan the event log for particular patterns.
In this case, the footprint matrix will be calculated for a process model, containing the relation between
each pair of activities in the workflow. Let M be a process model and a, b two activities that belong to
the process model M. It is possible to establish the following relationships:

•     a > M b if only if there is a possible path p = ht1 , t2 , ..., tn i and i ∈ 1, ..., n − 1 such that p ∈ M and
      ti = a and ti+1 = b. This relation is also known as “direct follows” relation.
•     a → M b if only if a > M b and b M a
•     a# M b if only if a M b and b M a
•     ak M b if only if a > M b and b > M a

     The footprint matrix is commonly used in basic conformance checking techniques providing a
log-to-model comparison. However, this same approach can be used in log-to-log and model-to-model
comparisons [46]. In our case, we will use the model-to-model comparison approach to quantify their
similarity. The correct way is to compare the unsimplified model’s footprint matrix (end of stage
one) with the simplified model’s footprint matrix (end of stage two). These two footprints matrix are
Appl. Sci. 2019, 9, 2322                                                                           18 of 28

calculated and compared using the fitness measure for each one of the 8102 inputs. The objective is to
check that no wrong behaviour is introduced in the simplified model. The measurement that will be
used for the comparison is the fitness. It represents the similarity between two footprints (belonging
to two different process models M1 and M2 ) using a 0.0 to 1.0 scale. In our case, a value of 1.0 in all
comparisons will be the goal. In Equation (1) the fitness measurement for comparing two footprints
( f M1 and f M2 ) is defined [46].

                                            different cells between f M1 and f M2
                           f itness = 1 −                                                              (1)
                                                  total cells in f M1 and f M2
     For clarity, it is possible to visualise this procedure with an example. Take the process model from
the end of stage one represented in Figure 8 as model M1 . It must be compared with the process model
from the end of stage two represented in Figure 10 (M2 ). The reader can check the footprint matrix for
both models. Table 1 contains the footprint for model M1 and Table 2 for model M2 .

                                  Table 1. Footprint matrix for model M1 .

                                             d      e       f       g

                                     d      # M1   → M1   → M1     # M1
                                     e      ← M1   # M1    # M1   → M1
                                     f      ← M1   # M1    # M1   → M1
                                     g      # M1   ← M1   ← M1     # M1

                                  Table 2. Footprint matrix for model M2 .

                                             d      e       f       g

                                     d      # M2   → M2   → M2     # M2
                                     e      ← M2   # M2    # M2   → M2
                                     f      ← M2   # M2    # M2   → M2
                                     g      # M2   ← M2   ← M2     # M2

      In this example, both matrices contain the same values, and as a result, the fitness value is 1.0.

6.3. Results
     The procedure described is repeated for the 8102 cases in order to carry out the validation.
Upon the review of the result, the value of 1.0 in the fitness measurement, the maximum value possible,
has been obtained for each input set.
     By analysing the simplified models a table with the most significant measures is elaborated.
An elements contage is essential to see the effect of the simplification process. The reader can see a
fragment of the results on Table 3.
Appl. Sci. 2019, 9, 2322                                                                                                                                                   19 of 28

                                                         Table 3. Results table excerpt.

                                 Original Model                                                                     Simplified Model
         ID             Tasks        Gats   sFlows       Total   Tasks              Gats                           sFlows      Total     A     B     C    D      E    Fitness
     ...                  .            .        .          .          .                            .                  .          .       .     .     .     .     .          .
  pm15_073                6           14       28         48          6                            6                 17         29       0     3     3     0     2         1.0
  pm15_074                6           14       28         48          6                            4                 14         24       0     4     3     0     3         1.0
  pm15_075                6           14       28         48          6                            5                 16         27       0     4     3     0     2         1.0
  pm15_076                6           14       28         48          6                            5                 16         27       0     4     3     0     2         1.0
  pm15_077                6           14       28         48          6                            5                 16         27       0     4     2     0     3         1.0
     ...                  .            .        .          .          .                            .                  .          .       .     .     .     .     .          .

     Each row in the table corresponds with one of the 8102 cases tested. The first column contains
the model ID. Columns 2, 3 and 4 contain a contage element for the original model representing the
number of gateways, activities and sequence flows. Column 5 represents the total of elements in the
original model. Columns 6, 7 and 8 contain a contage element for the simplified model representing
the number of gateways, activities and sequence flows. Column 9 contain the total of elements in
the simplified model. Columns 10 to 14 represent the number of patterns of each type that has been
identified and simplified in each model. Finally, column 15, represents the fitness measure (comparing
both footprints of the original and the simplified one).
     Throughout the 8102 cases considered, models needing a deeper simplification than others
were found. Through the boxplot of Figure 12a, the reader can see the effect and the number of
simplifications that have been made on the models of the dataset. In this chart, it is shown the effect of
the second stage on the algorithm. As the reader can note, there is a clear trend towards obtaining
models with fewer elements after the patterns simplification phase. The median with respect to the
number of elements in a model has decreased from 33 (before the simplification process) to 23.

                                                                                                           14000
                                                                                                                                13,677
                                                                                                           12000
          After
                                                                               Number of simplifications

 simplification
                                                                                                           10000

                                                                                                            8000
                                                                                                                                                                     7,997
                                                                                                                                             7,546
                                                                                                            6000

        Before                                                                                              4000
 simplification
                                                                                                                     3,608
                                                                                                            2000
                                                                                                                                                         2,341
                  10     15     20     25    30     35    40     45       50                                   0
                                                                                                                       A             B         C           D           E
                                Number of BPMN elements                                                                                  Patterns pair
                       (a) BPMN elements simplified                                                                        (b) Patterns pairs simplified
                                            Figure 12. Simplification results from the 8102 cases.

     It is also interesting to analyse the times that each one of the patterns present in the repository has
been found and simplified. In total, for the 8102 cases tested, 35,169 simplifications were conducted.
On Figure 12b, the reader can verify the distribution of these simplifications according to each pattern.
The repetition of the pattern B is evident, doubling the occurrences to the next most popular pattern,
the E.
     The complete table with the raw information for the 8102 tested cases is available for the reader in
an online data repository [47] at the Open Science Framework platform. This repository contains also
the 62 original BPMN models, the 8102 BPMN models obtained after stage one and the 8012 BPMN
simplified models.
You can also read