# Meta-Model Structure Selection: Building Polynomial NARX Model for Regression and Classification

←

**Page content transcription**

If your browser does not render page correctly, please read the page content below

Noname manuscript No. (will be inserted by the editor) Meta-Model Structure Selection: Building Polynomial NARX Model for Regression and Classification Wilson Rocha Lacerda Junior · Samir Angelo Milani Martins · Erivelton Geraldo Nepomuceno arXiv:2109.09917v1 [cs.LG] 21 Sep 2021 Received: date / Accepted: date Abstract This work presents a new meta-heuristic ap- 1 Introduction proach to select the structure of polynomial NARX models for regression and classification problems. The System identification is a method of identifying the dy- method takes into account the complexity of the model namic model of a system from measurements of the sys- and the contribution of each term to build parsimonious tem inputs and outputs [1]. In particular, the interest models by proposing a new cost function formulation. in nonlinear system identification has deserved much at- The robustness of the new algorithm is tested on several tention by researchers from the 1950s onward and many simulated and experimental system with different non- relevant results were developed [2,3,4,5]. In this con- linear characteristics. The obtained results show that text, one frequently employed model representation is the proposed algorithm is capable of identifying the cor- the NARMAX (Non-linear Autoregressive Models with rect model, for cases where the proper model structure Moving Average and Exogenous Input), which was in- is known, and determine parsimonious models for ex- troduced in 1981 aiming at representing a broad class perimental data even for those systems for which tradi- of nonlinear system [6,7,8]. tional and contemporary methods habitually fails. The There are many NARMAX model set representa- new algorithm is validated over classical methods such tions such as polynomial, generalized additive, and neu- as the FROLS and recent randomized approaches. ral networks. Among these types of the extended model set, the power-form polynomial is the most commonly NARMAX representation [1]. Fitting polynomial NAR- Keywords System Identification · Regression and MAX models is a simple task if the terms in the model Classification · , NARX Model · Meta-heuristic · Model are known a priori, which is not the case in real-world Structure Selection problems. Selecting the model terms, however, is fun- damental if the goal of the identification is to obtain models that can reproduce the dynamics of the orig- Wilson Rocha Lacerda Junior inal system. Problems related to overparameterization Control and Modelling and Control Group (GCOM), Depart- and numerical ill-conditioning are typical because of the ment of Electrical Engineering, Federal University of São João del-Rei, Minas Gerais, Brazil limitations of the identification algorithms in selecting E-mail: wilsonrljr@outlook.com the appropriate terms that should compose the final Samir Angelo Milani Martins model [9,10]. Control and Modelling and Control Group (GCOM), Depart- In that respect, one of the most traditionally algo- ment of Electrical Engineering, Federal University of São João rithms for structure selection of polynomial NARMAX del-Rei, Minas Gerais, Brazil E-mail: martins@ufsj.edu.br was developed by [11] based on the Orthogonal Least Squares (OLS) and the Error Reduction Ratio (ERR), Erivelton Geraldo Nepomuceno Control and Modelling and Control Group (GCOM), Depart- called Forward Regression Orthogonal Least Squares ment of Electrical Engineering, Federal University of São João (FROLS). Numerous variants of FROLS algorithm has del-Rei, Minas Gerais, Brazil been developed to improve the model selection per- E-mail: nepomuceno@ufsj.edu.br formance such as [12,13,14,15]. The drawbacks of the

2 Wilson Rocha Lacerda Junior et al. FROLS have been extensively reviewed in the litera- and physical systems. Section 4 adapts the technique to ture, e.g., in [16,17,18]. Most of these weak points are develop NARX models considering systems with binary related to i) the Prediction Error Minimization (PEM) responses that depend on continuous predictors. Sec- framework; ii) the inadequacy of the ERR index in mea- tion 5 recaps the primary considerations of this study suring the absolute importance of regressors; iii) the use and proposes possible future works. of information criteria such as Akaike Information Cri- terion (AIC) [19], Final Prediction Error (FPE) [20] and the Bayesian information criterion (BIC) [21], to 2 Background select the model order. Regarding the information crite- ria, although these techniques work well for linear mod- 2.1 Polynomial NARX model els, in a nonlinear context no simple relation between model size and accuracy can be established [18,22]. Polynomial Multiple-Input Multiple-Output (MIMO) NARX is a mathematical model based on difference As a consequence of the limitations of OLS based equations and relates the current output as a function algorithms, some recent research endeavors have sig- of past inputs and outputs, mathematically described nificantly strayed from the classical FROLS scheme, as [12,7]: by reformulating the Model Structure Selection (MSS) process in a probabilistic framework and using ran- yi k =Fiℓ y1 k−1 , . . . , y1 k−niy , . . . , ys k−1 , . . . , ys k−niy , dom sampling methods [18,23,24,25,26]. Nevertheless, 1 s these techniques based on meta-heuristics and prob- x1 k−d , x1 k−d−1 , . . . , x1 k−d−nix , . . . , 1 abilistic frameworks presents some flaws. The meta- xr k−d , xr k−d−1 , . . . , xr k−d−nix + ξi k , (1) heuristics approaches turn on AIC, FPE, BIC and oth- r ers information criteria to formulate the cost function where ny ∈ N∗ , nx ∈ N, are the maximum lags for the of the optimization problem, generally resulting in over- system output and input respectively; xk ∈ Rnx is the parameterized models. system input and yk ∈ Rny is the system output at dis- Last but not last, due to the importance of classifi- crete time k ∈ Nn ; ek ∈ Rne stands for uncertainties cation techniques for decision-making tasks in engineer- and possible noise at discrete time k. In this case,F ℓ ing, business, health science, and many others fields, it is some nonlinear function of the input and output re- is surprising how only a few researchers have addressed gressors with nonlinearity degree ℓ ∈ N and d is a time this problem using classical regression techniques. The delay typically set to d = 1. authors in [27] presented a novel algorithm that com- The number of possibles terms of MIMO NARX bines logistic regression with the NARX methodology model given the ith polynomial degree, ℓi , is: to deal with systems with a dichotomous response vari- ℓi able. The results in that work, although very interest- X nm r = nij , (2) ing, are based on FROLS algorithm and, therefore, in- j=0 herits most of the drawbacks concerning the traditional technique, opening new paths for research. where This work proposes a technique to the identification s r niyk + nixk + j − 1 P P of nonlinear systems using meta-heuristics that fills the nij−1 k=1 k=1 mentioned gaps in what concerns the structure selection nij = , j of NARMAX models for regression and classification. The method uses an alternative to the information cri- ni0 = 1, j = 1, . . . , ℓi . (3) teria cited as the index indicating the accuracy of the Parsimony makes the Polynomial NARX models a model as a function of the size of the model. Finally, the widely known model family. This characteristic means proposed algorithm is adapted to deal with classifica- that a wide range of behaviors can be represented con- tion problems to represent systems with binary outputs cisely using only a few terms of the vast search space that depend on continuous time predictors. formed by candidate regressors and usually a small data The remainder of this work is organized as follows: set are required to estimate a model. Section 2 provides the basic framework and notation for nonlinear system identification of NARX models. Sec- tion 3 presents the necessary tools to formulate the cost 2.2 Importance of Structure Selection function of the identification strategy. This section also introduces the new algorithm and reports the results Identifying the correct structure, is fundamental to al- obtained on several systems taken from the literature low the user to be able to analyze the system dynamics

Meta-Model Structure Selection: Building Polynomial NARX Model for Regression and Classification 3 consistently. The regressors selection, however, is not 2.4 Meta-heuristics a simple task. If ℓ, nx , and ny , increases, the number of candidate models becomes too large for brute force In general, nature-inspired optimization algorithms have approach. Considering the MIMO case, this problem is been increasingly widespread over the last two decade far worse than the Single-Input Single-Output (SISO) due to the flexibility, simplicity, versatility, and local one if many inputs and outputs are required. The total optima avoidance of the algorithms in real life applica- number of all different models is given by tions. ( Two essential characteristics of meta-heuristics al- 2nr for SISO models, gorithms are exploitation and exploration [28]. Exploita- nm = nm r (4) 2 for MIMO models, tion is related to the local information in the search process regarding the best near solution. On the other where nr and nm r are the values computed using Eq. (2) hand, exploration is related to explore a vast area of the to Eq. (3). search space to find an even better solution and not be A classical solution to regressors selection problem stuck in local optima. [29] shows that there is no con- is the FROLS algorithm associated with ERR test. The sensus about the notion of exploration and exploitation FROLS method adapt the set of regressors in the search in evolutionary computing, and the definitions are not space into a set of orthogonal vectors, which ERR eval- generally accepted. However, it can be observed a gen- uates the individual contribution to the desired output eral agreement about they work like opposite forces and variance by calculating the normalized energy coeffi- usually hard to balance. In this sense, a combination of cient C(x, y) between two vectors defined as: two metaheuristics, called hybrid metaheuristic, can be (x⊤ y)2 done to provide a more robust algorithm. C(x, y) = . (5) (x⊤ x)(y ⊤ y) 2.4.1 The Binary hybrid Particle Swarm Optimization An approach often used is to stop the algorithm and Gravitational Search Algorithm (BPSOGSA) using some information criteria, e.g., AIC [19]. algorithm 2.3 The Sigmoid Linear Unit Function As can be observed in most meta-heuristics algorithm, to achieve a good balance between exploration and ex- Definition 1 (Sigmoidal function) Let F represent a class ploitation phase is a challenging task. In this paper, of bounded functions φ : R 7→ R. If the properties of to provide a more powerful performance by assuring φ(x) satisfies higher flexibility in the search process a BPSOGSA hybridized using a low-level co-evolutionary heteroge- lim φ(x) = α x→∞ neous technique [30] proposed by [31] is used. The main lim φ(x) = β with α > β, concept of the BPSOGSA is to associate the high capa- x→−∞ bility of the particles in Particle Swarm Optimization the function is called sigmoidal. (PSO) to scan the whole search space to find the best global solution with the ability to look over local solu- In this particular case and following definition Eq. (1) tions of the Gravitational Search Algorithm (GSA) in with alpha = 0 and β = 1, we write a ”S” shaped curve a binary space. as 1 2.4.2 Standard PSO algorithm ς(x) = −a(x−c) . (6) 1+e In that case, we can specify a, the rate of change. If In PSO [32,33], each particle represents a candidate a is close to zero, the sigmoid function will be gradual. solution and consists of two parts: the location in the If a is large, the sigmoid function will have an abrupt search space, ~x np,d ∈ Rnp×d , and the respective veloc- or sharp transition. If a is negative, the sigmoid will go ity, ~v np,d ∈ Rnp×d , where np = 1, 2, · · · , na and na is from 1 to zero. The parameter c corresponds to the x the size of the swarm and d is the dimension of the prob- value where y = 0.5. lem. In this respect, the following equation represents The Sigmoid Linear Unit Function (SiLU) is defined the initial population: by the sigmoid function multiplied by its input x1,1 x1,2 · · · x1,d x2,1 x2,2 · · · x2,d silu(x) = xς(x), (7) ~x np,d = . .. . . .. (8) .. . . . which can be viewed as an steeper sigmoid function with overshoot. xna ,1 xna ,2 · · · xna ,d

4 Wilson Rocha Lacerda Junior et al. At each iteration, t, the position and velocity of a 2.4.4 The binary hybrid optimization algorithm particle are updated according to t+1 t The combination of the algorithms are according to [31]: vnv,d = ζvnv,d + c1 κ1 (pbesttnp − xtnp,d ) +c2 κ2 (gbesttnp − xtnp,d ), (9) vit+1 = ζ × vit + c′1 × κ × ati + c′2 × κ × (gbest − xti ), (13) where κj ∈ R, for j = [1, 2], are a real-valued, contin- uous random variable in the interval [0, 1], ζ ∈ R is an where c′j ∈ R is an acceleration coefficient. The Eq. (13) inertia factor to control the influence of the previous have the advantage to accelerate the exploitation phase velocity on the current one (also working representing by saving and using the location of the best mass found a trade-off between exploration and exploitation), c1 is so far. However, because this method can affect the ex- the cognitive factor related to pbest (best particle) and ploration phase as well, [35] proposed a solution to solve c2 is the social factor related to gbest (global solution). this issue by setting adaptive values for c′j , described The values of the velocity, ~v np,d , are usually bounded by [36]: in the range [vmin , vmax ] to guarantee that the random- ness of the system do not lead to particles rushing out t3 of the search space. The position are updated in the c′1 = −2 × +2 (14) max(t)3 search space according to t3 xt+1 t t+1 c′2 = 2 × +2 (15) np,d = xnp,d + vnp,d , (10) max(t)3 . (16) 2.4.3 Standard GSA algorithm In each iteration, the positions of particles are up- In GSA [34], the agents are measured by their masses, dated as stated in Eq. (10) to Eq. (11). which are proportional to their respective values of the To avoid convergence to local optimum when map- fitness function. These agents share information related ping the continuous space to discrete solutions, the fol- to their gravitational force in order to attract each other lowing transfer function are used [37]: to locations closer to the global optimum. The larger the values of the masses, the best possible solution is 2 π achieved, and the agents move more slowly than lighter S(vik ) = arctan vik . (17) π 2 ones. In GSA, each mass (agent) has four specifications: position, inertial mass, active gravitational mass, and Considering a uniformly distributed random num- passive gravitational mass. The position of the mass ber κ ∈ (0, 1), the positions of the agents in the binary corresponds to a solution to the problem, and its gravi- space are updated according to tational and inertial masses are determined using a fit- ness function. ( t+1 Consider a population formed by agents described (xtnp,d )−1 , if κ < S(vik ) xt+1 np,d = X(m, n) = t+1 in Eq. (8). At a specific time t, the velocity and position xtnp,d , if κ ≥ S(vik ). of each agent are updated, respectively, as follow: (18) t+1 t vi,d = κi × vi,d + ati,d , xt+1 t t+1 3 Meta-Model Structure Selection (Meta-MSS): Build- i,d = xi,d + vi,d . (11) ing NARX for Regression where κ gives a stochastic characteristic to the search. The acceleration, ati,d , is computed according to the law In this section, the use of a method based on meta- of motion [34]: heuristic to select the NARX model structure is ad- t dressed. The BPSOGSA is implemented to search for Fi,d ati,d = , (12) the best model structure in a decision space formed by a Miit predefined dictionary of regressors. The objective func- where t is a specific time, Mii is inertial the mass of tion of the optimization problem is based on the root object i and Fi,d the gravitational force acting on mass mean squared error of the free run simulation output i in a d−dimensional space. The detailed process to multiplied by a penalty factor that takes into account calculate and update both Fi,d and Mii can be found the complexity and the individual contribution of each in [34]. regressor to build the final model.

Meta-Model Structure Selection: Building Polynomial NARX Model for Regression and Classification 5 3.1 Encoding scheme where σ̂e2 is the estimated noise variance calculated as N The use of BPSOGSA for model structure selection is 1 X σ̂e2 = ⊤ (yk − ψk−1 Θ̂) (22) described. First, one should define the dimension of the N −m k=1 test function. In this regard, the ny , nx and ℓ are set to generate all possibilities of regressors and a general and Vjj is the jth diagonal element of (Ψ ⊤ Ψ )−1 . matrix of regressors, Ψ , is built. The number of columns The estimated standard error of the jth regression of Ψ is assigned to the variable noV , and the number of coefficient Θ̂j is the positive square root of the diagonal agents, N , is defined. Then a binary noV × N matrix elements of σ̂ 2 , referred as X , is randomly generated with the position q of each agent in the search space. Each column of X se(Θ̂j ) = σ̂jj2 . (23) represents a possible solution; in other words, a possi- ble model structure to be evaluated at each iteration. A penalty test considers the standard error of the Since each column of Ψ corresponds a possible regres- regression coefficients to determine the statistical rele- sor, a value of 1 in X indicates that, in its respective vance of each regressor. The t-test is used in this study position, the column of Ψ is included in the reduced to perform a hypothesis test on the coefficients to check matrix of regressors, while the value of 0 indicates that the significance of individual regressors in the multi- the regressor column is ignored. ple linear regression model. The hypothesis statements involve testing the null hypothesis described as: Example 1 Consider a case where all possible regressors are defined based on ℓ = 1 and ny = nu = 2. The H0 : Θj = 0, Ψ is defined by Ha : Θj 6= 0. [constant y(k − 1) y(k − 2) u(k − 1) u(k − 2)] (19) In practice, one can compute a t-statistic as Because there are 5 possible regressors, noV = 5. Θ̂j Assume N = 5, then X can be represented, for example, T0 = , (24) se(Θ̂) as which measures the number of standard deviations that 01000 Θ̂j is away from 0. More precisely, let 1 1 1 0 1 X =0 0 1 1 0 (20) −tα/2,N −m < T < tα/2,N −m , (25) 0 1 0 0 1 10110 where tα/2,N −m is the t value obtained considering α as the significance level and N − m the degree of free- The first column of X is transposed and used to gen- dom. Then, If T0 does not lie in the acceptance region erate a candidate solution: of Eq. (25), the null hypothesis, H0 : Θj = 0, is rejected and it is concluded that Θj is significant at α. Other- constant y(k − 1) y(k − 2) u(k − 1) u(k − 2) wise, θj is not significantly different from zero, and the X = 1 1 1 0 1 null hypothesis θj = 0 cannot be rejected. Hence, in this example, the first model to be tested 3.2.1 Penalty value based on the Derivative of the Sig- is αy(k − 1) + βu(k − 2), where α and β are parame- moid Linear Unit function ters estimated via Least Squares method. After that, the second column of X is tested and so on. We proposed a penalty value based on the derivative of Eq. (7) defined as: 3.2 Formulation of the objective function ς(x(̺)) ˙ = ς(x)[1 + (a(x − c))(1 − ς(x))]. (26) For each candidate model structure randomly defined, In this respect, the parameters of Eq. (26) are de- the linear-in-the-parameters system can be solved di- fined as follows: x has the dimension of noV ; c = noV /2; rectly using the Least Squares algorithm. The variance and a is defined by the number of regressors of the cur- of estimated parameters can be calculated as: rent test model divided by c. This approach results in a different curve for each model, considering the number σ̂ 2 = σ̂e2 Vjj , (21) of regressors of the current model. As the number of

6 Wilson Rocha Lacerda Junior et al. regressor increases, the slope of the sigmoid curve be- Algorithm 1: Meta-structure selection (Meta- comes steeper. The penalty value, ̺, corresponds to the MSS) algorithm value in y of the correspondent sigmoid curve regarding Result: Model which has the best fitness value the number of regressor in x. It is imperative to point Input: {(uk ), (yk ), k = 1, . . . , N }, out that because the derivative of the sigmoid function M = {ψj , j = 1, . . . , m}, ny , nu , ℓ, return negative values, we normalize ς as max iteration, noV , np 1 P ← Build initial population of random agents in the search space, S ̺ = ς − min(ς), (27) 2 v ← set the agent’s velocity equal zero at first iteration so ̺ ∈ R+ . 3 Ψ ← Build the general matrix of regressors based on However, two different models can have the same ny , nu and ℓ 4 repeat number of regressors and present significantly different 5 for i = 1 : d do results. This situation can be explained based on the 6 mi ← ~ x np,i ⊲ Extract the model encoding importance of each regressor in the composition of the from population model. In this respect, we use the t-student test to de- 7 Ψr ← Ψ (mi ) ⊲ Delete the Ψ columns where mi = 0 Ex.1 termine the statistical relevance of each regressor and 8 Θ̂ ← (Ψr⊤ Ψr )−1 Ψr⊤ y introduce this information on the penalty function. In 9 ŷ ← Free-run simulation of the model each case, the procedure returns the number of regres- 10 V ← (Ψ ⊤ Ψ )−1 sors that are not significant for the model, which we 11 σ̂e2 = N−m 1 PN ⊤ k=1 (yk − ψk−1 Θ̂) ⊲ Eq.22 call nΘ,H0 . Then, the penalty value is chosen consider- 12 for h = 1 : τ do ing the model sizes as 13 σ̂ 2 ← σ̂e2 Vh,h ⊲ Eq.21 q 14 se(Θ̂j ) ← σ̂h,h 2 ⊲ Eq.23 model size = nΘ + nΘ,H0 . (28) Θ̂j 15 T0 ← se(Θ̂) ⊲ Eq.24 16 p ← regressors where The objective function considers the relative root −tα/2,N−m < T0 < tα/2,N−m squared error of the model and ̺ and is defined as ⊲ Eq.25 s 17 end n 18 Remove the p regressors from Ψr (yk − ŷk )2 P 19 Check for empty model k=1 F= s × ̺. (29) 20 if Model is empty then n 21 Generate a new population (yk − ȳ)2 P 22 Repeat the steps from line 6 to 18 k=1 23 end 24 n1 ← size(p) ⊲ Number of redundant terms With this approach, even if the tested models have 25 Θ̂ ← (Ψr⊤ Ψr )−1 Ψr⊤ y ⊲ Re-estimation the same number of regressors, the model which con- s n (yk −ŷk )2 P tain redundant regressors are penalized with a more 26 Fi ← s k=1 n ×̺ ⊲ Eq.29 substantial penalty value. (yk −ȳ)2 P k=1 Finally, the Algorithm 6 summarizes the method. 27 Pin ← Encoded Ψr 28 Evaluate the fitness for each agent, Fi (t) 29 end 30 P ← Pn ⊲ Update the population 3.3 Case Studies: Simulation Results 31 x np,d ∈ P do foreach ~ 32 Calculate the acceleration of each agent In this section, six simulation examples are considered ⊲ Eq.12 33 Adapt the c′j coefficients ⊲ Eq.16 to illustrate the effectiveness of the Meta-MSS algo- 34 Update the velocity of the agents ⊲ Eq.13 rithm. An analysis of the algorithm performance has 35 Update the position of the agents ⊲ Eq.11 been carried out considering different tuning parame- 36 end ters. The selected systems are generally used as a bench- 37 until max iterations is reached mark for model structures algorithms and were taken from [38,18,24,10,14,39,40]. Finally, a comparative anal- ysis with respect to the Randomized Model Structure Selection (RaMSS) [18], the FROLS [1], and the Reversible- jump Markov chain Monte Carlo (RJMCMC) [24] algo- rithms has been accomplished to check out the goodness of the proposed method.

Meta-Model Structure Selection: Building Polynomial NARX Model for Regression and Classification 7 The simulation models are described as: Table 2: Overall performance of the Meta-MSS S1 : yk = −1.7yk−1 − 0.8yk−2 + xk−1 + 0.81xk−2 + ek , S1 S2 S3 S4 S5 S6 (30) Correct model 100% 100% 100% 100% 100% 100% Elapsed time (mean) 5.16s 3.90s 3.40s 2.37s 1.40s 3.80s with xk ∼ U (−2, 2) and ek ∼ N (0, 0.012 ); S2 : yk = 0.8yk−1 + 0.4xk−1 + 0.4x2k−1 + 0.4x3k−1 + ek , (31) This result resides in the evaluation of all regressors 2 with xk ∼ N (0, 0.3 ) and ek ∼ N (0, 0.01 ). 2 individually, and the ones considered redundant are re- S3 : 3 yk = 0.2yk−1 + 0.7yk−1 xk−1 + 0.6x2k−2 moved from the model. Figure 1 present the convergence of each execution − 0.7yk−2 x2k−2 − 0.5yk−2 + ek , (32) 2 of Meta-MSS. It is noticeable that the majority of exe- with xk ∼ U (−1, 1) and ek ∼ N (0, 0.01 ). cutions converges to the correct model structures with S4 : yk = 0.7yk−1 xk−1 − 0.5yk−2 + 0.6x2k−2 10 or fewer iterations. The reason for this relies on − 0.7yk−2 x2k−2 + ek , (33) the maximum number of iterations and the number of with xk ∼ U (−1, 1) and ek ∼ N (0, 0.04 ). 2 search agents. The first one is related to the accelera- S5 : yk = 0.7yk−1 xk−1 − 0.5yk−2 + 0.6x2k−2 tion coefficient, which boosts the exploration phase of the algorithm, while the latter increases the number of − 0.7yk−2 x2k−2 + 0.2ek−1 candidate models to be evaluated. Intuitively, one can − 0.3xk−1 ek−2 + ek , (34) see that both parameters influence the elapsed time with xk ∼ U (−1, 1) and ek ∼ N (0, 0.022 ); and, more importantly, the model structure selected S6 : yk = 0.75yk−2 + 0.25xk−2 − 0.2yk−2 xk−2 + ek to compose the final model. Consequently, an inappro- with xk ∼ N (0, 0.252 ) and ek ∼ N (0, 0.022 ); priate choice of one of them may results in sub/over- parameterized models, since the algorithm can converge where U(a, b) are samples evenly distributed over [a, b], to a local optimum. The next subsection presents an and N (η, σ 2 ) are samples with a Gaussian distribution analysis of the max iter and n agents influence in the with mean η and standard deviation σ. All realizations algorithm performance. of the systems are composed of a total of 500 input- output data samples. Also, the same random seed is used to reproducibility purpose. 3.4 Meta-MSS vs RaMSS vs C-RaMSS All tests have been performed in Matlab® 2018a environment, on a Dell Inspiron 5448 Core i5 − 5200U The systems S1 , S2 , S3 , S4 and S6 has been used as CPU 2.20GHz with 12GB of RAM. benchmark by [41], so we can compare directly our re- Following the aforementioned studies, the maximum sults with those reported by the author in his thesis. All lags for the input and output are chosen to be, respec- techniques used ny = nu = 4 and ℓ = 3. The RaMSS tively, nu = ny = 4 and the nonlinear degree is ℓ = 3. and the RaMSS with Conditional Linear Family (C- The parameters related to the BPSOGSA are detailed RaMSS) used the following configuration for the tun- on Table (1). ing parameters: K = 1, α = 0.997, N P = 200 and v = 0.1. The Meta-Structure Selection Algorithm was Table 1: Parameters used in Meta-MSS tuned according to Table 1. In terms of correctness, the Meta-MSS outperforms Parameters nu ny ℓ p-value max iter n agents α G0 Values 4 4 3 0.05 30 10 23 100 (or at least equals) the RaMSS and C-RaMSS for all an- alyzed systems as shown in Table 3. Regarding S6 , the correctness rate increased by 18% when compared with 300 runs of the Meta-MSS algorithm have been exe- RaMSS and the elapsed time required for C-RaMSS cuted for each model, aiming to compare some statistics obtain 100% of correctness is 1276.84% higher than about the algorithm performance. The elapsed time, the the Meta-MSS. Furthermore, the Meta-MSS is notably time required to obtain the final model, and correctness, more computationally efficient than C-RaMSS and sim- the percentage of exact model selections, are analyzed. ilar to RaMSS. The results in Table 2 are obtained with the param- eters configured accordingly to Table (1). Table 2 shows that all the model terms are correctly 3.5 Meta-MSS vs FROLS selected using the Meta-MSS. It is worth to notice that even the model S5 , which have an autoregressive noise, The FROLS algorithm has been tested on all the sys- was correctly selected using the proposed algorithm. tems and the results are detailed in Table 5. It can

8 Wilson Rocha Lacerda Junior et al. (1) System S1 . (2) System S2 . (3) System S3 . (4) System S4 . (5) System S5 . (6) System S6 . Fig. 1: The convergence of each execution of Meta-MSS algorithm. Table 3: Comparative analysis between Meta-MSS, RaMSS, and C-RaMSS S1 S2 S3 S4 S6 Correct model 100% 100% 100% 100% 100% Meta-MSS Elapsed time (mean) 5.16s 3.90s 3.40s 2.37s 3.80s Correct model 90.33% 100% 100% 100% 66% RaMSS- N P = 100 Elapsed time (mean) 3.27s 1.24s 2.59s 1.67s 6.66s Correct model 78.33% 100% 100% 100% 82% RaMSS- N P = 200 Elapsed time (mean) 6.25s 2.07s 4.42s 2.77s 9.16s Correct model 93.33% 100% 100% 100% 100% C-RaMSS Elapsed time (mean) 18s 10.50s 16.96s 10.56s 48.52s be seen that only the model terms selected for S2 and cution with 30, 000 iterations. Furthermore, it assumes S6 are correct using FROLS. The FROLS fails to se- different probability distributions which are chosen to lect two out of four regressors for S1 . Regarding S3 , ease the computations for the parameters involved in 3 the term yk−1 is included in the model instead of yk−1 . the procedure. Similarly, the term yk−4 is wrongly added in model S4 instead of yk−2 . Finally, an incorrect model structure is returned for S5 as well with the addition of the spurious 3.7 Full-scale F-16 aircraft term yk−4 . The F-16 Ground Vibration Test has been used as a benchmark for system identification. The case exhibits 3.6 Meta-MSS vs RJMCMC a clearance and friction nonlinearities at the mounting interface of the payloads. The empirical data were ac- The S4 is taken from [24]. Again the maximum lag for quired on a full-scale F-16 aircraft on a Siemens LMS the input and output are ny = nu = 4 and the nonlin- Ground Vibration Testing Master Class as well as a de- ear degree is ℓ = 3. In their work, the authors executed tailed formulation of the identification problem is avail- the algorithm 10 times on the same input-output data. able at Nonlinear System Identification Benchmarks 1 . The RJMCMC was able to select the true model struc- Several datasets are available concerning different ture 7 times out of the 10 runs. On the other hand, input signals and frequencies. This work considers the the Meta-MSS can get the true model in all runs of the data recorded under multisine excitations with a full algorithm. The results are summarized in Table 6. Be- frequency grid from 2 to 15Hz. According to [42], at sides, there are main drawbacks related to RJMCMC each force level, 9 periods were acquired considering a method which are overcome by the Meta-MSS: the for- mer is computationally expensive and required an exe- 1 Available at http://www.nonlinearbenchmark.org/

Meta-Model Structure Selection: Building Polynomial NARX Model for Regression and Classification 9 Table 4: Comparative analysis - Meta-MSS vs FROLS same input and output lags are considered on FROLS approach. Table 8 details the results considering the Table 5: C second acceleration signals as output. For this case, fol- lowing the recommendation in [42], the models are eval- Meta-MSS FROLS uated using the metric ermst , which is defined as: Regressor Correct Regressor Correct yk−1 yes yk−1 yes v u u1 X N yk−2 yes yk−4 no S1 xk−1 yes xk−1 yes ermst = t (yk − ŷk )2 . (35) N k=1 xk−2 yes xk−4 no yk−1 yes yk−1 yes As highlighted in Table 8, the Meta-MSS algorithm xk−1 yes xk−1 yes S2 returns a model with 9 regressors and a better per- x2k−1 yes x2k−1 yes formance than the model with 18 terms built using x3k−1 yes x3k−1 yes 3 FROLS. The step-by-step procedure used by FROLS yk−1 yes yk−1 no results in the selection of the first 12 output terms, while yk−1 xk−1 yes yk−1 xk−1 yes S3 x2k−2 yes x2k−2 yes only 4 output regressors are selected using Meta-MSS. yk−2 x2k−2 yes yk−2 x2k−2 yes From Table 8, one can see that the Meta-MSS algo- yk−2 yes yk−2 yes rithm have an affordable computational cost, since the yk−1 xk−1 yes yk−1 xk−1 yes time to select the model is very acceptable, even when yk−2 yes yk−4 no comparing with FROLS, which is known to be one of S4 the most efficient methods for structure selection. x2k−2 yes x2k−2 yes yk−2 x2k−2 yes yk−2 x2k−2 yes Further, it is interesting to note that the Meta-MSS yk−1 xk−1 yes yk−1 xk−1 yes returned a linear model even when the tests were per- yk−2 yes yk−4 no formed using the maximum nonlinearity degree ℓ = 2. S5 x2k−2 yes x2k−2 yes This result demonstrates the excellent performance of yk−2 x2k−2 yes yk−2 x2k−2 yes the method since the classical one was not able to reach yk−2 yes yk−2 yes a satisfactory result. Figure 2 depicts the free run sim- S6 xk−1 yes xk−1 yes ulation of each model. yk−2 xk−2 yes yk−2 xk−1 yes 4 Meta-Model Structure Selection (Meta-MSS): Build- single realization of the input signal. There are 8192 ing NARX for Classification samples per period. Note that transients are present in the first period of measurement. Because of many real-life problems associate continu- This case study represents a significant challenge ous and discrete variables, classification has been one of because it involves nonparametric analysis of the data, the most widely studied techniques for decision-making linearized modeling, and damping ratios versus the ex- tasks in engineering, health science, business and many citation level and nonlinear modeling around a single more. Many methods and algorithms have been de- mode. Also, the order of the system is reasonably high. veloped to data classification, which cover logistic re- In the 2 − 15Hz band, the F-16 possesses about 10 res- gression [43], random forest [44], support vector ma- onance modes. chines [45], k-nearest neighbors [46] and logistic-NARX The Meta-MSS algorithm and the FROLS are used model for binary classification [27]. The former three to select models to represent the dynamics of the F- algorithms are widely used, but the interpretation of 16 aircraft described above. In the first approach, the such models is a hard task. Regarding logistic-NARX, maximum nonlinearity degree and the lag of inputs and besides the computational efficiency and transparency, output were set to 2 and 10, respectively. In this case, it allows the inclusion of lagged terms straightforwardly the Meta-MSS select a model with 15 terms, but the while other techniques include lagged terms explicitly. model selected through FROLS diverged. Thus, we set Following the logistic-NARX approach, this section the maximum lag to 2. The Meta-MSS has chosen 3 adapts the Meta-MSS algorithm to develop NARX mod- regressors to form the model, while the FROLS failed els focusing on the prediction of systems with binary re- again to build an adequate model. Finally, the maxi- sponses that depend on continuous predictors. The pri- mum lag was set to 20 and the maximum nonlinearity mary motivation comes from the fact that the logistic- degree was defined to be 1. For the latter case, the Meta- NARX approach inherits not only the goodness of the MSS parameters are defined as listed in Table 7. The FROLS but all of its drawbacks related to being stocked

10 Wilson Rocha Lacerda Junior et al. Table 6: Comparative analysis - Meta-MSS vs RJMCMC Meta-MSS RJMCMC Model Correct Model 1 (7×) Model 2 Model 3 Model 4 Correct yk−1 xk−1 yes yk−1 xk−1 yk−1 xk−1 yk−1 xk−1 yk−1 xk−1 yes yk−2 yes yk−2 yk−2 yk−2 yk−2 yes S4 x2k−2 yes x2k−2 x2k−2 x2k−2 x2k−2 yes yk−2 x2k−2 yes yk−2 x2k−2 yk−2 x2k−2 yk−2 x2k−2 yk−2 x2k−2 yes - - - yk−3 xk−3 x2k−4 xk−1 x2k−3 no Table 7: Parameters used in Meta-Structure Selection 4.1 Logist NARX modeling approach using Meta-MSSc Algorithm for the F-16 benchmark algorithm Parameters nu nu2 ny ℓ p-value max iter n agents α G0 Values 20 20 20 1 0.05 30 15 23 100 In [27], the logistic-NARX is based on the FROLS al- gorithm to select the terms to compose the following probability model 1 pk = ⊤ Θ̂ . (36) −ψk−1 1+e The biserial coefficient is used to measure the rela- tionship between a continuous variable and a dichoto- mous variable according to [47]: r (1) Meta-MSS: ℓ = 1, ny = nx1 = nx2 = 20. X 1 − X 0 n1 n0 r(x, y) = , (37) σX N2 where X0 is the mean value on the continuous variable X for all the observations that belong to class 0, X1 is the mean value of variable X for all the observations that belong to class 1, σX is the standard deviation of (2) Meta-MSS: ℓ = 2, ny = nx1 = nx2 = 2. variable X, n0 is the number of observations that be- long to class 0, n1 is the number of observations that belong to class 1, and N is the total number of data points. Even though it is based on FROLS, the logistic- NARX approach requires the user to set a maximum number of regressors to form the final model, which is not required when using Meta-MSS algorithm for bi- (3) Meta-MSS: ℓ = 2, ny = nx1 = nx2 = 10. nary classification. The objective function of the Meta-MSS is adapted to use the biserial correlation to measure the associa- tion between the variables instead of the RMSE. For the continuous regression problem, the parameters are (4) FROLS: ℓ = 1, ny = nx1 = nx2 = 20. estimated using the LS method, which minimizes the Fig. 2: Models obtained using the Meta-MSS and the sum of squared errors of the model output. Because we FROLS algorithm. The FROLS was only capable to re- are dealing with categorical response variables, this ap- turn a stable model when setting ℓ = 1. The Meta-MSS, proach is not capable of producing minimum variance otherwise, returned satisfactory models in all cases. unbiased estimators, so the parameters are estimated via a Stochastic Gradient Descent (SGD) [48]: Apart from those changes, the main aspects of the standard Meta-MSS algorithm are held, such the regres- sor significance evaluation and all aspects of exploration and exploitation of the search space. Because the pa- in locally optimal solutions. A direct comparison with rameters are now estimated using SGD, the method be- the methods above is performed using the identifica- comes more computationally demanding, and this can tion and evaluation of two simulated models, and an slow down the method, especially when concerning with empirical system. large models.

Meta-Model Structure Selection: Building Polynomial NARX Model for Regression and Classification 11 Table 8: Identified NARX model using Algorithm 6 and FROLS. Meta-MSS Meta-MSS | ℓ = nx1 = nx2 = 2 Meta-MSS | ℓ = 2, nx1 = nx2 = 10 FROLS Model term Parameter Model term Parameter Model term Parameter Model term Parameter yk−1 0.7166 yk−2 0.6481 yk−1 1.3442 yk−1 1.7829 yk−5 0.2389 x1 k−1 1.5361 yk−2 −0.8141 yk−2 −1.8167 yk−8 −0.0716 x2 k−2 1.3857 yk−4 0.3592 yk−3 1.3812 yk−13 -0.0867 x1 k−6 14.8635 yk−6 1.5213 x1 k−2 1.5992 x1 k−7 −14.7748 yk−9 0.3625 x1 k−13 −1.1414 x2 k−1 −3.2129 x2 k−7 −2.4253 x2 k−4 2.2248 x2 k−3 7.1903 x1 k−1 1.8534 x2 k−8 −0.8383 x2 k−8 −4.0374 x2 k−3 1.9866 x2 k−13 −1.1189 x1 k−8 −1.5305 x2 k−1 0.6547 yk−7 1.2767 yk−5 1.3378 yk−10 −0.3234 yk−4 −1.0199 yk−8 −0.7116 yk−12 −0.2222 yk−11 0.3761 x1 k−20 0.0245 ermst 0.0862 0.1268 0.0982 0.0876 Elapsed time 27.92s 16.78 207.13 18.01s 4.2 Electroencephalography Eye State Identification higher score achieved by the popular techniques was 0.6473, considering the case where a principal compo- This dataset was built by [49] containing 117 seconds nent analysis (PCA) drastically reduced the data di- of EEG eye state corpus with a total of 14, 980 EEG mensionality. Thus, this result shows a powerful per- measurements from 14 different sensors taken with the formance of the Meta-MSSc algorithm. For comparison Emotiv EEG neuroheadset to predict eye states [27]. purpose, a PCA is performed, and the first five princi- Their dataset is now frequently used as a benchmark pal components were selected as a representation of the and is available on Machine Learning Repository, Uni- original data. Table 10 illustrates that Meta-MSSc has versity of California, Irvine (UCI) [50]. The reader is built the model with the best accuracy together with referred to [51] for additional information regarding the the Logistic NARX approach. The models built with- experiment. out autoregressive inputs have the worst classification Following the method in [27], the data is separated accuracy, although this is improved with the addition in a training set composed of 80% of the data and a of autoregressive terms. However, even with autoregres- testing set with the remainder. The eye state is encoded sive information, the popular techniques do not achieve as follows: 1 indicates the eye-closed and 0 the eye-open a classification accuracy to take up the ones obtained state. by the Meta-MSSc and Logistic NARX methods. Some statistical analysis was performed on train- ing dataset to check if the data have missing values or any outlier to be fixed. In this respect, were found 5 Conclusion values corresponding to inaccurate or corrupt records in all-time provided from sensors. The detected inac- This study presents the structure selection of polyno- curate values are replaced with the mean value of the mial NARX models using a hybrid and binary Particle remaining measurements for each variable. Also, each Swarm Optimization and Gravitational Search Algo- input sequence is transformed using scale and centering rithm. The selection procedure considers the individual transformations. The Logistic NARX based on FROLS importance of each regressor along with the free-run- was not able to achieve satisfactory performance when simulation performance to apply a penalty function in trained with the original dataset. The authors explained candidates solutions. The technique, called Meta-MSS the lousy performance as a consequence of the high algorithm in its standard form, is extended and ana- variability and dependency between the variables mea- lyzed into two main categories: (i) regression approach sured. Table 9 reports that the Meta-MSSc , on the other and (ii), the identification of systems with binary re- hand, was capable of building a model with 10 terms sponses using a logistic approach. and accuracy of 65.04%. The technique, called Meta-MSS algorithm, outper- This result may appear to be a poor performance. formed or at least was compatible with classical ap- However, the Logistic NARX achieved 0.7199, and the proaches like FROLS, and modern techniques such as

12 Wilson Rocha Lacerda Junior et al. Table 9: Identified NARX model using Meta-MSS. This model was built using the original EEG measurements. No comparison was made because the FROLS based technique was not capable to generate a model which performed well enough Model term constant x1 k−1 x4 k−30 x4 k−36 x4 k−38 x4 k−41 x6 k−2 x7 k−5 x12 k−1 x13 k−1 Meta-MSSc Parameter 0.2055 −0.1077 0.1689 0.1061 0.0751 0.1393 0.3573 −0.7471 −0.4736 0.3875 Table 10: Accuracy performance between different methods for Electroencephalography Eye State Identification Method Classification accuracy Meta-MSSc 0.7480 Logistic NARX 0.7199 Regression NARX 0.6643 Random Forest (without autoregressive inputs) 0.5475 Support Vector Machine (without autoregressive inputs) 0.6029 K-Nearest Neighbors (without autoregressive inputs) 0.5041 Random Forest (with autoregressive inputs) 0.6365 Support Vector Machine (with autoregressive inputs) 0.6473 K-Nearest Neighbors (with autoregressive inputs) 0.5662 RaMSS, C-RaMSS, RJMCMC, and a meta-heuristic Although some analysis are out of scope and, there- based algorithm. This statement considers the results fore, are not addressed in this paper, future work are obtained in the model selection of 6 simulated models open for research regarding the inclusion of noise pro- taken from literature, and the performance on the F-16 cess terms in model structure selection, which is an im- Ground Vibration benchmark. portant problem concerning the identification of poly- The latter category proves the robust performance nomial autoregressive models. In this respect, an excit- of the technique using an adapted algorithm, called ing continuation of this work would be to implement Meta-MSSc , to build models to predict binary outcomes an extended version of Meta-MSS to return NARMAX in classification problems. Again, the proposed algo- models. rithm outperformed or at least was compatible with popular techniques such as K-Nearest Neighbors, Ran- References dom Forests and Support Vector Machine, and recent approaches based on FROLS algorithm using NARX 1. S.A. Billings, Nonlinear system identification: NARMAX models. Besides the simulated example, the electroen- methods in the time, frequency, and spatio-temporal do- cephalography eye state identification proved that the mains (John Wiley & Sons, Chichester, 2013) 2. N. Wiener, Nonlinear problems in random theory. Tech. Meta-MSSc algorithm could handle the problem bet- rep., Massachusetts Institute of Technology (1958) ter than all of the compared techniques. In this case 3. W.J. Rugh, Nonlinear System Theory (Johns Hopkins study, the new algorithm returned a model with satis- University Press, 1981) factory performance even when the data dimensionality 4. R. Haber, L. Keviczky, Nonlinear System Identification - Input-Output Modeling Approach, vol. 1 and 2 (Kluwer was not transformed using data reduction techniques, Academic Publishers, 1999) which was not possible with the algorithms used for 5. R. Pintelon, J. Schoukens, System Identification: A Fre- comparisons purposes. quency Domain Approach (John Wiley & Sons, 2012) 6. S.A. Billings, I.J. Leontaritis, in Proceedings of the IEEE Furthermore, although the stochastic nature of the Conference on Control and its Application (1981), pp. Meta-MSS algorithm, the individual evaluation of the 183–187 regressors and the penalty function results in fast con- 7. I.J. Leontaritis, S.A. Billings, International Journal of vergence. In this respect, the computational efficiency is Control 41(2), 303 (1985) 8. S. Chen, S.A. Billings, International Journal of Control better or at least consistent with other stochastic proce- 49(3), 1013 (1989) dures, such as RaMSS, C-RaMSS, RJMCMC. The com- 9. L.A. Aguirre, S.A. Billings, Physica. D, Nonlinear Phe- putational effort relies on the number of search agents, nomena 80(1-2), 26 (1995) 10. L. Piroddi, W. Spinelli, International Journal of Control the maximum number of iterations, and the search space 76(17), 1767 (2003) dimensionality. Therefore, in some cases, the elapsed 11. M.L. Korenberg, S.A. Billings, Y.P. Liu, P.J. McIlroy, time of the Meta-MSS is compatible with the FROLS. International Journal of Control 48(1), 193 (1988) 12. S.A. Billings, S. Chen, M.J. Korenberg, International The development of a meta-heuristic based algo- journal of control 49(6), 2157 (1989) rithm for model selection such as the Meta-MSS permits 13. M. Farina, L. Piroddi, International Journal of Systems a broad exploration in the field of system identification. Science 43(2), 319 (2012)

Meta-Model Structure Selection: Building Polynomial NARX Model for Regression and Classification 13 14. Y. Guo, L.Z. Guo, S.A. Billings, H. Wei, International 45. N. Cristianini, J. Shawe-Taylor, An introduction to sup- Journal of Systems Science 46(5), 776 (2015) port vector machines and other kernel-based learning 15. K.Z. Mao, S.A. Billings, Mechanical Systems and Signal methods (Cambridge university press, 2000) Processing 13(2), 351 (1999) 46. M. Kuhn, K. Johnson, Applied predictive modeling, 16. S.A. Billings, L.A. Aguirre, International journal of Bi- vol. 26 (Springer, 2013) furcation and Chaos 5(06), 1541 (1995) 47. J. Pallant, SPSS survival manual (McGraw-Hill Educa- 17. P. Palumbo, L. Piroddi, Journal of Sound and Vibration tion (UK), 2013) 239(3), 405 (2001) 48. L. Bottou, in Neural networks: Tricks of the trade 18. A. Falsone, L. Piroddi, M. Prandini, Automatica 60, 227 (Springer, 2012), pp. 421–436 (2015) 49. M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reute- 19. H. Akaike, IEEE Transactions on Automatic Control mann, I.H. Witten, ACM SIGKDD explorations newslet- 19(6), 716 (1974) ter 11(1), 10 (2009) 20. H. Akaike, Annals of the Institute of Statistical Mathe- 50. A. Asuncion, D. Newman. Uci machine learning reposi- matics 21(1), 243 (1969) tory (2007) 21. G. Schwarz, The Annals of Statistics 6(2), 461 (1978) 51. T. Wang, S.U. Guan, K.L. Man, T.O. Ting, Mathemati- 22. S. Chen, X. Hong, C.J. Harris, IEEE Transactions on cal Problems in Engineering 2014 (2014) Automatic Control 48(6), 1029 (2003) 23. R. Tempo, G. Calafiore, F. Dabbene, Randomized al- gorithms for analysis and control of uncertain systems: with applications (Springer Science & Business Media, 2012) 24. T. Baldacchino, S.R. Anderson, V. Kadirkamanathan, Automatica 49(9), 2641 (2013) 25. K. Rodriguez-Vazquez, C.M. Fonseca, P.J. Fleming, IEEE Transactions on Systems, Man, and Cybernetics- Part A: Systems and Humans 34(4), 531 (2004) 26. A.G.V. Severino, F.M.U.d. Araáujo, in Simpósio Brasileiro de Automação Inteligente (2017), pp. 609–614 27. J.R.A. Solares, H.L. Wei, S.A. Billings, Neural Comput- ing and Applications 31(1), 11 (2019) 28. C. Blum, A. Roli, ACM computing surveys (CSUR) 35(3), 268 (2003) 29. A.E. Eiben, C.A. Schippers, Fundamenta Informaticae 35(1-4), 35 (1998) 30. E.G. Talbi, Journal of heuristics 8(5), 541 (2002) 31. S. Mirjalili, S.Z.M. Hashim, in 2010 international con- ference on computer and information application (IEEE, 2010), pp. 374–377 32. J. Kennedy, R.C. Eberhart. Particle swarm optimization, ieee international of first conference on neural networks (1995) 33. J. Kennedy, Encyclopedia of machine learning pp. 760– 766 (2010) 34. E. Rashedi, H. Nezamabadi-Pour, S. Saryazdi, Informa- tion sciences 179(13), 2232 (2009) 35. S. Mirjalili, A. Lewis, Neural Computing and Applica- tions 25(7-8), 1569 (2014) 36. S. Mirjalili, G.G. Wang, L.d.S. Coelho, Neural Comput- ing and Applications 25(6), 1423 (2014) 37. S. Mirjalili, A. Lewis, Swarm and Evolutionary Compu- tation 9, 1 (2013) 38. H. Wei, S.A. Billings, International Journal of Modelling, Identification and Control 3(4), 341 (2008) 39. M. Bonin, V. Seghezza, L. Piroddi, IET control theory & applications 4(7), 1157 (2010) 40. L.A. Aguirre, B.H.G. Barbosa, A.P. Braga, Mechanical Systems and Signal Processing 24(8), 2855 (2010) 41. F. Bianchi, A. Falsone, M. Prandini, L. Piroddi, A ran- domised approach for narx model identification based on a multivariate bernoulli distribution. Master’s thesis, Po- litecnico di Milano (2017) 42. J.P. Nöel, M. Schoukens, in 2017 Workshop on Nonlinear System Identification Benchmarks (2017), pp. 19–23 43. D.W. Hosmer Jr, S. Lemeshow, R.X. Sturdivant, Applied logistic regression, vol. 398 (John Wiley & Sons, 2013) 44. L. Breiman, Machine learning 45(1), 5 (2001)

You can also read