# SENSOR NUMERICAL PREDICTION BASED ON LONG-TERM AND SHORT-TERM MEMORY NEURAL NETWORK - DIVA

←

**Page content transcription**

If your browser does not render page correctly, please read the page content below

Sensor numerical prediction based on long-term and short-term memory neural network Yangyang Wen Type of document -- Computer Engineering BA(C),Final Project,Examensarbete Main field of study: Computer Engineering Credits: 15 credits Semester/Year: Spring,VT2020 Supervisor: Dr. Forsström Stefan,stefan.forsstrom@miun.se Examiner: Dr. Ulf Jennehag, e-mail ulf.jennehag@miun.se Course code/registration number: DT099G Degree programme: Exchange Student

Sensor numerical prediction based on LSTM neural network Yangyang Wen 2020-06-05 Abstract Many sensor nodes are scattered in the sensor network,which are used in all aspects of life due to their small size, low power consumption, and multiple functions. With the advent of the Internet of Things, more small sensor devices will appear in our lives. The research of deep learning neural networks is generally based on large and medium-sized devices such as servers and computers, and it is rarely heard about the research of neural networks based on small Internet of Things devices. In this study, the Internet of Things devices are divided into three types: large, medium, and small in terms of device size, running speed, and computing power. More vividly, I classify the laptop as a medium- sized device, the device with more computing power than the laptop， like server, as a large-size IoT(Internet of Things) device, and the IoT mobile device that is smaller than it as a small IoT device. The purpose of this paper is to explore the feasibility, usefulness, and effectiveness of long-short-term memory neural network model value prediction research based on small IoT devices. In the control experiment of small and medium-sized Internet of Things devices, the following results are obtained: the error curves of the training set and verification set of small and medium-sized devices have the same downward trend, and similar accuracy and errors. But in terms of time consumption, small equipment is about 12 times that of medium-sized equipment. Therefore, it can be concluded that the LSTM(long-and-short-term memory neural networks) model value prediction research based on small IoT devices is feasible, and the results are useful and effective. One of the main problems encountered when the LSTM model is extended to small devices is time-consuming. Keywords: deep learning, long-and-short-term memory neural networks, sensor networks, Internet of Things, value prediction. i

Sensor numerical prediction based on LSTM neural network Yangyang Wen 2020-06-05 Acknowledgements When conducting research and writing this thesis, I am participating in the Chinese-Swedish Exchange Program and working as an exchange student in Sweden. During the period, I am very grateful to teacher Peihai Zhao for his domestic guidance.. At the same time, I feel pretty grateful to my supervisor Forsström Stefan for providing me the equipment needed for the experiment and for supporting me to modify the paper. Thank Tingting Zhang for providing me the industrial sensor data set. ii

Sensor numerical prediction based on LSTM neural network Yangyang Wen 2020-06-05 Table of Contents 1 Introduction................................................................................................... 1 1.1 Background and problem motivation..................................................... 1 1.2 Overall aim.................................................................................................. 3 1.3 Detailed problem statement...................................................................... 4 1.4 Scope............................................................................................................. 4 1.5 Outline..........................................................................................................5 1.6 Contributions.............................................................................................. 5 2 Theory............................................................................................................. 6 2.1 Sensor networks.......................................................................................... 6 2.2 Neural Network.......................................................................................... 7 2.2.1 Simple Neural Network..........................................................................7 2.2.2 Recurrent neural network...................................................................... 8 2.2.3 Long and short-term memory neural network model....................... 9 2.2.4 The working process of long and short-term memory neural network model................................................................................................ 11 2.3 Data circulation process in LSTM Model.............................................. 17 2.4 Error estimation method......................................................................... 20 2.5 Tensor......................................................................................................... 21 2.6 Related research work..............................................................................22 2.6.1 LSTM-based stock returns prediction method: Take the Chinese stock market as an example.......................................................................... 22 2.6.2 Travel time prediction of LSTM neural network.............................. 22 2.6.3 Development and application of deep neural network soft sensor based on LSTM................................................................................................23 2.6.4 Analysis of industrial IoT devices based on LSTM.......................... 23 3 Methodology............................................................................................... 25 3.1.1 Pycharm.................................................................................................. 25 3.1.2 Anaconda................................................................................................ 26 3.2 LSTM univariate value prediction process........................................... 26 3.2.1 Data set.................................................................................................... 27 3.2.2 Data preprocessing................................................................................ 28 3.2.3 Data segmentation................................................................................. 30 3.2.4 Loss Function......................................................................................... 31 3.2.5 Multivariate experiment....................................................................... 31 3.3 Value prediction control experiment..................................................... 33 3.3.1 Univariate value prediction experiment based on laptop............... 33 3.3.2 Univariate value prediction experiment based on Raspberry Pi... 34 iii

Sensor numerical prediction based on LSTM neural network Yangyang Wen 2020-06-05 4 Implementation and Results.................................................................... 36 4.1 Environment and tools............................................................................ 36 4.1.1 Computer environment configuration............................................... 36 4.1.2 Environment configuration of Raspberry Pi..................................... 38 4.1.3 Version of equipment and environment............................................ 39 4.2 Multivariate experiment.......................................................................... 39 4.2.1 Experimental results of laptop.............................................................39 4.2.2 Experimental results of Raspberry Pi................................................. 45 4.3 Controlled experiment............................................................................. 50 4.3.1 The optimal model based on minimum error................................... 50 4.3.2 The optimal model based on minimum time consumption............ 53 5 Summary and Analysis of Results.......................................................... 57 5.1 Experimental summary of Multivariate experiment...........................57 5.2 Experimental summary of Control experiment................................... 58 6 Conclusions................................................................................................. 59 6.1 Ethical and Societal Discussion.............................................................. 60 6.2 Future Work.............................................................................................. 61 iv

Sensor numerical prediction based on LSTM neural network Yangyang Wen 2020-06-05 Terminology Acronyms Adam adaptive moment estimation,a method for stochastic Optimization ARIMA Differential autoregressive moving average model BP Back Propagation KNN K nearest neighbor classification algorithm LSTM Long Short-Term Memory MAE Mean Absolute Error MSE Mean Square Error RMSE Root Square Error RNN Recurrent nerual network Mathematical notation b bias bh bias of hidden layer C Cell state fa activate function Fo Forget gate in LSTM Ht the results of hidden layer in RNN In Input gate in LSTM Ot the results of Output layer in RNN Out Output gate in LSTM S sigmoid activate function v

Sensor numerical prediction based on LSTM neural network Yangyang Wen 2020-06-05 tanh activate function W weight Wih the weight from Input layer to hidden layer Z input model vi

Sensor numerical prediction based on LSTM neural network Yangyang Wen 2020-06-05 1 Introduction The concept of Industry 4.0 means that people are using information technology to promote industrial revolution and enter the era of intelligence. Data collection is the most practical and high-frequency demand in intelligent manufacturing, and it is also a prerequisite for Industry 4.0. Traditional data collection methods include manual entry, questionnaires, and telephone follow-up. Nowadays, the the sensor data collection method is one of the methods that directly changes the application scenarios of big data. It is estimated that by 2030, the number of small IoT devices will reach one trillion, and many of them will be small embedded devices with sensors and actuators. Due to the scale of the equipment and the large number of sensors, the actuator generates a lot of information every day. The sensor data collection method overcomes the error-prone and low-efficiency problems of traditional data collection methods, but the storage and processing difficulties caused by massive data are a major difficulty for sensors. The emergence of “dirty data” not only requires storage space, but also needs to be processed, which technically increases the difficulty. Therefore, the sensor value prediction method research based on LSTM will have important practical and innovative significance. For countries, sensor value prediction can effectively avoid the storage of dirty data and duplicate data, effectively save cloud storage, optimize the layout of network space resources. For engineering construction, the sensor value prediction can guarantee the effect of engineering construction and data analysis. For enterprises and the public, accurate sensor value prediction can provide data support for the development strategies of various enterprises and major decisions of the public, and improve the feasibility and reliability of decisions. 1.1 Background and problem motivation As early as 1998, new ideas for applying neural networks to solve problems in the field of sensors have been proposed proposed a new method for diagnosing sensor failure[1] based on neural network time series predictor. Its principle is to use the difference between predicted value of the neural network and the actual output value of the sensor to judge whether the sensor has failed. At present, many scholars around 1

Sensor numerical prediction based on LSTM neural network Yangyang Wen 2020-06-05 the world have applied many mature models to the field of value prediction, K nearest neighbor classification (KNN) algorithm, differential autoregressive moving average model(ARIMA) and BP neural network. The details are as follows, uses the moving average method[2] to forecast the demand for car sales in 4S stores[3]; and use the KNN[4] method to predict the current category based on k historical data. The category of the current time data in the past K days of data is the category to which most of the data belongs. However, the prediction results of this method are only related to a very small number of adjacent samples, and when one sample has a large sample size and the other sample classes are small, this method cannot guarantee the accuracy of the prediction result. At the same time, the algorithm requires a large amount of calculation. For each text to be classified, the distance to all known samples must be calculated to obtain its K nearest neighbors. The ARIMA method[5] is used to predict the short-term expressway traffic passager flow; use the gray system theory prediction [6] model and the time series ARIMA prediction model to predict the traffic flow respectively, and on this basis, the combined model’s prediction accuracy is higher than that of the gray prediction model and the time series analysis model, and it has the advantages of model simplicity and strong interpretability. However, due to the characteristics of non- linearity and uncertainty of the monitoring data, the parametric model cannot describe its unique properties well, resulting in a larger prediction error than the non-parametric model. The temperature compensation of neural network humidity sensor based on PSO-BP(Particle Swarm Optimization)algorithm[7] improves the compensation accuracy of the original BP neural network, but it still has the limitation of falling into extreme value. Short-term prediction of urban passenger traffic based on GSO-BP[8] (Glowworm Swarm Optimization) neural network has a slow learning speed and is easy to fall into local minimum value.It performs better when the time series data is shorter. In the case of processing long-term series data, it will lead to the phenomenon that the earlier data has less influence on the current prediction. It can be seen that choosing the appropriate optimization algorithm can improve the performance of the prediction 2

Sensor numerical prediction based on LSTM neural network Yangyang Wen 2020-06-05 model to a certain extent; and to break through the limitations, it is necessary to develop a more appropriate model. The LSTM neural network model can overcome the "forgetting" phenomenon and effectively solve the problem of gradient disappearance. Since its introduction, it has aroused great attention at home and abroad. The use of LSTM for continuous prediction[9] which enriched the traditional recurrent neural network (RNN),can solve many traditional problems that cannot be solved by RNN learning algorithms. The conbination method of LSTM and GRU neural network for traffic flow prediction[10] proves that deep learning methods using cyclic neural network such as LSTM create short-term memory and GRU gate The performance of the neural network controlling the recurrent unit is due to the autoregressive integrated moving average (ARIMA) model. What’s more, the LSTM neural network is used to predict stock price trends[11] .Compared with other machine learning methods, when predicting whether a particular stock will be in the near future When it will rise, the accuracy of the LSTM model reaches 55.9%. Due to the satisfactory performance of the long and short-term memory model in time series research, this study selects the LSTM model for the machine learning method used for sensor value prediction. 1.2 Overall aim There are problems such as data duplication and data loss when sensors collect data, which wastes storage space and increases sensor consumption. Neural network value prediction can reduce the frequency of sensor data collection without affecting the actual results, and use value prediction methods to make up for the missing data, which can reduce labor costs and machine repeated operation costs. There are problems such as data duplication and data loss when sensors collect data, which wastes storage space and increases sensor consumption. Neural network value prediction can reduce the frequency of sensor data collection without affecting the actual results, and use value prediction methods to make up for the missing data, which can reduce labor costs and machine repeated operation costs. 3

Sensor numerical prediction based on LSTM neural network Yangyang Wen 2020-06-05 Currently, the hardware devices on which deep learning systems are based require large memory capacity, large GPUs, or strong CPU computing power. Neural network-based image recognition often requires server support. Based on the deep learning of mobile devices, the neural network model is directly hosted in the cloud, and the data can be uploaded through the mobile application to obtain the prediction results. This shows that the application of deep learning on small Internet devices is still very limited. In order to more clearly understand the main difficulties encountered when applying deep learning neural networks to small IoT devices, in order to explore the future fusion development of small IoT devices and deep learning, the project's aim is to compare LSTM on small IoT with LSTM on big or middle size IoT as a analysis to the performance of different IoT's LSTM prediction experiment. And also study and evaluate the feasibility, usefulness and effectiveness of the LSTM value prediction deep learning model applied to small IoT devices, and provide a reference for the future development of sensors and neural networks. 1.3 Detailed problem statement The survey has an objective to achieve the following three goals: 1. Choose the suitable tools and libraries to build a deep learning environment. 2. Propose and introduce the LSTM model for univariate value prediction process. 3. The trained LSTM model is used for sensor value prediction research, and control experiments are set on the laptop and Raspberry Pi to evaluate and compare their training set’s loss value, testing set’s loss value and time consumption. 1.4 Scope The study has its focus on comparing the performance of LSTM value prediction experiments on large, medium and small IoT. In the experiment, the choice of hyperparameters is as moderate as possible, the purpose is to ignore the effect of overfitting on the experimental results. The prediction results on different devices are distinguished 4

Sensor numerical prediction based on LSTM neural network Yangyang Wen 2020-06-05 according to the performance evaluation of the loss value of the training set , the loss value of the verification set and the time consumption.In the multivariate experiment, only the batch size and data set division in the hyperparameters were selected as independent variables, and the other parameters were initialized by default. For details, see the multivariate experiments in Part3 and Part 4. 1.5 Outline Chapter 2 describes data and basic theory used in LSTM value prediction research. Chapter 3 designs the method to realize the value prediction goal. Chapter 4 is the specific implementation process of the design scheme corresponding to the third section.Chapter 5 is the results part, which is a summary of the experimental results. Chapter 6 is conclusion part, which discusses the future research of LSTM neural network value prediction and introduces the controversy caused by its moral aspects. 1.6 Contributions The study of this thesis is completed independently. The main work has been completed as follows: determine the research purpose, design the research plan, execute the research process and analyze the research results. And for all the tables used in the article, the flow charts and data charts are drawn by me or obtained by my own experiment. 5

Sensor numerical prediction based on LSTM neural network Yangyang Wen 2020-06-05 2 Theory This chapter will introduce some basic theories which I will use after.And also I will anaysis some related works and make conclusion about them here. 2.1 Sensor networks The development of wireless communication and electronic technology has promoted the development of low-power, low-cost, and multi- functional sensors. The sensor network is composed of many sensor nodes, and the sensor nodes communicate with each other to jointly realize the response to the surrounding environment or phenomenon [12]. These tiny sensors can communicate without limits within a certain range, have the ability to perform simple processing and calculation on local data, and realize the function of transmitting required data or locally calculated data to the network. In order to monitor a certain area or phenomenon, a sensor network deploys a large number of sensors in the area or on the surface of the phenomenon. These tiny sensor nodes have the following characteristics: large number of sensor nodes, intensive deployment, prone to failure, frequent network topology changes, broadcast communication, and most importantly, sensor nodes are limited in terms of power, computing power, and memory . The dense deployment of sensors leads to the overlapping of the monitoring range of the sensors, which causes a waste of storage space and collects a large amount of redundant information. However, problems such as fast power consumption and node failures can cause data loss, which in turn affects the sensor's monitoring of the environment or phenomena. The sensor network has the characteristics of easy deployment, self- organization, fault tolerance and wide application range. On the military side, sensor networks can be used for reconnaissance, targeting, intelligent control, and computing. In terms of health, the collected data can be used to monitor various indicators of patients. In terms of life, the sensor network can calculate the changes in the surrounding environment through the collected data, control the switching of household appliances, and regulate the temperature and humidity. Today, the very popular smart home technology is also inseparable from sensor networks. 6

Sensor numerical prediction based on LSTM neural network Yangyang Wen 2020-06-05 2.2 Neural Network The inspiration of the neural network model comes from simulating the processing of external information by the human brain. Every day our brain processes various stimuli from the external environment to guide our behavior. Three subjects are involved in this process: external stimuli, brain, and guidance scheme. 2.2.1 Simple Neural Network As shown in Figure 2-1, this is a schematic diagram of a simple fully connected neural network. It consists of an input layer, a hidden layer, and an output layer, which correspond to the external stimuli, brain, and program during the processing of external information by the human brain. There is only one neuron in the input and output layers. Hidden layers can have multiple layers to deal with more complex practical problems. The human brain is composed of many neurons, which contact each other and transmit messages through neurotransmitters. The neurons in the model, that is, circles, represent a computing center. They calculate the incoming data to determine whether the calculation result of the activation function satisfies the passing conditions. If so, the message is passed to the next computing center. Figure 2-1 Neural network 7

Sensor numerical prediction based on LSTM neural network Yangyang Wen 2020-06-05 The topological diagram of the neural network in Figure 2-1 shows that one neuron in the hidden layer will be stimulated by the three neurons in the input layer and decide whether to stimulate the two neurons in the output layer. The influence of input layer neurons on hidden layer neurons can be measured by weights. Whether the neurons in the hidden layer are activated to deliver a message to the next neuron depends on the calculation result of the activation function. Figure 2-2 details the complete process of a neuron in the hidden layer of Figure 2- 1 from being stimulated to delivering a message to the next neuron. Figure 2-2 The calculation process of neurons 2.2.2 Recurrent neural network RNN is a machine learning model based on sequence model. It can effectively solve the problem of nonlinear time series. As shown in Figure 2-4 below, it shows the process of RNN model processing sequence data. For a given sequence I (I1, I2, I3..In), enter its sequence into the model, the hidden layer uses the following two formulas (1) ~ (2) to iteratively calculate each data in sequence I, and finally get a hidden Layer sequence H (H1, H2, H3 ... Hn) and an output layer sequence O (O1, O2, O3 ... On). among them: Ht=fa(WihIt + WhhHt-1 + bh) (1) Ot=WhoHt + bo (2) 8

Sensor numerical prediction based on LSTM neural network Yangyang Wen 2020-06-05 (a) RNN model (b) Hidden layer neuron structure Figure 2-3 Recurrent neural network model and hidden layer neuron structure For formulas (1) to (2), Ht represents the calculation result of the hidden layer, Ot represents the predicted result of the output, fa represents the activation function, W represents the weight, and b represents the bias. The order of the subscripts has a directional meaning, Wih represents the weight of the input layer named I to the hidden layer named H, bh represents the offset of the hidden layer, and so on. However, when the input sequence is too long, when the weights are updated, the effect of the sequence will show an exponential downward trend with time, and the gradient will exponentially decay with back propagation, that is, there are problems such as gradient disappearance and gradient explosion. , Affecting the accuracy of the model training results. In this case, the LSTM long-short-term memory neural network model will reflect its unique advantages. LSTM neural network has a long-term memory function, which can overcome the problems of gradient disappearance and gradient explosion, and effectively deal with time series. 2.2.3 Long and short-term memory neural network model In order to better understand the LSTM model, this section will introduce the structure and functional steps of the LSTM model neuron. The training process of the LSTM model is divided into four steps [13]: input time series data for forward calculation; and then reversely 9

Sensor numerical prediction based on LSTM neural network Yangyang Wen 2020-06-05 calculate the error based on the output sequence of the prediction result and the true value sequence; use the back propagation algorithm to calculate the gradient of each weight ; Finally, choose the gradient- based parameter optimization algorithm to update the weights. The forward calculation formula of the LSTM model and the results of the LSTM neurons are as follows: Int=S(WxInXt + WhInHt-1 + WcInCt-1 + bIn) (3) Fot=S(WxFoXt + WhFoHt-1 + WcFoCt-1 + bFo) (4) Ct=FotCt-1 + Inttanh(WxCXt +WhCHt-1 + bC) (5) Outt=S(WxOutXt + WhOutHt-1 + WcOutCt + bOut) (6) Ht=Yt=Outttanh(Ct) (7) Figure 2-4 LSTM model neuron structure Similarly, the input sequence is X (X1, X2 ... Xt ..), and the forward calculation formulas (3) ~ (7) are iteratively calculated to obtain the hidden layer sequence H (H1, H2 ..., Ht ..) And the output sequence Y (Y1, Y2 ... Yt ..). Understanding Figure 2-5 helps to better understand the meaning of the formula. In the formula and structure diagram, In, 10

Sensor numerical prediction based on LSTM neural network Yangyang Wen 2020-06-05 Out, Fo, S, tanh, Z, and C respectively represent the input gate, output gate, forget gate, sigmoid activation function, tanh activation function, input module, and cell state of the neural network. Xt represents an element in the input sequence, H (t-1) represents the calculation result of the previous hidden layer, and Ht represents the calculation result of this hidden layer. C (t-1) represents the cell state of the previous layer, and Ct represents the cell state of this layer. The parameter optimization algorithm is used to update the model parameters. A good optimization algorithm can make the model converge faster and complete the parameter update faster. The parameter optimization algorithm based on gradient can find the best advantage as the gradient direction to update the weight. This article selects Adam optimization algorithm to update the model parameters. Adam adaptive momentum estimation algorithm as a gradient-based parameter optimization algorithm is widely used because of its easy implementation, high computational efficiency, and low memory footprint. 2.2.4 The working process of long and short-term memory neural network model In order to better understand the flow of the LSTM model for prediction work, and to understand the connected working modules and related functions of the LSTM input layer, output layer and hidden layer as a whole, this section will introduce the LSTM working process in detail. 11

Sensor numerical prediction based on LSTM neural network Yangyang Wen 2020-06-05 Figure 2-5 LSTM working process The LSTM model can be divided into two parts: network training and network prediction. Network training is used to train a unique and suitable LSTM prediction model with good prediction effect and good weight parameter matching for specific experimental purposes. As we all know, the data set is divided into a training set and a test set. The training set is used to train the network, and the test set is used to make network predictions to verify how effective the trained model is, whether there is underfitting or overfitting. According to the phenomenon, the basis of judgment is usually the error loss value calculated by the loss function. The training work can be subdivided into original data description and cleaning, data set division, standardization and data segmentation (also can be understood as data format conversion, that is, the standardized data sequence is divided into tensors that can be directly processed by LSTM). A. Network training The network training takes the hidden layer as the research object. The input sequence X of the input layer needs to meet the data format requirements of the hidden layer, and the format of the output sequence P of the output layer depends on the calculation result of the hidden layer. LSTM is a special recurrent neural network RNN. Their basic principles are the same. LSTM has been further improved. 12

Sensor numerical prediction based on LSTM neural network Yangyang Wen 2020-06-05 Compared with the RNN model, the advantage of the LSTM model is that in addition to solving the problems of gradient disappearance and gradient explosion, it is not necessary to set the length of the window in advance, and it can also be said that the number of windows. In real life, the length of the long-term data set is large, and the implementation cannot be estimated. The LSTM model has more practical significance. Starting from the model input cutting process of window prediction by RNN, the output cutting process of LSTM model is explored. First, in the input layer, define the sensor time series as t = {t1, t2, ..., tn}, and divide it into training set ttrain = {t1, t2, ..., tm} and test set ttest = { tm+1, tm+2, ..., tn}, satisfy the constraint m

Sensor numerical prediction based on LSTM neural network Yangyang Wen 2020-06-05 each window is i to m-L + i-1. The input and theoretical output of the model are a subset of the training set, then the corresponding theoretical output is Y={Y1,Y2,...,YL} (13) Yi={Ti+1,Ti+2,...,Tm-L+i} (14) Input the sequence X into the hidden layer, combined with the recurrent neural network mentioned in the theoretical section, we can know that the sequence will be processed by L LSTM neurons, and iteratively calculates the forward calculation formulas (3) ~ (7) to obtain the predicted sequence P. The output of X after passing through the hidden layer can be expressed as: P={P1，P2，...，P1} (15) Pp=LSTMforward(Xp,Cp-1,Hp-1) (16) Cp-1 and Hp-1 respectively represent the state of the previous neuron and the predicted result of the output. LSTMforward represents the forward calculation formula. It can be seen that the hidden layer output sequence P, the model input sequence X and the theoretical output sequence Y windows (size, length) can be represented by a two- dimensional tensor table with the shape (m-L, L). In this experiment, the average absolute error calculation method MAE is selected as the loss function. The elements of the theoretical output are yi and the elements of the prediction result are pi. The loss function is expressed as: 1 ( m L) L Loss= (m L) L i 1 ( pi yi ) 2 (17) Set the results of the random seed reduction model and set the number of training steps for the purpose of saving time. Select Adam's gradient- based model optimization algorithm to update the network weights, and finally get a trained LSTM hidden layer network. (2) LSTM model cutting training Ttrain = {T1, T2, ..., Tm} LSTM needs to set the window size, and the number of windows can be calculated automatically. Set the size of the split window to s, then the split model input is: 14

Sensor numerical prediction based on LSTM neural network Yangyang Wen 2020-06-05 X={X1,X2,...,Xn-s} (18) Xi={Ti,Ti+1,Ti+2,...,Ts+i-1} (19) 1≦i≦n-s,i∈N (20) The size of the window is n, which means that there will be n-s prediction results. The number of data in each window is from i to s + i- 1, a total of s elements. The input and theoretical output of the model are a subset of the training set, then the corresponding theoretical output is: Y={Y1,Y2,...,Yn-s} (21) Yi={Ti+1,Ti+2,...,Ts+i} (22) Input the sequence X into the hidden layer, combined with the recurrent neural network mentioned in the theoretical section, we can know that the sequence will be processed by L LSTM neurons, and iteratively calculates the forward calculation formulas (3) ~ (7) to obtain the predicted sequence P. The output of X after passing through the hidden layer can be expressed as: P={P1，P2，...，Pn-s} (23) Pp=LSTMforward(Xp,Cp-1,Hp-1) (24) Cp-1 and Hp-1 respectively represent the state of the previous neuron and the predicted result of the output. LSTMforward represents the forward calculation formula. It can be seen that the windows (size, length) of the hidden layer output sequence P, the model input sequence X and the theoretical output sequence Y can be represented by a two-dimensional tensor table of shape (s, n-s). In this experiment, the average absolute error calculation method MAE is selected as the loss function. The elements of the theoretical output are yi and the elements of the prediction result are pi. The loss function is expressed as: s (ns ) (p i yi ) 2 Loss= i 1 (25) s(n s) 15

Sensor numerical prediction based on LSTM neural network Yangyang Wen 2020-06-05 Set random seed reduction model results, set training steps, save time. Select Adam's gradient-based model optimization algorithm to update the network weights, and finally get a trained LSTM hidden layer network. B. Network prediction The network prediction part mainly includes the following contents: iterative prediction, anti-normalized prediction results; calculation of the error between the prediction results and the corresponding theoretical set. The prediction process uses an iterative method, and each prediction result will be used as the last element of the next model input sequence to predict the next result. The process of network prediction is as follows: In the first step, the last input sequence of training set X is Xf = Tf = {Tm-n, Tm-n+1, ..., Tm-1}, and the final theoretical output is Yf = {Tm-n+1, Tm-n+2, ..., Tm}. This means the end of model training and the start of model prediction. In the second step, Yf is input into the LSTM model as an input sequence. An output sequence Pf is obtained, and the last element of the sequence represents the first prediction result of the model. Pf = LSTM (Yf) = {Pm-n+2, Pm-n+3, ..., Pm+1}, this formula represents the predicted value at the time of m + 1 is Pm+1. So a new output sequence is Yf+1 = {Tm-n+2, Tm- n+3, ..., Tm, Pm+1}. Take it as input to the model calculation and iterate this step until the length of the prediction sequence is equal to the length of the test set to stop prediction. The prediction sequence at this time is P = {Pm+1, Pm+2, ..., Pn}. The accuracy of the model is measured by the error, and the error represents the difference between the predicted value and the actual value. Now that we have obtained the sequence of predicted values, we only need to calculate with the corresponding theoretical value set. However, it should be noted that in the previous step, normalization converges the range of the training set data between 0-1, so the size of each data in the current predicted value sequence is also between 0-1. Denormalize the prediction set as: 16

Sensor numerical prediction based on LSTM neural network Yangyang Wen 2020-06-05 n n 2 n (ti ti / n) t i pi=Pi t 1 i 1 + i 1 (26) n n m+1≦i≦n,i∈N Error calculation. The error (Ptrain, Ttrain), Error (Ptestt, Ttest) of the training set and the test set are calculated by using the Loss function of equation (25). 2.3 Data circulation process in LSTM Model This section mainly introduces the whole process of the original data, sequentially, and without duplication through the LSTM model. Analyze the professional terms involved in the process.As shown in Figure 2-6 below, the entire process from the original data taken from the file to the LSTM model includes: 1．reading into the buffer; dividing the batch; 2．standardizing and data segmentation of the first batch; 3．input LSTM neural network ; 4．Reverse calculation of error and weight; 5．update weight parameters; 6．continue to traverse the next batch until all data is processed. 17

Sensor numerical prediction based on LSTM neural network Yangyang Wen 2020-06-05 Figure 2-6 The whole process of data passing through the LSTM model Here, introduce some professional terms that are often used in neural networks: buffer, batch, batch_size, iteration, steps, epoch. As the name implies, buffer is just a buffer that we usually say. The original sensor value data needs to pass through the buffer before entering the LSTM network model. When the data set is very small, only a few hundred or thousands, the buffer can read all sensor value data at once. When the data set is large, there are tens of millions of data types, the buffer can not read all the data at once. Therefore, only part of the data can be entered first. After the data is called to leave the buffer, the following data immediately enters the buffer to fill the gap. We often say that buffer_size represents the size of the buffer. As shown in the data pipeline diagram in the following figure, the data in the buffer is cut into batches, each batch is sequentially standardized, and the data is divided and then sent to the LSTM model. 18

Sensor numerical prediction based on LSTM neural network Yangyang Wen 2020-06-05 Figure 2-7 Data pipeline Batch meaning "batch" can be abstractly understood as packing 10 boxes of milk into a box, this box of milk is also called a batch of milk. Batch_size, meaning batch size. Therefore, in the above example, it can be well understood as the quantity of a batch of milk, batch_size is equal to 10. Looking back at the working process of the LSTM model in Figure 2-5, a batch here is equivalent to the training set of the input layer, which can only be entered into the LSTM model for training after standardization and data segmentation. The purpose of batching data is mainly to: (1) improve the utilization of memory; (2) increase the number of model iterations and parameter updates, so as to better converge to the optimal performance model. Iteration is the number of times an batch completes an epoch. A batch performs forward calculation through the LSTM network, and then reversely updates the weight parameters of the LSTM network. This process is called an iteration. Steps refer to the number of steps to train the network, but in fact it has the same meaning as iteration, so steps and iteration are collectively called iteration in the back. Epoch, translated as "epoch", I understand it as a cycle. All data in the training set completes the model calculation task, which is called an epoch. Assuming that the size of the data set is S, in an epoch, Iteration and batch satisfy the following relationship: 19

Sensor numerical prediction based on LSTM neural network Yangyang Wen 2020-06-05 S Batch = Iteration = (27) batch _ size Sometimes it is not enough to train the model once with the complete data set. It is necessary to repeatedly use the data set to train several times. In this case, the number of batches is greater than Iteration, and the iteration value is unchanged. 2.4 Error estimation method The error estimation method is the loss function we often say. It is often used to calculate the deviation of the model's prediction results from the actual results and measure the model's prediction performance. Generally, the smaller the loss function value, the higher the model's prediction accuracy and the better the model performance; conversely, the larger the loss function value, the worse the model performance. Commonly used loss functions are Root Mean Square Error (RMSE), Mean Square Error (MSE), Mean Absolute Error (MAE), etc. When observing an experimental phenomenon, the true value set of this phenomenon is [x1, x2, ... xn], the corresponding prediction result set is [y1, y2, ... yn], and the error Ei = Yi- Xi; then the three forms of loss function reflecting the experiment are: n n ( yi xi) 2 Ei 2 RMSE= i 1 = i 1 (28) n n n n ( yi xi) 2 Ei 2 MSE= i 1 = i 1 (29) n n n n | yi xi | | Ei | MAE= i 1 = i 1 (30) n n RMSE is often used as a standard function for measuring errors in machine learning models; MSE is the summation average of the squares of errors and is often used as a loss function; MAE is the average of absolute errors, which can reflect the actual situation of the predicted value error well; this article chooses MAE as a loss function of the model. 20

Sensor numerical prediction based on LSTM neural network Yangyang Wen 2020-06-05 2.5 Tensor Tensors are containers used to store data of different dimensions in neural networks. Tensors can have multiple dimensions, and each dimension can have multiple vectors. The input sequence of the LSTM neural network is usually a two-dimensional tensor table (samples, features). If an input time series contains 100 samples, and each sample collects the values of feature i and feature j at that time, then the tensor of this event sequence is expressed as (100, 2). table2-1 tensor table rank example Python output 0 Scalar S=234 1 vector V=[32.1,25.2,3.3] 2 matrix M=[[1,2,3],[4,5,6],[7,8,9]] 3 3rd order tensor T=[[[1],[2],[3]],[[4],[5],[6]],[[7],[8],[9]]] n nth ...... Tensorflow uses tensor tables for data. Tensors have three key attributes, namely the number of axes, shape and data type. Axis is also called rank, 0 axis with 0 dimension tensor, 1 axis with 1 dimension tensor, 2 axes with 2 dimension tensor, and so on. Each axis can have multiple vectors. The data types are easier to understand, such as float32, unit8, float64, etc. The third attribute is shape. Knowing the array and shape can restore the tensor table, the shape of the tensor table can be calculated by the shape () function. The Shape () function has three parameters, and the size of the parameter represents the number of data. For example, shape = (2,3,2) means a three-dimensional tensor table, and the tensor table has two data in the first dimension, three data in the second dimension, and two data in the third dimension. Tensors [[[1,2], [3,4], [5,6]], [[7,8], [9,10], [11,12]]] are an example of satisfying the conditions. The reshape () function can divide the array into tensor tables of any shape. If you define an array a = array ([1,2,3,4,5,6,7,8,9,10,11,12]), a = np.reshape (a, (2,3, -1 )) The result is [[[1,2], [3,4], [5,6]], [[7,8], [9, 10], [11,12]]]. -1 represents the number of elements in the smallest unit, which can be inferred automatically. The 21

Sensor numerical prediction based on LSTM neural network Yangyang Wen 2020-06-05 method to access the two-dimensional tensor table is also very simple, the access method is similar to the two-dimensional array. For example, t = [[1,2, 3], [4,5,6], [7,8,9]] is a second-order vector, and you can use the statement t [i, j] to access any element in it. For example, t [1,2] returns 6. Similarly, for third-order vectors, t [i, j, k] can be used to access any element. 2.6 Related research work In this paper, the LSTM model is used to model and predict industrial sensor data. The data set is composed of the sensor values collected by multiple small sensors from 2016/2/18 12:28:34 to 2016/2/18 15:20:19. The data set contains multiple sensor values in a continuous time of 2 hours, 51 minutes and 45 seconds, and a sensor value is collected every 0.1 seconds. The standard number of sensor data is 100,000. Transform the sensor historical data into multiple input sequences with a length of 20 historical data, and then the 21st (that is, the next 0.1 second) data will be used as an element of the tag array and used as the theoretical output of the predicted value. This sequence has only one characteristic value of the current sensor's recorded value of the surrounding environment. Train the historical data with 8: 2 and 7: 3 ratios respectively to fit the LSTM model, select the optimal performance ratio for Raspberry Pi and laptop, and compare their prediction results. 2.6.1 LSTM-based stock returns prediction method: Take the Chinese stock market as an example The study used the LSTM model to predict China ’s stock returns, collecting the daily highs, lows, opening and closing records, and the Shanghai Composite Index of the Shanghai and Shenzhen Chinese stock markets from 1990 to 2015[14]. Divide the training set and test set according to the ratio of 4: 1, and set up five sets of control experiments. The results preliminarily prove that LSTM value prediction has a powerful function for the Chinese stock market. The common point with the LSTM model sensor value prediction experiment in this article is that the data is divided using a 4: 1 ratio, and control experiments are set up; the difference is that the experiment using the LSTM model to predict Chinese stocks is a multivariate value prediction, which sets more Many controlled experiments. 2.6.2 Travel time prediction of LSTM neural network 22

Sensor numerical prediction based on LSTM neural network Yangyang Wen 2020-06-05 Travel time is one of the most important things for travelers. Travel time means the time spent from the beginning to the end of the previous journey, and once recorded, it becomes historical data. Knowing the historical time and predicting the travel time of the next trip has important guidance for the planning of travel attractions and road selection, and can help save time. LSTM travel time prediction has constructed 66 LSTM neural network sequence prediction models for the 66-segment link data set of the highway[15], and selected the optimal setting range for each model through training and testing. At the same time, for the time consumption on each link, the LSTM neural network's travel time prediction model performs multi-step prediction and sets predictions for 1 to 5 time steps in the future. The LSTM travel time prediction study is similar to the study in this article in that it has made efforts to select the optimal performance model parameter combination; the difference is that the study not only predicts a prediction experiment of 1 time step in the future, but also completes many Step prediction experiment. And the conclusion is drawn that the longer the time step, the greater the error. 2.6.3 Development and application of deep neural network soft sensor based on LSTM With the popularity of wearable devices nowadays, the comfort and elasticity of soft sensors have become the main factors that need to be considered when designing sensors for the human body or clothing. Soft sensors can accurately measure quality variables or important process variables [17]. These key quality variable data usually have dynamic and non-linear characteristics, which are sometimes difficult to measure. The method of applying LSTM model to soft sensor[16] for variable measurement can be used to measure variables with strong linearity and dynamic soft structure, and is especially suitable for dynamic soft sensor modeling. This point is very similar to the design concept of the research method in this article. Industrial sensor data has a large amount of data. The combination of long- and short-term memory neural networks and sensors can not only estimate lost data, but also predict complex data to reduce costs. 2.6.4 Analysis of industrial IoT devices based on LSTM 23

Sensor numerical prediction based on LSTM neural network Yangyang Wen 2020-06-05 The research starts with modeling and predicting the operating state of equipment using historical data of the Industrial Internet of Things, and proposes a method[18] that uses long-term memory to predict the operating state of equipment—the LSTM model. The similarity between this study and this article is that we all propose the LSTM model to analyze the status of industrial Internet of Things devices. We all set up a control group. The difference is that this study proves the superiority of LSTM for value prediction from the differences. This study starts from the good value prediction effect of LSTM and aims to find the similarity of the prediction results of LSTM models on large and small IoT devices, proving the bright prospect of LSTM models applied to small IoT devices. 24

Sensor numerical prediction based on LSTM neural network Yangyang Wen 2020-06-05 3 Methodology The experiment will be divided into three parts, as shown in Figure 3-1. First, introduce the environment in which the model runs and the tools used. Then, describe the LSTM neural network univariate prediction process, including data sets and data preprocessing. Then, import the experimental data set into the trained LSTM model and compare the prediction results based on Raspberry Pi and laptop. Figure 3-1 Research steps and objectives 3.1 Environment and tools The first goal is to find tools and learning libraries suitable for configuring deep learning networks on laptops and Raspberry Pi. First, by reviewing the information, I summarized two python development platforms for building neural networks. They are the pycharm computer integrated development environment and the free python development version Anaconda. By designing the installation and configuration steps of the two software, try to install them on the experimental equipment, and choose the best environment configuration according to the installation situation. 3.1.1 Pycharm The steps to build a deep learning framework on pycharm are mainly divided into three steps: first install Python 3.0 or later. Second, install pycharm and select the python installed in the previous step as the 25

Sensor numerical prediction based on LSTM neural network Yangyang Wen 2020-06-05 automatically created pycharm virtual environment editor. In addition, install tensorflow and keras. As a back-end character library for keras, Tensorflow needs to be pre-installed to develop and train deep learning frameworks. You can use the statement import tensorflow as tf to verify that tensorflow has been successfully installed. If no errors are reported, the installation was successful.If an error is reported, you need to consider whether there is a version mismatch. Finally, install the library software packages needed for the experiment, such as numpy, matplotlib, etc. The successful installation of the library package will affect the success of the experiment. 3.1.2 Anaconda As a Python language distribution, Anaconda has package management functions and environment management functions. Anaconda has built-in many different versions of data packets and libraries needed to build neural networks, such as numpy, conda, python, numpy, pandas, etc. Anaconda will automatically complete the version matching between various data packages and libraries, and there will be no environment building error due to version mismatch. The environment management function supports the creation of different virtual environments for different projects. As a web-based application for interactive computing, JupyterNotebook can realize the process of code editing, running, displaying results and saving through the browser, and it also supports dozens of programming languages such as python, R and Julia. First install Anaconda; after successful installation, create a virtual environment, install tensorflow, keras, and various data packages; after the installation is complete, open the Jupyter interactive notebook to write code. 3.2 LSTM univariate value prediction process The second goal of this experiment is to describe and introduce the LSTM model for univariate value prediction process. In order to achieve this goal, this summary describes the entire process of LSTM neural network value prediction. And the forecasting process is mainly divided into four steps: data set introduction, data preprocessing, data segmentation, and multi-variable experimental parameter adjustment. 26

Sensor numerical prediction based on LSTM neural network Yangyang Wen 2020-06-05 Figure 3-2 LSTM univariate value prediction process The univariate value prediction process of this experiment, as shown in Figure 3-2 above, first introduces the sensor data set, then the data preprocessing, including data cleaning and data standardization. Next is data segmentation, the target is a tensor table and label array. Finally, through multivariate experiments, the optimal parameter combination of the model performance is selected and the performance is evaluated. 3.2.1 Data set The data used in the experiment comes from actual industrial sensor data. From 2016/2/18 12:28:34 to 2016/2/18 15:20:19, multiple small sensors collected multiple sensor values in a continuous time of 2 hours, 51 minutes and 45 seconds. The standard number of sensor data is 100,000, and a sensor value is collected every 0.1 seconds. All sensor 27

Sensor numerical prediction based on LSTM neural network Yangyang Wen 2020-06-05 data is stored in industrial_sensor_data .csv files in the form of numbers and text. Figure 3-3 Industrial_sensor_data.csv In the above figure, the first row represents the name of each sensor, and each column represents the time-dependent data value recorded by the sensor. During the recording process, the sensor will encounter extreme weather, emergencies, man-made damage and other problems, which will result in the loss and redundancy of data values. As a univariate time series forecast, this experiment only needs to select the data of one sensor from all sensors for prediction. 3.2.2 Data preprocessing The first step in data preprocessing is data cleansing. Data cleaning mainly includes selecting the appropriate sensor and its data from a large number of sensors, and processing the redundant data and duplicate data. Therefore, choose the sensor that is most suitable for the experiment and see if there is a problem with the sensor data value. If there is missing data, it can be solved by manual filling. For redundant and abnormal data can be deleted. In this experiment, the original data is processed to exclude sensors with multiple redundant data and sensor data with multiple missing values. If there are missing values, we can use the average of the two values before and after to fill the gap. Finally, select the sensor named "C01424REG403RW". Create another CSV file and save the timestamp and data set of the "C01424REG403RW" sensor separately, named Sensor_values1.csv, as shown below. 28

Sensor numerical prediction based on LSTM neural network Yangyang Wen 2020-06-05 Figure 3-4 Sensor-value1.csv As a data processing and analysis software library, Pandas provides a variety of advanced data structures, with powerful data indexing and processing capabilities. Pandas.read_csv (file_or_buffer) can directly read the file in csv format and return the DataFrame data object. file_or_buffer indicates the access path of the file. As one of the main data objects of pandas, DataFrame can be translated into "data frame", which is a two-dimensional array structure, similar to the form of Excel spreadsheet. Its vertical rows are called columns, and its horizontal rows are called index subscripts. You can determine the position of an element value through columns and index. The Pandas.DataFrame.head function can return any number of rows of DataFrame data objects. Figure 3-5 DataFrame data object Figure 3-6 Check for null values First, the Pandas.read_csv function reads the CSV file and returns the DataFrame data object. Then use the Pandas.DataFrame.head function to display all the contents of the data object. As shown in Figure 1, the serial number ranges from 0 to 99998 and consists of two columns with a total of 99999 rows. The first column shows the timestamp, and the second column is the data value recorded by the sensor named "C01424REG403RW". The file results have a total of 99999 data, 29

You can also read