Optimization of Fatigue Detection Method under Altitude Changes in Plateau Region Based on MTCNN - IOPscience

 
NEXT SLIDES
Optimization of Fatigue Detection Method under Altitude Changes in Plateau Region Based on MTCNN - IOPscience
IOP Conference Series: Earth and Environmental Science

PAPER • OPEN ACCESS

Optimization of Fatigue Detection Method under Altitude Changes in
Plateau Region Based on MTCNN
To cite this article: Sipeng Han et al 2021 IOP Conf. Ser.: Earth Environ. Sci. 692 042016

View the article online for updates and enhancements.

                               This content was downloaded from IP address 46.4.80.155 on 26/05/2021 at 16:03
EMCEME 2020                                                                                                    IOP Publishing
IOP Conf. Series: Earth and Environmental Science 692 (2021) 042016                        doi:10.1088/1755-1315/692/4/042016

Optimization of Fatigue Detection Method under Altitude
Changes in Plateau Region Based on MTCNN

                     Sipeng Han1, Jingyang Tan1, Qianzhi Jiao1, Bo Tang1, *, Yi Luo2 and
                     Xuguang Yang1
                     1
                       Electric Engineering College, Tibet Agriculture and Animal Husbandry University
                     Nyingchi, China
                     2
                       Electric Engineering College, Agriculture and Animal Husbandry College, Chengdu,
                     China

                     *Corresponding author: tangbo@xza.edu.cn

                     Abstract. Fatigue driving is the main cause of traffic accidents, and research on fatigue
                     driving detection algorithms is of great significance to improve road safety. This paper
                     proposes an image processing method based on MTCNN model detection
                     optimization, Perform median filter denoising before P-Net training to improve the
                     detection rate of night faces, then, the ASM algorithm is used to detect the facial
                     feature points, and finally the PERCLOS principle is used to analyze the driving
                     fatigue state. The experimental results show that the method has a high detection rate,
                     can be applied to fatigue detection at different altitudes, and has strong practicability.

                     Keywords: MTCNN, Plateau, Driving fatigue, Fatigue detection, Face recognition,
                     Convolutional neural network

1. Introduction
With the rapid development of my country's economy, on the one hand, the happiness index of
people's lives has been improved, and on the other hand, the consumption level has also been greatly
improved. At present, the number of cars in our country is constantly increasing, and the problem of
safe driving of cars has become a problem for our country and the world. Fatigue driving is now a
serious hidden danger of traffic accidents. Therefore, it is necessary to detect the driver’s fatigue
driving behavior and make reminders and interventions according to the different levels of fatigue. In
plateau areas, with the continuous changes in altitude, some drivers will have altitude sickness, and
some people will become fatigued and mistakenly believe that it is caused by the high reaction.
Therefore, fatigue detection is very important for the life and property safety of drivers. Meaning.
   Driving fatigue is usually a combination of physical fatigue and mental fatigue. It mainly refers to
the phenomenon of inaccurate vision and slow response due to excessive driving time, insufficient rest
time, or the driver's mental disturbance and decreased operating ability after driving for a long time.
When the driver enters a state of fatigue, it is usually accompanied by a decline in judgment and slow
response. If the driver continues to drive, incorrect operations or incorrect adjustments are likely to
occur, leading to traffic accidents. In the case of mild fatigue, the driver will experience operating
delays or poor operation. Due to increased fatigue, the driver may even make operational mistakes.

              Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution
              of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.
Published under licence by IOP Publishing Ltd                          1
EMCEME 2020                                                                               IOP Publishing
IOP Conf. Series: Earth and Environmental Science 692 (2021) 042016   doi:10.1088/1755-1315/692/4/042016

After entering a severe fatigue state, there will be unconscious operation or short sleep. In severe cases,
the control of the vehicle will be lost, which will cause the "type" of the vehicle to move forward [1].
    Traditional fatigue detection methods mainly include driver's physiological characteristic detection
method, vehicle behavior characteristic detection method, and driver facial feature detection method.
The detection methods based on the physiological characteristics of the driver mainly include
electroencephalogram (EEG), electrocardiogram (ECG), electromyography (EMG), electro-oculogram
(EOG), pulse signal, etc. In the driving process of the driver, the dynamic EEG scanner, dynamic
electrocardiograph, and electromyography are used to detect the corresponding signals, and compare
them with the biological human fatigue parameter indicators to determine whether the driver is
fatigued. Fan Wang [2] et al. proposed to collect driver's ECG and eye movement data to assess the
degree of fatigue. It is to choose the results based on eye movement to enhance the estimation method
of driving fatigue based on heart rate variability (HRV). Chen Jichi [3] collects the driver's EEG
signals in real time, performs wavelet packet decomposition and reconstruction, and extracts various
rhythm signals. Then by calculating the phase lag index between the leads, the connection matrix is
constructed, and the brain network characteristics of each rhythm are extracted. However, the sensor
that detects physiological signals in this method is in contact with the body, this is also the
disadvantage of the physiological feature detection method, which seriously affects the comfort of the
driver during driving or affects the normal driving operation of the driver. The detection method based
on vehicle behavior characteristics is mainly to collect and analyze the relevant information of the
vehicle itself during the driving process of the vehicle to determine whether the car driver is in a
fatigue state, It mainly includes information such as the speed of the vehicle during the driving process,
lane deviation and steering wheel rotation range. The American Electronic Safety Product company
developed the steering wheel monitoring device S.A.M [4], which mainly detects the driver’s fatigue
state by detecting the rotation of the steering wheel; the European Union developed the "AWAKE"
system [5], by real-time monitoring of the driver’s sight direction, lane tracking, etc., and then using
multi-channel fusion technology to achieve driver fatigue monitoring and warning signals, this type of
method is non-invasive but the driver’s driving habits will affect the accuracy of detection . The
method of driver facial feature detection mainly realizes driver fatigue detection by detecting the
frequency of human head drooping, body tilt, increased blinking frequency, yawning and other
phenomena, Jianju Xing [6] applied the convolutional neural network to face recognition, improved
the pupil location algorithm, and effectively overcomes the problem of large amount of calculation of
the traditional algorithm. The fatigue recognition rate of this algorithm is 87.5%. Zhong Wang [7] uses
the MTCNN model to detect face images, and then performs image processing on the face images,
including three steps of grayscale processing, binarization processing, and human eye detection, but
this method is combined in high altitude areas and dark nights. Not applicable. Weihuang Liu [8]
proposed a driver fatigue detection algorithm based on a dual-stream network model of multiple facial
features. Finally, an accuracy of 97.06% was achieved on the National Tsing Hua University Driver
Drowsiness Detection (NTHU-DDD) dataset.
    The traditional fatigue detection method is not applicable in the face of different altitude changes
and dark night, moreover, it does not show strong robustness under the influence of external
conditions such as partial occlusion and light changes. This article combines the fatigue driving
detection algorithm based on MTCNN optimization, select representative parameters such as the
degree of eye closure, the degree of mouth opening, and the head posture that represent fatigue driving,
and finally analyze the driver’s fatigue state using the principle of PERCLOS.

2. METHOD
First, the driver’s facial image is acquired through the camera, and then the improved MTCNN model
is used to detect the driver’s image. The median filter is used to denoise before the P-Net detection to
avoid excessive noise in the night scene and improve the detection accuracy. Next, the key points of
the face are detected on the collected images, and the aspect ratio of the eyes and the mouth are
located. Finally, the PERCLOS principle is used to analyze and judge the fatigue state of the driver.

                                                     2
EMCEME 2020                                                                                                      IOP Publishing
IOP Conf. Series: Earth and Environmental Science 692 (2021) 042016                          doi:10.1088/1755-1315/692/4/042016

                                                                           Characteristics
                                                                           of eye fatigue
         Camera face            Improved            Face key
                                                                                                    PERCLOS
           image              MTCNN face              point
                                                                                                   determination           early warning
         acquisition           detection            detection
                                                                           Characteristics           of fatigue
                                                                             of mouth
                                                                              fatigue

                                      Figure 1. The proposed method framework.

2.1. Face Detection
Face detection uses Multi-task Cascaded Convolutional Networks (MTCNN) [9], the algorithm is
based on deep learning to jointly detect face bounding boxes and face key points. It is currently one of
the best algorithms in the field of face detection and face key point positioning. In order to avoid the
problem of excessive noise detection at night, median filtering is performed to denoise before the face
detection starts, and then the image is transformed with different scales, After constructing the image
pyramid, it is input to three cascaded network structures: P-Net, R-Net, O-Net [10], as shown in Figure
2.
                                                            P-Net
                           Conv:3×3                                                                                        face
                                                   Conv:3×3                 Conv:3×3                                  classification
                            MP:2×2
                                                                                                         1×1×2

                                                                                                                      bounding box
                                                                                                                       regression

                                                                                                         1×1×4
                                                                                         1×1×32
                                                                  3×3×16
                Input size              5×5×10
                 12×12×3                                                                                         facial landmark
                                                                                                                   localization
                                                                                                         1×1×10

                                                            (a) P-Net
                                                                R-Net

                           Conv:3×3              Conv:3×3                                 Fully                  Face classification
                            MP:2×2                MP:2×2                Conv:2×2         connect
                                                                                                           2

                                                                                                                 Bounding box
                                                                                                                  regression

                                                                                  3×3×64                   4
                                                            4×4×48                             128
              Input size              11×11×28                                                                    Facial landmark
               24×24×3                                                                                              localization

                                                                                                           10

                                                            (b) R-Net
                                                         O-Net
                        Conv:3×3
                                       Conv:3×3                                                          Fully                     Face
                         MP:2×2                                 Conv:2×2
                                        MP:3×3                                        Conv:2×2                                 classification
                                                                                                        connect
                                                                                                                          2
                                                                                                                              Bounding box
                                                                                                                               regression
                                                                                                                          4
                                                                                               3×3×128
                                   23×23×32                                                                     256
           Input size                                 10×10×64                  4×4×64                                        Facial landmark
                                                                                                                                localization
            48×48×3
                                                                                                                         10

                                                            (c) O-Net
                             Figure 2. Structure diagram of three cascaded networks.

   For face recognition, cross entropy cost function is used directly, and box regression and key point
location are used. Finally, the losses of these three parts are multiplied by their own weights to form
the final total loss function.

                                                                    3
EMCEME 2020                                                                                                                    IOP Publishing
IOP Conf. Series: Earth and Environmental Science 692 (2021) 042016                                        doi:10.1088/1755-1315/692/4/042016

   Face recognition loss function:

                                       Ldet
                                        i    ( yidet log( pi )  (1  yidet )(1  log( pi )))                                                        (1)

   Where Pi means Xi face probability, yidet ( yidet  0,1 ) is the real category label, If the xi is a
face, then yidet  1 , yidet  - log( pi ) , Conversely, if xi is a non-face, then yidet  0 , Ldet
                                                                                                i    1- log( pi ) .
    The loss function of the regression box:
    The face key point detection is similar to the boundary regression task. The key point coordinates
of the face can be obtained by minimizing Euclidean distance regression, see formula (2)

                                                                         landmark
                                             Llandmark
                                              i         y                               yilandmark        2
                                                                                                             2                                        (2)
                                                                         i

   Among them, y i          ( y i
                    landmark      landmark              landmark
                                            R10 ) and yi        they are the key points of the predicted and real
faces, which mainly include the eyes, nose, and the sides of the mouth.
   Total loss function:
                                                            n
                                                  min  jdet,box ,landmark a j  i j Lij                                                           (3)
                                                           i 1

   Among them, n is the number of training samples, is the weight of each task, i j ( i j  0,1) is
the true label of the sample, and Lij is the cross-entropy loss function or Euclidean loss function. In P-
Net and R-Net,  j is respectively  det  1,  landmark  0.5 . At the same time, in order to achieve high-
precision face key point positioning, set a j to  det  1 ,  box  0.5 ,  landmark  1 in O-Net.

2.2. Human eye and mouth feature point detection

                                3000

                                2500

                                2000

                                1500

                                1000

                                500

                                   0
                                        14
                                        29
                                        44
                                        59
                                        74
                                        89
                                       104
                                       119
                                       134
                                       149
                                       164
                                       179
                                       194
                                       209
                                       224
                                       239
                                       254

    lmage     Grayscale Image                                Brightness balance       Grayscale Image   Binarization Image   Remove Hair   Labelling images
                                       Gray histogram

                                                                  (a) Face detection

                                                        (b) Punctuation of key points
                                         Figure 3. Face detection and punctuation.

   Active Shape Model (ASM) has good robustness [11]. Use several feature points to describe the
shape of the target. Corresponding to similar targets with similar shapes, the specific position of each

                                                                                  4
EMCEME 2020                                                                               IOP Publishing
IOP Conf. Series: Earth and Environmental Science 692 (2021) 042016   doi:10.1088/1755-1315/692/4/042016

point can be adjusted when the average shape is obtained through statistical analysis, thereby
obtaining a vector that can describe the specific shape of the object, This process is the alternating
effect on the face shape model and the local gray scale model, and the output shape gradually
converges. The facial feature points obtained by the MTCNN algorithm can accurately obtain the edge
positions of the eyes and mouth. The result is shown in Figure 3.
    When the eyes are opened and closed, the relative position between P1 and P6 is very different,
especially the longitudinal coordinates between P2/P6 and P3/P5. According to the change of the eye
state, the current state of the eye can be clearly extracted through the change of the relative distance
between the 6 feature points. The camera extracts the facial contour shape vector of the driver in each
frame, and calculates the eye aspect ratio through a specific point of the shape vector for blink
calculation, the calculation is as follows:
                                                P 2  P 6  P3  P5
                                         EAR                                                        (4)
                                                     2 P1  P 4

    Where P1~P6 are the two-dimensional coordinate vector of the feature points of the face, the
numerator calculates the vertical distance between the upper eyelid and the lower eyelid, and the
denominator calculates the horizontal distance between the corners of the eyes. At the same time, in
order to eliminate the interference caused by different sitting postures of the driver, this paper
calculates the EAR of the left eye and the right eye at the same time, and uses their average value as
the final EAR value.
    The calculation of the aspect ratio of the mouth is similar to that of the eyes. The calculation is as
follows:
                                        M 2 M8 M3 M 7  M 4 M 6
                              MAR                                                                     (5)
                                                   3 M1 M 5

   Where M1-M8 are the two-dimensional vector coordinates of the inner contour points in the mouth
shape, the numerator is the vertical distance between the upper lip and the lower lip, and the
denominator is the horizontal distance between the corners of the mouth. At the same time, in order to
eliminate the error interference caused by the driver's attitude, this paper also calculates the vertical
distance of the three pairs of feature points and averages them as the MAR value.

2.3. Verify the effectiveness of the algorithm
In order to verify the effectiveness of the algorithm, this paper establishes a fatigue detection data set
and uses a camera to collect multiple test videos, including fatigue and normal videos of 6 testers
under different altitudes and different lighting conditions. First, use the optimized MTCNN model for
face detection, and then through image processing. The data set of the MTCNN face detection model
comes from multiple public face data sets. We select different face images according to the ratio, with
different postures, expressions, and lighting to form a new data set to verify the accuracy of the
recognition of eye and mouth feature points in this article. The experiment uses the TensorFlow deep
learning framework to implement the network.

                                  Table 1. Test results of this method.
   ID        Altitude (m)           Number of faces           Identification number         Accuracy
    1          3000m                     36                             33                   91.7%
    2          3000m                     42                             40                   95.2%
    3          4102m                     23                             21                   91.3%
    4          4102m                     53                             49                   92.5%
    5          1265m                     28                             26                   92.9%
    6          1265m                     38                             35                   92.1%

                                                     5
EMCEME 2020                                                                                  IOP Publishing
IOP Conf. Series: Earth and Environmental Science 692 (2021) 042016      doi:10.1088/1755-1315/692/4/042016

   It can be seen from Table 1 that the recognition accuracy of the method proposed in this paper is
above 91%, and the average accuracy of the algorithm is close to 93%. This further illustrates the
robustness of the detection method proposed in this paper and proves that the fatigue driving detection
system is effective for the driver. The feasibility and validity of fatigue judgment.

3. DRIVER FATIGUE STSTE JUDGMENT BASED ON PERCLOS
Based on PERCLOS [12] (Percentage of Eyelid Closure Over the Pupil Over Time), the driver's
mental state judgment method is currently a recognized fatigue state judgment standard in the research
field of driver fatigue state detection system. PERCLOS value is the percentage of the time the eyes
are closed in a certain period of time. In a period of time, the longer the eyes are closed, the greater the
PERCLOS value, which reflects the more serious the driver’s fatigue.
    Currently, PERCLOS has three standards: P70, P80 and EM. Among them, the P70 standard and
the P80 standard mean that when the area of the eyelid covering the pupil is 70% and 80%
respectively, it is judged as a closed eye state, and the proportion of the time that the eye is closed in a
period of time is calculated. The EM standard is that when the eyelid covers half of the pupil area, it is
judged as closed eyes.
                                                  t 3  t2
                                             P             100%                                         (6)
                                                  t4  t1

    In the formula, P represents the ratio of the number of frames whose eyes are closed within a
certain period of time to the total number of frames of the collected video stream. t1 represents the time
required to fully open the eye to 20% of the closed eye, t2 represents the time required to fully open the
eye to 80% of the closed eye, t3 represents the time required to fully open the eye to fully close the eye
and then open 20%, t4 represents The time it takes to open your eyes completely to completely closed
and then open to 80%.
    The image data of each frame collected by the system can be divided into two states: eyes open and
eyes closed. According to the algorithm proposed in this paper, not only the data status of each frame
can be judged, but also whether the current driver is tired. In this paper, if P is greater than or equal to
90%, it is judged that the driver is in a severe fatigue state, if P is between 60% and 90%, it is judged
that the driver is in a mild fatigue state, and if it is less than 60%, it is judged that the driver is normal
status.

4. FATIGUE THRESHOLD SELECTION AND TESTING
According to different altitudes, select people suitable for testing for fatigue testing. This article
selects 10 testers to collect real fatigue videos at 12 o'clock in the morning at night (the first 5 are at an
altitude of 4000m, and the last 5 are at an altitude of 3000m). Calculate and record the fatigue times of
each video as shown in Table 2:

                                         Table 2. Algorithm test.
  personnel number        Fatigue detection times          Correct detection times     Detection accuracy
          1                        154                              145                     94.15%
          2                        145                              136                     93.79%
          3                        165                              155                     93.93%
          4                        126                              118                     94.75%
          5                        115                              109                     94.78%
          6                         86                               81                     94.18%
          7                        103                               97                     94.17%
          8                         65                               61                     93.85%
          9                         95                               89                     93.68%
         10                         73                               63                     93.15%

                                                       6
EMCEME 2020                                                                               IOP Publishing
IOP Conf. Series: Earth and Environmental Science 692 (2021) 042016   doi:10.1088/1755-1315/692/4/042016

  Experimental results show that the algorithm can also be applied in different altitude environments,
maintaining an accuracy rate of 93%.

5. IN CONCLUSION
In this paper, an optimization algorithm for face detection based on MTCNN model is proposed, and
ASM is used to detect eye and mouth feature points. At last, the algorithm is detected at night
according to PERCLOS of multiple videos and verified. Experimental results show that the algorithm
is effective and accurate at about 93%.

Acknowledgments
This work was supported by The National Natural Science Foundation of China (Grant No. 51667017)
and the undergraduate innovation experiment project of Tibet College of Agriculture and Animal
Husbandry.

References
[1] Sheng Yingchao. Research and Implementation of Fatigue Driving Detection Svstem Based on
         Eve Features. Diss. 2019.
[2] Wang, Fan , et al. "Estimating Driving Fatigue at a Plateau Area with Frequent and Rapid
         Altitude Change." Sensors 19.22 (2019): 4982.
[3] Chen Jichi, Wang Hong. A Study on Drowsy Driving State Based on EEG Signals. [J].
         Automotive engineering, 2018, 40 (05): 515-520.
[4] Vankayalapati, H. D., K. R. Anne, and K. Kyamakya. Extraction of Visual and Acoustic
         Features of the Driver for Monitoring Driver Ergonomics Applied to Extended Driver
         Assistance Systems. Data and Mobility. Springer Berlin Heidelberg, 2010.
[5] Takei, Y., and Y. Furukawa. "Estimate of driver's fatigue through steering motion." Systems,
         Man and Cybernetics, 2005 IEEE International Conference on IEEE, 2006.
[6] Xing, Jianju, et al. "Application of Face Recognition Based on CNN in Fatigue Driving
         Detection." the 2019 International Conference 2019.
[7] Wang, Zhong, P. Shi, and C. Wu. "A Fatigue Driving Detection Method based on Deep
         Learning and Image Processing." Journal of Physics: Conference Series 1575.1 (2020):
         012035 (6pp).
[8] Liu, Weihuang, et al. "Convolutional Two-Stream Network Using Multi-Facial Feature Fusion
         for Driver Fatigue Detection." Future Internet 11.5 (2019): 115.
[9] Zhang K, Zhang Z, Li Z, et al. Joint Face Detection and Alignment Using Multitask Cascaded
         Convolutional Networks [J]. IEEE Signal Processing Letters, 2016, 23 (10): 1499-1503.
[10] Li Qingchen. Design of fatigue Driving detection System based on Facial Feature. Diss. 2019.
[11] Islam, Rafiul, et al. "Genome-wide analysis of SARS-CoV-2 virus strains circulating worldwide
         implicates heterogeneity." entific Reports 10.1 (2020).
[12] Thropp, Jennifer E., J. F. V. Scallon, and P. Buza. "PERCLOS as an Indicator of Slow-Onset
         Hypoxia in Aviation." Aerospace Medicine & Human Performance 89.8 (2018): 700.

                                                     7
You can also read
NEXT SLIDES ... Cancel