Forecasting And Prediction Of Air Pollution Levels To Protect Human Beings From Health Hazards

Page created by Alice Diaz
 
CONTINUE READING
INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY RESEARCH VOLUME 9, ISSUE 01, JANUARY 2020                               ISSN 2277-8616

      Forecasting And Prediction Of Air Pollution
     Levels To Protect Human Beings From Health
                       Hazards
                                              S. Suganya, Professor Dr. T. Meyyappan

Abstract: Prevention and control of air pollution has become an essential activity in many cities. Air is polluted at unacceptable
levels by industries and heavy vehicular traffic in cities which affects human health conditions to a great extent. Forecasting,
Predicting and controlling air pollution is the need of the hour to protect human beings from health hazards. Air pollution poses
threats not only to humans but also to entire flora and fauna. The prime objective of this paper is to propose a new method to
predict air pollution using data collected on monthly basis and provide recommendations to prevent and control air pollution.
This research work comprises of two phases. The first phase preprocesses the chosen dataset using python coding. The
second phase analyzes the preprocessed data to predict air pollution levels. Kaggle dataset containing monthly air pollution data
collected over the period 2000 to 2010 is subjected to the proposed method. Predictions for a future month are made by
computing Air Quality Index(AQI) metric and computed threshold value for the previous two months. The proposed method
shows acceptable accuracy in performance.

Index Terms: Air Pollution, Air Quality Index, Analysis, Pollution Forecasting, Prediction, Prevention, Control.
                                        ——————————  ——————————

1. INTRODUCTION                                                         Air Quality Index (AQI) is a specific number. AQI number is
Today air pollution occurs when harmful or unnecessary                  used to characterize the quality of the air at a particular time in
quantity of substance including gases are present in air. They          the given location. The proposed research work predicts the
may cause severe health problems, diseases, allergy and                 air pollution for a future date using the AQI of past months.
even death to humans. Air pollution plays a significant role in         Paper.
weakening health conditions of skin, eye and human organs,
which reduces human life time. Flora and Fauna are also
affected by polluted air. Healthy life of future generation is          2 RELATED WORKS
under threat by ever increasing air pollution at global level due       Lot of work has been carried out as found in the literature in
industrialisation and increase in the use of petroleum products.        the learning, analysis and prediction of air pollution as well as
These problems motivated to do research on air pollution data           forecasting the future trends. Following are the three active
for accurate and early prediction. Storage, processing and              researchers in this field: Polaiah Bojja , Y i-Ting Tsai , Ranjana
analysis of the pollution data set using traditional techniques is      Waman Gore, and Ling WangPolaiah Bojja[6] has studied the
complex due to its huge volume. Hence, there is a need to               Artificial Neural Networks (ANN), Fuzzy Logic Controller,
move to Big Data Analytic techniques. In Big Data techniques,           Pollution Forecasting, Ecosystem, Effect of PM 10 and SO2.
HDFS and Hbase can store high volume of data. Hive and Pig
                                                                        Accuracy of measurement is ensured by evaluating the
can process both semi structured and unstructured data. Map
                                                                        minimum forecasting error using MA TLAB software. The level
Reduce can be used to analyze the dataset, The proposed
work analyzes the air pollution data set to predict air pollution       of air pollution due to increase in number of vehicles in India
for all the months of any future year. The results obtained can         and Andhra Pradesh is determined using Artificial Neural
be used by policy makers to control and prevent air pollution in        Networks (ANN), Fuzzy Logic Controller, Pollution
future. In this proposed work the most dangerous air pollutant          Forecasting, Ecosystem, Effect of PMIO and S02' with
particles considered are arbon-dioxide (CO), Nitrogen Oxides            MATLAB coding. Soft computing approaches Feed-forward
(NO2) and Sulpher Dioxide and Ozone. The main sources of                Back Propagation network ( BPN ) model and Mamdani Fuzzy
these particles are vehicles, traffic, smokes, burning plastics,        Inference model are trained and tested using five years past
burning electronic wastages, and industries. When human                 data (meteorological data). Yi-Ting Tsai[7] proposed an
beings inhale the polluted air, these particles get into their          approach to forecast PM2.5 concentration using RNN
blood and cause dangerous diseases. Children and aged                   (Recurrent Neural Network) with LSTM (Long Short-Term
people are affected at a faster rate by the polluted air.               Memory). The training data used in the network is retrieved
                                                                        from the EPA (Environmental Protection Administration) of
             ————————————————                                           Taiwan from year 2012 to 2016 and is combined into 20-
 S. SUGANYA is currently pursuing Ph.D. in Department of Computer      dimensions data; and the forecasting test data is the year
  Science in Alagappa University, India, PH-9786383278. E-mail:         2017. Experiments are conducted to evaluate the forecasting
  suganyasudhakar04@gmail.com                                           value of PM2.5 concentration for next four hours at 66 stations
 DR. T. MEYYAPPAN is currently Working as a Professor in
                                                                        around Taiwan. The proposed approach forecasts PM2.5
  Department of Computer Science in Alagappa University, India, E-
  mail: meyyappant@alagappauniversity.ac.in                             concentration using RNN (Recurrent Neural Network) with
                                                                        LSTM (Long Short- Term Memory). Keras, which is a high-
                                                                        level neural networks API written in Python is exploited in their
                                                                                                                                      2541
                                                                IJSTR©2020
                                                                www.ijstr.org
INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY RESEARCH VOLUME 9, ISSUE 01, JANUARY 2020                               ISSN 2277-8616

research work. Ranjana Waman Gore[8], analyzed how the air
pllution affects people. The classification based on Air Quality
Index(AQI) are good, moderate, (unhealthy for sensitive
groups) unhealthy, unhealthy, very_unhealthy,        This paper
focused on analysis of air based on the available data of
various air pollutants such as NO2, SO2, CO and O3 with
corresponding AQI values. Naïve Bayes and Decision tree J48
algorithm are adopted for predicting the health concern. Ling
Wang[9] proposed a model named ―Prediction of Air Pollution
Based on FCM-HMM Multi-model‖.                 It analyzes the
relationships between the air pollution index (API) and
meteorological factors using correlation analysis and principal
component regression. A multi model frame is constructed with
FCM-HMM clustering and TS fuzzy inference. Firstly, fuzzy c-
means clustering (FCM) algorithm is adopted to obtain the
initial clusters of the observation sequences used as a tool for
the prediction of air pollution index. Compared to nonlinear
regression, gray model and ANN, HMM offers a powerful
framework for temporal modeling of features extracted from
time series data. The proposed strategy derives predictive                        Fig. 2. The Steps in the Proposed Air Pollution
                                                                                                 Prediction Model
model to predict air pollution index values in urban areas. Multi
Model Method Based on FCM-HMM is implemented. Many
researchers have adopted Clustering, Fuzzy C-means                  Steps in Analysis:
Clustering Algorithm, Hidden Markov Models, Observation                  1. In preprocessing step, the parameters NO2, CO, SO2,
Sequence Generation based on PCA methods. Many                      O3 are separated on monthly basis.
researchers have analyzed and monitored the air quality and              2. The following new formula is computed
air pollution in Delhi, Agra and USA.. In our work, we have                    AQI = 0.3 * NO2 + 0.3 * CO + 0.2 * SO2 + 0.2 * O3
introduced a new formula to compute Air Quality Index. The               To find the value of Air Quality Index for every month in an
new AQI formula uses 30%, 30%, 20% and 20% weightages               year.
for NO2, CO, SO2, O3 respectively. The air pollution for a given         3. Average of AQI value in 24 hours of a day is computed
period has a relationship with past month’s air pollution levels.   for each day.
Hence, AQI for the future month is computed based on the                 4. Average of AQI values of all the days in chosen month
AQI values of past two months.                                      is computed.              n
                                                                         Threshold T =     i 1
                                                                                                  AQI i where n is the no. of days in
3 METHODOLOGY                                                       the month
                                                                         5. Compute the difference between the threshold and
3.1 Data Set Used                                                   actual AQI values of all the days in the chosen month
The proposed research work uses air pollution data set              DIFFi = T - AQIi            (i = 1,2 … n) where n is the no. of days
downloaded from kaggle website. This data set contains data         in the month
with four attributes that were collected during the years 2000               If DIFFi is < 0 then
to 2010. Air pollutants parameters are Air Quality Index (AQI)                  Air pollution is present
values of NO2, SO2, CO, and O3. Data set is preprocessed and                 Else
analysed using Big Data analytic techniques to predict air                      Air pollution is not present
pollution on monthly basis. Python coding is used for                    6. Stop
implementation of techniques adopted. Sample data from the          Classification Accuracy Rate and Error Rate for the month are
data set is shown in Fig 1.                                         computed using confusion matrix as follows:
                                                                    Classification Accuracy Rate = (TP + FN) / (TP + TN + FP +
                                                                    FN)
3.2 Process Flow in the Proposed Work:
                                                                    Classification Error Rate = (TN + FP) / (TP + TN + FP + FN)

                                                                                                                                     2542
                                                            IJSTR©2020
                                                            www.ijstr.org
INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY RESEARCH VOLUME 9, ISSUE 01, JANUARY 2020                            ISSN 2277-8616

                                                                              Step 2: Threshold Computation
                                                                              Step 3: Prediction

                                                                      PHASE I
                                                     Fig. 1. Fig.1. Example Dataset
                                                                        Pre-Processing   (And) AQI Value Calculation

                                                                      In this proposed work, data sets for the years 2000 to 2010 are
Aggregation of Air Quality Index:                                     collected from the Kaggle website and preprocessed using big
                                                                      data analytics and python coding. After preprocessing, AQI
                                                                      values of NO2, CO, SO2, O3 are computed on monthly basis.
                                                                      Then New AQI is calculated using the formula (1) for every
                                                                      month in the years from 2000 to 2010. This

                                                                      New AQI = 0.3/NO2 + 0.3/CO + 0.2/SO2 + 0.2/O3 …………
                                                                      (1)

                                                                      PHASE II
                                                                      Analysis of New AQI
                                                                      After calculating the New AQI for every month,air pollution is
                                                                      predicted for every month using the AQI values of previous
                                                                      two months. For example, to make prediction for the month of
                                                                      April, average of the AQI value is calculated for the previous
                                                                      two months March and February. The average of this two
                    Fig. 3.. AQI Aggregation Index                    month’s AQI value is also calculated.
                                                                      Threshold Calculation
The proposed work is carried out in two Phases. Phase I               Threshold value is computed as an average value of all the
computes AQI (Air Quality Index) value which is used in Phase         AQI values of a the chosen month.
II to make prediction. Phase II makes prediction based on the
AQI values and Threshold value computed.                              Prediction
                                                                      Prediction for the chosen month is made by comparing the
Phase I
                                                                      actual AQI value with the Threshold value. The Threshold
           Step 1: Pre-processing                                     value is compared with all the average values of the previous
           Step 2: AQI Value Computation                              two month’s AQI. Average value lesser than the Threshold
                                                                      value indicates absence of air pollution. Average value greater
Phase II                                                              than or equal to the Threshold value indicates the presence of
           Step 1: Analysis of New AQI                                air pollution. The process is repeated for all the months in an
                                                                      year.

                                                                                                                                  2543
                                                              IJSTR©2020
                                                              www.ijstr.org
INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY RESEARCH VOLUME 9, ISSUE 01, JANUARY 2020                                   ISSN 2277-8616

4 RESULT AND DISCUSSION                                            Confusion Matrix has the information on actual class and
                                                                   predicted class. Performance of this proposed work is
Table 1 shows the difference between calculated and actual
                                                                   evaluated using the data in the matrix.
AQI values for a given month. The threshold value is 4.38
                                                                   Table.2. Actual Class and Predicted Class - Illustration
which is computed based on the Actual AQI values. The
threshold value is compared with each average values of the
                                                                                                    Actual
days in previous two months. Either the presence or absence
of air pollution is determined based on the difference value
                                                                                                    TRUE          FALSE
compared with threshold value.

                                                                           Predicted
 Table.1 Prediction of Air Pollution for a month based on AQI                            TRUE       14            8
                                  value
 THRESHOLD T = 4.380556 (Average for the month of April)                                 FALSE      4             4

                         Threshold=4.38
It is evident from the data shown in table 1, for the given
month (for example April), the Difference value is found to be     Accuracy Rate:
lesser than the Threshold value for 10 days. Hence, it is              Accuracy Rate is the proportion of the total number of
concluded that the air pollution is not present on those days      predictions that are correct. It is determined by the following
and it is present for remaining 20 days                            equation.
                                                                   Classification Accuracy Rate = (TP + FN) / (TP + TN + FP +
Confusion Matrix                                                   FN)

                                                         TABLE 1
                               PREDICTION OF AIR POLLUTION FOR A MONTH BASED ON AQI VALUE
         Day    AQI Actual        February         March           Average             Difference   Prediction
                Value(April)                                       (A)                 (T – A)      If (T-A) >=0 or (T-A)
INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY RESEARCH VOLUME 9, ISSUE 01, JANUARY 2020   ISSN 2277-8616

Accuracy Rate for the April Month = (14+4) / 30 = 18/30 = 0.6

Error Rate:
Error Rate is the proportion of the total number of predictions
that are incorrect. It is determined by the following equation.
Classification Error Rate = (TN + FP) / (TP + TN + FP + FN)
Error Rate for the year 2010 = (8+4) / 30 = 12/30 = 0.4

5 CONCLUSION
Air pollution is dangerous for nature as well as for human
beings. Prediction and remedial actions is the need of the
hour. In this research work, the data set chosen from Kaggle
website is preprocessed first to separate pollutant parameters
NO2, CO, SO2, O3. The prediction of air pollution is performed
in two phases. The first phase computes AQI (Air Quality
Index) values for all the days in a month. The second phase
computes threshold value of AQI as an average of previous
months average AQI values. Air pollution for the days in
chosen month is predicted by comparing the threshold value
with the average of the previous two month values. Big data
analytics are used to handle huge data volumes and Python
coding is used to implement computational procedures.
Prediction accuracy and error rate are computed. The results
are found to be encouraging. Further research work is in
progress to include other environmental parameters.

ACKNOWLEDGMENT
This article has been written with the financial Support of
RUSA-Phase 2.0 grant sanctioned vide Letter NO.F,24-
51/2014-U,Policy (TN Multi-Gen),Dept of Edn. Govt of India,
Dt. 09.10.2018

REFERENCES
[1] https://en.wikipedia.org/wiki/Air_pollution.
[2] Shweta Taneja,Dr.Nidhi Sharma ―Predicting Trends in air
     pollution in Delhi using data mining‖,2016 IEEE.
[3] Peijiang Zhao, Koji Zettsu ―Convolution Recurrent Neural
     Networks Based Dynamic Transboundary Air Pollution
     Predictiona‖, 2019 the 4th IEEE International Conference
     on Big Data Analytics.
[4] HOW can affect the human being atmospheric And
     environment pollution.
[5] https://en.wikipedia.org/wiki/Air_pollution.
[6] Polaiah Bojja, Vivith Kumar Karumuri ―Development and
     Evaluation of Pollution Forecasting Model Using Soft-
     Computing Methods for PMIO and S02 in Ambient Air‖
     IEEE WiSPNET 2016 conference.
[7] Yi-Ting Tsai, Dept. of Computer Science and Information
     ngineering National Taipei University.‖ Air pollution
     forecasting using RNN with LSTM‖, 2018 IEEE 16th Int.
     Conference.
[8] Ranjana Waman Gore, ―An Approach for Classification of
     Health Risks Based on Air Quality Levels‖ 978-1-5090-
     4264-7/17/$31.00 ©2017 IEEE.
[9] Ling Wang ―Prediction of Air Pollution Based on FCM-HMM
     Multi-model‖Proceedings of the 35th Chinese Control
     Conference July 27-29, 2016, Chengdu, China

                                                                                                         2545
                                                          IJSTR©2020
                                                          www.ijstr.org
You can also read