AIREX: Neural Network-based Approach for Air Quality Inference in Unmonitored Cities
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
AIREX: Neural Network-based Approach for Air Quality Inference in Unmonitored Cities Yuya Sasaki1 , Kei Harada1 , Shohei Yamasaki1 and Makoto Onizuka1 1 Osaka university {sasaki, harada.kei, yamasaki.shohei, onizuka}@ist.osaka-u.ac.jp arXiv:2108.07120v1 [cs.LG] 16 Aug 2021 Abstract The existing methods aim to infer air quality of target loca- tions only in monitored cities (i.e., cities with monitoring sta- Urban air pollution is a major environmental prob- tions). They do not capture the difference of features in cities, lem affecting human health and quality of life. which causes the low accuracy of air quality inference. Since Monitoring stations have been established to con- not all cities have monitoring stations, we cannot accurately tinuously obtain air quality information, but they do infer the air quality of target locations within the unmonitored not cover all areas. Thus, there are numerous meth- cities. ods for spatially fine-grained air quality inference. Since existing methods aim to infer air quality of Problem definition and challenges: We study a new prob- locations only in monitored cities, they do not as- lem, air quality inference in unmonitored cities, to globally sume inferring air quality in unmonitored cities. solve the urban air pollution problem. A straightforward ap- In this paper, we first study the air quality infer- proach for the problem is the use of existing models that are ence in unmonitored cities. To accurately infer trained by air quality data of cities in the vicinity of the target air quality in unmonitored cities, we propose a unmonitored city. However, even the state-of-the-art method neural network-based approach AIREX. The nov- ADAIN [Cheng et al., 2018] deteriorates the inference accu- elty of AIREX is employing a mixture-of-experts racy in unmonitored cities, even when using air quality data approach, which is a machine learning technique of numerous monitored cities as training data (see Table 1 in based on the divide-and-conquer principle, to learn experiments section). correlations of air quality between multiple cities. Therefore, we need a new neural network architecture in To further boost the performance, it employs atten- this problem. We face two challenges: (1) how to design tion mechanisms to compute impacts of air qual- a neural network architecture to capture the correlations of ity inference from the monitored cities to the loca- air quality between monitored and unmonitored cities and (2) tions in the unmonitored city. We show, through how to train models without available air quality data of the experiments on a real-world air quality dataset, that unmonitored cities. For the first challenge, since features of AIREX achieves higher accuracy than state-of-the- cities differ, architectures must capture their differences and art methods. reflect them in the inference of air quality. It is difficult to select optimal monitoring stations for model training due to the absence of air quality data in unmonitored cities. For the 1 Introduction second challenge, since we do not have air quality data of Urban air pollution poses a severe and global problem. The the unmonitored city, architectures must be trained only by fine-grained assessment of urban air quality is crucial for both using air quality data of monitored cities and external data. the governments and citizens to establish means to improve We cannot directly learn the correlations between monitored human health and quality of life. Monitoring stations have and unmonitored cities. been established in numerous cities to continuously obtain air In summery, we require a new neural network architecture quality information. However, due to high construction and that (1) can automatically capture the correlation between management costs, monitoring stations are sparsely installed monitored and unmonitred cities without selecting monitor- and concentrated only in areas of higher importance, such as ing stations and (2) can be trained in an unsupervised manner. cities with large populations. As a result, it is essential to Contributions: We propose a novel neural network-based ar- infer air quality in areas without monitoring stations. chitecture AIREX. AIREX automatically captures the correla- The development of neural network techniques has acceler- tions between monitored and unmonitored cities. The novel ated a neural network-based approach for inferring air quality design of AIREX is based on the effective combination of of target locations [Zheng et al., 2013; Cheng et al., 2018; the mixture-of-experts [Jacobs et al., 1991; Masoudnia and Chang and Hanna, 2004]. This approach leverages avail- Ebrahimpour, 2014; Guo and Barzilay, 2018] and attention able external data related to the air quality, such as point- mechanisms [Bahdanau et al., 2014]. The mixture-of-experts of-interest and meteorology, to capture features of locations. approach is a machine learning technique based on the divide-
and-conquer principle. This approach uses multiple models Offline training Online inference (called experts) and aggregates outputs of experts for deriv- AIREX Result ing the final output. Each expert in AIREX corresponds to Aggregation individual monitored cities, and thus AIREX infers air qual- Model copy ity in unmonitored cities by aggregating air quality assessed Encode Attention from individual monitored cities. The attention mechanism , Feature extraction further boosts the performance of AIREX. AIREX employs Feature extraction Target two attentions to capture the importance of monitored cities location and monitoring stations individually for computing weights PoI Road Network Meteorology Air pollutant City of influences from monitored cities to the target location. The effective combination of mixture-of-experts approach and at- tention mechanism achieves accurate air quality inference. Figure 1: Our framework for air quality inference For training AIREX, we develop a training method using a meta-training approach [Guo and Barzilay, 2018], which is 2013; Cheng et al., 2018]; Point-of-interest (PoI), road net- suitable for training of the mixture-of-experts approach in an work, and meteorological information. unsupervised manner. In our training method, we regard one of the monitored cities as an unmonitored city at the training Definition 2 (PoI data). PoI data DP consist of PoI informa- phase so that AIREX can be learned in an unsupervised man- tion p, which is a triple of an identifier, specific location lp , ner. We use multi-task learning [Caruana, 1997] for training and category υp (e.g., factory). both the whole AIREX and experts with capturing the differ- Definition 3 (Road network data). A road network DR con- ence among cities. This training method enables to learn the sists of road segments r. Each road segment includes coordi- correlations between monitored and unmonitored cities with- nates of the start and end points, and road category υr (e.g., out air quality data of unmonitored cities. highway). Our contributions presented in this study are as follows: Definition 4 (Meteorology data). Meteorology data DM con- • We address a novel problem that infers air quality infor- sist of distinct-level meteorological information. Meteorolog- mation in unmonitored cities by using the air quality data ical information includes meteorological measurements, such obtained from other cities. We show that state-of-the-art as weather and temperature. The meteorology data are time- methods are not suitable for this problem. dependent data. • We propose AIREX that can accurately infer air qual- In this study, we aim to infer spatially fine-grained air qual- ity information in unmonitored cities. This employs the ity in the target unmonitored city. mixture-of-experts approach and attention mechanism to capture the correlations of air quality between monitored Problem statement. Given target city ctgt , target location and unmonitored cities. ltgt in ctgt , a set Csrc of source cities, a set of monitoring stations in Csrc , air pollutant data DA , PoI data DP , road • Through experiments with 20 cities in China, we show network data DR , and meteorology data DM , we aim to infer that AIREX achieves higher accuracy than the-state-of- air quality of ltgt over time domain T . the-art method. We focus on regression for evaluating quantities of air pol- lutants in this paper, but our models can be used for classifica- 2 Problem Formulation tion for evaluating the air quality index [Cheng et al., 2018]. We describe the notations and definitions used in the formu- lation of the problem that we solve in this study. 3 Proposal There are two types of cities, namely, target and source cities, that denote unmonitored and monitored cities, respec- We present our neural network-based architecture AIREX tively. Each city c has its representative specific location lc and training method after describing our framework and fea- (e.g., the center of c). We assume that we have a single tar- ture extraction. get city ctgt and a set Csrc of source cities. We denote the 3.1 Framework and Design Policy set of monitoring stations by S and each monitoring station s ∈ S has its location ls , which periodically monitors a quan- Figure 1 illustrates our framework. This framework consists tity of air pollutants, such as PM2.5 , over the time domain of offline training and online inference. In the offline training, T = ht1 , t2 , . . . , t|T | i. Source city ck ∈ Csrc has a set of we build our inference model after extracting features, and monitoring stations Sk ⊆ S. We denote sk,i as monitoring in the online inference, we infer the air quality of the given station si ∈ Sk . We define air pollutant data as follows: target location by using the built model. We describe a design policy of the offline training. Air Definition 1 (Air pollutant data). Air pollutant data DA con- quality of the target location is assessed by data related to tar- sists of quantities of air pollutants monitored by stations, and get location, monitoring station, and cities. Thus, our frame- they are time-dependent. work extracts features of target location, monitoring stations, Cities have characteristics that affect air quality. To infer and cities, from data sources. We leverage these features to air quality, we use three external data that were frequently capture the correlations of air quality between the target and employed in prior studies [Xu and Zhu, 2016; Zheng et al., source cities, and the target location and monitoring stations.
We design our inference architecture AIREX for automat- compute xR υ as follows: ically capturing the correlations and being trained in an un- supervised manner. For this purpose, AIREX is based on the xR R υ (l) = |{r ∈ D |r ⊂ L(l) ∧ υr = υ}|. (2) mixture-of-experts approach [Guo and Barzilay, 2018] and Meteorological factors XlM : XlM is the sequence of mete- attention mechanism [Bahdanau et al., 2014]. The mixture- orological measurements of l, such as weather and tempera- of-experts approach compute the final output by aggregating ture, which influences the concentrations and flows of air pol- the output of multiple models (i.e., experts). In AIREX, each lutants. The meteorological measurements have two types of expert is a model for inferring air quality by using data of values; categorical values (e.g., weather and wind direction) source city. Each source city and monitoring station does and numerical values (e.g., temperature and wind speed). For not equally contribute the air quality inference in the target categorical and numerical values, we adopt one-hot encoding city, and thus we use the attention mechanism to compute and raw values, respectively. We denote the meteorological the importance of cities and monitoring stations. AIREX can factor at time step t as XlM t . accurately infer the air quality in the target city by elegant These factors have demonstrated their usefulness in previous combination of the mixture-of-experts approach and attention studies [Xu and Zhu, 2016; Cheng et al., 2018]. We normal- mechanism. Furthermore, AIREX can be trained in an unsu- ize numerical values in factors by dividing the largest values pervised manner by using the meta-training approach [Guo among each factor. and Barzilay, 2018] and multi-task learning [Caruana, 1997]. The monitoring and station location factors are associated We describe the training method later. with station s, and the city location factor is associated with AIREX consists of three main components: encoding, at- city c. tention, and aggregation. First, in the encode, it encodes raw input features to obtain latent features for capturing interac- Monitoring factor XsA : Quantities of air pollutants moni- tions between inferred values and raw input. Then, in the at- tored by station s represent the most important information tention, AIREX computes the importance of source cities and for inferring air quality. XsA is the sequence of air pollutant monitoring stations for inferring air quality of the target city. quantities in DA of station s. We denote the monitoring fac- Finally, in the aggregation, it computes output of experts for tor at time step t as XsAt . each source city by aggregating the transformed features and Station location and city location factors XcC and XsS : The importance of monitoring stations, and then compute the final distance and direction from a location to another location are output by aggregating the outputs of experts and importance likewise important factors to measure the influence of their of cities. respective air quality levels. XsS (resp. XcC ) is the relative position that depecits the distance and angle from station s 3.2 Feature extraction (resp. source city c) to the target location ltgt (resp. target We introduce our features for assessing air quality at ltgt . We city ctgt ). extract the three features, namely, the target location feature Our features combine the above factors. The target loca- Xtgt , monitoring station feature Xstns , and city feature X city . tion feature Xtgt , monitoring station feature Xstns , and city These features comprise (1) PoI factor, (2) road network fac- feature Xcity are given as follows: tor, (3) meteorological factor, (4) air pollutant factor, (5) sta- Xtgt = XlPtgt ∪ XlRtgt ∪ XlM , tion location factor, and (6) city location factor. We describe tgt our three features after explaining how to extract each factor Xstn s = XP R M A S ls ∪ Xls ∪ Xls ∪ Xs ∪ Xs , and from the data. The PoI, road network, and meteorological factors are as- Xcity = ∪c∈Csrc {XcC }. sociated with location l (e.g., locations of monitoring stations Here, since the air quality changes time by time, it is prefer- and the target location). l has its own factors that are extracted able that all factors are time-dependent. Due to limited data from the data within affecting region L(l). We set L(l) as a sources, it is necessary to support both time-independent and circle whose center and radius are l and 1 km, respectively. time-dependent data. PoI factor XlP : XlP includes the numbers of PoIs, which rep- resents the characteristics of locations, such as the numbers of 3.3 Inference architecture factories and public parks. We consider a set ΥP of PoI cat- We introduce our inference architecture AIREX. Figure 2 egories and count the number of PoIs belonging to each PoI shows components of AIREX. AIREX has three input types: category. Let XlP = {xP υ (l)}υ∈ΥP denote the PoI factor for Xtgt , Xstn s for ∀s ∈ S, and Xcity , and it contains five layers: l. We compute xP as follows: encode, station-based attention, city-based attention, experts, υ and mixture layers. We describe each layer in the following. xP P υ (l) = |{p ∈ D |lp ⊂ L(l) ∧ υp = υ}|. (1) Encode layer: The encode layer transforms Xtgt and Xstn s . Road network factor XlR : XlR includes the numbers of road Each feature includes time-independent (e.g., PoI) and time- segments, which affects local air quality, as vehicles are one dependent (e.g., meteorology) data. We transform time- of the sources of air pollutants. We consider a set ΥR of road dependent and time-independent factors by LSTM and FC, categories and count the number of road segments belonging respectively [Cheng et al., 2018]. We use different models for to each road category. Let XlR = {xR υ (l)}υ∈ΥR denote the Xtgt and Xstn s because they include different factors; how- road network features extracted for l. We define r as arbi- ever, we use the same LSTM and FC for all monitoring sta- trary points between the start and end of road segment r. We tions to increase generalization ability.
City-based Mixture lows: Encode attention (L+L0 ) 0 0 FC ,…, z⊕k =z(L+L sk,1 ) ⊕ · · · ⊕ z(L+L sk,n ) FC (L+L0 ) (L+L0 ) βc0 k =w|β ReLU (Wβ (zltgt ⊕ z⊕k ⊕ XcCk ) + bβ ) + bβ LSTM Station-based exp(βc0 k ) βck = attention … Σc∈Csrc exp(βc0 ) FC , ,…, , Station-based Attention layer: Each monitoring station has … a different impact to the target location, as distances and an- , FC , gles between each monitoring station and target location are … , , different as well as similarity of their features. In the station- LSTM … … … , based attention layer, we compute station-affect factor, which is a weight of influence of monitoring stations on the air qual- … , , Experts ity of the target location. The station-affect factor αk,i for stations si in source city ck is calculated by the following Figure 2: Neural network structure of AIREX equation: 0 (L+L0 ) 0 We first explain models for time-depending factors in αk,i = w|α ReLU (Wα (zltgt ⊕ z(L+L sk,i ) ) + bα ) + bα t Xstn M A s . Xls and Xls at time step t are transformed into hs 0 exp(αk,i ) as follows: αk,i = 0 Σsi ∈Sk exp(αk,i ) its =σ(Wix (XM t At t−1 ls ⊕ Xls ) + Wih hs + Wic ct−1 s + bi ) We then compute embedding zck of source city with station fts =σ(Wf x (XM t At t−1 ls ⊕ Xls ) + Wf h hs + Wf c ct−1 s + bf ) affect-factors as follows: cts =fts ct−1 + its tanh(Wcx (XM t At t−1 0 X ls ⊕ Xls ) + Wch hs + bc ) αk,i zs(L+L ) s zck = k,i . ots =σ(Wox (XM ls t ⊕ XAt ls ) + Woh ht−1 s + Woc cts + bo ) si ∈Sk hts =ots tanh(cts ) zck represents how much is influence air quality of source city ck to the target location. where, W is weight matrix, b is bias vector, and indicates Experts layer: The experts layer computes an inferred value Hadamard product. i, f, o, c, and h are input gate, forget gate, on each source city. Inferred value ỹck of ck is computed by output gate, memory cell, and final states of hidden layer, re- the following equation: spectively. Next, we describe models for time-independent factors. (L+L0 ) ỹck = w|k ReLU (Wk (zltgt ⊕ zck )) + bk ) + bk . (n) Xls , XlRs , and XlSs in Xstn P s are translated into embedding zs as follows: This equation represents an expert model. We use this simple ( model for all cities to eliminate the the impact of performance (i) (i) ReLU (Ws (XlPs ⊕ XlRs ⊕ XlSs ) + bs ), i = 1 of experts to the final output in this paper. z(i) s = (i) (i) ReLU (Ws zni1 s + bs ), 1 < i ≤ L Mixture layer: We obtain the inferred value by summing out- puts of experts weighted by city attention factors as follows: where L denotes the number of hidden layers. X Xtgt is transformed in the same way as Xstn ỹ = βck ỹck . s . The differ- ck ∈Csrc ence is the input factors. Finally, the transformed features generated by the LSTM and FC are concatenated to input another FC to obtain the (n0 ) 3.4 Training method features z∗ as follows: One of the major challenges of our study is the training of (i0 ) (i0 ) AIREX because we cannot directly train our model due to ( 0 0 ReLU (W∗0 (zL t ∗ ⊕ h∗ ) + b∗0 ), i = L + 1 z(i ∗ ) = (i0 ) i0 −1 (i0 ) missing air quality data of the target city. We develop a train- ReLU (W∗0 z∗ + b∗0 ), i0 ∈ [L + 2, L + L0 ] ing method in an unsupervised manner. We describe our ap- proach and loss function in the training phase. where ∗ indicates either ltgt or s and L0 denotes the number of hidden layers. Overall idea: We employ a meta-training approach [Guo and Barzilay, 2018], which supports to learn the differences be- City-based Attention layer: Not all source cites contribute tween individual features and cities in an unsupervised set- equally to inference in the target city. AIREX automatically captures the importance of different city data by employing ting. Given a set of source cities, the meat-training approach the attention mechanism. The city-based attention layer com- regards a single source city as a temporal target city, and putes city-attention factor which represents the weights of in- then trains models using the pair of temporal target and other fluences of source cities to air quality in the target city. The source cities. The temporal target and other source cities are city-attention factor βck of source city ck is computed as fol- referred to as the meta-target ct and meta-sources ci ∈ Cs ,
respectively. We obtain |Csrc | training pairs of meta-target We further use regularization of β to avoid overfitting. and meta-sources. The regularization computes the entropy of β and the sum We use a multi-task learning method with a shared encoder. of them. We design loss functions for accurately inferring air quality R= X βci log βci (3) and capturing the difference between source and target cities. ci ∈Cs Loss functions: The main objective of our training is that the final outputs are closer to the actual value. Since we The total loss function to be minimized in our training phase have multiple experts, we additionally train them. It is not is defined as follows: sufficient to evaluate the difference between outputs and true L = λ · Lf + (1 − λ) · Le + γ · La + ζ · R values because we must capture the correlations between the source and target cities. Since we do not have air quality data where λ, γ, and ζ are hyper parameters. in the target city, we must indirectly learn the correlations. For this purpose, we use a loss for minimizing the difference 4 Experiments between the transformed features of cities. We note that the true values in training phase are air quality of the meta-targets In this section, we evaluate the inference accuracy of AIREX instead of the actual target location. compared with the state-of-the-art methods. We aim to vali- The loss Lf is the main loss function for evaluating the in- date that AIREX can accurately infer air quality in unmoni- ference accuracy. Lf is computed by the mean squared error tored cities and other methods cannot1 . (MSE) between the final output ỹ and true value y as follows: 1 X 4.1 Experimental settings Lf = (ỹ(x) − y(x))2 Dataset: We use data of 20 cities in China spanning four |T | x∈T months from June 1st 2014/6/1 to September 30th 2014. We where T denotes the set of training pairs. collect air quality data, road network, PoI, and meteorology The loss Le is one for evaluating the inference accuracy of data as follows. Air quality data is provided as open data by an individual expert. Le is computed by MSE between ỹck Microsoft2 . We focus on inferring PM2.5 . We collect PoI data for source city ck of outputs of experts and y. from Foursquare3 and categorize them into ten categories ac- ! cording to the official categories provided by Foursquare. For 1 X 1 X Le = (ỹci (x) − y(x))2 . road network data, we use OpenStreetMap4 , and roads are |Cs | c ∈C |T | x∈T i s categorised into three types; highway, trunk, and other. For meteorology data, we use weather, temperature, air pressure, The loss La is for evaluating the difference of cities. humidity, wind speed, and wind direction, which is also pro- It is computed based on maximum mean discrepancy vided by Microsoft. Air quality and meteorology data are (MMD) [Gretton et al., 2012] as the adversary to minimize sampled every hour. the divergence between the marginal distribution of target and Evaluation: We select four cities as target cities; Beijing, source cities. MMD is known as effective distance metric Tianjin, Shinzhen, and Guangzhou. Beijing and Tianjin are measures for evaluating the discrepancy between two distri- cities in the northern area of China, whereas Shinzhen and butions explicitly in a non-parametric manner. Guangzhou are in the south. We randomly select five moni- toring stations from each city for training and test data. The La = M M D2 (z∪c1 ∪ · · · ∪ z∪c|Cs | , z∪ct ), ratio of training and test data is |Csrc | to one. z∪ci = 0 ∪s∈Si zs(L+L ) , As evaluation metrics of inference accuracy, we use the root mean squared error (RMSE) for PM2.5 , which is a stan- 1 X 1 X dard metric [Xu and Zhu, 2016; Cheng et al., 2018]. We run M M D(X , X 0 ) = φ(x) − φ(x0 ) , |X | x∈X |X 0 | 0 0 H three times for training by changing monitoring stations. x ∈X Compared methods and hyper parameters: We com- In MMD computation, H indicates the reproducing ker- nel Hilbert space (RKHS) and φ is the mapping function to pare AIREX with three approaches: (a) k nearest neighbors RKHS space. In our method, we compute the MMD score (KNN): This method selects the k monitoring stations closest by the kernel method [Bousmalis et al., 2016]. The kernel to the target location, and compute the average air pollutant method computes the MMD score as follows: quantities from these stations as result. We set k to be three in our experiments. (b) Feedforward neural networks (FNN): 1 X M M D(X , X 0 ) = K(x, x0 ) This method uses a simple neural network model, whose in- |X |(|X | − 1) x,x0 ∈X ,x6=x0 puts are Xtgt and Xstn s for all stations. In our experiments, 1 X the model consists of three layers with 200 units. For sequen- + K(x, x0 ) tial features, we only use their values at the same time step |X 0 |(|X 0 | − 1) x,x0 ∈X 0 ,x6=x0 2 1 X X Please see a supplementary file for detail implementation, data − K(x, x0 ) 0 |X ||X | x∈X 0 0 statistics, and additional results. x ∈X 2 www.microsoft.com/en-us/research/project/urban-computing/ where, K indicates a Gaussian karnel function: K(x, x0 ) = 3 developer.foursquare.com exp(− 2σ1 2 ||x − x0 ||2 ). 4 www.openstreetmap.org/
100 of the inferred air quality. (c) ADAIN: This method repre- KNN sents the state-of-the-art neural network model for inferring 80 FNN ADAIN5 air quality [Cheng et al., 2018]. We use two cases of source ADAIN19 RMSE 60 cities: ADAIN5 and ADAIN19, whose source cities are the AIREX 40 five cities closest to the target city and all source cities, re- spectively. 20 In parameter settings of AIREX and ADAIN, we follow the 0 setting in experiments of ADAIN [Cheng et al., 2018]. We Beijing Tianjin Shinzhen Guangzhou construct a single basic FC layer (L = 1) with 100 neurons and two LSTM layers with 300 memory cells per layer. We Figure 3: An overview of accuracy then build two layers of the high-level FC network (L0 = 2) with 200 neurons per layer. The time-dependent data is input in 24 time steps (i.e., one day). The number of epochs, the Table 1: AIREX vs ADAIN in different source cities. City names in- dicate the result obtained by ADAIN, where training data is the city. batch size, learning rate are selected from [100, 200, 300], A distance of zero kilometers indicates that the target and source [32, 64, 128, 256, 512], and [0.005, 0.01], respectively, by cities are the same. grid search. In our model, λ, γ, and ζ in AIREX are 0.5, 1.0, and 1.0, respectively. Further detail is provided in our codes. Beijing Guangzhou Method RMSE Dist. [km] RMSE Dist. [km] 4.2 Experimental results AIREX 47.88 — 11.90 — Figure 3 shows the inference accuracy for each method. 5 NN cities 51.70 — 23.05 — AIREX achieves the best accuracy in Tianjin and Guanzhou 19 cities 58.83 — 16.34 — and the second best in Beijing and Shinzhen. Since AIREX Beijing 30.49 0 78.15 1883.5 learns the difference between target and source cities, it can Langfang 48.58 47.1 61.57 1847.9 accurately infer air quality without air quality data in the Tianjin 52.05 113.8 43.01 1807.9 target city. KNN achieves the best accuracy in Beijing and Baoding 68.46 140.3 52.01 1758.1 Tangshan 60.53 154.9 54.85 1887.7 Shinzhen, as these are monitoring stations that very close to Zhangjiakou 81.79 160.9 24.0 1961.8 the target location, whereas KNN fails the accurate inference Chengde 74.67 176.0 27.18 2024.9 when there are no monitoring stations close to the target lo- ADAIN Cangzhou 59.57 181.5 47.22 1716.5 cation like Tianjin and Guangzhou. ADAIN and FNN do Hengshui 66.90 248.8 52.68 1635.9 not perform well in all target cities. In particular, although Shijiazhuang 55.80 263.8 87.56 1657.7 ADAIN is the state-of-the-art method for inferring air qual- Qinhuangdao 69.13 273.0 43.79 1956.8 ity, it does not perform well when the source and target cities Zibo 69.62 372.1 41.98 1585.2 are different. Shantou 85.63 1,835.3 25.95 350.6 We further investigate the difference between AIREX and Huizhou 84.25 1,871.4 22.30 118.0 ADAIN, as ADAIN may perform well if we use optimal Guangzhou 76.62 1,883.5 19.70 0 Dongguan 80.07 1,888.8 20.29 51.4 source cities. Table 1 shows the accuracy of AIREX and Foshan 76.56 1,897.4 21.87 18.9 ADAIN in Beijing and Guangzhou as target cities (see ap- Shenzhen 86.14 1,937.7 23.64 104.1 pendix for Tianjin and Shenzhen). In ADAIN, we use each Jiangmen 82.32 1,946.5 22.75 63.8 city as the source city in addition to ADAIN5 and ADAIN19. Hong Kong 86.99 1,953.3 26.46 118.8 In Beijing, ADAIN accurately infers the air quality when its source city is Beijing (i.e., target and source cities are the same). However, the accuracy of ADAIN significantly decreases when ADAIN uses different cities even when the et al. [Zheng et al., 2013] proposed U-air, which is a neural source cities are close to Beijing. In Guangzhou, AIREX network-based classifier model that captures both spatial and achieves better performance than ADAIN even when ADAIN temporal correlations. Hu et al. [Hu et al., 2018] proposed uses Guangzhou as the source city. This result indicates an architecture that employs deep reinforcement learning for that the use of multiple cities increases the inference accu- optimizing air quality sensing systems. Zhong et al. [Zhong racy if we can capture the correlations of air quality between et al., 2020] proposed AirRL, which consists of station selec- cities. From these results, we can confirm that our mixture-of- tor that distills monitoring stations using reinforcement learn- experts approach combined with attention mechanisms per- ing. Cheng et al. [Cheng et al., 2018] proposed ADAIN, forms well for accurately inferring the air quality in unmoni- which employs the attention mechanism to assign weights to tored cities without selecting source cities. station-oriented features. We used ADAIN to design encode and station-based attention layers. To the best of our knowl- 5 Related Work edge, we first employ a mixture-of-experts approach for air quality inference. We review neural network-based approaches for spatially fine-grained air quality inference. Numerous methods have None of them addresses the problem of air quality infer- been proposed [Shad et al., 2009; Hasenfratz et al., 2014; ence in unmonitored cities. In contrast to these studies, our Xu and Zhu, 2016], such that employ linear regression, ma- method assigns weights to each city automatically, without trix factorization and neural networks. For example, Zheng selecting monitoring stations.
6 Conclusion [Shad et al., 2009] Rouzbeh Shad, Mohammad Saadi Mes- gari, and Arefeh Shad. Predicting air pollution using fuzzy We addressed a new problem that infers air quality informa- genetic linear membership kriging in GIS. The ELSEVIER tion in unmonitored cities. For the problem, we proposed Computers, Environment and Urban Systems, 33(6):471– AIREX, which can accurately infer air quality in unmoni- 481, 2009. tored cities. Experimental studies using real data showed that AIREX outperforms the state-of-the-art methods. [Xu and Zhu, 2016] Yanan Xu and Yanmin Zhu. When re- As our future works, we address air quality inference in dif- mote sensing data meet ubiquitous urban data: Fine- ferent countries, in particular, countries across sea, and sup- grained air quality inference. In Proceedings of the IEEE port environments that each city has different data sources. Big Data, pages 1252–1261, 2016. [Zheng et al., 2013] Yu Zheng, Furui Liu, and Hsun-Ping References Hsieh. U-Air: When urban air quality inference meets big data. In Proceedings of the ACM SIGKDD, pages 1436– [Bahdanau et al., 2014] Dzmitry Bahdanau, Kyunghyun 1444, 2013. Cho, and Yoshua Bengio. Neural machine translation [Zhong et al., 2020] Huiqiang Zhong, Cunxiang Yin, Xiao- by jointly learning to align and translate. arXiv preprint arXiv:1409.0473, 2014. hui Wu, Jinchang Luo, and JiaWei He. Airrl: A rein- forcement learning approach to urban air quality inference. [Bousmalis et al., 2016] Konstantinos Bousmalis, George arXiv preprint arXiv:2003.12205, 2020. Trigeorgis, Nathan Silberman, Dilip Krishnan, and Du- mitru Erhan. Domain separation networks. In Proceedings of the NIPS, pages 343–351, 2016. [Caruana, 1997] Rich Caruana. Multitask learning. Machine learning, 28(1):41–75, 1997. [Chang and Hanna, 2004] Joseph C Chang and Steven R Hanna. Air quality model performance evaluation. Meteo- rology and Atmospheric Physics, 87(1-3):167–196, 2004. [Cheng et al., 2018] Weiyu Cheng, Yanyan Shen, Yanmin Zhu, and Linpeng Huang. A neural attention model for urban air quality inference: Learning the weights of mon- itoring stations. In Proceedings of the AAAI, 2018. [Gretton et al., 2012] Arthur Gretton, Karsten M Borgwardt, Malte J Rasch, Bernhard Schölkopf, and Alexander Smola. A kernel two-sample test. Journal of Machine Learning Research, 13(Mar):723–773, 2012. [Guo and Barzilay, 2018] Jiang Guo and Regina Barzilay. Multi-source domain adaptation with mixture of experts. In Proceedings of the ACL EMNLP, pages 4694–4703, 2018. [Hasenfratz et al., 2014] David Hasenfratz, Olga Saukh, Christoph Walser, Christoph Hueglin, Martin Fierz, and Lothar Thiele. Pushing the spatio-temporal resolution limit of urban air pollution maps. In Proceedings of the IEEE PerCom, pages 69–77, 2014. [Hu et al., 2018] Zhiwen Hu, Zixuan Bai, Kaigui Bian, Tao Wang, and Lingyang Song. Real-time fine-grained air quality sensing networks in smart city: Design, implemen- tation and optimization. arXiv preprint arXiv:1810.08514, 2018. [Jacobs et al., 1991] Robert A Jacobs, Michael I Jordan, Steven J Nowlan, and Geoffrey E Hinton. Adaptive mix- tures of local experts. Neural computation, 3(1):79–87, 1991. [Masoudnia and Ebrahimpour, 2014] Saeed Masoudnia and Reza Ebrahimpour. Mixture of experts: a literature sur- vey. Artificial Intelligence Review, 42(2):275–293, 2014.
You can also read