AIREX: Neural Network-based Approach for Air Quality Inference in Unmonitored Cities

Page created by Salvador Douglas
 
CONTINUE READING
AIREX: Neural Network-based Approach for Air Quality Inference in
 Unmonitored Cities

 Yuya Sasaki1 , Kei Harada1 , Shohei Yamasaki1 and Makoto Onizuka1
 1
 Osaka university
 {sasaki, harada.kei, yamasaki.shohei, onizuka}@ist.osaka-u.ac.jp
arXiv:2108.07120v1 [cs.LG] 16 Aug 2021

 Abstract The existing methods aim to infer air quality of target loca-
 tions only in monitored cities (i.e., cities with monitoring sta-
 Urban air pollution is a major environmental prob-
 tions). They do not capture the difference of features in cities,
 lem affecting human health and quality of life.
 which causes the low accuracy of air quality inference. Since
 Monitoring stations have been established to con-
 not all cities have monitoring stations, we cannot accurately
 tinuously obtain air quality information, but they do
 infer the air quality of target locations within the unmonitored
 not cover all areas. Thus, there are numerous meth-
 cities.
 ods for spatially fine-grained air quality inference.
 Since existing methods aim to infer air quality of Problem definition and challenges: We study a new prob-
 locations only in monitored cities, they do not as- lem, air quality inference in unmonitored cities, to globally
 sume inferring air quality in unmonitored cities. solve the urban air pollution problem. A straightforward ap-
 In this paper, we first study the air quality infer- proach for the problem is the use of existing models that are
 ence in unmonitored cities. To accurately infer trained by air quality data of cities in the vicinity of the target
 air quality in unmonitored cities, we propose a unmonitored city. However, even the state-of-the-art method
 neural network-based approach AIREX. The nov- ADAIN [Cheng et al., 2018] deteriorates the inference accu-
 elty of AIREX is employing a mixture-of-experts racy in unmonitored cities, even when using air quality data
 approach, which is a machine learning technique of numerous monitored cities as training data (see Table 1 in
 based on the divide-and-conquer principle, to learn experiments section).
 correlations of air quality between multiple cities. Therefore, we need a new neural network architecture in
 To further boost the performance, it employs atten- this problem. We face two challenges: (1) how to design
 tion mechanisms to compute impacts of air qual- a neural network architecture to capture the correlations of
 ity inference from the monitored cities to the loca- air quality between monitored and unmonitored cities and (2)
 tions in the unmonitored city. We show, through how to train models without available air quality data of the
 experiments on a real-world air quality dataset, that unmonitored cities. For the first challenge, since features of
 AIREX achieves higher accuracy than state-of-the- cities differ, architectures must capture their differences and
 art methods. reflect them in the inference of air quality. It is difficult to
 select optimal monitoring stations for model training due to
 the absence of air quality data in unmonitored cities. For the
 1 Introduction second challenge, since we do not have air quality data of
 Urban air pollution poses a severe and global problem. The the unmonitored city, architectures must be trained only by
 fine-grained assessment of urban air quality is crucial for both using air quality data of monitored cities and external data.
 the governments and citizens to establish means to improve We cannot directly learn the correlations between monitored
 human health and quality of life. Monitoring stations have and unmonitored cities.
 been established in numerous cities to continuously obtain air In summery, we require a new neural network architecture
 quality information. However, due to high construction and that (1) can automatically capture the correlation between
 management costs, monitoring stations are sparsely installed monitored and unmonitred cities without selecting monitor-
 and concentrated only in areas of higher importance, such as ing stations and (2) can be trained in an unsupervised manner.
 cities with large populations. As a result, it is essential to Contributions: We propose a novel neural network-based ar-
 infer air quality in areas without monitoring stations. chitecture AIREX. AIREX automatically captures the correla-
 The development of neural network techniques has acceler- tions between monitored and unmonitored cities. The novel
 ated a neural network-based approach for inferring air quality design of AIREX is based on the effective combination of
 of target locations [Zheng et al., 2013; Cheng et al., 2018; the mixture-of-experts [Jacobs et al., 1991; Masoudnia and
 Chang and Hanna, 2004]. This approach leverages avail- Ebrahimpour, 2014; Guo and Barzilay, 2018] and attention
 able external data related to the air quality, such as point- mechanisms [Bahdanau et al., 2014]. The mixture-of-experts
 of-interest and meteorology, to capture features of locations. approach is a machine learning technique based on the divide-
and-conquer principle. This approach uses multiple models Offline training Online inference
(called experts) and aggregates outputs of experts for deriv- AIREX Result
ing the final output. Each expert in AIREX corresponds to Aggregation
individual monitored cities, and thus AIREX infers air qual- Model
 copy
ity in unmonitored cities by aggregating air quality assessed Encode Attention

from individual monitored cities. The attention mechanism , Feature extraction
further boosts the performance of AIREX. AIREX employs Feature extraction
 Target
two attentions to capture the importance of monitored cities location

and monitoring stations individually for computing weights PoI Road
 Network
 Meteorology Air pollutant City

of influences from monitored cities to the target location. The
effective combination of mixture-of-experts approach and at-
tention mechanism achieves accurate air quality inference. Figure 1: Our framework for air quality inference
 For training AIREX, we develop a training method using
a meta-training approach [Guo and Barzilay, 2018], which is
 2013; Cheng et al., 2018]; Point-of-interest (PoI), road net-
suitable for training of the mixture-of-experts approach in an
 work, and meteorological information.
unsupervised manner. In our training method, we regard one
of the monitored cities as an unmonitored city at the training Definition 2 (PoI data). PoI data DP consist of PoI informa-
phase so that AIREX can be learned in an unsupervised man- tion p, which is a triple of an identifier, specific location lp ,
ner. We use multi-task learning [Caruana, 1997] for training and category υp (e.g., factory).
both the whole AIREX and experts with capturing the differ- Definition 3 (Road network data). A road network DR con-
ence among cities. This training method enables to learn the sists of road segments r. Each road segment includes coordi-
correlations between monitored and unmonitored cities with- nates of the start and end points, and road category υr (e.g.,
out air quality data of unmonitored cities. highway).
 Our contributions presented in this study are as follows:
 Definition 4 (Meteorology data). Meteorology data DM con-
 • We address a novel problem that infers air quality infor- sist of distinct-level meteorological information. Meteorolog-
 mation in unmonitored cities by using the air quality data ical information includes meteorological measurements, such
 obtained from other cities. We show that state-of-the-art as weather and temperature. The meteorology data are time-
 methods are not suitable for this problem. dependent data.
 • We propose AIREX that can accurately infer air qual- In this study, we aim to infer spatially fine-grained air qual-
 ity information in unmonitored cities. This employs the ity in the target unmonitored city.
 mixture-of-experts approach and attention mechanism to
 capture the correlations of air quality between monitored Problem statement. Given target city ctgt , target location
 and unmonitored cities. ltgt in ctgt , a set Csrc of source cities, a set of monitoring
 stations in Csrc , air pollutant data DA , PoI data DP , road
 • Through experiments with 20 cities in China, we show network data DR , and meteorology data DM , we aim to infer
 that AIREX achieves higher accuracy than the-state-of- air quality of ltgt over time domain T .
 the-art method.
 We focus on regression for evaluating quantities of air pol-
 lutants in this paper, but our models can be used for classifica-
2 Problem Formulation tion for evaluating the air quality index [Cheng et al., 2018].
We describe the notations and definitions used in the formu-
lation of the problem that we solve in this study. 3 Proposal
 There are two types of cities, namely, target and source
cities, that denote unmonitored and monitored cities, respec- We present our neural network-based architecture AIREX
tively. Each city c has its representative specific location lc and training method after describing our framework and fea-
(e.g., the center of c). We assume that we have a single tar- ture extraction.
get city ctgt and a set Csrc of source cities. We denote the 3.1 Framework and Design Policy
set of monitoring stations by S and each monitoring station
s ∈ S has its location ls , which periodically monitors a quan- Figure 1 illustrates our framework. This framework consists
tity of air pollutants, such as PM2.5 , over the time domain of offline training and online inference. In the offline training,
T = ht1 , t2 , . . . , t|T | i. Source city ck ∈ Csrc has a set of we build our inference model after extracting features, and
monitoring stations Sk ⊆ S. We denote sk,i as monitoring in the online inference, we infer the air quality of the given
station si ∈ Sk . We define air pollutant data as follows: target location by using the built model.
 We describe a design policy of the offline training. Air
Definition 1 (Air pollutant data). Air pollutant data DA con- quality of the target location is assessed by data related to tar-
sists of quantities of air pollutants monitored by stations, and get location, monitoring station, and cities. Thus, our frame-
they are time-dependent. work extracts features of target location, monitoring stations,
 Cities have characteristics that affect air quality. To infer and cities, from data sources. We leverage these features to
air quality, we use three external data that were frequently capture the correlations of air quality between the target and
employed in prior studies [Xu and Zhu, 2016; Zheng et al., source cities, and the target location and monitoring stations.
We design our inference architecture AIREX for automat- compute xR
 υ as follows:
ically capturing the correlations and being trained in an un-
supervised manner. For this purpose, AIREX is based on the xR R
 υ (l) = |{r ∈ D |r ⊂ L(l) ∧ υr = υ}|. (2)
mixture-of-experts approach [Guo and Barzilay, 2018] and Meteorological factors XlM : XlM is the sequence of mete-
attention mechanism [Bahdanau et al., 2014]. The mixture- orological measurements of l, such as weather and tempera-
of-experts approach compute the final output by aggregating ture, which influences the concentrations and flows of air pol-
the output of multiple models (i.e., experts). In AIREX, each lutants. The meteorological measurements have two types of
expert is a model for inferring air quality by using data of values; categorical values (e.g., weather and wind direction)
source city. Each source city and monitoring station does and numerical values (e.g., temperature and wind speed). For
not equally contribute the air quality inference in the target categorical and numerical values, we adopt one-hot encoding
city, and thus we use the attention mechanism to compute and raw values, respectively. We denote the meteorological
the importance of cities and monitoring stations. AIREX can factor at time step t as XlM t .
accurately infer the air quality in the target city by elegant These factors have demonstrated their usefulness in previous
combination of the mixture-of-experts approach and attention studies [Xu and Zhu, 2016; Cheng et al., 2018]. We normal-
mechanism. Furthermore, AIREX can be trained in an unsu- ize numerical values in factors by dividing the largest values
pervised manner by using the meta-training approach [Guo among each factor.
and Barzilay, 2018] and multi-task learning [Caruana, 1997]. The monitoring and station location factors are associated
We describe the training method later. with station s, and the city location factor is associated with
 AIREX consists of three main components: encoding, at- city c.
tention, and aggregation. First, in the encode, it encodes raw
input features to obtain latent features for capturing interac- Monitoring factor XsA : Quantities of air pollutants moni-
tions between inferred values and raw input. Then, in the at- tored by station s represent the most important information
tention, AIREX computes the importance of source cities and for inferring air quality. XsA is the sequence of air pollutant
monitoring stations for inferring air quality of the target city. quantities in DA of station s. We denote the monitoring fac-
Finally, in the aggregation, it computes output of experts for tor at time step t as XsAt .
each source city by aggregating the transformed features and Station location and city location factors XcC and XsS : The
importance of monitoring stations, and then compute the final distance and direction from a location to another location are
output by aggregating the outputs of experts and importance likewise important factors to measure the influence of their
of cities. respective air quality levels. XsS (resp. XcC ) is the relative
 position that depecits the distance and angle from station s
3.2 Feature extraction (resp. source city c) to the target location ltgt (resp. target
We introduce our features for assessing air quality at ltgt . We city ctgt ).
extract the three features, namely, the target location feature Our features combine the above factors. The target loca-
Xtgt , monitoring station feature Xstns , and city feature X
 city
 . tion feature Xtgt , monitoring station feature Xstns , and city
These features comprise (1) PoI factor, (2) road network fac- feature Xcity are given as follows:
tor, (3) meteorological factor, (4) air pollutant factor, (5) sta-
 Xtgt = XlPtgt ∪ XlRtgt ∪ XlM ,
tion location factor, and (6) city location factor. We describe tgt

our three features after explaining how to extract each factor Xstn
 s = XP R M A S
 ls ∪ Xls ∪ Xls ∪ Xs ∪ Xs , and
from the data.
 The PoI, road network, and meteorological factors are as- Xcity = ∪c∈Csrc {XcC }.
sociated with location l (e.g., locations of monitoring stations Here, since the air quality changes time by time, it is prefer-
and the target location). l has its own factors that are extracted able that all factors are time-dependent. Due to limited data
from the data within affecting region L(l). We set L(l) as a sources, it is necessary to support both time-independent and
circle whose center and radius are l and 1 km, respectively. time-dependent data.
PoI factor XlP : XlP includes the numbers of PoIs, which rep-
resents the characteristics of locations, such as the numbers of 3.3 Inference architecture
factories and public parks. We consider a set ΥP of PoI cat- We introduce our inference architecture AIREX. Figure 2
egories and count the number of PoIs belonging to each PoI shows components of AIREX. AIREX has three input types:
category. Let XlP = {xP υ (l)}υ∈ΥP denote the PoI factor for
 Xtgt , Xstn
 s for ∀s ∈ S, and Xcity , and it contains five layers:
l. We compute xP as follows: encode, station-based attention, city-based attention, experts,
 υ
 and mixture layers. We describe each layer in the following.
 xP P
 υ (l) = |{p ∈ D |lp ⊂ L(l) ∧ υp = υ}|. (1)
 Encode layer: The encode layer transforms Xtgt and Xstn s .
Road network factor XlR : XlR
 includes the numbers of road Each feature includes time-independent (e.g., PoI) and time-
segments, which affects local air quality, as vehicles are one dependent (e.g., meteorology) data. We transform time-
of the sources of air pollutants. We consider a set ΥR of road dependent and time-independent factors by LSTM and FC,
categories and count the number of road segments belonging respectively [Cheng et al., 2018]. We use different models for
to each road category. Let XlR = {xR υ (l)}υ∈ΥR denote the Xtgt and Xstn s because they include different factors; how-
road network features extracted for l. We define r as arbi- ever, we use the same LSTM and FC for all monitoring sta-
trary points between the start and end of road segment r. We tions to increase generalization ability.
 City-based Mixture lows:
 Encode attention
 (L+L0 ) 0 0
 FC ,…, z⊕k =z(L+L
 sk,1
 )
 ⊕ · · · ⊕ z(L+L
 sk,n
 )
 
 FC (L+L0 ) (L+L0 )
 βc0 k =w|β ReLU (Wβ (zltgt ⊕ z⊕k ⊕ XcCk ) + bβ ) + bβ
 
 LSTM
 Station-based
 exp(βc0 k )
 βck =
 attention … 
 Σc∈Csrc exp(βc0 )

 FC
 , ,…, , Station-based Attention layer: Each monitoring station has
 …
 a different impact to the target location, as distances and an-
 , FC
 ,

 gles between each monitoring station and target location are
 …

 ,
 ,
 different as well as similarity of their features. In the station-
 LSTM
 …

 …
 …

 ,
 
 based attention layer, we compute station-affect factor, which
 is a weight of influence of monitoring stations on the air qual-
 …

 ,
 ,
 Experts ity of the target location. The station-affect factor αk,i for
 stations si in source city ck is calculated by the following
 Figure 2: Neural network structure of AIREX equation:

 0 (L+L0 ) 0
 We first explain models for time-depending factors in αk,i = w|α ReLU (Wα (zltgt ⊕ z(L+L
 sk,i
 )
 ) + bα ) + bα
 t
Xstn M A
 s . Xls and Xls at time step t are transformed into hs 0
 exp(αk,i )
as follows: αk,i = 0
 Σsi ∈Sk exp(αk,i )
 its =σ(Wix (XM t At t−1
 ls ⊕ Xls ) + Wih hs + Wic ct−1
 s + bi )
 We then compute embedding zck of source city with station
 fts =σ(Wf x (XM t At t−1
 ls ⊕ Xls ) + Wf h hs + Wf c ct−1
 s + bf ) affect-factors as follows:
 cts =fts ct−1 + its tanh(Wcx (XM t At t−1 0
 X
 ls ⊕ Xls ) + Wch hs + bc ) αk,i zs(L+L )
 s zck = k,i
 .
ots =σ(Wox (XM
 ls
 t
 ⊕ XAt
 ls ) + Woh ht−1
 s + Woc cts + bo ) si ∈Sk

hts =ots tanh(cts ) zck represents how much is influence air quality of source city
 ck to the target location.
 where, W is weight matrix, b is bias vector, and indicates
 Experts layer: The experts layer computes an inferred value
Hadamard product. i, f, o, c, and h are input gate, forget gate, on each source city. Inferred value ỹck of ck is computed by
output gate, memory cell, and final states of hidden layer, re- the following equation:
spectively.
 Next, we describe models for time-independent factors. (L+L0 )
 ỹck = w|k ReLU (Wk (zltgt ⊕ zck )) + bk ) + bk .
 (n)
Xls , XlRs , and XlSs in Xstn
 P
 s are translated into embedding zs
as follows: This equation represents an expert model. We use this simple
 ( model for all cities to eliminate the the impact of performance
 (i) (i)
 ReLU (Ws (XlPs ⊕ XlRs ⊕ XlSs ) + bs ), i = 1 of experts to the final output in this paper.
 z(i)
 s = (i) (i)
 ReLU (Ws zni1
 s + bs ), 1 < i ≤ L Mixture layer: We obtain the inferred value by summing out-
 puts of experts weighted by city attention factors as follows:
 where L denotes the number of hidden layers. X
 Xtgt is transformed in the same way as Xstn ỹ = βck ỹck .
 s . The differ-
 ck ∈Csrc
ence is the input factors.
 Finally, the transformed features generated by the LSTM
and FC are concatenated to input another FC to obtain the
 (n0 ) 3.4 Training method
features z∗ as follows:
 One of the major challenges of our study is the training of
 (i0 ) (i0 ) AIREX because we cannot directly train our model due to
 (
 0
 0 ReLU (W∗0 (zL t
 ∗ ⊕ h∗ ) + b∗0 ), i = L + 1
z(i
 ∗
 )
 = (i0 ) i0 −1 (i0 ) missing air quality data of the target city. We develop a train-
 ReLU (W∗0 z∗ + b∗0 ), i0 ∈ [L + 2, L + L0 ] ing method in an unsupervised manner. We describe our ap-
 proach and loss function in the training phase.
 where ∗ indicates either ltgt or s and L0 denotes the number
of hidden layers. Overall idea: We employ a meta-training approach [Guo and
 Barzilay, 2018], which supports to learn the differences be-
City-based Attention layer: Not all source cites contribute tween individual features and cities in an unsupervised set-
equally to inference in the target city. AIREX automatically
captures the importance of different city data by employing ting. Given a set of source cities, the meat-training approach
the attention mechanism. The city-based attention layer com- regards a single source city as a temporal target city, and
putes city-attention factor which represents the weights of in- then trains models using the pair of temporal target and other
fluences of source cities to air quality in the target city. The source cities. The temporal target and other source cities are
city-attention factor βck of source city ck is computed as fol- referred to as the meta-target ct and meta-sources ci ∈ Cs ,
respectively. We obtain |Csrc | training pairs of meta-target We further use regularization of β to avoid overfitting.
and meta-sources. The regularization computes the entropy of β and the sum
 We use a multi-task learning method with a shared encoder. of them.
We design loss functions for accurately inferring air quality R=
 X
 βci log βci (3)
and capturing the difference between source and target cities. ci ∈Cs
Loss functions: The main objective of our training is that
the final outputs are closer to the actual value. Since we The total loss function to be minimized in our training phase
have multiple experts, we additionally train them. It is not is defined as follows:
sufficient to evaluate the difference between outputs and true L = λ · Lf + (1 − λ) · Le + γ · La + ζ · R
values because we must capture the correlations between the
source and target cities. Since we do not have air quality data where λ, γ, and ζ are hyper parameters.
in the target city, we must indirectly learn the correlations.
For this purpose, we use a loss for minimizing the difference 4 Experiments
between the transformed features of cities. We note that the
true values in training phase are air quality of the meta-targets In this section, we evaluate the inference accuracy of AIREX
instead of the actual target location. compared with the state-of-the-art methods. We aim to vali-
 The loss Lf is the main loss function for evaluating the in- date that AIREX can accurately infer air quality in unmoni-
ference accuracy. Lf is computed by the mean squared error tored cities and other methods cannot1 .
(MSE) between the final output ỹ and true value y as follows:
 1 X
 4.1 Experimental settings
 Lf = (ỹ(x) − y(x))2 Dataset: We use data of 20 cities in China spanning four
 |T | x∈T
 months from June 1st 2014/6/1 to September 30th 2014. We
 where T denotes the set of training pairs. collect air quality data, road network, PoI, and meteorology
 The loss Le is one for evaluating the inference accuracy of data as follows. Air quality data is provided as open data by
an individual expert. Le is computed by MSE between ỹck Microsoft2 . We focus on inferring PM2.5 . We collect PoI data
for source city ck of outputs of experts and y. from Foursquare3 and categorize them into ten categories ac-
 ! cording to the official categories provided by Foursquare. For
 1 X 1 X
 Le = (ỹci (x) − y(x))2 . road network data, we use OpenStreetMap4 , and roads are
 |Cs | c ∈C |T | x∈T
 i s categorised into three types; highway, trunk, and other. For
 meteorology data, we use weather, temperature, air pressure,
 The loss La is for evaluating the difference of cities. humidity, wind speed, and wind direction, which is also pro-
It is computed based on maximum mean discrepancy vided by Microsoft. Air quality and meteorology data are
(MMD) [Gretton et al., 2012] as the adversary to minimize sampled every hour.
the divergence between the marginal distribution of target and
 Evaluation: We select four cities as target cities; Beijing,
source cities. MMD is known as effective distance metric
 Tianjin, Shinzhen, and Guangzhou. Beijing and Tianjin are
measures for evaluating the discrepancy between two distri-
 cities in the northern area of China, whereas Shinzhen and
butions explicitly in a non-parametric manner.
 Guangzhou are in the south. We randomly select five moni-
 toring stations from each city for training and test data. The
 La = M M D2 (z∪c1 ∪ · · · ∪ z∪c|Cs | , z∪ct ), ratio of training and test data is |Csrc | to one.
 z∪ci =
 0
 ∪s∈Si zs(L+L ) , As evaluation metrics of inference accuracy, we use the
 root mean squared error (RMSE) for PM2.5 , which is a stan-
 1 X 1 X dard metric [Xu and Zhu, 2016; Cheng et al., 2018]. We run
 M M D(X , X 0 ) = φ(x) − φ(x0 ) ,
 |X | x∈X |X 0 | 0 0 H three times for training by changing monitoring stations.
 x ∈X
 Compared methods and hyper parameters: We com-
 In MMD computation, H indicates the reproducing ker-
nel Hilbert space (RKHS) and φ is the mapping function to pare AIREX with three approaches: (a) k nearest neighbors
RKHS space. In our method, we compute the MMD score (KNN): This method selects the k monitoring stations closest
by the kernel method [Bousmalis et al., 2016]. The kernel to the target location, and compute the average air pollutant
method computes the MMD score as follows: quantities from these stations as result. We set k to be three
 in our experiments. (b) Feedforward neural networks (FNN):
 1 X
 M M D(X , X 0 ) = K(x, x0 ) This method uses a simple neural network model, whose in-
 |X |(|X | − 1)
 x,x0 ∈X ,x6=x0 puts are Xtgt and Xstn s for all stations. In our experiments,
 1 X the model consists of three layers with 200 units. For sequen-
 + K(x, x0 ) tial features, we only use their values at the same time step
 |X 0 |(|X 0 | − 1)
 x,x0 ∈X 0 ,x6=x0
 2 1
 X X Please see a supplementary file for detail implementation, data
 − K(x, x0 )
 0
 |X ||X | x∈X 0 0 statistics, and additional results.
 x ∈X 2
 www.microsoft.com/en-us/research/project/urban-computing/
 where, K indicates a Gaussian karnel function: K(x, x0 ) = 3
 developer.foursquare.com
exp(− 2σ1 2 ||x − x0 ||2 ). 4
 www.openstreetmap.org/
100
of the inferred air quality. (c) ADAIN: This method repre- KNN
sents the state-of-the-art neural network model for inferring 80 FNN
 ADAIN5
air quality [Cheng et al., 2018]. We use two cases of source ADAIN19

 RMSE
 60
cities: ADAIN5 and ADAIN19, whose source cities are the AIREX
 40
five cities closest to the target city and all source cities, re-
spectively. 20
 In parameter settings of AIREX and ADAIN, we follow the 0
setting in experiments of ADAIN [Cheng et al., 2018]. We Beijing Tianjin Shinzhen Guangzhou
construct a single basic FC layer (L = 1) with 100 neurons
and two LSTM layers with 300 memory cells per layer. We Figure 3: An overview of accuracy
then build two layers of the high-level FC network (L0 = 2)
with 200 neurons per layer. The time-dependent data is input
in 24 time steps (i.e., one day). The number of epochs, the Table 1: AIREX vs ADAIN in different source cities. City names in-
 dicate the result obtained by ADAIN, where training data is the city.
batch size, learning rate are selected from [100, 200, 300], A distance of zero kilometers indicates that the target and source
[32, 64, 128, 256, 512], and [0.005, 0.01], respectively, by cities are the same.
grid search. In our model, λ, γ, and ζ in AIREX are 0.5, 1.0,
and 1.0, respectively. Further detail is provided in our codes. Beijing Guangzhou
 Method
 RMSE Dist. [km] RMSE Dist. [km]
4.2 Experimental results AIREX 47.88 — 11.90 —
Figure 3 shows the inference accuracy for each method. 5 NN cities 51.70 — 23.05 —
AIREX achieves the best accuracy in Tianjin and Guanzhou 19 cities 58.83 — 16.34 —
and the second best in Beijing and Shinzhen. Since AIREX Beijing 30.49 0 78.15 1883.5
learns the difference between target and source cities, it can Langfang 48.58 47.1 61.57 1847.9
accurately infer air quality without air quality data in the Tianjin 52.05 113.8 43.01 1807.9
target city. KNN achieves the best accuracy in Beijing and Baoding 68.46 140.3 52.01 1758.1
 Tangshan 60.53 154.9 54.85 1887.7
Shinzhen, as these are monitoring stations that very close to
 Zhangjiakou 81.79 160.9 24.0 1961.8
the target location, whereas KNN fails the accurate inference Chengde 74.67 176.0 27.18 2024.9
when there are no monitoring stations close to the target lo-
 ADAIN

 Cangzhou 59.57 181.5 47.22 1716.5
cation like Tianjin and Guangzhou. ADAIN and FNN do Hengshui 66.90 248.8 52.68 1635.9
not perform well in all target cities. In particular, although Shijiazhuang 55.80 263.8 87.56 1657.7
ADAIN is the state-of-the-art method for inferring air qual- Qinhuangdao 69.13 273.0 43.79 1956.8
ity, it does not perform well when the source and target cities Zibo 69.62 372.1 41.98 1585.2
are different. Shantou 85.63 1,835.3 25.95 350.6
 We further investigate the difference between AIREX and Huizhou 84.25 1,871.4 22.30 118.0
ADAIN, as ADAIN may perform well if we use optimal Guangzhou 76.62 1,883.5 19.70 0
 Dongguan 80.07 1,888.8 20.29 51.4
source cities. Table 1 shows the accuracy of AIREX and
 Foshan 76.56 1,897.4 21.87 18.9
ADAIN in Beijing and Guangzhou as target cities (see ap- Shenzhen 86.14 1,937.7 23.64 104.1
pendix for Tianjin and Shenzhen). In ADAIN, we use each Jiangmen 82.32 1,946.5 22.75 63.8
city as the source city in addition to ADAIN5 and ADAIN19. Hong Kong 86.99 1,953.3 26.46 118.8
In Beijing, ADAIN accurately infers the air quality when
its source city is Beijing (i.e., target and source cities are
the same). However, the accuracy of ADAIN significantly
decreases when ADAIN uses different cities even when the et al. [Zheng et al., 2013] proposed U-air, which is a neural
source cities are close to Beijing. In Guangzhou, AIREX network-based classifier model that captures both spatial and
achieves better performance than ADAIN even when ADAIN temporal correlations. Hu et al. [Hu et al., 2018] proposed
uses Guangzhou as the source city. This result indicates an architecture that employs deep reinforcement learning for
that the use of multiple cities increases the inference accu- optimizing air quality sensing systems. Zhong et al. [Zhong
racy if we can capture the correlations of air quality between et al., 2020] proposed AirRL, which consists of station selec-
cities. From these results, we can confirm that our mixture-of- tor that distills monitoring stations using reinforcement learn-
experts approach combined with attention mechanisms per- ing. Cheng et al. [Cheng et al., 2018] proposed ADAIN,
forms well for accurately inferring the air quality in unmoni- which employs the attention mechanism to assign weights to
tored cities without selecting source cities. station-oriented features. We used ADAIN to design encode
 and station-based attention layers. To the best of our knowl-
5 Related Work edge, we first employ a mixture-of-experts approach for air
 quality inference.
We review neural network-based approaches for spatially
fine-grained air quality inference. Numerous methods have None of them addresses the problem of air quality infer-
been proposed [Shad et al., 2009; Hasenfratz et al., 2014; ence in unmonitored cities. In contrast to these studies, our
Xu and Zhu, 2016], such that employ linear regression, ma- method assigns weights to each city automatically, without
trix factorization and neural networks. For example, Zheng selecting monitoring stations.
6 Conclusion [Shad et al., 2009] Rouzbeh Shad, Mohammad Saadi Mes-
 gari, and Arefeh Shad. Predicting air pollution using fuzzy
We addressed a new problem that infers air quality informa-
 genetic linear membership kriging in GIS. The ELSEVIER
tion in unmonitored cities. For the problem, we proposed
 Computers, Environment and Urban Systems, 33(6):471–
AIREX, which can accurately infer air quality in unmoni-
 481, 2009.
tored cities. Experimental studies using real data showed that
AIREX outperforms the state-of-the-art methods. [Xu and Zhu, 2016] Yanan Xu and Yanmin Zhu. When re-
 As our future works, we address air quality inference in dif- mote sensing data meet ubiquitous urban data: Fine-
ferent countries, in particular, countries across sea, and sup- grained air quality inference. In Proceedings of the IEEE
port environments that each city has different data sources. Big Data, pages 1252–1261, 2016.
 [Zheng et al., 2013] Yu Zheng, Furui Liu, and Hsun-Ping
References Hsieh. U-Air: When urban air quality inference meets big
 data. In Proceedings of the ACM SIGKDD, pages 1436–
[Bahdanau et al., 2014] Dzmitry Bahdanau, Kyunghyun 1444, 2013.
 Cho, and Yoshua Bengio. Neural machine translation
 [Zhong et al., 2020] Huiqiang Zhong, Cunxiang Yin, Xiao-
 by jointly learning to align and translate. arXiv preprint
 arXiv:1409.0473, 2014. hui Wu, Jinchang Luo, and JiaWei He. Airrl: A rein-
 forcement learning approach to urban air quality inference.
[Bousmalis et al., 2016] Konstantinos Bousmalis, George arXiv preprint arXiv:2003.12205, 2020.
 Trigeorgis, Nathan Silberman, Dilip Krishnan, and Du-
 mitru Erhan. Domain separation networks. In Proceedings
 of the NIPS, pages 343–351, 2016.
[Caruana, 1997] Rich Caruana. Multitask learning. Machine
 learning, 28(1):41–75, 1997.
[Chang and Hanna, 2004] Joseph C Chang and Steven R
 Hanna. Air quality model performance evaluation. Meteo-
 rology and Atmospheric Physics, 87(1-3):167–196, 2004.
[Cheng et al., 2018] Weiyu Cheng, Yanyan Shen, Yanmin
 Zhu, and Linpeng Huang. A neural attention model for
 urban air quality inference: Learning the weights of mon-
 itoring stations. In Proceedings of the AAAI, 2018.
[Gretton et al., 2012] Arthur Gretton, Karsten M Borgwardt,
 Malte J Rasch, Bernhard Schölkopf, and Alexander
 Smola. A kernel two-sample test. Journal of Machine
 Learning Research, 13(Mar):723–773, 2012.
[Guo and Barzilay, 2018] Jiang Guo and Regina Barzilay.
 Multi-source domain adaptation with mixture of experts.
 In Proceedings of the ACL EMNLP, pages 4694–4703,
 2018.
[Hasenfratz et al., 2014] David Hasenfratz, Olga Saukh,
 Christoph Walser, Christoph Hueglin, Martin Fierz, and
 Lothar Thiele. Pushing the spatio-temporal resolution
 limit of urban air pollution maps. In Proceedings of the
 IEEE PerCom, pages 69–77, 2014.
[Hu et al., 2018] Zhiwen Hu, Zixuan Bai, Kaigui Bian, Tao
 Wang, and Lingyang Song. Real-time fine-grained air
 quality sensing networks in smart city: Design, implemen-
 tation and optimization. arXiv preprint arXiv:1810.08514,
 2018.
[Jacobs et al., 1991] Robert A Jacobs, Michael I Jordan,
 Steven J Nowlan, and Geoffrey E Hinton. Adaptive mix-
 tures of local experts. Neural computation, 3(1):79–87,
 1991.
[Masoudnia and Ebrahimpour, 2014] Saeed Masoudnia and
 Reza Ebrahimpour. Mixture of experts: a literature sur-
 vey. Artificial Intelligence Review, 42(2):275–293, 2014.
You can also read