Age-of-Information-based Scheduling in Multiuser Uplinks with Stochastic Arrivals: A POMDP Approach

Page created by Derrick Stephens
 
CONTINUE READING
Age-of-Information-based Scheduling in Multiuser Uplinks
                                                 with Stochastic Arrivals: A POMDP Approach
                                                                             Aoyu Gong∗ , Tong Zhang† , He Chen† , and Yijin Zhang∗
                                            ∗
                                                School of Electronic and Optical Engineering, Nanjing University of Science and Technology, Nanjing 210094, China
                                                    †
                                                      Department of Information Engineering, The Chinese University of Hong Kong, Hong Kong SAR, China
                                                   Email: gongaoyu@gmail.com, bennyzhangtong@yahoo.com, he.chen@ie.cuhk.edu.hk, yijin.zhang@gmail.com

                                            Abstract—In this paper, we consider a multiuser uplink status           work in [11] by studied the nonorthogonal multiple access.
                                         update system, where a monitor aims to timely collect randomly             Considering event-triggered measurements where the status
arXiv:2005.05443v2 [cs.IT] 29 May 2020

                                         generated status updates from multiple end nodes through a                 update arrivals are stochastic, the authors in [13] derived a
                                         shared wireless channel. We adopt the recently proposed metric,
                                         termed age of information (AoI), to quantify the information               universal lower bound of scheduling policies. In [14], both
                                         timeliness and freshness. Due to the random generation of the              “generate-at-will” and stochastic arrival models with no buffer
                                         status updates at the end node side, the monitor only grasps a             at end nodes were investigated, and an Whittle’s index policy
                                         partial knowledge of the status update arrivals. Under such a              was proposed to achieve the performance close to a structural
                                         practical scenario, we aim to address a fundamental multiuser              Markov decision process (MDP) algorithm.
                                         scheduling problem: how to schedule the end nodes to minimize
                                         the network-wide AoI? To solve this problem, we formulate it as               In multiuser uplink systems, scheduling problems of mini-
                                         a partially observable Markov decision process (POMDP), and                mizing the network-wide AoI is more challenging than that in
                                         develop a dynamic programming (DP) algorithm to obtain the                 broadcast systems, especially when the status update arrivals
                                         optimal scheduling policy. By noting that the optimal policy is            are stochastic. This is mainly because the monitor may not
                                         computationally prohibitive, we further design a low-complexity            know whether new status updates arrive at end nodes. Most ex-
                                         myopic policy that only minimizes the one-step expected reward.
                                         Simulation results show that the performance of the myopic                 isting work assumed end nodes used extra feedback overhead
                                         policy can approach that of the optimal policy, and is better              to report their status update arrivals so that the monitor had a
                                         than that of the baseline policy.                                          complete knowledge of their status update arrivals [15], [16].
                                                                                                                    Such feedback leads to considerable overhead and thus makes
                                                                 I. I NTRODUCTION                                   the corresponding scheduling policies hard to implement in
                                            The information freshness has become an increasingly im-                practice.
                                         portant performance metric in this era of the Internet of Things              To combat this weakness, we consider a multiuser uplink
                                         (IoT). Various IoT services, such as remote monitoring and                 system with stochastic status update arrivals. We assume that
                                         control, require the underlying information to be delivered                there is no extra feedback overhead for end nodes to report
                                         as timely as possible [1], [2]. To quantify the information                their status update arrivals. Thus, the monitor can obtain the
                                         timeliness and freshness, the age of information (AoI) metric,             status update arrival knowledge of an end node only when
                                         defined as the time elapsed since the generation time of                   it is scheduled to transmit and its transmission is successful.
                                         the latest received status update at the monitor, has been                 Such a practical assumption leads to a partial knowledge of
                                         investigated in [3]–[7]. Early work (e.g., [4]–[10]) on the                status update arrivals at the monitor. In this context, we aim
                                         AoI focused on single-user systems, while recent work (e.g.,               to minimize the expected weighted-sum AoI (EWSAoI) of all
                                         [11]–[17]) shifted to multi-user systems, such as broadcast                end nodes by designing multiuser scheduling policies. Note
                                         systems and multiuser uplink systems, where the AoI not only               that the consideration of a partial knowledge of status update
                                         depends on the single-user behaviors but also depends on the               arrivals renders difficulties in solving the scheduling policies
                                         interactions among different end nodes.                                    in the considered system.
                                            In broadcast systems, scheduling problems of minimizing                    The main contributions of this paper are summarized as
                                         the network-wide AoI were studied in [11]–[14]. The authors                follows. We formulate the considered scheduling problem as
                                         in [11] considered the “generate-at-will” model for the status             a partially observable Markov decision process (POMDP),
                                         update, where the status update arrivals could be generated by             of which the belief state characterizes the fully observable
                                         end nodes once they were scheduled to transmit. Three low-                 AoI and the partially observable status update arrivals of end
                                         complexity scheduling policies were developed and analyzed                 nodes at the monitor. Built on this POMDP, we develop a
                                         in [11], including a randomized policy, a max-weight policy                dynamic programming (DP) algorithm to solve the optimal
                                         and a Whittle’s index policy. The authors in [12] extended the             policy. To reduce the computational complexity, we further
                                                                                                                    propose a low-complexity myopic policy that only minimizes
                                           The work of H. Chen is supported by the CUHK direct grant under the      the one-step expected reward. Simulation results show that the
                                         project code 4055126.
                                           The first two authors contributed equally to this paper. Any technical   performance of the myopic policy is very close to that of the
                                         problems of this paper should go to H. Chen.                               optimal policy solved by the DP algorithm. Both of them are
superior to the baseline policy utilizing no knowledge of status                  generality, we assume that there is a status update arrival at
update arrivals. To the best of our knowledge, this is the first                  the beginning of the first time slot for each end node.
work that designs an information-freshness-oriented multiuser                        In each time slot, reporting the local age of all end nodes
scheduling policy under partial system information.                               to the monitor causes a large amount of extra overhead. For
                                                                                  practical implementations, we enforce that the local age of
     II. S YSTEM M ODEL AND P ROBLEM F ORMULATION                                 an end node can be observed by the monitor only when the
   In this section, we first describe the system model, and then                  end node is scheduled and successfully transmits its freshest
formulate the network-wide AoI minimization problem.                              status update. This is because the status update received by
                                                                                  the monitor contains the time-stamp of itself.
A. System Model
                                                                                  C. AoI Minimization with Partial Knowledge of Arrivals
   As shown in Fig. 1, we consider a multiuser uplink system
where K end nodes report their freshest (i.e. most recently                          In this paper, we adopt the AoI metric to quantify the
arriving) status updates to a common monitor via a shared                         information freshness. The AoI of end node i at the monitor,
channel. The K end nodes are identified by the index set K ,                      denoted by hti , will be set to the local age of end node i, if
{1, 2, . . . , K}. The time axis is divided into time slots of equal              the end node is scheduled and its transmission is successful.
duration. We let T denote the time-horizon of the discrete-                       Otherwise, the AoI of the end node will increase by 1. The
time system considered. In each time slot t ∈ {1, 2, . . . , T },                 AoI evolution can be expressed as follows:
a new status update arrives at end node i ∈ K according to                                         (
                                                                                             t+1      zit + 1, Scheduled and received,
a Bernoulli arrival process with mean λi ∈ (0, 1]. The arrival                             hi =                                              (2)
process is independent and identically distributed over time,                                         hti + 1, Otherwise.
and independent across end nodes. At the beginning of each                        Note that, since the local age of an end node increases when
time slot, the monitor will schedule an end node to transmit its                  there is no new status update arrival, the monitor schedules
freshest status update. The transmissions of end nodes to the                     the end node with no status update stored in its buffer will
monitor are error-prone. Specifically, the transmission of end                    not reduce its AoI at the monitor.
node i has a successful probability pi , and an error probability                    In this context, the monitor is only aware of the local age
(1 − pi ).                                                                        of an end node that is scheduled and transmits successfully.
                                                                                  This leads to partial observation of the system information at
                                                                                  the monitor. With such partial knowledge, we aim to find a
        1
              Status                     Status
                                                                                  scheduling policy π minimizing the EWSAoI, which can be
                                                  t   Node 1
              Update                     Update
                                                                                  formulated as the following optimization problem:
        2                                                                                                        " T K              #
                       Status            Status   t   Node 2                                                 1      XX
                                                                                                                             ωi hti π ,
                       Update            Update
                                                                                              (P1) : min        E                             (3)
                                                             Monitor                                 π   TK       t=1 i=1

        K                                                                        where ωi ∈ (0, ∞) is the importance weight of end node i.
              Status
              Update
                                Status
                                Update
                                                  t   Node K                      The expectation is taken over all system dynamics.

Fig. 1. The multiuser uplink system with stochastic arrivals of status updates.                     III. POMDP F ORMULATION
                                                                                    In this section, to solve the problem (P1), we reformulate it
                                                                                  as a POMDP, and use the average reward of the POMDP to
B. Local Age
                                                                                  evaluate the EWSAoI.
   Each end node is assumed to store at most one status update                      1) States: We denote the state of end node i in time slot
in the buffer. When a new status update arrives at an end node,                         t by sti , [hti , zit ], where hti ∈ T , {1, 2, 3, . . .} is its
the end node drops the status updates already in its buffer if its                      instantaneous AoI at the monitor and zit ∈ T is its local
buffer is not empty. This assumption ensures one status update                          age. Then, we denote the state of the POMDP in time
stored in the buffer of an end node is freshest. The local age of                       slot t by st , [ht , zt ], where ht , [ht1 , ht2 , . . . , htK ] ∈
end node i, denoted by zit , measures the freshness of the status                       H , T K represents the AoI of all end nodes, and zt ,
update at the end node. The evolution of zit can be expressed                           [z1t , z2t , . . . , zK
                                                                                                              t
                                                                                                                ] ∈ Z , T K represents the local age of
as follows:                                                                             all end nodes. Denote by S the space of all possible
           (
    t+1      zit + 1, No arrival in time slot t,                                        states.
  zi =                                                        (1)                   2) Actions: We denote the action of the POMDP in time
             1,        Status update arrival in time slot t.
                                                                                        slot t by at , [at1 , at2 , . . . , atK ], where ati ∈ A , {0, 1}
As shown in (1), the local age of end node i will increase by 1                         indicates whether end node i is scheduled to transmit
if there is no status update arrival and be reset to 1 otherwise.                       or not in time slot t. If end node i is scheduled, then
The local age of different end nodes evolves independently                              ati = 1; otherwise, ati = 0. In the single-antenna system
according to their Bernoulli arrival processes. Without loss of                         considered, the monitor can only schedule at most one
PK
   end node in each time slot. Thus, we have i=1 ati ≤ 1.
                                                                                  f (I t −1 , a t −1 , o t −1 )       h t  t
                                                                                                                                           f (I t , a t , o t )        t +1    t +1
                                                                                                                                                                                            f (I t +1 , a t +1 , o t +1 )
   Denote by A the space of all possible actions.                                                                 [  , b ]
                                                                                                                                   t
                                                                                                                                                                     [
                                                                                                                                                                      h   , b]
                                                                                                                                                                                      t+1
                                                                                                                                                                                                                            
                                                                                                                  belief state I                                  belief state I
3) Observations: The observations of the POMDP at the
   monitor consists of the fully observed AoI and partially
   observed local age of all end nodes. Specifically, if end                                                        action at                                       action at+1

   node i is scheduled and its transmission is successful,
   its local age can be accurately observed by the monitor.                                                       observation ot                                  observation ot+1
   Otherwise, there is no observation of its local age. We
   denote the observation of the POMDP in time slot t                       Fig. 2. An illustration of belief states, actions, observations and the update
   by ot , [o11 , ot2 , . . . , otK ], where oti , [hti , ẑit ] is the     of belief states.
   observation of end node i, including its fully observed
   AoI hti and partially observed local age ẑit ∈ {T , X}.                                two parts: ht and bt . When given It , at and ot , for
   Note that X means no observation of the local age of an                                 ∀i ∈ K, ht+1
                                                                                                     i   can be updated as follows:
   end node, caused by its unsuccessful transmission or not                                                    (
   being scheduled. Denote by O the space of all possible                                                        ẑit + 1, if ẑit 6= X,
                                                                                                       ht+1
                                                                                                        i   =                                 (6)
   observations.                                                                                                 hti + 1, if ẑit = X.
4) Belief States: We denote the belief state of the POMDP                                  As shown in (6), the update of ht is always determin-
   in time slot t by It , [ht , bt ], where bt is a probability                            istic. When given the same condition, for ∀zt+1 ∈ Z,
   distribution over Z. Let bt (zt ) denote the probability                                bt+1 (zt+1 ) can be updated via the Bayes’ theorem:
   assigned to state zt by distribution btP      , which satisfies                                            X
   bt (zt ) ∈ [0, 1] for all zt ∈ Z, and                      t t                            bt+1 (zt+1 ) = η    Pr(ot |zt , at )Pr(zt+1 |zt )bt (zt ), (7)
                                                   zt ∈Z b (z ) =
   1. It is worth mentioning that although in general the                                                                          zt ∈Z
   belief state of a POMDP is a probability distribution                                   where
   over S, in our problem ht is fully observable, i.e., its                                          η = 1/
                                                                                                                      X                X
                                                                                                                                           {Pr(ot |zt , at )Pr(zt+1 |zt )bt (zt )}
   belief state update is always deterministic given ht−1 ,
                                                                                                                   zt+1 ∈Z zt ∈Z
   at−1 and ot−1 . Denote by I the space of all possible
   belief states.                                                                    is a normalizing factor. We denote the update of ht in
5) State Transition Function and Observation Function:                               (6) and the update of bt in (7) by the update function
   Because the belief state update of ht is deterministic,                           It+1 = f (It , at , ot ), of which the inputs are It , at and
   we only need to define the state transition function and                          ot , and the output is It+1 .
   observation function of zt . TheQstate transition function                     7) Reward: The expected immediate reward at belief state
                                           K
   is denoted by Pr(zt+1 |zt ) = i=1 Pr(zit+1 |zit ), giving                         It is defined as the weighted sum ofP        the instantaneous
                                                                                                                                    K
   the conditional probability of reaching state zt+1 given                          Aol of all end nodes, i.e., R(It ) , i=1 ωi hti . Then,
   state zt . For end node i ∈ K, we have                                            the EWSAoI in (3) can be evaluated by
                                                                                                               " T                #
                                                                                                         1       X
                        
                                  if zit+1 = 1,                                                              E       R(It ) I1 , π ,            (8)
                        λi ,
                                                                                                     TK         t=1
               t+1 t
           Pr(zi |zi ) = 1 − λi , if zit+1 = zit + 1,                (4)
                                                                                    where I1 is a given initial belief state.
                         0,       otherwise.
                        
                                                                                  8) Policy: In the above equation, π is a given policy defined
                                                                                     as π = [π 1 , π 2 , . . . , π T ], where π t is a mapping from
    The observation function is denoted by Pr(ot |zt , at ) =                        the belief space I to the action space A, i.e., decides
    Q K       t t t
      i=1 Pr(oi |zi , ai ), giving the conditional probability of                    which action at should be taken when the POMDP is
    making observation ot given state zt and action at . If                          in belief state It . Our aim is to find the optimal policy
    end node i is scheduled, then                                                    that minimizes the average reward in (8), i.e.,
                                                                                                                          " T               #
                                                                                                                   1      X
                               pi ,
                                        if ẑit = zit ,                                     π ∗ = arg min              E      R(It ) I1 , π .   (9)
                   t t                                                                                   π        TK
               Pr(oi |zi , 1) = 1 − pi , if ẑit = X,                (5)                                                   t=1
                               
                               
                                 0,      otherwise.                            To illustrate the POMDP formulation, we depict its belief
                                                                            states, actions, observations and update of belief states in Fig.
                                                                            2.
   If end node i is not scheduled, then Pr(oti |zit , 0) = 1 if
   ẑit = X, and Pr(oti |zit , 0) = 0 otherwise.                               IV. P OLICY D ESIGN FOR THE F ORMULATED POMDP
6) Belief State Update: In our POMDP, the monitor keeps                       In this section, we first propose a DP algorithm to find the
   belief states rather than knowing actual states. In time                 optimal policy of the formulated POMDP and then devise a
   slot t, belief state It is a sufficient statistic for a given            myopic policy with low-complexity and near-optimal perfor-
   history {I1 , a1 , o1 , a2 , o2 , . . . , at−1 , ot−1 }, consisting of   mance.
A. Dynamic Programming for the Optimal Policy                                         B. A Myopic Policy
   We follow [18] and resort to the DP framework for finding                             In our problem, the local age of different end nodes evolves
the optimal policy of the POMDP formulated in Section                                 independently as described in Section II. As such, the monitor
III. The DP method solves complex problems by breaking                                can only maintain probability distributions of the local age of
them down into a sequence of simpler sub-problems and then                            each end node, which are sufficient statistics for the POMDP.
recursively combining solutions of sub-problems. It is worth                          We let Bt = [bt1 , bt2 . . . , btK ] denote these distributions, where
mentioning that the space of belief states of the POMDP is                            bti is the probability distribution of the local age of end node
countable for any given initial belief state I1 . We denote the                       i. Let bti (zit ) denote the probability assigned to local age zit by
finite set of belief states in time slot t by I t . The expected                      distribution     bti , satisfying bti (zit ) ∈ [0, 1] for all zit ∈ T , and
total reward of ∀It ∈ I t can be denoted by the inner product                                      t t
                                                                                      P
                                                                                          zit ∈T bi (zi ) = 1. Then, the belief state of the POMDP can
bt · V(It ), where V(It ) , [vIt (zt1 ), vIt (zt2 ), . . . , vIt (zt|Z| )] is a       be expressed as It , [ht , Bt ].
|Z| dimensional vector by recalling that the update of ht is                             We then propose a myopic policy that minimizes the ex-
deterministic. Note that bt · V(It ) incorporates rewards from                        pected reward of the next time slot, also known as a one-
time slot t onward. The DP algorithm is formally described                            step expected reward. Given It for the POMDP, if action at
as follows.                                                                           is chosen in time slot t, the one-step expected reward of the
Algorithm 1 The DP Algorithm                                                          system is given by
 1: Initialization:                                                                                          XK     
                                                                                                  t t
    Set V∗ (IT ) = R(IT )1|Z| for ∀IT ∈ I T , where 1|Z| is                                   R̂(I , a ) =        ωi (1 − ati )(hti + 1)+
    a |Z| dimensional vector with all entries equal to 1, and                                     X i=1                                                  (12)
    t = T − 1.
                                                                                           ati    pi bti (zit )(zit + 1) + (1 − pi )(hti + 1) .
 2: Backward Induction:
                                                                                                     zit ∈T
    1) For ∀It ∈ I t , compute π ∗ (It ).
                          X                   X                                      ForPa scheduled end node, its expected AoI in next time slot
         ∗ t
       π (I ) = arg min        b (z ) R(It ) +
                                t t
                                                  ϑ×                                  is zt ∈T bti (zit )(zit + 1) with probability pi or (hti + 1) with
                                                                                             i
                             at ∈A
                                      zt ∈Z                            ot ∈O          probability (1 − pi ). For an arbitrary unscheduled end node,
                                                                               (10)
                                                                                      its expected AoI in the next time slot is (hti + 1). The myopic
                               X                                          
               t   t
         Pr(o |z , a )   t
                                         Pr(z    t+1    t
                                                       |z   )vI∗t+1 (zt+1 ),
                                                                                      policy for belief state It can be obtained by
                              zt+1 ∈Z

      ∀It+1 ∈ I t+1 , where ϑ = 1 if f (It , at , ot ) = It+1 , and                                           π̃ ∗ (It ) = arg min R̂(It , at ).            (13)
                                                                                                                           at ∈A
      ϑ = 0 otherwise.
      2) For ∀It ∈ I t , compute vI∗t (zt ) for ∀zt ∈ Z.                              Then, the myopic policy π̃ ∗ is defined as π̃ ∗,t : It → π̃ ∗ (It ).
                                X                                                     Compared with the optimal policy, the myopic policy is easier
          vI∗t (zt ) = R(It ) +   ϑ Pr(ot |zt , π ∗ (It ))×                           to implement. Not only Bt reduces the dimension of bt from
                                         ot ∈O                                        |T |K to K|T |, growing linearly with the number of end
            X                                                                  (11)
                       Pr(z   t+1
                                    |z )vI∗t+1 (zt+1 ), ∀It+1 ∈ I t+1 .
                                     t                                                nodes, but also the myopic policy only relies on the one-
          zt+1 ∈Z                                                                     step expected reward instead of the expected total reward. The
                                                                                      proposed myopic policy is formally described in Algorithm 2.
 3:   Stopping Rule:
      If t = 1, stop. Otherwise, set t = t − 1 and go to step 2.                                         V. P ERFORMANCE E VALUATION
                                                                                        In this section, after introducing a physical-layer model to
    The recursion simplifies the evaluation and optimization of                       obtain successful transmission probability pi , we evaluate the
V∗ (I1 ) over T time slots into a sequence of T − 1 one-step                          DP algorithm and myopic policy via simulations.
computations. As shown in (11), in each step, the value of
vI∗t (zt ) equals the immediate reward plus the expected total                        A. Successful Transmission Probability
reward over the remaining time slots. The optimal policy π ∗                             In time slot t, the small-scale fading from end node i to
is defined as π ∗,t : It → π ∗ (It ) for ∀It ∈ I t . In particular,                   the monitor is denoted by git , which is assumed to follow
(b1 · V∗ (I1 ))/(T K) is the minimal EWSAoI given the initial                         an exponential random variable with a unit mean. The large-
belief state I1 .                                                                     scale fading is denoted by d−τ i , where di is the distance from
    The DP algorithm represents an effective solution to find                         end node i to the monitor and τ is the path-loss factor. The
the optimal policy. However, the recursion is computationally                         additive white Gaussian noise follows a complex Gaussian
prohibitive due to the following reasons. First, the AoI and                          distribution CN (0, σ 2 ). The achievable rate is computed by
local age tends to be large in real systems. Second, the di-                          ri = log2 d−τ    t      2
                                                                                                   i gi P /σ + 1 , where P is the transmit power,
mension of the probability distribution bt grows exponentially                        and the signal-to-noise ratio (SNR) is calculated by SNR =
with the number of end nodes. Thus, it is crucial to find a low-                      d−τ       2
                                                                                       i P/σ . If ri is below the threshold rth , the transmission
complexity and near-optimal policy.                                                   of end node i is deemed to be unsuccessful. Consequently,
Algorithm 2 The Myopic Policy                                                     3.58

 1: Initialization:                                                               3.57

    Set t = 1 and give initial belief state I1 . For each end
                                                                                  3.56
    node, the monitor maintains its AoI hti and a probability
    distribution bti of its local age.                                            3.55

 2: Obtain the Myopic Policy:                                                     3.54
    In time slot t, the monitor chooses action ãt by
                                                                                  3.53

                         ãt = arg min R̂(It , at ).
                                                                                  3.52
                                 at ∈A

                                                                                  3.51

 3:   Update the Belief State:
                                                                                   3.5
      After taking action ãt , the monitor makes observation ot .                    25    26         27   28        29   30        31    32

      For each end node, the monitor updates its ht+1      i    by (6).
      If ẑit 6= X, the monitor updates its bt+1i   by                        Fig. 3. Analytical results v.s. simulation results, K = 2, D = 8.
                                
                                                t+1
                                λi ,
                                          if zi = 1,                              4

                bt+1
                 i   (zi
                        t+1
                            ) =   1 − λi , if zit+1 = ẑit + 1,                   3.9
                                
                                  0,       otherwise.
                                
                                                                                  3.8

                                                                                  3.7
      Otherwise, the monitor updates its bt+1
                                          i      by
                                                                                  3.6
                               X
               bt+1
                i   (zit+1 ) =    Pr(zit+1 |zit )bti (zit ).                      3.5

                                 zit ∈T                                           3.4

                                                                                  3.3
 4:   Stopping Rule:
                                                                                  3.2
      If t = T , stop. Otherwise, set t = t + 1 and go to step 2.
                                                                                  3.1

                                                                                   3
                                                                                    20            25             30             35         40

the successful transmission probability of end node i can be
obtained by                                                                     Fig. 4. Proposed myopic policy v.s. optimal policy, K = 2.
                                                 rth
                                                       
                                       t    2 τ2     −1
   pi = 1 − Pr (ri < rth ) = 1 − Pr gi < σ di
                                                    P                     of the optimal policy, i.e., the myopic policy can achieve near-
                  2 τ rth
                                                         ,
      (a)         σ di (2 − 1)                                            optimal performance. Meanwhile, the curves indicate that the
      = exp −                                                             increase of the EWSAoI becomes slower with the increase of
                         P
                                                         (14)             the state truncation D. This is because the AoI and local age
where step (a) follows by the cumulative density function of              of end nodes tend to be limited when the monitor takes the
the exponential random variable with a unit mean. We set the              optimal or myopic policy aiming to minimize the EWSAoI.
distance di = 5m, the path-loss factor τ = 2, the threshold
rth = 1 bps/Hz for all simulation runs in this section.                   C. Comparison with Baseline Policies
                                                                             We compare the proposed myopic policy with two baseline
B. Comparisons with Simulation Results                                    policies, described as follows:
   In solving the POMDP-based policies, a state truncation D                 1) MDP Policy: We introduce a myopic policy proposed
is applied to approximate the countable state space, i.e., the                  in [17], which assumes a complete knowledge of status
AoI and local age are both upper bounded by D for ∀i, t.                        update arrivals. Denote by “MDP” this myopic policy.
   Fig. 3 shows the analytical and simulation results of the                 2) MaxAoI Policy: We propose a myopic policy, which
optimal and myopic policies as a function of the SNR, in                        assumes no knowledge of status update arrivals. Specif-
which we set K = 2, T = 25, D = 8, λi = 0.4 and ωi =                            ically, the monitor always schedules the end node with
1 for ∀i ∈ K. Each simulation result is obtained from 106                       maximum AoI to transmit. The policy only relies on the
independent simulation runs. For both policies, it is shown                     fully observable AoI available at the monitor. Denote by
that the analytical and simulation results are well matched,                    “MaxAoI” this myopic policy.
which verifies the accuracy of the POMDP formulation.                        Fig. 5 shows the simulation results of the myopic, MaxAoI
   Fig. 4 compares the analytical results of the myopic policy            and MDP policies as a function of the SNR, in which we
with that of the optimal policy, in which the parameter setting           set K = 2, T = 106 , D = 30, λi = 0.4 and ωi = 1
is the same as that in Fig. 3 except D = 4, 6, 8, 10. It is shown         for ∀i ∈ K. Each simulation result is obtained from 10
that the performance of the myopic policy can approach that               independent simulation runs. Since the MDP policy has a
10
                                                                           update arrivals at the end node side. To tackle this problem,
         9
                                                                           a POMDP has been formulated to characterize the dynamic
                                                                           behavior of such system. A DP algorithm has been developed
         8                                                                 to achieve the optimal policy, and a myopic policy with low-
         7
                                                                           complexity and near-optimal performance has been further
                                                                           proposed. Simulation results have shown that the performance
         6                                                                 of the myopic policy approaches that of the optimal policy,
                                                                           and is better than that of the baseline policy which utilizes
         5
                                                                           no knowledge of status update arrivals. Moreover, simulation
         4                                                                 results have indicated that the role of status update arrival
                                                                           knowledge in minimizing the network-wide AoI becomes
         3
          15          20           25         30           35         40   insignificant when the status update arrival rate goes large.
                                                                                                         R EFERENCES
     Fig. 5. Proposed myopic policy v.s. baseline policies, D = 30.
                                                                            [1] A. Kosta, N. Pappas, and V. Angelakis, “Age of information: A new
                                                                                concept, metric, and tool,” Foundations and Trends in Netw., vol. 12,
        10
                                                                                no. 3, pp. 162–259, 2017.
                                                                            [2] Y. Sun, I. Kadota, R. Talak, and E. Modiano, “Age of information: A
         9                                                                      new metric for information freshness,” Synthesis Lectures on Commun.
                                                                                Networks, vol. 12, no. 2, pp. 1–224, 2019.
                                                                            [3] S. Kaul, M. Gruteser, V. Rai, and J. Kenney, “Minimizing age of
         8
                                                                                information in vehicular networks,” in Proc. Annu. IEEE Commun. Soc.
                                                                                Conf. Sensor, Mesh Ad-Hoc Commun. Netw. (SECOM), 2011, pp. 350–
         7                                                                      358.
                                                                            [4] S. Kaul, R. Yates, and M. Gruteser, “Real-time status: How often should
                                                                                one update?” in Proc. IEEE Conf. Comput. Commun. (INFOCOM),
         6
                                                                                2012, pp. 2731–2735.
                                                                            [5] M. Costa, M. Codreanu, and A. Ephremides, “On the age of information
         5                                                                      in status update systems with packet management,” IEEE Trans. Infor.
                                                                                Theory, vol. 62, no. 4, pp. 1897–1910, 2016.
                                                                            [6] C. Kam, S. Kompella, and A. Ephremides, “Age of information under
         4
          0.2   0.3        0.4   0.5    0.6    0.7   0.8        0.9   1         random updates,” in Proc. IEEE Int. Symp. Inf. Theory (ISIT), 2013, pp.
                                                                                66–70.
                                                                            [7] Y. Sun, E. Uysal-Biyikoglu, R. D. Yates, C. E. Koksal, and N. B. Shroff,
 Fig. 6. Proposed myopic policy v.s. baseline policies, D = 30, K = 5.          “Update or wait: How to keep your data fresh,” IEEE Trans. Infor.
                                                                                Theory, vol. 63, no. 11, pp. 7492–7508, 2017.
                                                                            [8] Y. Gu, H. Chen, Y. Zhou, Y. Li, and B. Vucetic, “Timely status update
                                                                                in internet of things monitoring systems: An age-energy tradeoff,” IEEE
complete knowledge of status update arrivals, there is a gap                    Internet of Things Journal, 2019.
between the myopic and MDP policies. The MaxAoI policy                      [9] Y. Gu, H. Chen, C. Zhai, Y. Li, and B. Vucetic, “Minimizing age of
utilizes no knowledge of status update arrivals, thus it has the                information in cognitive radio-based iot systems: Underlay or overlay?”
                                                                                IEEE Internet of Things Journal, vol. 6, no. 6, pp. 10 273–10 288, 2019.
worst performance among three policies. It is further shown                [10] Q. Wang, H. Chen, Y. Gu, Y. Li, and B. Vucetic, “Minimizing the age
that the gap between the myopic and MDP policies becomes                        of information of cognitive radio-based iot systems under a collision
larger as the number of end nodes increases. This is because                    constraint,” arXiv preprint arXiv:2001.02482, 2020.
                                                                           [11] I. Kadota, A. Sinha, E. Uysal-Biyikoglu, R. Singh, and E. Modiano,
the gain resulting from fully observable status update arrivals                 “Scheduling policies for minimizing age of information in broadcast
is augmented as the number of end nodes increases.                              wireless networks,” IEEE/ACM Trans. Netw., vol. 26, no. 6, pp. 2637–
   Fig. 6 shows the simulation results of the myopic, MaxAoI                    2650, 2018.
                                                                           [12] Q. Wang, H. Chen, Y. Li, and B. Vucetic, “Minimizing age of informa-
and MDP policies as a function of the packet arrival rate, in                   tion via hybrid NOMA/OMA,” arXiv preprint arXiv:2001.04042, 2020.
which the parameter setting is the same as that in Fig. 5 except           [13] I. Kadota and E. Modiano, “Minimizing the age of information in wire-
SNR = 30dB. It is shown that as the status update arrival                       less networks with stochastic arrivals,” IEEE Trans. Mobile Comput.,
                                                                                pp. 1–1, 2019.
rate increases, the performance of these three polices turns to            [14] Y. Hsu, E. Modiano, and L. Duan, “Scheduling algorithms for mini-
converge. This is because, for large status update arrival rates,               mizing age of information in wireless broadcast networks with random
the importance of status update arrival knowledge is marginal                   arrivals,” IEEE Trans. Mobile Comput., pp. 1–1, 2019.
                                                                           [15] Z. Jiang, B. Krishnamachari, X. Zheng, S. Zhou, and Z. Niu, “Timely
when minimizing the EWSAoI. For the extreme case with                           status update in wireless uplinks: Analytical solutions with asymptotic
λi = 1 for ∀i ∈ K, the considered system will be equivalent                     optimality,” IEEE Internet Things J., vol. 6, no. 2, pp. 3885–3898, 2019.
to the “generate-at-will” model, where there is no uncertainty             [16] J. Sun, Z. Jiang, B. Krishnamachari, S. Zhou, and Z. Niu, “Closed-form
                                                                                whittles index-enabled random access for timely status update,” IEEE
on the status update arrival knowledge.                                         Trans. Commun., vol. 68, no. 3, pp. 1538–1551, 2020.
                                                                           [17] H. Chen, Q. Wang, Z. Dong, and N. Zhang, “Multiuser scheduling for
                           VI. C ONCLUSIONS                                     minimizing age of information in uplink MIMO systems,” arXiv preprint
                                                                                arXiv:2002.00403, 2020.
   In this paper, we have investigated the information-                    [18] S. H. A. Ahmad, M. Liu, T. Javidi, Q. Zhao, and B. Krishnamachari,
                                                                                “Optimality of myopic sensing in multichannel opportunistic access,”
freshness-oriented scheduling problem in the multiuser uplink                   IEEE Trans. Inf. Theory, vol. 55, no. 9, pp. 4040–4050, 2009.
system, where the monitor has a partial knowledge of status
You can also read