POD: A Smartphone That Flies

Page created by Marie Salinas
 
CONTINUE READING
POD: A Smartphone That Flies
POD: A Smartphone That Flies
 Guojun Chen Noah Weiner Lin Zhong
 Yale University Yale University Yale University
 guojun.chen@yale.edu noah.weiner@yale.edu lin.zhong@yale.edu

 ABSTRACT
 We present POD, a smartphone that flies, as a new way to
arXiv:2105.12610v1 [cs.HC] 26 May 2021

 achieve hands-free, eyes-up mobile computing. Unlike exist-
 ing drone-carried user interfaces, POD features a smartphone-
 sized display and the computing and sensing power of a
 modern smartphone. We share our experience in prototyping
 POD, discuss the technical challenges facing it, and describe
 early results toward addressing them.
 Figure 1: POD (top left corner) interacting with our test mannequin
 mounted on a Wi-Fi-controlled robotic chassis.
 1. INTRODUCTION
 Our driving vision is hands-free, eyes-up mobile comput- human-drone interaction and requires new investigation.
 ing. That is, mobile users are able to interact with a computer To address these challenges and experiment with POD,
 without holding a device in hand and looking down at it. The we have implemented a prototype. As shown in Figure 1,
 key to this vision is a user interface (both input and output) it is based on a mechanical, electrical and computational
 that does not require a human hand. Wearable devices, such partnership between a micro-controller-based drone and a
 as Google Glass, are perhaps the most studied implementa- stripped-down smartphone (Google Pixel 4). It has a flight
 tion of this type of mobile, hands-free user interface. time of four minutes, uses computer vision to track and follow
 This paper explores an alternative idea for these new user its user, and harnesses the phone’s sensors to improve the
 interfaces: one carried by a small drone. Compared to wear- usability of its display.
 able user interfaces, drone-carried UIs do not impose gadgets This paper describes the POD prototype and shares our
 on the user’s head or eyes. More importantly, they can go experience in building it (§4). It presents our early results (§5)
 beyond the physical reach of the human body, empowering in addressing the challenges facing POD (§3) and discusses
 interesting use cases. interesting, novel use cases for it (§2).
 While there is a growing literature about human-drone in-
 teraction [16, 31, 13], this work envisions a drone-carried UI
 as rich as that of modern smartphones, called POD. Unlike 2. THE CASE FOR POD
 previously studied drone-carried UIs, POD features a small We first describe several compelling applications of POD,
 (potentially foldable) display and allows bidirectional interac- a smartphone that flies. Because POD carries cameras and
 tions with mobile users. We imagine that POD would rest on microphones, it can fill roles intended for existing cinemato-
 the user’s shoulder most of the time and fly out occasionally graphic drones. We ignore these use cases and instead focus
 to interact with the user only on demand. on those uniquely enabled by POD’s display. We envision
 Compared to wearable glasses and other drone-carried UIs, that POD will share much of the hardware and software of
 POD faces its own unique challenges that have not been smartphones. Our prototype uses a Google Pixel 4, allow-
 addressed by the human-drone interaction community. First, ing it to tap into the mature ecosystem of Android, which
 like all flying objects, drones are not steady. This challenges facilitates app development and enables “retrofitting” exist-
 the usability of an onboard display, especially if the drone ing smartphone apps for POD. This flexibility contrasts with
 needs to maintain a safe distance of one to two feet from the other drone platforms that develop their own ecosystems, e.g.,
 user’s eyes. Second, drones are noisy. Even the quietest drone DJI and Skydio.
 operating at one or two feet away from the user produces Hands-free “Smartphone”: POD’s hands-free experience
 noise at 70 dBA, which is equivalent to being on a busy street opens up a host of new multitasking opportunities for mobile
 or near a vacuum cleaner [8]. Finally, because we intend users in situations when holding a device in hand is either
 POD to serve the user on the go, it must decide between inconvenient or impossible. Users could make video calls or
 following the user or hovering in place when the user moves consume media while cooking, walking around the house, or
 around. This challenge represents a very specialized case of jogging.
POD: A Smartphone That Flies
800

 pixels
 Peaks
 y displacement of static content
 700 y displacement of stabilized content

 600

 500
 time/s Environment noise level: -20dB
 400
 0 2 4 6 8 10 12 14

Figure 2: Y-axis (vertical) movement measured by a camera located Figure 3: Noise spectrum of our POD prototype. It has multiple
20 cm from our POD prototype while the latter was hovering. strong tonal components (peaks) and a relatively wide band.

 Personable Interaction: There is a growing literature POD moves along the vertical axis during one of its flights,
about human-drone interaction with a focus on how users despite its best effort to hover, as captured by a camera. Such
command a drone [13], e.g., using gestures [12, 23, 25] and motion poses significant usability challenges for the display,
touch controls [9]. Few works, however, examine how a especially for reading small text. Using a drone-carried iPad,
drone could communicate with users directly, without using the authors of [29] found, not surprisingly, that a moving
an intermediary device like a handheld terminal. For example, display carried by the drone requires a larger text font for
Szafir, Mutlu and Fong [30] studied how a drone could use readability, although the study did not isolate the impact of the
an LED ring to communicate its flight intention. drone’s inability to stay stationary in the air, as is important
 POD, with its display, provides a much richer platform for to POD.
this type of direct communication. For example, POD can
use smiley faces to comfort the user or indicate its intention,
which is important because many users are uncomfortable
 3.2 Noise Suppression
with a drone flying around at such close proximity. POD can Drones are noisy, and drone noise suppression has been
also use arrows or other symbols to direct the user’s attention. studied extensively [21]. The noise worsens disproportion-
 First Response and Public Safety: We also envision POD ately when the user is close to the drone. And POD intends to
being used in the context of first response and public safety. serve its user at the intimate smartphone distance, much more
Its display can present instructions to first responders as they closely than cinematographic drones, which acerbates the
navigate difficult scenarios, like fires and disasters. It could noise problem. Our POD prototype, though quite small, can
present interactive maps or other informative graphics to produce noise up to 75 dBA, if measured from two feet away.
those trapped in hard-to-reach areas. At the scenes of car Commercial cinematographic drones are about as noisy, or
accidents, police could send POD to facilitate a remote video noisier, due to their larger size [3]. Noise this loud is known
call with the driver and inspect their license, or to review to be harmful of humans during long periods of exposure [8].
damages and interview witnesses on the scene. Cinematographic drones usually operate far from users, and
 noise captured by their microphones can largely be removed
 through post-processing. In contrast, POD operates at the
3. OPEN CHALLENGES
 smartphone distance from the user and the noise must be
 While drones for cinematography have been extensively suppressed in real-time.
studied and widely used, drones as a personal device have Many have explored using passive methods to reduce drone
not. Our experience with the POD prototype has revealed noise: covering the propellers with sound-absorbing materials
technical challenges unique to personal drones, especially and metal surfaces [22]; using ducted (or shielded) propellers
those related to their interacting with the user at such a close [19]; using specially shaped propellers, e.g., DJI low-noise
proximity. propellers [2]; and using propellers with more blades. These
 methods have only limited success. For example, ducted
3.1 Display Usability propellers [19] only reduce the noise by up to 7 dBA.
 Drones are dynamic systems. Keeping them stationary The synchrophaser system [17] is an active drone noise
in the air is a long-studied and difficult problem. In order cancellation method. It requires that all propellers have ex-
to capture high-quality videos and photos, commercial cin- actly the same rotational speed and a fixed phase difference.
ematographic drones have gone a long way toward solving It thus only works for fixed-wing aircraft with symmetric
this problem. Unlike POD, they benefit from two factors: sets of engines. However, small rotorcraft like drones do not
they usually hover far away from the target scene; and the meet the synchrophaser requirements—their rotors constantly
impact of instability on the footage they take can be removed change speed to keep the craft stable. Noise cancellation tech-
by post-processing. niques widely used in headphones can reduce noise at the ear.
 POD, on the other hand, must present a stationary, readable These techniques work well with low-frequency (< 2 KHz)
display to its user, and at a much closer distance, with the and tonal noise. Unfortunately, as shown in Figure 3, drone
screen content often being generated in real time. Figure 2 noise consists of tonal noise of both high and low frequencies,
(red, dotted line) presents how the screen of our prototype as well as random wide-band noise higher than 6 KHz. While

 2
POD: A Smartphone That Flies
active noise cancellation (ANC) has been implemented to a remote-controlled, life-size mannequin is available at [6].
reduce extra noise captured drone microphones [33, 14], no
existing drones employ ANC to reduce the noise heard by its 4.1 Mechanical & Electrical
user, as is the goal of POD. Drone: POD’s drone consists of a controller board, motor
 components, and a 3D-printed frame. We base the controller
3.3 User Following board on the Bitcraze Crazyflie, an open-source nano-drone
 POD needs to present its display to a potentially mobile platform [1]. We upgraded Bitcraze’s board to have more
user. We envision that it mimics a human assistant holding I/O ports. Unlike the Crazyflie, which uses brushed motors,
the smartphone for its owner. This natural following model POD uses 1104-4500KV brushless motors and an iFlight 12A
goes beyond the following capability enjoyed by commer- electronic speed controller (ESC). We modify the Crazyflie
cial drones or reported in the literature. First, POD has to firmware’s motor control code to accommodate the ESC. We
not only follow the user but also orient the display properly. add a height sensor and an optical flow sensor to the drone in
Second, while it is tempting (and feasible) to position POD order to achieve autonomous hover stabilization (See §4.2).
at a constant distance with a constant orientation relative to The drone is powered by its own dedicated battery and has a
the user’s head or shoulders while the user moves around, flight time of four minutes when carrying the phone.
this can be annoying. Imagine a human assistant holding Smartphone: Our prototype uses a Google Pixel 4. We
the smartphone for the assisted. The assistant might decide remove all unnecessary parts to pare down the mass as much
to remain stationary while the assisted turns away from the as possible. Only the motherboard, USB-C connector, display,
display temporarily. One could imagine there exist “eager” and battery are kept for our experiment, totalling a mass of
vs. “lazy” assistants, with a lazy assistant preferring to stay 63 grams. The Pixel, which can act as a USB host device,
stationary and following the assisted only “reluctantly”. POD sends control packets to and receives data from the drone via
must allow its user to adjust how eagerly it should behave. one USB-C to Micro-B cable.
3.4 Programming Interface 4.2 Autonomous Hover
 POD shares much of the hardware and software of smart- Smaller drones, including the Crazyflie, do not usually
phones and can benefit from the mature ecosystems of smart- come with stable autonomous hovering out of the box; a
phones in app development. However, the drone’s movement human needs to manually direct the hover via remote con-
poses fresh programming challenges to even skilled develop- trol. But several methods are available to enable autonomous
ers. We consider two different programming support features. hover. Most cinematographic drones localize themselves with
First, a POD application that can retrofit existing mobile apps. GPS, which only has meter-level precision and only works
This retrofitting application would provide a natural interface outdoors. Some drones use a motion capture system consist-
for user input. It would not only convert speech or gestures ing of multiple cameras to precisely calculate drone location
into touch commands but also control POD’s movement. For within the coverage area, e.g. [4, 18]. Such motion capture
example, with this app, a POD user can use Android’s built-in systems are not only expensive but also have limited coverage
Camera app to take a photo after directing POD to the right area, not suitable for the use cases of POD.
position. Our POD prototype uses a much simpler, more mobile,
 Another, more interesting, support feature would be a li- solution: an optical flow sensor and phone camera. Optical
brary that developers can use to command POD and create flow is the pattern of apparent motion of objects due to the
novel apps catered to POD’s use cases, like those outlined in relative motion difference between the observer and a scene.
§2. Both the capabilities of the drone and the latency of the By tracking the optical flow of the ground, a drone can know
phone-drone communication constrain this library interface. its relative horizontal displacement and make timely adjust-
For example, a programming interface that directly sets the ments. We use the phone camera as a secondary observer to
drone’s motor thrust is unsafe because the drone firmware make the drone aware of its global environment. Unlike the
itself stabilizes POD and must have complete control over the optical flow sensor, the camera can help the drone stay within
motors. And the (non-deterministic) delay of USB communi- the user’s sight.
cation between the phone and the drone makes it challenging
for the phone itself to implement the stabilizer. Similarly, 4.3 Vision
a programming interface that sets the 3D position of POD
 POD uses computer vision to locate and track its user,
is infeasible because the drone itself is unable to accurately
 leveraging the phone’s front-facing camera and computa-
localize itself.
 tional power, namely through PoseNet.
 PoseNet [7] is based on PersonLab [24], a fully convolu-
4. DESIGN AND IMPLEMENTATION tional neural network implemented in TensorFlow. Because
 We next present our early prototype of POD, which is PoseNet is computationally heavy, we reduce its use with
based on a custom-made nano-drone and a Google Pixel 4 the Lucas-Kanade (LK) optical flow algorithm. The algo-
smartphone. A video demo of our prototype interacting with rithm matches the human feature points of a frame with the

 3
POD: A Smartphone That Flies
where L0e is the distance between the user eyes on the
 projection plane, i.e., E00 E10 .
 Feedback control: To accommodate many useful tasks,
 τ( )
 POD must remain at a predetermined distance and orientation
 (yaw) relative to the user. To maintain these requirements
 in real time, we employ a set of three PID controllers that
 are synchronized with PoseNet, running at around 50 Hz.
 The controllers correspond to POD’s yaw, distance from the
 Projection plane

 human, and position along the x-axis.
 ′
 
 ′
 
 5. EARLY RESULTS
 ′ 
 ′
 Based on our prototype described above, we next present
 some early results in addressing the challenges discussed in
 §3, with mixed success.
Figure 4: Spatial relationships between POD and its user: side view
 Evaluation setup: In order to test the prototype, we mount
(top) and top view (bottom). In the side view, the red plane is part of the a human bust mannequin on a robot tank chassis driven by
camera’s X plane, and the green plane is the plane that passes through two DC motors, as shown in Figure 1. The mannequin
the shoulders, i.e., S0 and S1 , and is perpendicular to the ground. τ rep-
resents the angle between the camera X plane (red, parallel to the plane
 has realistic features and is the actual size of a human
defined by the Y and Z axes) and the user’s shoulder plane (green). The torso—approximately 37 cm wide and 42 cm high. We
projection plane is parallel to the X-Y plane and lies at a distance of f use an Arduino and an ESP-01 Wi-Fi module to create a
from the camera, where f is the camera’s focal length. Wi-Fi remote control for the mannequin. The mannequin is
 recognized normally by PoseNet and contains all points of
 interest needed by our implementation (shoulders and eyes).
same corner feature points from the previous frame and es-
 Controlling the mannequin allows us to safely test our hu-
timates the geometric displacement between these two sets
 man following implementation, including much of our state
of points. Because the LK algorithm is much more efficient
 machine.
than PoseNet, we invoke PoseNet to retrieve accurate feature
points only every fourth frame and use the LK algorithm in 5.1 Display Usability
between to track these points. In order to improve the usability of POD’s display, we ad-
 Human Position and Orientation Estimation: POD must just the display content in real time to counter the movement
estimate the user’s distance and orientation in its own coor- of the drone. A naive implementation would use a sensor
dinate space (or camera coordinate space) without using a (camera or IMU) to measure the device’s movement and in-
depth camera. Figure 4 depicts the spatial relationship be- form the graphical rendering system to adjust content accord-
tween POD and its user. The distance and orientation to be ingly. However, the latency between sensing the movement
estimated are marked as D and τ respectively. We assume and showing the adjusted content on the display would render
the following parameters are known, either from calibration this implementation inefficient and ineffective. Therefore,
or being directly supplied by the user: (i) the focal length f we experiment with several predictive stabilization methods
of the camera; (ii) the distance, Le , between the user eyes, originally intended for handheld devices, like NoShake [26].
E0 and E1 in Figure 4; (iii) the distance, Ls , between the NoShake was intended to stabilize content when a user reads
user’s shoulders, S0 and S1 in Figure 4; and (iv) the distance a handheld display while walking. Walking causes hand and,
between the two parallel planes perpendicular to the ground as a result, display movement relative to the eyes. NoShake
defined by the shoulders and the eyes, respectively, marked combines a spring-mass model of the display content and
R in Figure 4. readings from the phone’s accelerometer to move the display
 POD first measures the lengths of S00 E00 and S10 E10 , denoted content opposite to the display movement so that it remains
by p and q, respectively, on the projection plane, by simply stable relative to the user’s eyes.
counting pixels in the camera view. When the user changes Our experimental measurements result in Figure 2 shows
their orientation, p and q change accordingly. POD derives τ that our current implementation of NoShake noticeably re-
from p/q using the following relationship: duce the displacement of screen content when compared
 side-by-side with identical unstabilized content. Because
 (Ls −Le )
 sin τ + R cos τ movement of a hovering drone is different from that of a
 p 2
 = (Ls −Le )
 mobile user hand, i.e., less periodic and more frequent, we
 q sin τ − R cos τ
 2 had to manually tune the spring factor and dampener friction
 POD computes D using the pinhole camera model as constant to be effective as shown in Figure 2. We expect a
 data-driven, machine learning approach would be more ef-
 D = f · Le · sin(τ )/L0e fective. We also notice that when NoShake moves the screen

 4
POD: A Smartphone That Flies
Feedback
 Processor Microphone

 Noise
 Speaker Counter signal

Figure 5: Our active noise reduction system consists of a speaker, a
feedback microphone located near a user ear, and a processor on the
drone.

content, it occasionally introduces the ghosting effect [5], Figure 6: The behavioral model of POD. State transition is based on
 the perceived state of the user. In Home, POD actively follows the user
partly due to the very high refreshing rate of Pixel 4’s screen and presents its display at a pre-determined distance and angle relative
(90 Hz). to their shoulders. In Idle, POD holds still. In Await, POD yaws to
 keep the user in the frame.
5.2 Noise Suppression
 Because prior work [22, 19] has highlighted the limits of User State Recognition Method
 Summoning Raise right wrist above eye line
passive noise reduction, we focus on active methods. Active Relieving Raise left wrist above eye line
noise reduction leverages destructive interference: it gener- Major_motion At least one of X, Z, or orientation exceeds acceptable
 ranges
ates a sound wave with the same amplitude as the target noise Minor_motion No change in position or orientation that exceeds accept-
but with inverted phase, such that the two cancel each other able ranges, head begins to move while shoulders still or
 vice versa
out at the receiver. In POD’s case, the user’s ears represent Lost No user in view
the receiver.
 We experiment with the feedback-based active noise can- Table 1: Methods for recognizing the current human state.
cellation setup show in Figure 5, which requires a feedback
microphone to be placed next to the user’s ear (two feet away
from POD). We focus on cancelling the prominent tonal when they make major movements as outlined in Table 1.
components of the drone noise. First, a micro-controller POD ignores any minor movements. In Idle, POD does not
(STM32F4, similar to the one used in the drone) that is con- move at all. POD enters Idle after the user rotates past the
nected to the feedback microphone analyzes the noise spec- predetermined τ threshold. It returns to Home after T elapses.
trum to determine the tonal noise with the highest intensity. The user can use the relieving and summoning gestures to
Next, the speaker carried by the drone generates a sound indicate that it would like POD to immediately enter Await
wave with an adjustable phase and the same frequency as or re-enter Home, respectively. In Await, POD only yaws to
the high-intensity noise. The microphone provides feedback keep the user in its camera view. That is, it only keeps the
to the speaker, which can then adjust the wave’s phase to feedback controller for yaw on.
perfectly cancel the noise. A short video showing our prototype eagerly following our
 A decibel meter next to the feedback microphone shows test mannequin is available here [6].
that the above active cancellation technique reduces the noise
from 73 dBA to 70 dBA. While small, this reduction is 5.4 Programming interface
promising since we only used a single speaker and only can- Our prototype’s phone uses the Android USB library to
celled the strongest tonal component. We plan to investigate control the drone via a connecting cable. The drone used
the use of multiple speakers. Most noise cancellation solu- in our prototype can only determine its height (z-axis) ac-
tions [37] use multiple speakers and cancel more frequency curately. It can estimate changes in the x-y plane with the
components. optical flow sensor with less accuracy.
 We currently support a programming interface allowing
5.3 User Following an Android app to directly command POD’s absolute and
 We experiment with configuring POD to mimic a human relative movement along the z-axis (height) and its relative
assistant holding a smartphone for the assisted. Figure 6 movement in the x-y plane. These functions are complete in
represents POD’s behavioral model as a state machine. POD that an POD app developer can use them to command POD
changes its state based on its understanding of the user’s state. to any position relative to its user, especially with feedback
Table 1 summarizes what user states POD recognizes and from POD’s camera-based human position and orientation
how. The behavioral model incorporates a tunable parameter, estimation (§4.3). Implementing this USB programming in-
T , which can range from a fraction of a second to tens of terface requires a new periodic task in the drone firmware.
seconds, depending on the application. A larger T leads to a The task runs at 100Hz, which is one-tenth of the main con-
“lazier” POD that follows its user more reluctantly. troller loop frequency. We also plan to expand our interface
 In Home, POD actively maintains a predetermined distance by converting our human position and orientation estimation
and orientation with respect to the user’s shoulders, using technique (§4.3) into an Android library.
the feedback controller described in §4.3 to follow the user The programming interface is blocking, or synchronous.

 5
That is, once the phone app invokes an API, it will block until [10] B OSCHI , A., S ALVETTI , F., M AZZIA , V., AND C HIABERGE , M. A
 cost-effective person-following system for assistive unmanned vehicles with
the drone completes the requested movement and returns a deep learning at the edge. Machines (2020).
message. We choose this synchronous design because we [11] B ROCK , A. M., C HATAIN , J., PARK , M., FANG , T., H ACHET, M., L ANDAY,
 J. A., AND C AUCHARD , J. R. Flymap: Interacting with maps projected from a
believe the app should move the drone one command a time, drone. In Proc. ACM PerDis (2018).
sequentially, for safety. Asynchronous motion APIs could [12] C AUCHARD , J. R., E, J. L., Z HAI , K. Y., AND L ANDAY, J. A. Drone & me:
 an exploration into natural human-drone interaction. In Proc. UbiComp (2015),
lead to unpredictable and dangerous behavior. pp. 361–365.
 [13] C AUCHARD , J. R., K HAMIS , M., G ARCIA , J., K LJUN , M., AND B ROCK ,
6. RELATED WORKS A. M. Toward a roadmap for human-drone interaction. ACM Interactions
 (2021).
 Drone as User Interface: Many have explored harness- [14] C HUN , C., J EON , K. M., K IM , T., AND C HOI , W. Drone noise reduction
 using deep convolutional autoencoder for uav acoustic sensor networks. In Proc.
ing drones to design new mobile user interfaces like flying IEEE MASSW (2019).
3D displays [27], haptic interfaces [35], and projectors [15, [15] DARBAR , R., ROO , J. S., L AINÉ , T., AND H ACHET, M. Dronesar: Extending
 physical spaces in spatial augmented reality using projection on a drone. ACM
32, 38, 28, 11]. DisplayDrone [27] carries a flexible touch International Conference Proceeding Series (2019).
display with remote tele-presence functionality. ISphere [34] [16] F UNK , M. Human-drone interaction: let’s get ready for flying user interfaces!
 ACM Interactions 25, 3 (2018), 78–81.
is a flying, omni-directional, spherical LED display. The [17] J ONES , J. D., AND F ULLERT, C. R. Noise control characteristics of
authors of [29] studied text readability on a tablet computer synchrophasing part 2: Experimental investigation. AIAA Journal (1986).
 [18] K USHLEYEV, A., M ELLINGER , D., P OWERS , C., AND K UMAR , V. Towards a
carried by a drone. These works were partially geared to- swarm of agile micro quadrotors. Autonomous Robots 35, 4 (2013), 287–300.
wards presenting information and guidance to mobile users. [19] M ALGOEZAR , A., V IEIRA , A., S NELLEN , M., S IMONS , D., AND V ELDHUIS ,
 L. Experimental characterization of noise radiation from a ducted propeller of
Their focus, however, was not on system design and imple- an unmanned aerial vehicle. International Journal of Aeroacoustics (2019).
mentation. Their prototypes offer no communication between [20] M AO , W., Z HANG , Z., Q IU , L., H E , J., C UI , Y., AND Y UN , S. Indoor follow
 me drone. In Proc. ACM MobiSys (2017), pp. 345–358.
the display content and the drone. In contrast, POD tightly [21] M ILJKOVI Ć , D. Methods for attenuation of unmanned aerial vehicle noise. In
integrates the drone and the phone and depends heavily on Proc. MIPRO (2018).
 [22] M OHAMUD , A., AND A SHOK , A. Drone noise reduction through audio
collaboration between them. waveguiding. In Proceedings of the 4th ACM Workshop on Micro Aerial Vehicle
 Human Following by Robots: There is a growing literature Networks, Systems, and Applications (New York, NY, USA, 2018), DroNet’18,
 Association for Computing Machinery, p. 92–94.
that studies how mobile robots can track and follow human [23] O BAID , M., K ISTLER , F., K ASPARAVI ČI ŪT Ė , G., YANTAÇ , A. E., AND
users. While POD builds on many ideas presented by past F JELD , M. How would you gesture navigate a drone? a user-centered approach
 to control a drone. In Proc. Int. Academic Mindtrek Conference (2016).
works, it uniquely features a rich user interface with both [24] PAPANDREOU , G., Z HU , T., C HEN , L.-C., G IDARIS , S., T OMPSON , J., AND
input and output flow. It thus faces a new set of challenges. M URPHY, K. Personlab: Person pose estimation and instance segmentation
 with a bottom-up, part-based, geometric embedding model. Computer Vision –
Existing robotic followers, including drones [20, 10], usually ECCV 2018 Lecture Notes in Computer Science (2018).
aim at staying within a certain distance of their human users. [25] P ESHKOVA , E., H ITZ , M., AND K AUFMANN , B. Natural interaction
 techniques for an unmanned aerial vehicle system. IEEE Pervasive Computing
POD must additionally orient the display properly and adjust 16, 1 (2017), 34–42.
screen content in real time as described in §5.1. [26] R AHMATI , A., S HEPARD , C., AND Z HONG , L. NoShake: Content stabilization
 for shaking screens. In Proc. IEEE PerCom (2009).
 The Georgia Tech Miniature Autonomous Blimp (GT- [27] RUBENS , C., B RALEY, S., G OMES , A., G OC , D., Z HANG , X., C ARRASCAL ,
MAB) [36] can follow and track a human user using a camera J., AND V ERTEGAAL , R. Bitdrones: Towards levitating programmable matter
 using interactive 3d quadcopter displays. In Proc. ACM CHI (2016).
and multiple thrusters. Unlike drones, blimps do not rely on [28] S CHEIBLE , J., H OTH , A., S AAL , J., AND S U , H. Displaydrone: A flying robot
motorized propellers for lift, so they are quieter, more stable based interactive display. In Proc. ACM PerDis (2013).
 [29] S CHNEEGASS , S., A LT, F., S CHEIBLE , J., AND S CHMIDT, A. Midair displays:
and much easier to maneuver. While GT-MAB and POD Concept and first experiences with free-floating pervasive displays. In Proc.
accept similar user input and both use vision-based sensing, ACM PerDis (2014).
 [30] S ZAFIR , D., M UTLU , B., AND F ONG , T. Communicating directionality in
GT-MAB offers no output or display. The primary challenge flying robots. In Proc. ACM/IEEE HRI (2015).
facing a blimp-based POD would be its size: to carry the [31] T EZZA , D., AND A NDUJAR , M. The state-of-the-art of human–drone
 interaction: A survey. IEEE Access 7 (2019), 167438–167454.
same reduced smartphone system (63 gram) as used in our [32] T OYOHARA , S., M IYAFUJI , S., AND KOIKE , H. [poster] arial texture:
prototype, GT-MAB must gain a volume of 0.6 m3 . Dynamic projection mapping on drone propellers. In Proc. IEEE
 ISMAR-Adjunct (2017).
 [33] WANG , L., AND C AVALLARO , A. A blind source separation framework for
7.[1] Bitcraze
 REFERENCES
 - crazyflie.
 ego-noise reduction on multi-rotor drones. IEEE/ACM Transactions on Audio
 Speech and Language Processing 28 (2020), 2523–2537.
 https://github.com/bitcraze/crazyflie-firmware. [34] YAMADA , W., YAMADA , K., M ANABE , H., AND I KEDA , D. Isphere:
 [2] Dji low noise propellers. https: Self-luminous spherical drone display. In Proc. ACM UIST (2017).
 //store.dji.com/product/mavic-2-low-noise-propellers. [35] YAMAGUCHI , K., K ATO , G., K URODA , Y., K IYOKAWA , K., AND
 [3] Drone noise level. TAKEMURA , H. A non-grounded and encountered-type haptic display using a
 https://www.airbornedrones.co/drone-noise-levels/. drone. In Proc. ACM Symp. Spatial User Interaction (SUI) (2016), p. 43–46.
 [4] Flying machine arena. [36] YAO , N., A NAYA , E., TAO , Q., C HO , S., Z HENG , H., AND Z HANG , F.
 https://www.flyingmachinearena.ethz.ch/. Monocular vision-based human following on miniature robotic blimp. In Proc.
 [5] Ghosting (television). IEEE ICRA (2017).
 https://en.wikipedia.org/wiki/Ghosting_(television). [37] Z HANG , W., H OFMANN , C., B UERGER , M., A BHAYAPALA , T. D., AND
 [6] Our prototype’s human following test. K ELLERMANN , W. Spatial noise-field control with online secondary path
 https://youtu.be/_ZVm9seYu1o. modeling: A wave-domain approach. IEEE/ACM Trans. Audio, Speech and
 [7] Posenet. https://github.com/tensorflow/tfjs-models/tree/ Lang. Proc. 26, 12 (Dec. 2018), 2355–2370.
 master/posenet. [38] Z HANG , X., B RALEY, S., RUBENS , C., M ERRITT, T., AND V ERTEGAAL , R.
 [8] What noises cause hearing loss? https://www.cdc.gov/nceh/ Lightbee: A self-levitating light field display for hologrammatic telepresence.
 hearing_loss/what_noises_cause_hearing_loss.html. In Proc. ACM CHI (2019).
 [9] A BTAHI , P., Z HAO , D. Y., E, J. L., AND L ANDAY, J. A. Drone near me:
 Exploring touch-based human-drone interaction. In Proc. ACM UbiComp
 (2017).

 6
You can also read