Lecture Notes in Computer Science

Page created by Florence Hunter
 
CONTINUE READING
Lecture Notes in Computer Science                          12572

Founding Editors
Gerhard Goos
   Karlsruhe Institute of Technology, Karlsruhe, Germany
Juris Hartmanis
   Cornell University, Ithaca, NY, USA

Editorial Board Members
Elisa Bertino
   Purdue University, West Lafayette, IN, USA
Wen Gao
   Peking University, Beijing, China
Bernhard Steffen
   TU Dortmund University, Dortmund, Germany
Gerhard Woeginger
   RWTH Aachen, Aachen, Germany
Moti Yung
   Columbia University, New York, NY, USA
More information about this subseries at http://www.springer.com/series/7409
Jakub Lokoč Tomáš Skopal
             •            •

Klaus Schoeffmann Vasileios Mezaris
                  •                   •

Xirong Li Stefanos Vrochidis
         •                    •

Ioannis Patras (Eds.)

MultiMedia Modeling
27th International Conference, MMM 2021
Prague, Czech Republic, June 22–24, 2021
Proceedings, Part I

123
Editors
Jakub Lokoč                                             Tomáš Skopal
Charles University                                      Charles University
Prague, Czech Republic                                  Prague, Czech Republic
Klaus Schoeffmann                                       Vasileios Mezaris
Klagenfurt University                                   CERTH-ITI
Klagenfurt, Austria                                     Thessaloniki, Greece
Xirong Li                                               Stefanos Vrochidis
Renmin University of China                              CERTH-ITI
Beijing, China                                          Thessaloniki, Greece
Ioannis Patras
Queen Mary University of London
London, UK

ISSN 0302-9743                      ISSN 1611-3349 (electronic)
Lecture Notes in Computer Science
ISBN 978-3-030-67831-9              ISBN 978-3-030-67832-6 (eBook)
https://doi.org/10.1007/978-3-030-67832-6
LNCS Sublibrary: SL3 – Information Systems and Applications, incl. Internet/Web, and HCI

© Springer Nature Switzerland AG 2021
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the
material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microfilms or in any other physical way, and transmission or information
storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now
known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book are
believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors
give a warranty, expressed or implied, with respect to the material contained herein or for any errors or
omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in
published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface

These two-volume proceedings contain the papers accepted at MMM 2021, the 27th
International Conference on MultiMedia Modeling.
   Organized for more than 25 years, MMM has become a respected and
well-established international conference bringing together excellent researchers from
academic and industrial areas. During the conference, novel research works from
MMM-related areas (especially multimedia content analysis; multimedia signal pro-
cessing and communications; and multimedia applications and services) are shared
along with practical experiences, results, and exciting demonstrations. The 27th
instance of the conference was organized in Prague, Czech Republic on June 22–24,
2021. Due to the COVID-19 pandemic, the conference date was shifted by five months,
however the Proceedings were published in January in accordance with the original
plan. Despite the pandemic, MMM 2021 received a large number of submissions
organized in different tracks.
   Specifically, 211 papers were submitted to seven MMM 2021 tracks. Each paper
was reviewed by at least two reviewers (but mostly three) from the Program
Committee, while the TPC chairs and special event organizers acted as meta-reviewers.
Out of 166 regular papers, 73 were accepted for the proceedings. In particular, 40
papers were accepted for oral presentation and 33 papers for poster presentation.
Regarding the remaining tracks, 16 special session papers were accepted as well as 2
papers for a demo presentation and 17 papers for participation at the Video Browser
Showdown 2021. Overall, the MMM 2021 program comprised 108 papers from the
seven tracks with the following acceptance rates:

Tracks                               #Papers            ACCEPTANCE rates
Full papers (oral)                     40                    24%
Full papers (oral + poster)            73                    44%
Demos                                   2                    67%
SS1: MAPTA                              4                    50%
SS2: MDRE                               5                    71%
SS3: MMARSat                            3                    100%
SS4: MULTIMED                           4                    67%
Video Browser Showdown                 17                    94%

   The special sessions are traditionally organized to extend the program with novel
challenging problems and directions. The MMM 2021 program included four special
sessions:
–   SS1: Multimedia Analytics: Perspectives, Tools, and Applications (MAPTA)
–   SS2: Multimedia Datasets for Repeatable Experimentation (MDRE)
–   SS3: Multimodal Analysis and Retrieval of Satellite Images (MMARSat)
–   SS4: Multimedia and Multimodal Analytics in the Medical Domain and Pervasive
    Environments (MULTIMED)
vi      Preface

    Besides the four special sessions, the anniversary 10th Video Browser Showdown
represented an important highlight of MMM 2021 with a record number of 17 par-
ticipating systems in this exciting (and challenging!) competition. In addition, two
highly respected speakers were invited to MMM 2021 to present their impressive talks
and results in multimedia-related topics. Specifically, we would like to thank Cees
Snoek from the University of Amsterdam, and Pavel Zezula from Masaryk University.
    Last but not least, we would like to thank all members of the MMM community who
contributed to the MMM 2021 event. We also thank all authors of submitted papers, all
reviewers, and all members of the MMM 2021 organization team for their great work
and support. They all helped MMM 2021 to be an exciting and inspiring international
event for all participants!

January 2021                                                           Jakub Lokoč
                                                                      Tomáš Skopal
                                                                 Klaus Schoeffmann
                                                                  Vasileios Mezaris
                                                                          Xirong Li
                                                                 Stefanos Vrochidis
                                                                      Ioannis Patras
Organization

Organizing Committee
General Chairs
Jakub Lokoč                 Charles University, Prague
Tomáš Skopal                Charles University, Prague

Program Chairs
Klaus Schoeffmann           Klagenfurt University
Vasileios Mezaris           CERTH-ITI, Thessaloniki
Xirong Li                   Renmin University of China

Special Session and Tutorial Chairs
Werner Bailer               Joanneum Research
Marta Mrak                  BBC Research & Development

Panel Chairs
Giuseppe Amato              ISTI-CNR, Pisa
Fabrizio Falchi             ISTI-CNR, Pisa

Demo Chairs
Cathal Gurrin               Dublin City University
Jan Zahálka                 Czech Technical University in Prague

Video Browser Showdown Chairs
Klaus Schoeffmann           Klagenfurt University
Werner Bailer               Joanneum Research
Jakub Lokoč                 Charles University, Prague
Cathal Gurrin               Dublin City University

Publicity Chairs
Phoebe Chen                 La Trobe University
Chong-Wah Ngo               City University of Hong Kong
Bing-Kun Bao                Nanjing University of Posts and Telecommunications

Publication Chairs
Stefanos Vrochidis          CERTH-ITI, Thessaloniki
Ioannis Patras              Queen Mary University of London
viii      Organization

Steering Committee
Phoebe Chen                 La Trobe University
Tat-Seng Chua               National University of Singapore
Kiyoharu Aizawa             University of Tokyo
Cathal Gurrin               Dublin City University
Benoit Huet                 Eurecom
Klaus Schoeffmann           Klagenfurt University
Richang Hong                Hefei University of Technology
Björn Þór Jónsson           IT University of Copenhagen
Guo-Jun Qi                  University of Central Florida
Wen-Huang Cheng             National Chiao Tung University
Peng Cui                    Tsinghua University

Web Chair
František Mejzlík           Charles University, Prague

Organizing Agency
Conforg, s.r.o.

Special Session Organizers
Multimedia Datasets for Repeatable Experimentation (MDRE)
Cathal Gurrin               Dublin City University, Ireland
Duc-Tien Dang-Nguyen        University of Bergen, Norway
Björn Þór Jónsson           IT University of Copenhagen, Denmark
Klaus Schoeffmann           Klagenfurt University, Austria

Multimedia Analytics: Perspectives, Tools and Applications (MAPTA)
Björn Þór Jónsson           IT University of Copenhagen, Denmark
Stevan Rudinac              University of Amsterdam, The Netherlands
Xirong Li                   Renmin University of China, China
Cathal Gurrin               Dublin City University, Ireland
Laurent Amsaleg             CNRS-IRISA, France

Multimodal Analysis and Retrieval of Satellite Images
Ilias Gialampoukidis        Centre for Research and Technology Hellas,
                              Information Technologies Institute, Greece
Stefanos Vrochidis          Centre for Research and Technology Hellas,
                              Information Technologies Institute, Greece
Ioannis Papoutsis           National Observatory of Athens, Greece
Organization     ix

Guido Vingione           Serco Italy, Italy
Ioannis Kompatsiaris     Centre for Research and Technology Hellas,
                            Information Technologies Institute, Greece

MULTIMED: Multimedia and Multimodal Analytics in the Medical Domain
and Pervasive Environments
Georgios Meditskos       Centre for Research and Technology Hellas,
                           Information Technologies Institute, Greece
Klaus Schoeffmann        Klagenfurt University, Austria
Leo Wanner               ICREA – Universitat Pompeu Fabra, Spain
Stefanos Vrochidis       Centre for Research and Technology Hellas,
                           Information Technologies Institute, Greece
Athanasios Tzioufas      Medical School of the National and Kapodistrian
                           University of Athens, Greece

MMM 2021 Program Committees and Reviewers Regular and Special Sessions
Program Committee
Olfa Ben Ahmed           EURECOM
Laurent Amsaleg          CNRS-IRISA
Evlampios Apostolidis    CERTH ITI
Ognjen Arandjelović      University of St Andrews
Devanshu Arya            University of Amsterdam
Nathalie Aussenac        IRIT CNRS
Esra Açar                Middle East Technical University
Werner Bailer            JOANNEUM RESEARCH
Bing-Kun Bao             Nanjing University of Posts and Telecommunications
Ilaria Bartolini         University of Bologna
Christian Beecks         University of Munster
Jenny Benois-Pineau      LaBRI, UMR CNRS 5800 CNRS,
                            University of Bordeaux
Roberto Di Bernardo      Engineering Ingegneria Informatica S.p.A.
Antonis Bikakis          University College London
Josep Blat               Universitat Pompeu Fabra
Richard Burns            West Chester University
Benjamin Bustos          University of Chile
K. Selçuk Candan         Arizona State University
Ying Cao                 City University of Hong Kong
Annalina Caputo          University College Dublin
Savvas Chatzichristofis   Neapolis University Pafos
Angelos Chatzimichail    Centre for Research and Technology Hellas
Edgar Chavez             CICESE
Mulin Chen               Northwestern Polytechnical University
Zhineng Chen             Institute of Automation, Chinese Academy of Sciences
Zhiyong Cheng            Qilu University of Technology
Wei-Ta Chu               National Cheng Kung University
x      Organization

Andrea Ciapetti             Innovation Engineering
Kathy Clawson               University of Sunderland
Claudiu Cobarzan            Klagenfurt University
Rossana Damiano             Università di Torino
Mariana Damova              Mozaika
Minh-Son Dao                National Institute of Information and Communications
                               Technology
Petros Daras                Information Technologies Institute
Mihai Datcu                 DLR
Mathieu Delalandre          Université de Tours
Begum Demir                 Technische Universität Berlin
Francois Destelle           Dublin City University
Cem Direkoğlu               Middle East Technical University – Northern Cyprus
                               Campus
Jianfeng Dong               Zhejiang Gongshang University
Shaoyi Du                   Xi’an Jiaotong University
Athanasios Efthymiou        University of Amsterdam
Lianli Gao                  University of Science and Technology of China
Dimos Georgiou              Catalink EU
Negin Ghamsarian            Klagenfurt University
Ilias Gialampoukidis        CERTH ITI
Nikolaos Gkalelis           CERTH ITI
Nuno Grosso
Ziyu Guan                   Northwest University of China
Gylfi Gudmundsson            Reykjavik University
Silvio Guimaraes            Pontifícia Universidade Católica de Minas Gerais
Cathal Gurrin               Dublin City University
Pål Halvorsen               SimulaMet
Graham Healy                Dublin City University
Shintami Chusnul Hidayati   Institute of Technology Sepuluh Nopember
Dennis Hoppe                High Performance Computing Center Stuttgart
Jun-Wei Hsieh               National Taiwan Ocean University
Min-Chun Hu                 National Tsing Hua University
Zhenzhen Hu                 Nanyang Technological University
Jen-Wei Huang               National Cheng Kung University
Lei Huang                   Ocean University of China
Ichiro Ide                  Nagoya University
Konstantinos Ioannidis      CERTH ITI
Bogdan Ionescu              University Politehnica of Bucharest
Adam Jatowt                 Kyoto University
Peiguang Jing               Tianjin University
Hyun Woo Jo                 Korea University
Björn Þór Jónsson           IT-University of Copenhagen
Yong Ju Jung                Gachon University
Anastasios Karakostas       Aristotle University of Thessaloniki
Ari Karppinen               Finnish Meteorological Institute
Organization      xi

Jiro Katto                  Waseda University
Junmo Kim                   Korea Advanced Institute of Science and Technology
Sabrina Kletz               Klagenfurt University
Ioannis Kompatsiaris        CERTH ITI
Haris Kontoes               National Observatory of Athens
Efstratios Kontopoulos      Elsevier Technology
Markus Koskela              CSC – IT Center for Science Ltd.
Yu-Kun Lai                  Cardiff University
Woo Kyun Lee                Korea University
Jochen Laubrock             University of Potsdam
Khiem Tu Le                 Dublin City University
Andreas Leibetseder         Klagenfurt University
Teng Li                     Anhui University
Xirong Li                   Renmin University of China
Yingbo Li                   Eurecom
Wu Liu                      JD AI Research of JD.com
Xueting Liu                 The Chinese University of Hong Kong
Jakub Lokoč                 Charles University
José Lorenzo                Atos
Mathias Lux                 Klagenfurt University
Ioannis Manakos             CERTH ITI
José M. Martinez            Universidad Autònoma de Madrid
Stephane Marchand-Maillet   Viper Group – University of Geneva
Ernesto La Mattina          Engineering Ingegneria Informatica S.p.A.
Thanassis Mavropoulos       CERTH ITI
Kevin McGuinness            Dublin City University
Georgios Meditskos          CERTH ITI
Robert Mertens              HSW University of Applied Sciences
Vasileios Mezaris           CERTH ITI
Weiqing Min                 ICT
Wolfgang Minker             University of Ulm
Marta Mrak                  BBC
Phivos Mylonas              National Technical University of Athens
Henning Muller              HES-SO
Duc Tien Dang Nguyen        University of Bergen
Liqiang Nie                 Shandong University
Tu Van Ninh                 Dublin City University
Naoko Nitta                 Osaka University
Noel E. O’Connor            Dublin City University
Neil O’Hare                 Yahoo Research
Jean-Marc Ogier             University of La Rochelle
Vincent Oria                NJIT
Tse-Yu Pan                  National Cheng Kung University
Ioannis Papoutsis           National Observatory of Athens
Cecilia Pasquini            Universität Innsbruck
Ladislav Peška              Charles University
xii      Organization

Yannick Prie                 LINA – University of Nantes
Manfred Jürgen Primus        Klagenfurt University
Athanasios Psaltis           Centre for Research and Technology Hellas,
                                Thessaloniki
Georges Quénot               Laboratoire d’Informatique de Grenoble, CNRS
Miloš Radovanović            University of Novi Sad
Amon Rapp                    University of Torino
Stevan Rudinac               University of Amsterdam
Borja Sanz                   University of Deusto
Shin’ichi Satoh              National Institute of Informatics
Gabriella Scarpino           Serco Italia S.p.A.
Simon Scerri                 Fraunhofer IAIS, University of Bonn
Klaus Schoeffmann            Klagenfurt University
Matthias Schramm             TU Wien
John See                     Multimedia University
Jie Shao                     University of Science and Technology of China
Wen-Ze Shao                  Nanjing University of Posts and Telecommunications
Xi Shao                      Nanjing University of Posts and Telecommunications
Ujjwal Sharma                University of Amsterdam
Dongyu She                   Nankai University
Xiangjun Shen                Jiangsu University
Koichi Shinoda               Tokyo Institute of Technology
Hong-Han Shuai               National Chiao Tung University
Mei-Ling Shyu                University of Miami
Vasileios Sitokonstantinou   National Observatory of Athens
Tomáš Skopal                 Charles University
Alan Smeaton                 Dublin City University
Natalia Sokolova             Klagenfurt University
Gjorgji Strezoski            University of Amsterdam
Li Su                        UCAS
Lifeng Sun                   Tsinghua University
Machi Symeonidou             DRAXIS Environmental SA
Daniel Stanley Tan           De La Salle University
Mario Taschwer               Klagenfurt University
Georg Thallinger             JOANNEUM RESEARCH
Christian Timmerer           Klagenfurt University
Athina Tsanousa              CERTH ITI
Athanasios Tzioufas          NKUA
Shingo Uchihashi             Fuji Xerox Co., Ltd.
Tiberio Uricchio             University of Florence
Guido Vingione               Serco
Stefanos Vrochidis           CERTH ITI
Qiao Wang                    Southeast University
Qifei Wang                   Google
Xiang Wang                   National University of Singapore
Xu Wang                      Shenzhen University
Organization      xiii

Zheng Wang               National Institute of Informatics
Leo Wanner               ICREA/UPF
Wolfgang Weiss           JOANNEUM RESEARCH
Lai-Kuan Wong            Multimedia University
Tien-Tsin Wong           The Chinese University of Hong Kong
Marcel Worring           University of Amsterdam
Xiao Wu                  Southwest Jiaotong University
Sen Xiang                Wuhan University of Science and Technology
Ying-Qing Xu             Tsinghua University
Toshihiko Yamasaki       The University of Tokyo
Keiji Yanai              The University of Electro-Communications
Gang Yang                Renmin University of China
Yang Yang                University of Science and Technology of China
You Yang                 Huazhong University of Science and Technology
Zhaoquan Yuan            Southwest Jiaotong University
Jan Zahálka              Czech Technical University in Prague
Hanwang Zhang            Nanyang Technological University
Sicheng Zhao             University of California, Berkeley
Lei Zhu                  Huazhong University of Science and Technology

Additional Reviewers

Hadi Amirpour                        Hanyuan Liu
Eric Arazo                           Katrinna Macfarlane
Gibran Benitez-Garcia                Danila Mamontov
Adam Blažek                          Thanassis Mavropoulos
Manliang Cao                         Anastasia Moumtzidou
Ekrem Çetinkaya                      Vangelis Oikonomou
Long Chen                            Jesus Perez-Martin
Přemysl Čech                         Zhaobo Qi
Julia Dietlmeier                     Tomas Soucek
Denis Dresvyanskiy                   Vajira Thambawita
Negin Ghamsarian                     Athina Tsanousa
Panagiotis Giannakeris               Chenglei Wu
Socratis Gkelios                     Menghan Xia
Tomáš Grošup                         Minshan Xie
Steven Hicks                         Cai Xu
Milan Hladik                         Gang Yang
Wenbo Hu                             Yaming Yang
Debesh Jha                           Jiang Zhou
Omar Shahbaz Khan                    Haichao Zhu
Chengze Li                           Zirui Zhu
Contents – Part I

Crossed-Time Delay Neural Network for Speaker Recognition . . . . . . . . . . .                               1
  Liang Chen, Yanchun Liang, Xiaoshu Shi, You Zhou, and Chunguo Wu

An Asymmetric Two-Sided Penalty Term for CT-GAN . . . . . . . . . . . . . . . .                              11
  Huan Zhao, Yu Wang, Tingting Li, and Yuqing Zhao

Fast Discrete Matrix Factorization Hashing for Large-Scale Cross-Modal
Retrieval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    24
  Huan Zhao, Xiaolin She, Song Wang, and Kaili Ma

Fast Optimal Transport Artistic Style Transfer . . . . . . . . . . . . . . . . . . . . . .                   37
  Ting Qiu, Bingbing Ni, Ziang Liu, and Xuanhong Chen

Stacked Sparse Autoencoder for Audio Object Coding. . . . . . . . . . . . . . . . .                          50
   Yulin Wu, Ruimin Hu, Xiaochen Wang, Chenhao Hu, and Gang Li

A Collaborative Multi-modal Fusion Method Based on Random Variational
Information Bottleneck for Gesture Recognition . . . . . . . . . . . . . . . . . . . . .                     62
   Yang Gu, Yajie Li, Yiqiang Chen, Jiwei Wang, and Jianfei Shen

Frame Aggregation and Multi-modal Fusion Framework for Video-Based
Person Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .         75
   Fangtao Li, Wenzhe Wang, Zihe Liu, Haoran Wang, Chenghao Yan,
   and Bin Wu

An Adaptive Face-Iris Multimodal Identification System Based on Quality
Assessment Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .           87
  Zhengding Luo, Qinghua Gu, Guoxiong Su, Yuesheng Zhu,
  and Zhiqiang Bai

Thermal Face Recognition Based on Multi-scale Image Synthesis . . . . . . . . .                              99
  Wei-Ta Chu and Ping-Shen Huang

Contrastive Learning in Frequency Domain for Non-I.I.D.
Image Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .        111
  Huan Shao, Zhaoquan Yuan, Xiao Peng, and Xiao Wu

Group Activity Recognition by Exploiting Position Distribution
and Appearance Relation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .           123
  Duoxuan Pei, Annan Li, and Yunhong Wang
xvi          Contents – Part I

Multi-branch and Multi-scale Attention Learning for Fine-Grained Visual
Categorization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   136
  Fan Zhang, Meng Li, Guisheng Zhai, and Yizhao Liu

Dense Attention-Guided Network for Boundary-Aware Salient
Object Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    148
  Zhe Zhang, Junhui Ma, Panpan Xu, and Wencheng Wang

Generative Image Inpainting by Hybrid Contextual Attention Network . . . . .                              162
  Zhijiao Xiao and Donglun Li

Atypical Lyrics Completion Considering Musical Audio Signals . . . . . . . . . .                          174
  Kento Watanabe and Masataka Goto

Improving Supervised Cross-modal Retrieval with Semantic Graph
Embedding. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    187
  Changting Feng, Dagang Li, and Jingwei Zheng

Confidence-Based Global Attention Guided Network for Image Inpainting . . .                               200
  Zhilin Huang, Chujun Qin, Lei Li, Ruixin Liu, and Yuesheng Zhu

Multi-task Deep Learning for No-Reference Screen Content Image
Quality Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      213
  Rui Gao, Ziqing Huang, and Shiguang Liu

Language Person Search with Pair-Based Weighting Loss . . . . . . . . . . . . . .                         227
  Peng Zhang, Deqiang Ouyang, Chunlin Jiang, and Jie Shao

DeepFusion: Deep Ensembles for Domain Independent System Fusion . . . . .                                 240
  Mihai Gabriel Constantin, Liviu-Daniel Ştefan, and Bogdan Ionescu

Illuminate Low-Light Image via Coarse-to-fine Multi-level Network . . . . . . .                           253
    Yansheng Qiu, Jun Chen, Xiao Wang, and Kui Jang

MM-Net: Learning Adaptive Meta-metric for Few-Shot Biometric
Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   265
  Qinghua Gu, Zhengding Luo, Wanyu Zhao, and Yuesheng Zhu

A Sentiment Similarity-Oriented Attention Model with Multi-task Learning
for Text-Based Emotion Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . .              278
   Yahui Fu, Lili Guo, Longbiao Wang, Zhilei Liu, Jiaxing Liu,
   and Jianwu Dang

Locating Visual Explanations for Video Question Answering . . . . . . . . . . . .                         290
  Xuanwei Chen, Rui Liu, Xiaomeng Song, and Yahong Han
Contents – Part I           xvii

Global Cognition and Local Perception Network for Blind
Image Deblurring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      303
  Chuanfa Zhang, Wei Zhang, Feiyu Chen, Yiting Cheng, Shuyong Gao,
  and Wenqiang Zhang

Multi-grained Fusion for Conditional Image Retrieval . . . . . . . . . . . . . . . . .                    315
  Yating Liu and Yan Lu

A Hybrid Music Recommendation Algorithm Based on Attention
Mechanism. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    328
  Weite Feng, Tong Li, Haiyang Yu, and Zhen Yang

Few-Shot Learning with Unlabeled Outlier Exposure. . . . . . . . . . . . . . . . . .                      340
  Haojie Wang, Jieya Lian, and Shengwu Xiong

Fine-Grained Video Deblurring with Event Camera . . . . . . . . . . . . . . . . . . .                     352
   Limeng Zhang, Hongguang Zhang, Chenyang Zhu, Shasha Guo,
   Jihua Chen, and Lei Wang

Discriminative and Selective Pseudo-Labeling for Domain Adaptation . . . . . .                            365
  Fei Wang, Youdong Ding, Huan Liang, and Jing Wen

Multi-level Gate Feature Aggregation with Spatially Adaptive Batch-
Instance Normalization for Semantic Image Synthesis . . . . . . . . . . . . . . . . .                     378
   Jia Long and Hongtao Lu

Robust Multispectral Pedestrian Detection via Uncertainty-Aware
Cross-Modal Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .        391
  Sungjune Park, Jung Uk Kim, Yeon Gyun Kim, Sang-Keun Moon,
  and Yong Man Ro

Time-Dependent Body Gesture Representation for Video Emotion
Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   403
  Jie Wei, Xinyu Yang, and Yizhuo Dong

MusiCoder: A Universal Music-Acoustic Encoder Based on Transformer . . . .                                417
  Yilun Zhao and Jia Guo

DANet: Deformable Alignment Network for Video Inpainting . . . . . . . . . . .                            430
  Xutong Lu and Jianfu Zhang

Deep Centralized Cross-modal Retrieval . . . . . . . . . . . . . . . . . . . . . . . . . .                443
  Zhenyu Wen and Aimin Feng

Shot Boundary Detection Through Multi-stage Deep Convolution
Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      456
  Tingting Wang, Na Feng, Junqing Yu, Yunfeng He, Yangliu Hu,
  and Yi-Ping Phoebe Chen
xviii          Contents – Part I

Towards Optimal Multirate Encoding for HTTP Adaptive Streaming . . . . . . .                                 469
  Hadi Amirpour, Ekrem Çetinkaya, Christian Timmerer,
  and Mohammad Ghanbari

Fast Mode Decision Algorithm for Intra Encoding of the 3rd Generation
Audio Video Coding Standard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .                481
  Shengyuan Wu, Zhenyu Wang, Yangang Cai, and Ronggang Wang

Graph Structure Reasoning Network for Face Alignment and
Reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .       493
  Xing Wang, Xinyu Li, and Suping Wu

Game Input with Delay – A Model of the Time Distribution for Selecting
a Moving Target with a Mouse. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .                506
  Shengmei Liu and Mark Claypool

Unsupervised Temporal Attention Summarization Model for User Created
Videos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   519
  Min Hu, Ruimin Hu, Xiaocheng Wang, and Rui Sheng

Learning from the Negativity: Deep Negative Correlation Meta-Learning
for Adversarial Image Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . .               531
   Wenbo Zheng, Lan Yan, Fei-Yue Wang, and Chao Gou

Learning 3D-Craft Generation with Predictive Action Neural Network. . . . . .                                541
  Ze-yu Liu, Jian-wei Liu, Xin Zuo, and Weimin Li

Unsupervised Multi-shot Person Re-identification via Dynamic
Bi-directional Normalized Sparse Representation . . . . . . . . . . . . . . . . . . . . .                    554
   Xiaobao Li, Wen Wang, Qingyong Li, and Lijun Guo

Classifier Belief Optimization for Visual Categorization . . . . . . . . . . . . . . . .                     567
  Gang Yang and Xirong Li

Fine-Grained Generation for Zero-Shot Learning. . . . . . . . . . . . . . . . . . . . .                      580
   Weimin Sun, Jieping Xu, and Gang Yang

Fine-Grained Image-Text Retrieval via Complementary Feature Learning . . . .                                 592
   Min Zheng, Yantao Jia, and Huajie Jiang

Considering Human Perception and Memory in Interactive Multimedia
Retrieval Evaluations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .        605
  Luca Rossetto, Werner Bailer, and Abraham Bernstein

Learning Multi-level Interaction Relations and Feature Representations
for Group Activity Recognition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .              617
   Lihua Lu, Yao Lu, and Shunzhou Wang
Contents – Part I          xix

A Structured Feature Learning Model for Clothing Keypoints Localization. . .                            629
  Ruhan He, Yuyi Su, Tao Peng, Jia Chen, Zili Zhang, and Xinrong Hu

Automatic Pose Quality Assessment for Adaptive Human
Pose Refinement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   641
  Gang Chu, Chi Xie, and Shuang Liang

Deep Attributed Network Embedding with Community Information . . . . . . .                              653
  Li Xue, Wenbin Yao, Yamei Xia, and Xiaoyong Li

An Acceleration Framework for Super-Resolution Network via Region
Difficulty Self-adaption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    666
  Zhenfang Guo, Yuyao Ye, Yang Zhao, and Ronggang Wang

Spatial Gradient Guided Learning and Semantic Relation Transfer
for Facial Landmark Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .         678
   Jian Wang, Yaoyi Li, and Hongtao Lu

DVRCNN: Dark Video Post-processing Method for VVC . . . . . . . . . . . . . .                           691
  Donghui Feng, Yiwei Zhang, Chen Zhu, Han Zhang, and Li Song

An Efficient Image Transmission Pipeline for Multimedia Services . . . . . . . .                        704
  Zeyu Wang

Gaussian Mixture Model Based Semi-supervised Sparse Representation
for Face Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    716
   Xinxin Shan and Ying Wen

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    729
Contents – Part II

MSCANet: Adaptive Multi-scale Context Aggregation Network
for Congested Crowd Counting. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .           1
   Yani Zhang, Huailin Zhao, Fangbo Zhou, Qing Zhang, Yanjiao Shi,
   and Lanjun Liang

Tropical Cyclones Tracking Based on Satellite Cloud Images: Database
and Comprehensive Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .        13
  Cheng Huang, Sixian Chan, Cong Bai, Weilong Ding, and Jinglin Zhang

Image Registration Improved by Generative Adversarial Networks . . . . . . . .                         26
  Shiyan Jiang, Ci Wang, and Chang Huang

Deep 3D Modeling of Human Bodies from Freehand Sketching . . . . . . . . . .                           36
  Kaizhi Yang, Jintao Lu, Siyu Hu, and Xuejin Chen

Two-Stage Real-Time Multi-object Tracking with Candidate Selection. . . . . .                          49
  Fan Wang, Lei Luo, and En Zhu

Tell as You Imagine: Sentence Imageability-Aware Image Captioning . . . . . .                          62
  Kazuki Umemura, Marc A. Kastner, Ichiro Ide, Yasutomo Kawanishi,
  Takatsugu Hirayama, Keisuke Doman, Daisuke Deguchi,
  and Hiroshi Murase

Deep Face Swapping via Cross-Identity Adversarial Training . . . . . . . . . . . .                     74
  Shuhui Yang, Han Xue, Jun Ling, Li Song, and Rong Xie

Res2-Unet: An Enhanced Network for Generalized Nuclear Segmentation
in Pathological Images. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    87
   Shuai Zhao, Xuanya Li, Zhineng Chen, Chang Liu, and Changgen Peng

Automatic Diagnosis of Glaucoma on Color Fundus Images Using
Adaptive Mask Deep Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .           99
  Gang Yang, Fan Li, Dayong Ding, Jun Wu, and Jie Xu

Initialize with Mask: For More Efficient Federated Learning . . . . . . . . . . . .                   111
   Zirui Zhu and Lifeng Sun

Unsupervised Gaze: Exploration of Geometric Constraints
for 3D Gaze Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    121
   Yawen Lu, Yuxing Wang, Yuan Xin, Di Wu, and Guoyu Lu
xxii         Contents – Part II

Median-Pooling Grad-CAM: An Efficient Inference Level Visual
Explanation for CNN Networks in Remote Sensing Image Classification . . . .                            134
  Wei Song, Shuyuan Dai, Dongmei Huang, Jinling Song,
  and Liotta Antonio

Multi-granularity Recurrent Attention Graph Neural Network
for Few-Shot Learning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .     147
   Xu Zhang, Youjia Zhang, and Zuyu Zhang

EEG Emotion Recognition Based on Channel Attention
for E-Healthcare Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      159
   Xu Zhang, Tianzhi Du, and Zuyu Zhang

The MovieWall: A New Interface for Browsing Large Video Collections. . . .                             170
  Marij Nefkens and Wolfgang Hürst

Keystroke Dynamics as Part of Lifelogging . . . . . . . . . . . . . . . . . . . . . . . .              183
  Alan F. Smeaton, Naveen Garaga Krishnamurthy,
  and Amruth Hebbasuru Suryanarayana

HTAD: A Home-Tasks Activities Dataset with Wrist-Accelerometer
and Audio Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   196
  Enrique Garcia-Ceja, Vajira Thambawita, Steven A. Hicks, Debesh Jha,
  Petter Jakobsen, Hugo L. Hammer, Pål Halvorsen,
  and Michael A. Riegler

MNR-Air: An Economic and Dynamic Crowdsourcing Mechanism
to Collect Personal Lifelog and Surrounding Environment Dataset.
A Case Study in Ho Chi Minh City, Vietnam. . . . . . . . . . . . . . . . . . . . . . .                 206
   Dang-Hieu Nguyen, Tan-Loc Nguyen-Tai, Minh-Tam Nguyen,
   Thanh-Binh Nguyen, and Minh-Son Dao

Kvasir-Instrument: Diagnostic and Therapeutic Tool Segmentation Dataset
in Gastrointestinal Endoscopy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      218
   Debesh Jha, Sharib Ali, Krister Emanuelsen, Steven A. Hicks,
   Vajira Thambawita, Enrique Garcia-Ceja, Michael A. Riegler,
   Thomas de Lange, Peter T. Schmidt, Håvard D. Johansen,
   Dag Johansen, and Pål Halvorsen

CatMeows: A Publicly-Available Dataset of Cat Vocalizations . . . . . . . . . . .                      230
  Luca A. Ludovico, Stavros Ntalampiras, Giorgio Presti, Simona Cannas,
  Monica Battini, and Silvana Mattiello

Search and Explore Strategies for Interactive Analysis of Real-Life Image
Collections with Unknown and Unique Categories . . . . . . . . . . . . . . . . . . .                   244
  Floris Gisolf, Zeno Geradts, and Marcel Worring
Contents – Part II           xxiii

Graph-Based Indexing and Retrieval of Lifelog Data . . . . . . . . . . . . . . . . . .                  256
  Manh-Duy Nguyen, Binh T. Nguyen, and Cathal Gurrin

On Fusion of Learned and Designed Features for Video Data Analytics. . . . .                            268
  Marek Dobranský and Tomáš Skopal

XQM: Interactive Learning on Mobile Phones . . . . . . . . . . . . . . . . . . . . . .                  281
  Alexandra M. Bagi, Kim I. Schild, Omar Shahbaz Khan, Jan Zahálka,
  and Björn Þór Jónsson

A Multimodal Tensor-Based Late Fusion Approach for Satellite Image
Search in Sentinel 2 Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .       294
  Ilias Gialampoukidis, Anastasia Moumtzidou, Marios Bakratsas,
  Stefanos Vrochidis, and Ioannis Kompatsiaris

Canopy Height Estimation from Spaceborne Imagery Using
Convolutional Encoder-Decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .           307
  Leonidas Alagialoglou, Ioannis Manakos, Marco Heurich,
  Jaroslav Červenka, and Anastasios Delopoulos

Implementation of a Random Forest Classifier to Examine Wildfire
Predictive Modelling in Greece Using Diachronically Collected Fire
Occurrence and Fire Mapping Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .            318
   Alexis Apostolakis, Stella Girtsou, Charalampos Kontoes,
   Ioannis Papoutsis, and Michalis Tsoutsos

Mobile eHealth Platform for Home Monitoring of Bipolar Disorder . . . . . . .                           330
  Joan Codina-Filbà, Sergio Escalera, Joan Escudero, Coen Antens,
  Pau Buch-Cardona, and Mireia Farrús

Multimodal Sensor Data Analysis for Detection of Risk Situations
of Fragile People in @home Environments. . . . . . . . . . . . . . . . . . . . . . . . .                342
   Thinhinane Yebda, Jenny Benois-Pineau, Marion Pech, Hélène Amieva,
   Laura Middleton, and Max Bergelt

Towards the Development of a Trustworthy Chatbot for Mental
Health Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   354
  Matthias Kraus, Philip Seldschopf, and Wolfgang Minker

Fusion of Multimodal Sensor Data for Effective Human Action
Recognition in the Service of Medical Platforms . . . . . . . . . . . . . . . . . . . . .               367
  Panagiotis Giannakeris, Athina Tsanousa, Thanasis Mavropoulos,
  Georgios Meditskos, Konstantinos Ioannidis, Stefanos Vrochidis,
  and Ioannis Kompatsiaris

SpotifyGraph: Visualisation of User’s Preferences in Music . . . . . . . . . . . . .                    379
  Pavel Gajdusek and Ladislav Peska
xxiv          Contents – Part II

A System for Interactive Multimedia Retrieval Evaluations . . . . . . . . . . . . .                      385
  Luca Rossetto, Ralph Gasser, Loris Sauter, Abraham Bernstein,
  and Heiko Schuldt

SQL-Like Interpretable Interactive Video Search . . . . . . . . . . . . . . . . . . . . .                391
  Jiaxin Wu, Phuong Anh Nguyen, Zhixin Ma, and Chong-Wah Ngo

VERGE in VBS 2021 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .          398
  Stelios Andreadis, Anastasia Moumtzidou, Konstantinos Gkountakos,
  Nick Pantelidis, Konstantinos Apostolidis, Damianos Galanopoulos,
  Ilias Gialampoukidis, Stefanos Vrochidis, Vasileios Mezaris,
  and Ioannis Kompatsiaris

NoShot Video Browser at VBS2021 . . . . . . . . . . . . . . . . . . . . . . . . . . . . .                405
  Christof Karisch, Andreas Leibetseder, and Klaus Schoeffmann

Exquisitor at the Video Browser Showdown 2021: Relationships Between
Semantic Classifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .     410
  Omar Shahbaz Khan, Björn Þór Jónsson, Mathias Larsen,
  Liam Poulsen, Dennis C. Koelma, Stevan Rudinac, Marcel Worring,
  and Jan Zahálka

VideoGraph – Towards Using Knowledge Graphs for Interactive
Video Retrieval. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   417
  Luca Rossetto, Matthias Baumgartner, Narges Ashena, Florian Ruosch,
  Romana Pernisch, Lucien Heitz, and Abraham Bernstein

IVIST: Interactive Video Search Tool in VBS 2021 . . . . . . . . . . . . . . . . . .                     423
  Yoonho Lee, Heeju Choi, Sungjune Park, and Yong Man Ro

Video Search with Collage Queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .            429
  Jakub Lokoč, Jana Bátoryová, Dominik Smrž, and Marek Dobranský

Towards Explainable Interactive Multi-modal Video Retrieval with Vitrivr. . .                            435
  Silvan Heller, Ralph Gasser, Cristina Illi, Maurizio Pasquinelli,
  Loris Sauter, Florian Spiess, and Heiko Schuldt

Competitive Interactive Video Retrieval in Virtual Reality with vitrivr-VR . . .                         441
  Florian Spiess, Ralph Gasser, Silvan Heller, Luca Rossetto,
  Loris Sauter, and Heiko Schuldt

An Interactive Video Search Tool: A Case Study Using the V3C1 Dataset. . .                               448
  Abdullah Alfarrarjeh, Jungwon Yoon, Seon Ho Kim, Amani Abu Jabal,
  Akarsh Nagaraj, and Chinmayee Siddaramaiah

Less is More - diveXplore 5.0 at VBS 2021 . . . . . . . . . . . . . . . . . . . . . . . .                455
  Andreas Leibetseder and Klaus Schoeffmann
Contents – Part II          xxv

SOMHunter V2 at Video Browser Showdown 2021 . . . . . . . . . . . . . . . . . .                         461
  Patrik Veselý, František Mejzlík, and Jakub Lokoč

W2VV++ BERT Model at VBS 2021 . . . . . . . . . . . . . . . . . . . . . . . . . . . .                   467
 Ladislav Peška, Gregor Kovalčík, Tomáš Souček, Vít Škrhák,
 and Jakub Lokoč

VISIONE at Video Browser Showdown 2021. . . . . . . . . . . . . . . . . . . . . . .                     473
  Giuseppe Amato, Paolo Bolettieri, Fabrizio Falchi, Claudio Gennaro,
  Nicola Messina, Lucia Vadicamo, and Claudio Vairo

IVOS - The ITEC Interactive Video Object Search System at VBS2021 . . . .                               479
  Anja Ressmann and Klaus Schoeffmann

Video Search with Sub-Image Keyword Transfer Using Existing
Image Archives. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   484
  Nico Hezel, Konstantin Schall, Klaus Jung, and Kai Uwe Barthel

A VR Interface for Browsing Visual Spaces at VBS2021. . . . . . . . . . . . . . .                       490
  Ly-Duyen Tran, Manh-Duy Nguyen, Thao-Nhu Nguyen, Graham Healy,
  Annalina Caputo, Binh T. Nguyen, and Cathal Gurrin

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    497
You can also read