Lec 01: Introduction to Computer Vision - ECE/CS 5582/479 Computer Vision

Page created by Peter Sharp
 
CONTINUE READING
Lec 01: Introduction to Computer Vision - ECE/CS 5582/479 Computer Vision
Fall 2021: Zoom@533 999 8759, pwd: mcc2020, Thr 5:30pm-8:15pm

                                 ECE/CS 5582/479 Computer Vision
                 Lec 01: Introduction to Computer Vision

                                                  Zhu Li
                                          Dept of CSEE, UMKC
                         Office: FH560E, Email: lizhu@umkc.edu, Ph: x 2346.
                                        http://l.web.umkc.edu/lizhu

                                                       slides created with WPS Office Linux and EqualX LaTex equation editor

Z. Li: ECE 5582 Computer Vision, 2021                                                                                          p.1
Lec 01: Introduction to Computer Vision - ECE/CS 5582/479 Computer Vision
Outline
           Background
           Objective of the class
           Prerequisite
           Lecture Plan
           Course Project
           Q&A

Z. Li: ECE 5582 Computer Vision, 2021             p.2
Lec 01: Introduction to Computer Vision - ECE/CS 5582/479 Computer Vision
An image is worth a thousand words….
     What we observe are pixels….
     The story:
                The train wreck at La Gare
                 Montparnasse, 1895
     What computer can do these
      days:
                Figure out the building
                The train
                People walking around
     Still long way to go to figure out
      the semantics
                Train crashes
                It is an abnormal event (context)

                                                     La Gare Montparnasse, 1895

Z. Li: ECE 5582 Computer Vision, 2021                                             p.3
Lec 01: Introduction to Computer Vision - ECE/CS 5582/479 Computer Vision
Advances in Image Sensors: pixels and voxels

 Hyperspectral Image Sensor
            I(x,y) in RD, D= 48, e.g.

 3D/Depth Sensor: LiDAR,
  Stereo Capture
            I(x,y,z) in R

 Panoramic Video Cameras
            I( ,        ),    ,        in [0, 2 ]

 Lightfield Capture
            Lenslet images

Z. Li: ECE 5582 Computer Vision, 2021                         p.4
Lec 01: Introduction to Computer Vision - ECE/CS 5582/479 Computer Vision
More than 25 years of Image Retrieval Research…
 IEEE Computer 1995 Special Issue on Content Based Image Retrieval (CBIR)

                                                Dr. Raghavan, Vijay
                                                Distinguished Professor
                                                Center for Adv. Computer
                                                Studies
                                                Univ of Louisiana at Lafayette

Z. Li: ECE 5582 Computer Vision, 2021                                            p.5
Lec 01: Introduction to Computer Vision - ECE/CS 5582/479 Computer Vision
NSF Digital Librararies Initiative
     Relevance Feedback in CBIR

Z. Li: ECE 5582 Computer Vision, 2021                             p.6
Lec 01: Introduction to Computer Vision - ECE/CS 5582/479 Computer Vision
MPEG-7 Visual Features (Circa 2003)
     Color, Shape, Texture Features for Image Search

          Color                              Texture          Shape            Motion

   1. Histogram                         • Texture Browsing   • Contour Shape
          • Scalable Color              • Homogeneous        • Region Shape
                                        texture
          • Color Structure
                                        • Edge Histogram              • Camera motion
          • GOF/GOP
   2. Dominant Color                                                  • Motion Trajectory

   3. Color Layout                                                    • Parametric motion
                                                                      • Motion Activity

Z. Li: ECE 5582 Computer Vision, 2021                                                     p.7
Lec 01: Introduction to Computer Vision - ECE/CS 5582/479 Computer Vision
ImageNet - Deep Learning Classification (2013)
     Tasks: Image Classification, Object Detection & Localization
                 2012: Fisher Vector (ECCV test of time award, 2020)
                 2013: Deep Learning ~ Conv Neural Networks (CNN) .e.g. AlexNet
                 2016: (Very) Deep Learning ~ Residual Neural Networks (ResNet),
                  K. He, MSRA/FAIR.

Z. Li: ECE 5582 Computer Vision, 2021                                               p.8
Lec 01: Introduction to Computer Vision - ECE/CS 5582/479 Computer Vision
MPEG CDVS (2015) - Identification
     Compact Descriptor for Visual Search (CDVS)
                 Object Re-Identification
                 Applications: Navigation, Query by Capture, AR/VR

     Technology:
                 Key Point (SIFT) detection
                 Fisher Vector Aggregation and Hashing (for shortlisting)
                 SIFT compression
     Performance
                 Verification: 90+% precision on 1% recall
                 Retrieval : mAP in 80~90%.

Z. Li: ECE 5582 Computer Vision, 2021                                        p.9
Lec 01: Introduction to Computer Vision - ECE/CS 5582/479 Computer Vision
Point Cloud Detection and Segmentation
   Key problems for auto driving cars
   • Depth from Stereo Images
   • Optical Flow
   • Scene Flow
   • 2D/3D data fusion and registration
   • Image/3D features for SLAM
   • Higher level syntactic object/event
     recognition

Z. Li: ECE 5582 Computer Vision, 2021                         p.10
Image Recognition Pipeline - Handcrafted
     Handcrafted Feature Based

               Image                      Feature            Feature
                                                                          Classification
             Formation                   Computing         Aggregation

                                        Color histogram   Bow
              Homography,               Filtering, Edge   VLAD
              Color space               Detection         Fisher Vector
                                        HoG, Harris       Supervector
                                        Detector, SIFT                     Knowledge
                                                                           /Data Base

Z. Li: ECE 5582 Computer Vision, 2021                                                      p.11
Image Recognition Pipeline - Holistic/Deep Learning
     Holistic Image Analysis
                Direction Pixel Projection Subspace Models
                       w

           h                            Y=AX

                    X in Rhxw

                Convolutional Neural Networks

Z. Li: ECE 5582 Computer Vision, 2021                         p.12
Outline
           Background
           Objective of the class
           Prerequisite
           Lecture Plan
           Course Project
           Q&A

Z. Li: ECE 5582 Computer Vision, 2021             p.13
Prerequisite & Text book
   Prerequisite
              For senior and graduate students in EE/CS
              Good Matlab/C programming skills. Some Python is
               also desirable.
              Taken Signal & System, or Digital Signal Processing or
               consent of the instructor
              Will have different expectation/evaluation scheme for
               MS/PhD and undergrad students
   Textbook:
               None required (saving $$) , will distribute relevant
               chapters, papers, and notes.
   Key References:
              R. Szeliski, Computer Vision: Algorithms and
               Applications, Springer, 2014. URL:
               http://szeliski.org/Book/
              J. E. Solem, Programming Computer Vision with
               Python, O’Reilly, 2015. URL:
               http://programmingcomputervision.com/downloads/Pro
               grammingComputerVision_CCdraft.pdf

Z. Li: ECE 5582 Computer Vision, 2021                                   p.14
Tentative Lecture Plan
     Image Processing Basics                              HW 1:
                 Camera model and image formation         Image Filtering and Features
                 Image filtering
     Image Features for Retrieval
                 Color Features
                                                           HW 2:
                 Texture and Shape Features
                                                           Image Retrieval System
                 Basic Image Retrieval System and
                  Metrics
     Object Identification in Image                       HW 3:
                 Key Point Detection                      Keypoint Feature Aggregation
                 Key Point Feature Description
                 Fisher Vector Aggregation
                 MPEG Mobile Visual Search
                  Technology and Standard                  HW 4:
     Holistic Approach in Image                           Subspace method for face recog
      Understanding
                 Subspace methods for face recognition:
                  Eigenface, Fisherface, Laplacianface.    HW 5: deep learning
                 Deep Learning in Image Classification:   methods in Aerieal Image
                  SoftMax and Triplet Loss networks
                                                           Classification

Z. Li: ECE 5582 Computer Vision, 2021                                                     p.15
Potential Course/MS thesis Project
    Resources from last year:
               https://sce.umkc.edu/faculty-
                sites/lizhu/teaching/2019.spring.vision/mai
                n-cv.html
    Potential projects with 25% bonus points
               Google Landmark Grand Challenge -
                Identification/Recognition (CDVS baseline,
                U of Surrey)
               Aerial Image Classification with blur &
                noise (AFRL project)
               VisDrone - UAV vision and object
                recognition (Pengfei)
               FlatCam Lensless Camera Face Verification
                Challenge (Salman)
               Real world smart phone image super
                resolution (NITRE2020)
               Fast Face Detection in compressed video
                (OpenCV)

Z. Li: ECE 5582 Computer Vision, 2021                           p.16
Working with NSF Center for Big Learning

   Short Bio:

   Research Interests:
    Immersive visual communicaiton: light field, point
       cloud and 360 video coding and low latency
       streaming
    Low Light, Res and Quality Image Understanding
    What DL can do for compression (intra, ibc, sr, inter,
       end2end)
                                                            Multimedia Computing & Communication Lab
    What compression can do for DL (compression,           Univ of Missouri, Kansas City
       acceleration)

   signal processing and        image understanding   visual communication   mobile edge computing & communication
   learning
Z. Li: ECE 5582 Computer Vision, 2021                                                                           p.17
Dark Image Enhancement
 To design network to denoise the low-light image in Bayer
  domain
 To use wavelet decomposition to divide and conquer the
  problem by learning sensor field sub images using separate
  netowks

    Figure 4: [a] Extreme low-light image from Sony a7S II exposed for 1/25 second . [b] 250x intensity scaling of image in [a]. [c] Ground truth image captured with 10 second
    exposure time. [d] Output from SID[]. SID introduced some artifacts around the edge of the chair as shown by green arrow. [e] Output from ResLearning[]. The white region as
    indicated by arrow in image is not properly reconstructed as white compared to that in ground truth image. [f] Our result.

Z. Li: ECE 5582 Computer Vision, 2021                                                                                                                                              p.18
Decomposition based residual learning from sensor field

   Decomposition of the target image via Wavelet
   Adaptive loss functions for different subbands to exploit strong texture prior

    Figure 12: Overview of our wavelet decomposition based network. The first stage learns the decomposed image and used the inverse wavelet to reconstruct the denoised 4 channel
    image. The second stage uses the off-the-shelf ISP to enhances the image and converts into 3 channel sRGB image.

Z. Li: ECE 5582 Computer Vision, 2021                                                                                                                                          p.19
Experimental Results

Z. Li: ECE 5582 Computer Vision, 2021                          p.20
Remote Sensing & Vision Highlights (AFOSR)
   "Hyperspectral Image Classification with Attention Aided CNNs", IEEE Trans. on Geoscience &
    Remote Sensing (T-GRS), 2020.

                                                            Attention CNN for Hyperspectral
                                                            Image Classification
                                                             • Introducing a dual stream
                                                             network architecture with separate
                                                             attention model for spatial and
                                                             spectral feature maps
                                                             • Achieving the SOTA
                                                             performance.
   “PRINET: A Prior Driven Spectral Super-Resolution Network”, IEEE International Conf on
    Multimedia & Expo (ICME), London, 2020.

                                          PRINET: Spectral Super Resolution
                                            • Super-resolve hyper-spectral info from RGB
                                            inputs
                                            • A dual loss network that learn a correlation
                                            decomposed HSI images
                                            • Achieving the new SOTA performance.

           Z. Li, UMKC                                                                            p.21
Deep Guided Filtering Deblocking

     The residual frame can be used as the guidance for the in-
      loop filter of the reconstructed frame
                 Larger residuals indicate larger reconstruction errors

Z. Li: ECE 5582 Computer Vision, 2021                                      p.22
Coding-prior-based in-loop filter

       The residual frame is used as the additional input
       Specific networks for reconstruction and residual
                   Residual Network: residual blocks
                   Reconstruction Network: down-sampling and up-sampling

Z. Li: ECE 5582 Computer Vision, 2021                                       p.23
Experimental results
     Comparison with VRCNN

                    Intra: 2.1% improvement             Inter: 0.7% improvement

Z. Li: ECE 5582 Computer Vision, 2021                                             p.24
Radar Signal Learning for Privacy Preserving Fall Detection

     Use case:Seniors assisted living - Fall Detection
     Approach:
                      77Ghz portable radar array sensor set up: horizontal and vertical
                       scanning, 4x2 Tx/Rx
                      Radar Signal Low Dimension Embedding + LSTM action
                       recognition
                          Time

           GRB Images
              from
            Realsense

             Range-
              Angle
            Reflection
            Headmaps

            Non-Falls

               Falls

                                        Figure 1. mmWave Radar based Fall Detector

Z. Li: ECE 5582 Computer Vision, 2021                                                      p.25
Neural network processing
     Human activities are continuous dynamic patterns that can be recognized in
      both spatial and temporal dependencies. We use successive radar reflection
      heatmaps as the representative of human activities.
                 PCA is adopted as RLDE algorithm to project reflection heatmaps {H , V } to a
                  low-dimension subspace P as the elimination of spatial redundancies,
                 The proposed RNN with LSTM units utilizes the changes of motion at the temporal
                  domain. The softmax layer operates as a classifier. The cross-entropy function is
                  adopted as the objective function.

                                                   X               +
                                           Ct-1                                          Ct
                                                             it                   tanh
                                                                   X
                                                                             ot
                                                  ft                               X
                                                    σ        σ    tanh   σ
                                                                                                       Softmax
                                           ht-1                                          ht

                                                             RLDE
                                St-1                    St                                    St+1

                         Ht-1     Vt-1        Ht        Vt                     Ht+1             Vt+1
                                         Figure 3. Architecture of RNN with LSTM units

Z. Li: ECE 5582 Computer Vision, 2021                                                                            p.26
Extensive experiment

    Multiple human activities detections: 7 categories of human
     activities are labeled: Boxing, Falling, Jogging, Jump, Pick up,
     Stand up & Walking.

                                            Confusion Matrix of Multiple Human Activities

                    boxing    97.7%                 2.3%

                                                                                                            Average Inference Time Complexity:
                    falling   1.2%     69.4%        1.2%          1.2%         3.5%     15.3%     8.2%
                                                                                                            RLDE + LSTM: 0.06042 sec
                   jogging                         100.0%
                                                                                                            3DCNN: 7.336 sec
      True Class

                     jump              1.8%                      96.4%                            1.8%

                    pickup             5.9%                                    91.2%     2.9%

                   standup             32.1%                                   5.7%     49.1%     13.2%

                   walking                                        0.7%                            99.3%

                              boxing   falling     jogging        jump         pickup   standup   walking
                                                             Predicted Class
                         Figure 4. Accuracy of Multiple Human Activities Detecting

Z. Li: ECE 5582 Computer Vision, 2021                                                                                                            p.27
Internship Opportunities
     Industry Partners

     US Citizens - Send me your contact if interested
                 AFRL,
                 JAIC

Z. Li: ECE 5582 Computer Vision, 2021                              p.28
Course Outcome
     Upon completion of the course you will be able to:
                 Understand the basic operations in image formation and filtering
                 Understand basic image features for retrieval: color, shape, texture
                 Understand key point features and aggregation in object
                  identification
                 Understand the holistic appearance modeling approach in image
                  understanding
                 Understand the latest image analysis and understanding techniques
                  like deep learning .
                 Can apply the knowledge an algorithms to solve real world image
                  understanding and retrieval problems
                 Well prepared for conducting advanced research and pursing
                  career/PhD in this topic area. (PhD qualify required course)

Z. Li: ECE 5582 Computer Vision, 2021                                                    p.29
Grading (total 100pts + bonus)
     5 Homeworks (50pts)
                Image Filtering and Basic Features
                Image Retrieval System and Performance Metrics
                Key Point Feature and Fisher Vector Aggregation in Object
                 Identification
                Subspace Models in Image Understanding
                Deep Learning Aggregation in Classification
     2 Quizzes (20pts) : relax, quiz is also on me, to see where you
      guys stand
                Quiz-1: Part I and II
                Quiz-2: Part III and IV
     Project (30pts)
                Original work leads to publication, discuss with me by the mid of
                 October. (up to 15 bonus pts)
                Regular project: assign papers to read, implement certain aspect, and
                 do a presentation.
Z. Li: ECE 5582 Computer Vision, 2021                                                    p.30
Logistics
     Office Hour:
                 Thu: 2:30-4:30pm on zoom
                 Or by appointment
     TA:
                 Rijun Liao
                 Lab Sessions are planned to cover certain software tools aspects.
                 Office Hour: TBA
     Course Resources:
                 Box folder with slides, lecture video, references, data set, and
                  software: (Password: ECE5582CV)
                  https://umkc.box.com/s/zwj3nxrjbh1qzjctp7qhoru044grv5zf
                 Main communication: via class emails, homeworks submission via
                  canvas, zoom meetings/office hours
                 Additional reference, software, and data set will be announced.

Z. Li: ECE 5582 Computer Vision, 2021                                                 p.31
Q&A
     Q&A

Z. Li: ECE 5582 Computer Vision, 2021         p.32
You can also read