Intelligent Vision Tech Express 2020 - Huawei
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Cloud Service
03
Discussion on Video Cloud 63
Service Trends
P2P Technology 67
ONTENTS
Products and Solutions Catalog 72
Preface Ecosystem
04
Embrace the Intelligent Vision, 02
Build an Intelligent World
5G Discussion on Intelligent Vision 74
01
Ecosystem Trends
Products and Solutions Catalog 78
Discussion on the Impact of 5G on
Intelligent Vision
05 Appendix
5G-enabled Image Encoding and
Transmission Technologies
Products and Solutions Catalog
10
15
05
Abbreviations 81
AI Legal Statement 82
02 Product Portfolio 83
Image, Algorithm, and Storage 17
Trends Led by AI
Discussion on Frontend Intelligence Trends 24
Discussion on Development Trends 28
Among Intelligent Video and Image
Cloud Platforms
Chip Evolution and Development 32
Algorithm Repository Technology 36
SuperColor Technology 42
Video Codec Technology 47
Storage EC Technology 52
Multi-Lens Synergy Technology
Products and Solutions Catalog
56
60
CONTENTSEmbrace the
Intelligent Vision
Build an
Intelligent World
— President of Huawei
Intelligent Vision Domain
02In the past 120 years, three industrial revolutions have made breakthroughs in fields
such as electricity and information technologies, dramatically improving productivity
and our daily life. Today, the fourth industrial revolution, driven by AI and ICT
technologies, ushers in an intelligent era where all things are sensing, interconnected,
and intelligent. Vision, the core of biological evolution, will serve as a significant
enabler in this era. The combination of AI and vision systems will enable machines to
perceive information and respond intelligently, which revolutionizes people's work and
everyday life, and improves productivity and security.
Today, we are delighted to see that new ICT technologies, such as 5G, AI, and
machine vision are being put into commercial use, and playing a significant role in
the video surveillance industry. 2020 marks the first year of 5G commercialization as
well as a turning point of AI development. Additionally, machine vision now surpasses
human vision to obtain more information in specific scenarios. The three technologies
are interwoven with each other, fueling the development of intelligent vision.
Huawei remains steady in its commitment to embed 5G technologies into intelligent
vision, which opens up opportunities by providing high bandwidth, low latency, and
broad connection capabilities.
Huawei is developing intelligent cameras like how we develop smartphones by revolutioniz-
ing the technical architecture, ecosystem, and industry chain. Huawei embeds innovative
operating system (OS) into software-defined cameras (SDCs) to enable remote loading of
intelligent algorithms anytime, anywhere. The HoloSens Store allows users to download and
install algorithms on cameras depending on their needs.
Huawei adheres to the "platform + ecosystem" strategy to build a future-proof intelligent
vision ecosystem and empower more industries. Huawei is committed to providing platforms
and opening algorithms and applications to benefit vendors and customers across industries.
Huawei develops cloud-edge-device synergy to maximize data value. Huawei will give full
play to the technical advantages of the device-edge-cloud industry chain, develop devices
based on cloud technologies, and empower the cloud through interconnection with various
devices, thereby advancing the digital transformation of all industries.
Intelligent vision serves as the eyes of the intelligent world, the core of worldwide
sensory connections, and a key enabler for digital transformation of industries.
Huawei Intelligent Vision looks forward to, together with our partners across indus-
tries, driving industry development and the intelligent transformation of cities,
production, and people's life with the power of technology, to build an intelligent
world where all things can sense.
035G
Discussion on the Impact of 5G on Intelligent Vision 05
5G-enabled Image Encoding and Transmission Technologies 10
Products and Solutions Catalog 15
01Niu Liyang, Liu Zhen
Discussion on the Impact of 5G on Intelligent Vision
Niu Liyang, Liu Zhen
1. 5G Development
New 5G infrastructure is driving the expansion of the global digital economy, and each country’s information capability is
represented by the state of their 5G networks. 5G is even revolutionizing the whole industry chain, from electronic devices
to base station devices to mobile phones. Therefore, major economies around the world are accelerating their application
of 5G and actively exploring upstream and downstream industries to seize the strategic high ground. According to
TeleGeography, a prominent telecommunications market research company, the number of global 5G networks in
commercial use had reached 82 by June 2020, and will be doubled by the end of 2020.
2. Features of 5G Networks
With their high bandwidth, low latency, and massive connectivity, 5G networks contribute to the building of a fully
connected world. They have three major applications: Enhanced Mobile Broadband (eMBB), Ultra-Reliable Low Latency
Communications (URLLC), and Massive Machine Type Communications (mMTC). Users can select the 5G devices they
require according to different scenarios, and developers can select development scenarios based on the types of
applications they want to create.
Source: International Telecommunication
Union (ITU), partly updated
eMBB
10 ms Latency Latency 1 ms
Fast transmission at Gbit/s
3D video and UHD video
Uplink Uplink
Smart home
Cloud-based office/gaming 1 Mbit/s service rate service rate 200 Mbit/s
Intelligent
video surveillance Augmented reality (AR)
Voice intercom
Industrial automation
4G 5G
High-reliability applications, Downlink Downlink
Smart city such as mobile healthcare 10 Mbit/s service rate service rate 2 Gbit/s
Self-driving car
mMTC URLLC
5G application scenarios Comparison between 5G and 4G
3. Impact of 5G on Intelligent Vision
Extending the breadth of intelligent vision
In the 4G era, video services were limited to the consumer field. This was due to the low bandwidth and high latency of
4G networks. However, compared with 4G, 5G improves the service rate by about 100-fold, and reduces latency by about
10-fold, enriching video application scenarios, from remote areas with complex terrains, to mines, factories, harbors with
cabling difficulties, and places requiring security for major events.
055G/Discussion on the Impact of 5G on Intelligent Vision
5G camera
Rongbuk Monastery
5G camera installed atop Mount Qomolangma Video image from a 5G camera
5G increases the peak transmission rate limit, laying a solid foundation for the internet of everything. It will play an
important role in communications among machines and drive innovation across a range of emerging industries. Because
of its high mobility and low power consumption, 5G is capable of supporting a wide array of frontend devices, such as
vehicle-mounted devices, drones, wearables, and industrial robots, which will serve as significant carriers for video
awareness. It is estimated that by 2023, the number of connected short-distance Internet of Things (IoT) terminals will
reach 15.7 billion. In addition, the 5G network can be sliced into multiple subnets to meet the differing requirements of
terminals in terms of latency, bandwidth, number of connections, and security. This will further enrich the application
scenarios of 5G.
Vehicle-
mounted device
5G network
Drone Harbor Vehicle
Emergency
assurance
5G slicing 5G slicing 5G slicing
(harbor private (bus private (emergency assurance
network) network) private network)
Wearable
穿戴设备 Industrial robot
工业机器人
Diverse 5G terminals become enablers of intelligent vision Network slicing enriches 5G application scenarios
Typical application case
Optical fibers deployed at harbors are prone to corrosion, and those on gantry cranes can easily become entangled
during operations. To solve this problem, HD cameras are connected to 5G networks to monitor gantry cranes, so that
operators can remotely check lifting and hoisting operations in real time and promptly identify anomalies. In addition,
powered by 5G and artificial intelligence (AI), most container hoisting operations can be completed by machines, greatly
improving efficiency. When 5G is applied in a harbor, the transfer efficiency of the harbor is doubled, and the deployment
and maintenance costs of optical fibers are reduced by about CNY100,000 each year. Additionally, operators no longer
need to work at heights, greatly improving their work efficiency and ensuring safety.
06Niu Liyang, Liu Zhen
5G networks enable HD cameras to obtain
Remote operation in the central control room
full coverage
Optical fibers on existing 18 HD cameras are Remote detection and remote
gantry cranes required for precise control control joystick
On-site operation
Optical
fibers
Camera
• Optical fibers easily
50 gantry cranes, Operators in the central control room can remotely
become entangled
each fitted with 10 to operate two or three gantry cranes at the same time
• Cabling is subject to
18 cameras
sea tide impact
5G ushers in the AI era
5G is revolutionizing the way we think about AI. AI is now deeply rooted in the video surveillance industry, which in turn
poses increasingly high requirements on video and image quality. 4K video encoded in the H.265 format requires an
average transmission bandwidth of 10 Mbit/s to 20 Mbit/s. However, when intelligent services are enabled, the immediate
peak transmission rate will soar to over 100 Mbit/s, far higher than that provided by 4G networks. Once they are connected
to 5G networks, cameras can utilize the high bandwidth to quickly deliver detailed, high-quality video images, thereby
improving intelligent analysis performance.
4G network 720p video
720p camera Low-definition video,
which cannot be used
for intelligent services
Bandwidth:
1 Mbit/s
VS
5G network 4K video
High-quality video,
meeting the
4K camera requirements of
intelligent services
Bandwidth:
200 Mbit/s
With its low latency, 5G serves as the supporting system for AI. During the Industrial Revolutions, people increased their
productivity by mastering mechanical energy. At present, we are experiencing an AI revolution, in which people are
improving the intelligent capabilities of machines by harnessing computing power. As the cost of computing power drops,
the cloud, edges, and devices are coming to possess ample computing power, which they can use to perform video-based
analysis using intelligent algorithms, and generate massive amounts of valuable data. This data can only be fully utilized
when it is quickly transferred among the cloud, edges, and devices.
075G/Discussion on the Impact of 5G on Intelligent Vision
Intelligent capabilities are like electric power. The electric power possesses great potential, but cannot be directly applied
in industries unless a power transmission network is built. 5G, in essence, serves as the transmission network for
computing power and intelligent data. It enables the full implementation of intelligent capabilities, and by doing so, is
promoting the intelligent transformation of industries and people's everyday life.
AI Cloud
0 1 1 0 0
0 1 1 0 1 1 0
1 0 1 1 0 0 1 1
0 0 0 0 0 0 0 0 1
1 0 1 1 0 1 1 0 0
0 1 1 0 1 1 1 1 0
0 1 0 1 1 1
1 1 1
1 1
Edge node AI Edge node AI Edge node AI
5G 5G 5G
Intelligent data transmission on the devices, edges, and cloud
Typical application case
Major economies around the globe are seeking to digitally transform their manufacturing sectors. Aircraft manufacturing
is the most valuable sector of the manufacturing industry. Aircraft manufacturers adopt 5G and AI technologies for
quality assurance, reducing the time required for carbon fiber stitching gap checks from 40 minutes to 2 minutes. In
addition, 5G cameras provide a wide range of intelligent applications in factories, including safety helmet detection,
workwear detection, and perimeter intrusion detection.
Aircraft manufacturing plant
4. Application Bottlenecks of 5G in Intelligent Vision
The high bandwidth and low latency of 5G enable wireless video transmission, extending the boundary of intelligent
vision applications. When powered by 5G, cameras can connect to massive sensors to implement multi-dimensional
awareness. Additionally, as 5G develops, it is enabling the creation of various innovative kinds of devices, fueling the
digital transformation of all industries.
08Niu Liyang, Liu Zhen
Every technology encounters various difficulties when it is being applied. 5G is no exception when it is applied to
intelligent vision. The 5G uplink and downlink bandwidths are unbalanced, and the total 5G uplink bandwidth of a single
base station is limited to around 300 Mbit/s. However, most of the time, cameras upload P-frames containing changes in
an image from the previous frame, as well as periodically upload I-frames containing all information. As a result,
bandwidth usage can fluctuate dramatically. The instantaneous transmission rate of a single 4K camera can reach 60
Mbit/s. If five 4K cameras are connected to a single 5G base station, the uplink bandwidth of the base station will be
insufficient for video transmission during peak hours. Therefore, video encoding needs to be optimized so cameras can
adapt to the limited uplink bandwidth of 5G networks. In addition, packet loss and bit errors during wireless transmission
may cause image quality issues such as artifacts and video stuttering, which require more reliable transmission modes.
Limited uplink
bandwidth
Packet loss and bit
errors frequently
occur during wireless
transmissions
Artifacts and video stuttering may occur due to wireless network transmission limitations.
A 5G network uses short wavelengths for transmission, which results in fast signal attenuation. The network bandwidth
decreases rapidly as the distance increases. Therefore, the number of cameras that can be connected to a single 5G base
station is limited. In addition, carriers tend to build 5G base stations based on their actual requirements in terms of
construction costs and benefits, and 5G coverage is limited in the short term. Therefore, it is important to properly and
efficiently use 5G base station resources and improve the coverage and access capability of a single base station.
400 m
400m 300300m
m 200
200m
m 100
100m
m 100
100m
m 200
200m
m 300 m
300m 400m
400 m
Mbit/s 90 90Mbps
6060Mbps 140Mbps
Mbit/s 140 210Mbps
Mbit/s 210 Mbit/s 210Mbps
210 Mbit/s 9090Mbps
140Mbps
Mbit/s 140 Mbit/s 6060Mbps
Mbit/s
Bandwidth attenuation of a 5G base station
To solve these problems, 5G cameras should not simply be combinations of cameras and 5G modules. Instead, they
should provide efficient video/image encoding capabilities to reduce the bandwidth required for transmission.
Additionally, reliable transmission technologies are needed to prevent the packet loss and bit errors which occur during
wireless transmission. In this way, 5G base station resources can be utilized properly.
Built-in 5G module
More efficient
5G module encoding
More reliable
transmission
095G/5G-enabled Image Encoding and Transmission Technologies
5G-enabled Image Encoding and Transmission Technologies
Chen Yun, Liu Zhen
5G expands the scope of intelligent vision, and embeds artificial intelligence (AI) into a wide range of industries. However,
due to the limitations of 5G New Radio (NR), wireless 5G networks feature limited uplink bandwidth, and have high
requirements for network stability. Technical innovations have sought to overcome these challenges for utilizing 5G in
intelligent vision applications.
1. Challenges to Video and Image Transmission on 5G Networks
Video and image transmission requires high uplink bandwidth and stable wireless networks
5G networks adopt a time-division transmission mode, and spend 80% of the time transmitting downlink data and
20% of the time transmitting uplink data, under typical configurations. Generally, the uplink bandwidth of a single 5G
base station accounts for only 20% of the total bandwidth, and can reach 300 Mbit/s. However, in the intelligent vision
industry, video and image transmission requires far higher uplink bandwidth than that provided by 5G networks.
Wired transmission Typical wireless time-division transmission
1 RX+ (positive end for receiving data) 4:1 subframe
1 2 RX- (negative end for receiving data)
configuration
D D D S U D D D S U
2 3 TX+ (positive end for transmitting data)
3
4 4 Not used
8:2 subframe
D D D D D D D S U U
5 5 Not used
6
7 6 TX- (negative end for transmitting data) configuration
8 7 Not used
Time segment labeled with a D is used for data downlink, that labeled with a U is used
8 Not used
for data uplink, and that labeled with an S can be configured.
Wired transmission in full-duplex mode to receive Uplink transmission occupies only 20% of the total time, and uplink
and send data packets anytime data packets can be sent only during the specific time
In addition, during video and image transmission, an I-frame containing the full image information is sent first, after
which P-frames containing changes in the image from previous frames are sent, followed by an I-frame being sent again.
The size of I-frames is larger than that of P-frames. As a result, image data occupies uneven network bandwidth during
the 10 ms time window. Sending P-frames does not require a lot of bandwidth, but sending I-frames requires a high
amount. For example, the average bit rate of 4K video streams is 12 Mbit/s to 20 Mbit/s, and the peak bit rate during
I-frame transmission can reach 60 Mbit/s. This is known as I-frame burst, as it places great strain on the data
transmission time window on 5G networks.
I-frame I-frame
File size I-frame I-frame I-frame
P-frame
I-frame
P-frame P-frame
P-frame
P-frame
0 Time
Bandwidth usage in a 10 ms time window, with each column indicating the size of a file
10Chen Yun, Liu Zhen
In actual applications, a 5G base station always connects to multiple cameras at the same time. In this case, I-frame
bursts may occur simultaneously for multiple cameras, resulting in I-frame collision, further intensifying the pressure on
5G NR bandwidth. According to tests, the probability of I-frame collision is close to 100% when over 7 cameras using
traditional encoding algorithms are connected to a single 5G base station.
Camera 1
I-frame
Camera 2
Camera 3
Data packets of three cameras are scattered within 5 seconds, preventing
I-frame collision
Probability
100.00%
80.00%
60.00%
40.00%
20.00%
0.00%
1 2 3 4 5 6 7 8 9 10 11 12 13 Number of
cameras
25 frames 25 frames 25 frames
per second per second per second
GOP-25 GOP-30 GOP-60
Probability that I-frames of all cameras do not collide with each other
Furthermore, 5G networks are challenged by unstable transmission. Compared with wired network transmission, 5G
wireless network transmission is subject to packet loss and bit errors, especially during network congestion. This
results in video quality issues, such as image delays, artifacts, and video stuttering, which in turn affect backend
intelligent applications.
Efficiently utilizing 5G base station resources to promote the large-scale commercial use of 5G in intelligent vision
In addition to limited uplink bandwidth and network transmission reliability, 5G networks feature a fast attenuation
speed, which restricts the coverage of a single base station. This also affects the commercial use of 5G in intelligent
vision. 5G transmission is mainly conducted on the millimeter wave and sub-6 GHz (centimeter-level wavelength) bands.
These two bands feature short wavelengths, resulting in limited transmission range, poor penetration and diffraction
performance, and faster 5G network attenuation. Therefore, the coverage of a single 5G base station is far smaller than
that of a 4G base station. In addition, unlike 4G base stations which cover almost all areas, carriers build 5G base stations
based on actual project requirements with construction costs and benefits taken into consideration. Therefore, efficiently
utilizing 5G base station resources is essential to improving the coverage and access capabilities of a single base station,
and to achieving the large-scale commercial use of 5G in intelligent vision.
Rate (Mbit/s)
Supports 6–8 access channels for Supports 2–3 access channels for
40% of areas 60% of areas
210
140
90
60
Outdoor macrocell
100 m 200 m 300 m 400 m Coverage
radius (m)
Total uplink bandwidth of 5G networks decreases as the coverage radius increases
115G/5G-enabled Image Encoding and Transmission Technologies
2. Key Technologies
The biggest challenge for large-scale commercial use of 5G in intelligent vision is efficiently utilizing 5G uplink bandwidth,
and preventing packet loss and bit errors. As a remedy, the industry at large has sought to optimize image encoding and
transmission.
Image encoding optimization
Image encoding optimization is designed to eliminate I-frame bursts and reduce bandwidth required for video and image
transmission. The region of interest (ROI)-based encoding technology is used to compress image backgrounds, which
reduces the overall bandwidth required. In addition, stream smoothing technology is adopted to optimize I-frames,
thereby reducing the peak bandwidth required and preventing network congestion.
ROI-based encoding technology, reducing the average bandwidth required for video transmission
In the intelligent vision industry, bandwidth required for video transmission has soared, as image resolution has
continually increased. On top of that, high-quality person and vehicle images are captured and transmitted for intelligent
analysis, which requires even higher bandwidth than that for video transmission. However, in real world applications,
people tend to only focus on key information in video and images, such as pedestrians and vehicles, and have little need
for high definition image backgrounds. ROI-based encoding technology was developed with this understanding in mind.
It automatically distinguishes the image foreground from the background, ensuring high resolution in ROI within images,
while compressing the background, which reduces the overall bandwidth required for transmission. This technology has
managed to reduce the size of video streams and snapshots, with average bit rate a remarkable 30% lower in complex
scenarios, and 60% lower in simple scenarios.
Compressed encoding of background, reducing bit rate
Original
Processed by AI Encoder
video/image
algorithms
streams
Encoding stream
AI
Normal encoding of foreground, ensuring high image quality
Average bit rate of 1080p video (Mbit/s)
4.5
4
Reduced by 30%
3.5
Complex scenario Common scenario Simple scenario
3
2.5
Reduced by 50%
2
Reduced by 60%
1.5
1
0.5
0
Complex scenario Common scenario Simple scenario
Standard H.265 ROI-based
encoding encoding
ROI-based video encoding vs. Traditional encoding method
12Chen Yun, Liu Zhen
I-frame optimization, reducing peak bandwidth required for transmission
The peak bit rate during I-frame bursts is extremely high, which can lead to network congestion. To address this, the
industry has adopted a stream smoothing technology to adjust encoder parameters and control the size and
frequency of I-frames, reducing the peak bandwidth required for video transmission during I-frame bursts.
File size File size
Time Time
0 0
Before I-frame optimization After I-frame optimization
Peak bit rate of I-frames reduced by 40% after stream smoothing, reducing network congestions caused by I-frame bursts
Transmission optimization
Transmission optimization technology mainly focuses on intelligent flow controls and network transmission reliability.
Intelligent flow controls can detect network transmission status in real time and adjust data packet sending parameters
accordingly, to improve overall network bandwidth usage. Network transmission reliability can be enhanced via automatic
repeat request (ARQ) and forward error correction (FEC) technologies, and help prevent packet loss and bit errors.
Intelligent flow controls
In wireless transmission, if data is continuously sent while the network is congested, transmission capabilities will
deteriorate sharply. Intelligent flow control technology makes use of flow control units to detect the length of data queues
in real time, and adjust the data packet sending parameters accordingly. This allows for more data to be sent during
off-peak hours, and prevents data stacking during peak hours, for optimized network bandwidth usage.
Channel
Data
Encoder Packets sent without flow
control are prone to Receiver
No flow controls packet loss and Video delay and
network congestions stuttering
No flow control: Data is directly sent to the channel, causing network congestions and packet loss.
Encoder Data Intelligent flow control Channel
Receiver
Adjust the encoder and data packet sending
parameters based on the length of data Smooth, clear
queues, preventing data stacking. video images
Intelligent flow control: Flow control unit monitors network status in real time and adjusts the packet sending
parameters to improve network usage and prevent network congestions.
135G/5G-enabled Image Encoding and Transmission Technologies
Enhanced transmission reliability to prevent packet loss and bit errors
Video transmission through the Transmission Control Protocol (TCP) features low efficiency, particularly when packet loss
occurs on wireless networks. On 5G networks, video and images are transmitted through the User Datagram Protocol
(UDP), which features two implementation methods: acknowledgment and retransmission mechanisms based on ARQ and
FEC. ARQ adds a verification and retransmission mechanism on the basis of the conventional UDP-based transmission. If
the receiver detects that the transmitted data packet is incorrect, the receiver requests that the transmitter retransmit the
data packet. FEC reserves verification and error correction bits during data transmission. When the receiver detects an error
in the data, it uses the error correction bits to perform the exclusive or (XOR) operation, in order to restore the data. The
transmission optimization technologies can ensure smooth video transmission, even when packet loss rate approaches 10%.
However, transmission reliability improvement mechanisms need to be deployed on both the peripheral units (PUs) and
backend platforms.
Sender 1 0 0 0 D1 D1
D1
0 1 0 0 D2 Data ....
D2 transmission
0 0 1 0 = D3 D3
Retransmission D3
0 0 0 1 D4 D4
NOT OK! D4
R11 R12 R13 R14 C1 C1
Redundant Original Sent Received
Receiver coding matrix A data B data C1 data C2
ARQ adds a verification and retransmission mechanism on Data D2 lost during data transmission can be restored using the
the basis of the conventional UDP-based transmission. If received data and redundancy coding matrix (A^B=C). Data lost
the receiver detects that the transmitted data packet is in matrix B can also be restored (C^A).
incorrect, the receiver requests the transmitter to
retransmit the data packet.
ARQ FEC
3. Camera Bit Rate and Base Station Coverage After Optimization
These innovations have helped facilitate the commercial use of 5G in intelligent vision. More specifically, ROI-based
encoding and I-frame optimization help reduce the average bit rate at the encoding end and the peak bit rate, so that
5G uplink bandwidth can be utilized in a more efficient manner. Intelligent flow controls and transmission reliability
improvement technologies enable cameras to actively monitor data sending queues. This helps prevent network
congestion and improve 5G bandwidth usage. In addition, advancements in encoding and transmission technologies
allow a single 5G base station to connect to more cameras and increase its coverage range.
Unit: Mbit/s Uplink bandwidth: 300 Mbit/s
60
20
15
8
6 1 4
3
Peak bandwidth of 1080p video Peak bandwidth of 4K video Number of 1080p cameras Number of 4K cameras supported
supported by a single base station by a single base station
Before After Before After
Number of cameras that can be connected to a single 5G
Peak bandwidth required for video transmission
base station within 400 m
14Tan Shenquan, Liu Zhen
Products and Solutions Catalog
Tan Shenquan, Liu Zhen
Huawei 5G Cameras
Huawei, has leveraged its accumulated prowess in 5G and network communications, in releasing a series of patented
innovations to resolve longstanding 5G transmission challenges, such as the limited coverage of individual 5G base
stations, low uplink bandwidth, and packet loss. Huawei has also launched a series of related products, such as 5G
cameras, that can be applied across a wide range of industries, including intelligent harbors and manufacturing.
Intelligent encoding and I-frame optimization, improving resource utilization of 5G base stations
5G networks feature limited uplink bandwidth, resulting in network congestion when I-frame bursts occur during video
transmission. To resolve this problem, Huawei has proposed an region of interest (ROI)-based encoding technology to
increase the compression ratio of image backgrounds. This helps reduce the average bit rate of video streams.
Furthermore, the I-frame optimization technology helps reduce the bandwidth required for video transmission during
peak hours, to prevent network congestion. After the optimization, the maximum number of cameras that can be
connected to a single 5G base station has increased by two to three times, and 5G base station coverage has increased
by two to three times as well, significantly improving the resource utilization of 5G base stations.
User Datagram Protocol (UDP)-based reliable transmission, ensuring smooth, efficient video transmission
To prevent packet loss and bit errors during wireless transmission, Huawei has adopted UDP and the dynamic
optimization policy, to ensure smooth video transmission even when packet loss occurs.
Packet loss rate within 10% Clear, smooth video
Image encoding and transmission optimization technologies ensure smooth video transmission even when the
packet loss rate reaches 10%
Huawei 5G Camera Models
M2281-10-QLI-W5 M6781-10-GZ40-W5 X7341-10-HMI-W5
Supports n78, n79, and n41 frequency bands and standalone (SA)/non-standalone (NSA)
Flexible deployment
hybrid networking
Built-in integrated antenna, intelligent encoding and transmission optimization for
Large-scale access
5G New Radio (NR), ensuring large-scale access of 5G cameras
Professional-grade artificial intelligence (AI) chips and dedicated software-defined camera
AI-powered innovation (SDC) operating system (OS), supporting a wide range of intelligent functions such as
person analysis, crowd flow analysis, and vehicle analysis; support for long-tail algorithms
15AI
Image, Algorithm, and Storage Trends Led by AI 17
Discussion on Frontend Intelligence Trends 24
Discussion on Development Trends Among
28
Intelligent Video and Image Cloud Platforms
Chip Evolution and Development 32
Algorithm Repository Technology 36
SuperColor Technology 42
Video Codec Technology 47
02
Storage EC Technology 52
Multi-Lens Synergy Technology 56
Products and Solutions Catalog 60Ge Xinyu, Zhang Yingjun
Image, Algorithm, and Storage Trends Led by AI
Ge Xinyu, Zhang Yingjun
1. AI+Video Future Prospects
The rapid development of AI is driving considerable growth within the global video analysis industry
In recent years, the fast development of deep learning technology has driven the rapid growth of the overall video analysis
industry. According to statistics, from 2018 to 2023, the compound annual growth rate (CAGR) of the video analysis product
market is predicted to reach 37.1%. Additionally, the proportion of intelligent cameras powered by deep learning is expected to
increase from 5% to 66%.
Video analysis applications Proportion of intelligent cameras shipped with
deep learning analytics and rules based analytics
100%
S 0.38bn
90%
S
S
80%
S
70%
2018 global 60%
revenue 50%
40%
% 37.1% 30%
20%
10%
2018-2023 CAGR
0%
2018 2019 2020 2021 2022 2023
66.4% 63.6% 42.9% 34.4% 26.1% 22.3%
2018 2019 2020 2021 2022 2023
Rules Based Deep Learning Based
YOY revenue growth
Data source: IHS MarKit 2019
AI has become a core enabler of digital transformation across industries
As artificial intelligence (AI) technology matures and an intelligent society develops, AI is being used in a wide range of
industries. Currently, the transportation industry is using AI+video to achieve the efficacy of traffic management. In the
future, AI+video will gradually be embedded in more sectors, such as government, finance, energy, and education.
Transport networks can use AI to: Recognize key people and vehicles, thereby improving traffic
safety governance in urban areas; realize refined management of urban traffic and promote
Transportation smooth traffic optimization based on precise data.
Governments can use AI to: Improve their administrative efficiency by informatizing infrastructure;
improve the intelligence of various application systems; enhance information awareness, analysis, and
Government
processing capabilities by analyzing massive video data.
Banks can use AI to: Turn their focus from improving service efficiency to enhancing marketing, improving
the intelligence of unstaffed bank branches, and accelerating the reconstruction of smart branches.
Finance
Energy companies can use AI to: Realize visualized exploration and development, and construct intelligent
pipelines and gas stations.
Energy
Educational institutions can use AI to: Establish uniform systems across countries/regions;
promote intelligent education; establish intelligent education demonstration areas; and drive
Education education networking.
17AI/Image, Algorithm, and Storage Trends Led by AI
2. To Achieve AI Development, an Image Quality Assessment Standard is
Needed for Intelligent Cameras
Why is it necessary to have an image quality assessment standard?
The rapid development of AI in recent years has revolutionized the public safety industry. In the past, video needed to be
watched by people, but now, machines also play an important role in viewing and analyzing video. However, the current
technical standards do not reflect the true capabilities of today’s video surveillance technologies.
Machines are capable of conducting a wide range of recognition tasks, including recognizing objects such as
pedestrians, cyclists, and vehicles. To improve the recognition accuracy of AI algorithms, high-quality video is needed.
......
Pedestrians Cyclists Vehicles
All-scenario and all-weather coverage: New intelligent applications pose higher requirements on full-color imaging in
low light conditions, and this is now a trend within the industry. For example, person re-identification (ReID) requires
cameras to accurately capture the color of the surroundings and the gait details of people. Against this backdrop,
infrared multi-spectral light compensation technology has been proposed, which enables cameras to perform better
in low light conditions, and do so in an environmental-friendly way.
Re-I
D
ReID technology Full-color imaging in low light conditions
AI and image enhancement technologies have developed rapidly. Technologies such as AI noise reduction use global and
local optimization methods to improve image quality. They focus on optimizing image quality for targets such as license
plates, which greatly enhances the accuracy of image recognition. However, the industry still lacks a complete and
objective image assessment standard.
The status quo of image quality assessment standards
The current Chinese national standard GA/T 1127–2013 General technical requirements for cameras used in security video
surveillance mainly lists requirements for camera network access and manual video viewing. According to the traditional
assessment method, experienced workers grade images subjectively, but this method cannot be used in machine
assessment. Now that AI is enabling image assessment to become increasingly objective, an objective image assessment
standard needs to be formulated.
18Ge Xinyu, Zhang Yingjun
No Reference Metric (NORM) (2017 to now)
Audiovisual HD Quality (AVHD) (2012 to now)
GA/T 1356-2018 Specifications for compliance tests with national standard GB/T 25724-2017
GA/T 1127-2013 General technical requirements for cameras used in security video surveillance
Recommendation ITU-R BT.500-13 (2012), Methodology for the subjective assessment of the
quality of television pictures
GB 50198-2011 Technical code for project of civil closed circuit monitoring television system
Recommendation ITU-T J.341 (2011), Objective perceptual multimedia video quality measurement of
HDTV for digital cable television in the presence of a full reference
Recommendation ITU-T J.341 (2011), Objective multimedia video quality measurement of HDTV
for digital cable television in the presence of a reduced reference signal
1997 1998 2000 2002 2003 2007 2009 2010 2011 2012 2013 2018 2019
HDTV Phase I (2010), Full References (FR) and Reduced Reference (RR) objective
video quality models that predict the quality of high definition television QART
(Quality Assessment for Recognition Tasks) (2010)
RRNR-TV (2009), Reduced Reference (RR) and No References (NR) objective video quality
models that predict the quality of standard definition television
Recommendation ITU-R BT.500-12 (2009), Methodology for the subjective
assessment of the quality of television pictures
Recommendation ITU-R BT.1788 (2007), Methodology for the subjective assessment of
video quality in multimedia applications
FRTV Phase II (2003), Full References (FR) objective video quality models that predict the
quality of standard definition television
Recommendation ITU-R BT.500-11 (2002), Methodology for the subjective assessment of the
quality of television pictures
FRTV Phase I (2000), Full References (FR) objective video quality models that predict the
quality of standard definition television
GYT 134 (1998), The method for the subjective assessment of the quality of digital television picture
Recommendation ITU-R BT.500-7 (1997), Methodology for the subjective assessment of the
quality of television pictures
Key issues relating to the formulation of a new standard
There are five key issues to consider when developing an image quality assessment system for intelligent cameras.
Objectivity of camera When humans judge imaging quality using their eyes, their assessment is subjective. An objective
imaging quality quality assessment model would be based on existing full-reference, semi-reference, or
assessment no-reference models within the industry.
Consistency of
The assessment result arrived at by intelligent vision must be consistent with the subjective
assessment result and
perception. This is a key factor that any standard system must promote and recognize.
subjective perception
Currently, the image quality indicators of cameras are mainly evaluated using test cards and
Identity of assessment software or by manual judgment. This is different from the actual scenarios where these cameras
scenario and real would be used, which involve moving objects like people and vehicles. In addition, infrared
environment multi-spectral light compensation technology is widely used in actual scenarios. Therefore, the
spectral characteristics of the target must be consistent.
Concordance of
Currently, the image quality indicators of cameras are tested separately, and the relationship and
assessment indicators
weight of indicators for different intelligent tasks are not considered.
and actual effect
Repeatability of
Different assessors should get the same result regardless of time or place.
assessment methods
Thoughts and suggestions on the design of a standard system
The assessment indicators should be associated with user scenarios and reflect practicability of the service.
The assessment dimensions should include the user task type, user scenario type, and basic factor of image
customer assessment.
Score weighting should be decided based on each user task and scenario to calculate the overall score.
19AI/Image, Algorithm, and Storage Trends Led by AI
Indicator system for the image quality
assessment for intelligent cameras
Overall score
Calculation
计算函数f(x) function f(x)
Recognition task 1 Recognition task 2 Recognition task 3 Aggregate scores by
user task weight
Score Score Score
... .....
Calculation function f(x)
计算函数f(x)
Daytime Nighttime
Even illumination Light raking
in the daytime in the daytime Low light at night Rain and fog
Aggregate scores
Score Score Score Score
by user scenario
weight
Backlight in the daytime Low light with glare Rain and snow
Score Score Score
... .....
Calculation
计算函数f(x) function f(x)
Objective quality factors of a Objective quality factors
single frame in the spatial domain: in the temporal domain: Basic image
indicator factor
Definition Color reproduction Stability
Texture detail Color sensitivity Frame rate
Noise Color saturation
Contrast Exposure quality
Geometric distortion
3. Service Development Requirements for AI Algorithms and Future Evolution
Evolution from traditional single-object analysis to multi-object associative recognition
The traditional single-object recognition method cannot accurately recognize or analyze occluded objects. Instead,
multiple algorithms must be integrated to improve recognition efficiency, which has become a key service
requirement and future direction for algorithm evolution.
...
Person recognition Behavior recognition Gait recognition License plate recognition
Multi-algorithm integration
20Ge Xinyu, Zhang Yingjun
Evolution from traditional service closed-loop in a single area to comprehensive security protection
Social and transportation development facilitates provincial and national population mobility. Therefore, the traditional
service, with a closed-loop in a single area, cannot meet the requirements of comprehensive security protection which
is gradually developing towards cross-region intelligent management.
Airports Railway/Subway stations
Bus stations/Bus stops Pedestrian zones/Areas
Comprehensive intelligence across all scenarios: Implement closed-loop video surveillance for key areas such as city's
entrances, railway stations, subway stations, bus stations, airports, pedestrian zones, urban-rural intersections, street
communities, and agricultural trade markets.
Full awareness of people and vehicles within a residential community: Collect and update data for people and
vehicles entering and leaving residential communities every day in real time; quickly, and accurately recognize objects.
Multi-dimensional data collision and analysis: Align vast quantities of video and image data with multi-dimensional
social data such as travel data, to better analyze people.
4. Storage Requirements of AI Development
The status quo of video and image storage
To improve recognition accuracy, AI algorithms pose higher requirements on the image quality of cameras (including
definition and resolution). In smart cities and intelligent transportation systems, HD cameras are widely deployed,
and this requires considerable storage space for video and images. As a result, storage duration and coverage areas
increase, which can lead to a range of problems such as a limited equipment room footprint, high power
consumption, and maintenance difficulties.
In a medium-sized city
Limited equipment
room footprint
40+ cabinets;
Video resolution Storage duration Coverage area line reconstruction
Maintenance
High power
4K 90 days All areas consumption
difficulties
Component/Node/
440+ kW
Site faults
1080p 30 days Key areas
Customers' primary concern is how to improve storage space utilization and reduce equipment room footprint, storage
deployment costs, power consumption, and total cost of ownership (TCO).
21AI/Image, Algorithm, and Storage Trends Led by AI
Future trends
High-density storage: more storage media per unit
Video compression: Deep video compression enables better utilization of storage space. For example, region of
interest (ROI) compression technology separates and extracts ROIs from the background to reduce video bit rate
and storage space without decreasing the ROI detection rate.
Pixel-level
image
segmentation
Motor Motor
机动车
vehicle 机动车
vehicle
Bit rate before Bit rate after
compression: 2642 kbit/s compression: 551 kbit/s
In smart cities and intelligent transportation systems, video streams are mainly used to conduct AI analysis of people and
vehicles. A balance needs to be struck between lowering storage costs and ensuring the accuracy of this analysis.
5. Trends
The core objective of AI is to turn the physical world into metadata for analysis. However, in actual applications, a single
piece of metadata is generally useless. This requires frontend devices to go from uni-dimensional data collection to
multi-dimensional data awareness, and backend platforms to evolve from relying on image intelligence to data
intelligence. In this way, data can be fully associated and utilized for analysis and prediction.
Frontend devices: from uni-dimensional data collection to multi-dimensional data awareness
Department A Department B Department C
Aggregated data lake
Diversified awareness dimensions
and integrated device form
Person Phone Accommodation
Vehicle Relationship Travel
Multi-dimensional data awareness
Siloed systems where data is isolated
(+time/space/multi-modal) where data has converged
22Ge Xinyu, Zhang Yingjun
Backend platform: from image intelligence to data intelligence
Internet of
things (IoT)
data
Internet data
...... ......
Image intelligence: unforeseeable Data intelligence: foreseeable
23AI/Discussion on Frontend Intelligence Trends
Discussion on Frontend Intelligence Trends
Xu Tongjing
The aim of artificial intelligence (AI) is to train computers to see,
hear, and read like human beings. Current AI technologies are
mainly used to recognize images, speech, and text. Renowned
experimental psychologist D. G. Treichler proposes that 83% of the
information we obtain from the world around us is through our
vision. Therefore, over 50% of AI applications nowadays are
related to intelligent vision, and around 65% of industry 83%
digitalization information comes from intelligent vision. In 11%
addition, to bridge the physical and digital worlds, all things must 3.5%
be sensing. The type, quantity, and quality of data collected by
1%
frontend sensing devices determine the intelligence level.
1.5%
1. Five Advantages of
Frontend Intelligence
Superior imaging quality with ultimate computing power
Intelligent cameras, as sensing devices in the intelligent vision sector, were introduced around five years ago. Different from traditional
IP cameras (IPCs), intelligent cameras can adapt to challenging environments and collect video data of a higher quality. However, due to
immature algorithms and chips, intelligent cameras cannot provide sharp, HD-quality images in harsh weather conditions such as during
rain, sandstorms, and on overcast days. In addition, factors such as poor installation angle, occlusion, low light, and low resolution may
also lead to inaccurate object recognition. If the imaging quality cannot be guaranteed, intelligence will remain an unachievable mirage.
Intelligent image quality adjustment
With AI algorithms, intelligent cameras can automatically adjust image signal processing (ISP) parameters such as shutter speed, aperture,
and exposure according to the ambient lighting and object speed, deliver optimal images for further detection and recognition, and
associate face images with personal data.
24Xu Tongjing
Applicable to varied scenarios
Intelligent vision systems are increasingly expected to satisfy the needs of various industries for various intelligent applications at various
times and in various scenarios. For example, cameras must be able to detect vehicle queue length and accidents in the daytime and detect
parking violations at night or load different algorithms at different preset positions.
Thanks to frontend intelligence, customers can load their desired algorithms on intelligent cameras to satisfy their personalized or
scenario-specific requirements. This also helps reduce risk exposure in the delivery of diversified algorithms. In addition, lightweight
container technology is used to construct an integrated multi-algorithm framework. This enables each algorithm to operate
independently, ensuring service continuity during algorithm upgrade and switchover. Customers can also flexibly choose their desired
intelligent capabilities to adapt to specific application scenarios.
Radar Radar
Vehicle Intelligent
feature Intelligent
extraction camera camera
Vehicle capture
Gantry Gantry
Optimal computing efficiency
Video plays an essential role in some key industries such as
social governance and transportation. However, the traditional Computing
video surveillance market tends to be saturated and cannot efficiency
100%
satisfy digital transformation across industries. Thanks to
ultimate computing power, a lot of intelligent applications are
now possible. Compared with backend intelligence, frontend
intelligence improves computing efficiency by 30% to 60%.
With frontend intelligence, each camera processes only one video
channel at the frontend, which poses lower requirements on
computing power, and directly obtains raw data for analysis, further
reducing computational requirements and enhancing processing
efficiency. Frontend intelligence also enables cameras to deliver
high-quality images to the backend, so the backend platform can 0
focus on intelligent analysis while focusing less on secondary image Backend intelligence Frontend intelligence
decoding. With the same computing power, image analysis is
roughly 10 times more efficient than video analysis. Moving
intelligence to the frontend can maximize the value of intelligent
applications for customers with limited resources.
System linkage within milliseconds
In many industries, such as transportation and emergency response,
fast response and closed-loop management are the basic and also the
Intelligent camera most critical requirements of services. Frontend intelligence enables
cameras to analyze video in real time and to immediately link related
Millimeter-wave
radar service systems upon detecting objects that trigger behavior analysis
rules, in locations such as airports and high-speed rail stations.
In road traffic scenarios, cameras need to link external devices such as
illuminators, radar detectors, and traffic signal detectors within
milliseconds. For example, cameras need to work with illuminators to
provide enhanced lighting for specific areas at the right moment or
periodically synchronize with traffic signal detectors to accurately detect
Collision Motor vehicles, traffic incidents. In other linkage scenarios, for example, linkage
warning upon non-motorized vehicles,
between radar detectors and PTZ dome cameras or between barrier
lane change and pedestrians appear
simultaneously gates/swing gates and cameras, frontend intelligence can dramatically
improve the system response efficiency and ensure quick service closure.
25AI/Discussion on Frontend Intelligence Trends
Improved engineering efficiency
To apply intelligent applications on a large scale, engineering issues must be considered. A top concern for engineering vendors is
upgrading and reconstructing the live network using existing investments and at the lowest cost. The prevalence of intelligent cameras
(including common cameras with inclusive AI computing power), where intelligent algorithms can be dynamically loaded, can
dramatically improve the frontend data collection quality, enhance the intelligent analysis efficiency by 10-fold and intelligent
application availability by several-fold, and lower the total cost of ownership (TCO) by over 50%.
Intelligent analysis efficiency Intelligent application availability TCO reduced by over 50%
improved by 10-fold improved by several-fold
100% 100% 100%
0 0 0
Backend Frontend Backend Frontend Backend Frontend
intelligence intelligence intelligence intelligence intelligence intelligence
In addition, frontend intelligence enables a camera to run multiple algorithms concurrently. For example, an intelligent camera can
simultaneously load multiple algorithms such as traffic violation detection, vehicle capture and recognition, and traffic flow statistics,
while multiple devices were required to support these functions in the past. This sharply lowers the engineering implementation
difficulty and improves the engineering efficiency.
2. Key Factors for Implementing Frontend Intelligence
In terms of product technologies, intelligent cameras must be equipped with AI main control chips and
intelligent operating systems to implement frontend intelligence.
The most basic functionality of a camera is to shoot HD video around the clock, and HD and sharp images are the most basic
requirements for computer vision. Computing power is required to optimize images to improve the intelligent recognition rate. In
scenarios where intelligent services require high real-time performance, ultimate computing power is required to meet real-time data
awareness, computing, and response requirements.
26Xu Tongjing
Computing power is the foundation of intelligent capabilities, while professional AI chips give a huge boost to computing power.
Accelerated by dedicated hardware, these AI chips support tera-scale computing and visual processing based on deep learning on a
neural network. To support frontend intelligence, cameras must be equipped with professional AI chips.
Customers require cameras with different hardware forms and software with different
capabilities depending on the usage scenario. Currently, most cameras are designed for
specific scenarios, but their software and hardware are closely coupled. If software can be
decoupled from hardware, users can install desired algorithms on cameras just like installing
apps on smartphones. This maximizes the value of hardware, saves overall costs, and improves
user experience. To decouple software from hardware, an open and intelligent operating
system is required. With the intelligent operating system, differences between bottom-layer
hardware are no longer obstacles. After the computing and orchestration capabilities of
bottom-layer hardware devices are invoked, they are uniformly encapsulated by the operating
system. This significantly simplifies development and allows developers to focus solely on the
software's functional capabilities. In addition, the lightweight container is used to construct an
Intelligent operating system
integrated multi-algorithm framework, where each algorithm runs independently in a virtual
space, allowing independent loading and online upgrading. In summary, an intelligent camera
operating system is the basis of frontend intelligence.
From the perspective of application ecosystems, frontend intelligence requires a future-proof algorithm and
hardware ecosystem to boost industry digital transformation.
In the mobile Internet sector, the app market provides an overwhelming number of apps. Users can download and install desired apps
on their smartphones. In the intelligent video sector, the burning question is: How can we aggregate excellent ecosystem partners to
provide superior algorithms and applications to meet customers' fragmented and long-tail requirements? To address this issue, the
intelligent algorithm platform was developed, which aggregates ecosystem partners in the intelligent vision sector to provide intelligent
video/image applications for a range of industries. The platform protects developers' rights and interests through license files and
verification mechanisms and also allows users to easily choose from a range of reliable intelligent algorithms. In addition, intelligent
cameras can connect to a range of hardware sensors in wired or wireless mode to help build a multi-dimensional awareness ecosystem.
With a rich ecosystem, a large number of long-tail algorithms dedicated to specific industries can be quickly released to meet the
requirements of various scenarios.
The industry has reached a consensus on frontend intelligence and related standards. Mainstream vendors and users in the industry are
actively embracing frontend intelligence. Vendors in the industry have launched products such as software-defined cameras and
scenario-specific intelligent cameras. The industry ecosystem is thriving.
Intelligent awareness can help collect multi-dimensional data, dramatically improve the data collection quality, and unleash the value of
mass video data while reducing computing power required for backend data processing and the overall TCO. In addition, distributed
processing significantly improves system reliability.
27You can also read