Photonic Matrix Computing: From Fundamentals to Applications

Page created by Andre Ellis
 
CONTINUE READING
Photonic Matrix Computing: From Fundamentals to Applications
nanomaterials

Review
Photonic Matrix Computing: From Fundamentals
to Applications
Junwei Cheng, Hailong Zhou and Jianji Dong *

 Wuhan National Laboratory for Optoelectronics, Huazhong University of Science and Technology,
 Wuhan 430074, China; jwcheng@hust.edu.cn (J.C.); hailongzhou@hust.edu.cn (H.Z.)
 * Correspondence: jjdong@hust.edu.cn

 Abstract: In emerging artificial intelligence applications, massive matrix operations require high
 computing speed and energy efficiency. Optical computing can realize high-speed parallel infor-
 mation processing with ultra-low energy consumption on photonic integrated platforms or in free
 space, which can well meet these domain-specific demands. In this review, we firstly introduce
 the principles of photonic matrix computing implemented by three mainstream schemes, and then
 review the research progress of optical neural networks (ONNs) based on photonic matrix computing.
 In addition, we discuss the advantages of optical computing architectures over electronic processors
 as well as current challenges of optical computing and highlight some promising prospects for the
 future development.

 Keywords: photonic matrix computing; photonic accelerators; artificial intelligence; optical neural
 networks; photonic integrated platform; diffractive planes

 
 

Citation: Cheng, J.; Zhou, H.; Dong, 1. Introduction
J. Photonic Matrix Computing: From With the proliferation of artificial intelligence and the next-generation communica-
Fundamentals to Applications. tion technology, the growing demand for high-performance computing has driven the
Nanomaterials 2021, 11, 1683. https://
 development of custom hardware to accelerate this specific category of computing. How-
doi.org/10.3390/nano11071683
 ever, processors based on electronic hardware have hit the bottleneck of unsustainable
 performance growth as the exponential scaling of electronic transistors reaches the physical
Academic Editors: Carlos Alberto
 limit revealed by Moore’s law [1]. Photonic processors compute with photons instead of
Alonso-Ramos and Linjie Zhou
 electrons, and therefore optical computing can dramatically accelerate computing speed by
 overcoming the inherent limitations of electronics. Unlike electrical circuit technologies,
Received: 29 May 2021
Accepted: 24 June 2021
 photonic circuits have some extraordinary properties such as ultra-wide bandwidth, high
Published: 26 June 2021
 frequency, and low energy consumption. Furthermore, light has several dimensions such
 as wavelength, polarization, and spatial mode to enable parallel data processing, result-
Publisher’s Note: MDPI stays neutral
 ing in remarkable acceleration against a conventional von Neumann computer, which
with regard to jurisdictional claims in
 makes the optical computing approach a viable and competitive candidate for artificial
published maps and institutional affil- intelligence accelerators.
iations. The matrix–vector multiplication (MVM) operation is one of the fundamental math-
 ematical operations widely used in large-scale neuromorphic optoelectronic computing.
 The weighted interconnections between adjacent photonic neurons in ONNs can be math-
 ematically represented by a matrix whose entries are the weight values, and each entry
Copyright: © 2021 by the authors.
 vector is multiplied by the input signal of a particular synapse [2], which matches the math-
Licensee MDPI, Basel, Switzerland.
 ematical nature of MVM. To some extent, neuromorphic engineering is an attempt to move
This article is an open access article
 computational processes of artificial intelligence algorithms to specific hardware, enabling
distributed under the terms and functions that are difficult to realize with conventional computing hardware. For example,
conditions of the Creative Commons linear optical elements can calculate convolutions, Fourier transforms, random projections,
Attribution (CC BY) license (https:// and many other operations through light–matter interaction or light propagation [3–6].
creativecommons.org/licenses/by/ There has been rapid progress in research on photonic matrix computing, and different
4.0/). photonic devices have been successfully used to implement matrix operations. Optical

Nanomaterials 2021, 11, 1683. https://doi.org/10.3390/nano11071683 https://www.mdpi.com/journal/nanomaterials
Photonic Matrix Computing: From Fundamentals to Applications
Nanomaterials 2021, 11, 1683 2 of 15
Nanomaterials 2021, 11, x FOR PEER REVIEW 2 of 15

 modulator
 There hasarrays, such progress
 been rapid as electro-optic modulated
 in research on photonicdirect-driven LED arraysand
 matrix computing, anddiffer-
 acousto-
 entoptic Bragg
 photonic devices,
 devices havecanbeen
 perform matrix used
 successfully calculations
 to implementat a much faster
 matrix rate than
 operations. existing
 Optical
 electronic devices [7–9]. The optical implementation of convolutional
 modulator arrays, such as electro-optic modulated direct-driven LED arrays and acousto- neural networks
 with
 optic fast operation
 Bragg devices, can speed and high
 perform energy
 matrix efficiencyat
 calculations is aappealing owing
 much faster ratetothan
 its outstanding
 existing
 capability of feature extraction [10]. In particular, convolutional
 electronic devices [7–9]. The optical implementation of convolutional neural networks processing based on MVM,
 which is a computationally intensive operation in electronics, occupies
 with fast operation speed and high energy efficiency is appealing owing to its outstanding over 80% of the total
 processing time in convolutional neural networks [11], therefore computational acceleration
 capability of feature extraction [10]. In particular, convolutional processing based on
 for convolutional neural networks can be achieved by matching hardware and MVM
 MVM, which is a computationally intensive operation in electronics, occupies over 80%
 operations. The MVM operation can be mathematically described as
 of the total processing time in convolutional neural networks [11], therefore computa-
 tional acceleration for convolutional 
 wneural networks . . . can be achieved
 
 x1 by matching hard-
 
 11 w12 w1N
 ware and MVM operations. TheMVM operation can be mathematically described as
 w21 w22 … . . . w 2N  x2 
  
 Y = WX =   ... (1)
 . . . … . . . . . .  . . . 
 = = …w …w … .…. . w…NN xN (1)
 N1 N2
 … 
 where X isXthe input vector, W is Wtheismatrix, and and = Y, =, …[y, , y , .is the output
 T vector.
 where is the input vector, the matrix, 1 2 . . , y N ] is the output
 Optical MVMs can be implemented by three mainstream
 vector. Optical MVMs can be implemented by three mainstream optical methods, optical methods, as shown in as
 Figure 1, including the multiple plane light conversion (MPLC)
 shown in Figure 1, including the multiple plane light conversion (MPLC) method, the method, the wavelength
 division multiplexing
 wavelength (WDM) method,
 division multiplexing (WDM) andmethod,
 the Mach–Zehnder interferometer
 and the Mach–Zehnder (MZI)
 interferometer
 method.
 (MZI) method.

 Figure
 Figure 1. Three
 1. Three categories
 categories of optical
 of optical MVM,
 MVM, including
 including (a)(a) MPLC-MVM
 MPLC-MVM method,
 method, (b)(b) WDM-MVM
 WDM-MVM
 method, and (c) MZI-MVM
 method, and (c) MZI-MVM method.method.

 In this
 In this review,
 review, we firstly
 we firstly survey
 survey recent
 recent researches
 researches and progress
 and progress in optical
 in optical MVMsMVMs and
 and photonic
 photonic artificial
 artificial intelligence
 intelligence hardware.
 hardware. Afterwe
 After that, that, we describe
 describe the basic
 the basic principles
 principles of
 of optical
 optical MPLC-MVM,
 MPLC-MVM, WDM-MVM,
 WDM-MVM, and MZI-MVM
 and MZI-MVM methods.
 methods. Finally, Finally, we discuss
 we discuss the ad-the
 advantages
 vantages of optical
 of optical computing
 computing architectures
 architectures overover electronic
 electronic processors
 processors as well
 as well as as current
 current
 challenges of optical computing, and provide perspectives for further improvement of of
 challenges of optical computing, and provide perspectives for further improvement
 optical
 optical computing
 computing architectures.
 architectures.

 2. MPLC
 2. MPLC Matrix
 Matrix Core
 Core
 Unlike
 Unlike integrated
 integrated schemes
 schemes suchsuch
 as as microring
 microring andand MZI
 MZI matrix
 matrix cores,
 cores, thethe MPLC
 MPLC matrix
 matrix
 core builds computing capabilities directly above an optical field propagating
 core builds computing capabilities directly above an optical field propagating in free in free space.
 Among these three methods, MPLC was the first to be implemented in optical comput-
 space. Among these three methods, MPLC was the first to be implemented in optical com-
 ing [12], and the initially programmable MVM was finished with spatial optical elements [9].
 puting [12], and the initially programmable MVM was finished with spatial optical ele-
 The MPLC matrix core is the only one that can currently support super-large-scale matrix
 ments [9]. The MPLC matrix core is the only one that can currently support super-large-
 operation, which makes it valuable in pulse shaping [13], mode processing [14–16], and
 scale matrix operation, which makes it valuable in pulse shaping [13], mode processing
 machine learning [5,17–19].
 [14–16], and machine learning [5,17–19].
 The principle of the MPLC method is shown in Figure 2, and the architecture of the
 MPLC matrix core consists of a series of planes encoded with amplitude and phase infor-
Photonic Matrix Computing: From Fundamentals to Applications
Nanomaterials 2021, 11, x FOR PEER REVIEW 3 of 15

Nanomaterials 2021, 11, 1683 3 of 15

 The principle of the MPLC method is shown in Figure 2, and the architecture of the
 MPLC matrix core consists of a series of planes encoded with amplitude and phase infor-
 mation,and
 mation, andreflective
 reflectivemirrors
 mirrorsare
 are only
 only used
 used to to change
 change thethe direction
 direction of light
 of light propagation
 propagation to
 to reduce the spatial volume of the system. By configuring the parameters
 reduce the spatial volume of the system. By configuring the parameters of these planes, of these planes,
 thelight
 the lightbeam
 beamirradiated
 irradiatedon onthe
 theplane
 planesurface
 surfacecan
 canbe
 bemodulated
 modulatedtotochange
 changeamplitude
 amplitudeandand
 phase.The
 phase. Theincident
 incidentlight
 lightdiffracts
 diffractsin
 infree
 freespace
 spaceand
 andthen
 thenincident
 incidenttotothe
 thefirst
 firstplane,
 plane,after
 after
 that,ititdiffracts
 that, diffractsininfree
 freespace
 spaceand
 andthen
 thenincident
 incidentinto
 intothe
 thesecond
 secondplane,
 plane,and
 andsosoon.
 on.After
 After
 passing through all planes, the final optical field will be output and then be
 passing through all planes, the final optical field will be output and then be detected by thedetected by
 the photodetector
 photodetector array. array.

 Figure2.2.The
 Figure Theprinciple
 principleofofthe
 theMPLC
 MPLCmatrix
 matrixcore.
 core.The
 TheMPLC
 MPLCmatrix
 matrixcore
 coreisiscomposed
 composedofofaaseries
 seriesofof
 diffractive
 diffractiveplanes
 planesencoded
 encodedwith amplitude
 with amplitudeand phase
 and information.
 phase By By
 information. configuring the the
 configuring parameters
 parametersof
 of these
 these diffractive
 diffractive planes,
 planes, the beam
 the light light beam irradiated
 irradiated on theon thesurface
 plane plane surface can be modulated
 can be modulated to changeto
 change amplitude
 amplitude and phase.
 and phase.

 Thetransmission
 The transmissionmatrices
 matricesMM i (i = 1, 2, …, N, N
 i (i = 1, 2, . . . ,
 N ++ 1) are fixed
 fixed owing
 owingtotothe
 thepropagation
 propagation
 distance in free space being fixed. The transmission matrix of each plane is aadiagonal
 distance in free space being fixed. The transmission matrix of each plane is diagonal
 matrix, denoted as A (i = 1,
 matrix, denoted as Ai (i = 1, 2, . . . ,
 i 2, …, N). Therefore, the transmission matrix T of
 Therefore, the transmission matrix T of this MPLC this MPLC
 systemcan
 system canbebeexpressed
 expressedas as
 T = MN+1AN...M2A1M1. (2)
 The arrangement of pixelsTin= each MN+1plane
 AN . . relates
 . M2 A1to M1the
 . matrix dimensions. For exam- (2)
 ple, if we assume that the pixels of each plane are p × q, then the dimensions of the matrices
 M i, A The arrangement of pixels in each plane relates to the matrix dimensions. For example,
 i, and T are all pq × pq. From Equation (2), the multiplane transformation can be math-
 ifematically
 we assume that the pixels
 expressed as the of eachproduct
 cross plane are of pa × q, then
 series the dimensions
 of fixed matrices andof the matrices
 configurable
 M i , Ai , and
 diagonal matrices. pq × pq. From
 T are allTheoretically, Equation
 arbitrary (2), the
 matrix multiplane
 operation can betransformation
 implemented can be
 as long
 mathematically expressed as the cross product of a series of fixed matrices and configurable
 as there are enough planes, and the number of planes required is approximately equal to
 diagonal matrices. Theoretically, arbitrary matrix operation can be implemented as long as
 2-fold the number of input modes, but in fact, only a few planes are needed to approxi-
 there are enough planes, and the number of planes required is approximately equal to 2-
 mately achieve the function of target transmission matrix [20].
 fold the number of input modes, but in fact, only a few planes are needed to approximately
 In practice, the transmission matrix T in Equation (2) is difficult to analyze or measure
 achieve the function of target transmission matrix [20].
 experimentally. In general, parameters of each phase plane can be obtained by iterating
 In practice, the transmission matrix T in Equation (2) is difficult to analyze or measure
 according to the input and target output. One commonly used method is the wavefront
 experimentally. In general, parameters of each phase plane can be obtained by iterating
 matching method [21], as shown in Figure 3, where the input optical field is ( , ), and
 according to the input and target output. One commonly used method is the wavefront
 after forward propagation, the distribution of the optical field in front of the mth phase
 matching method [21], as shown in Figure 3, where the input optical field is φ0 ( x, y), and
 plane is ( , ). On the other hand, ( , ), the distribution of the optical field after
 after forward propagation, the distribution of the optical field in front of the mth phase
 the mth phase plane, can be obtained by the backward propagation of the output optical
 plane is φm ( x, y). On the other hand, Ψm ( x, y), the distribution of the optical field after the
 fieldphase
 mth plane,
 ( , ). can
 Hence the theoretical
 be obtained by the phase
 backward distribution
 propagation of the
 of mth phase plane
 the output opticalshould
 field
 be the phase difference between the two fields, i.e., ( , ) and ( , ):
 Ψ N +1 ( x, y). Hence the theoretical phase distribution of∗ the mth phase plane should be the
 ( , ) = ( , ) ( , ) (3)
 phase difference between the two fields, i.e., φm ( x, y) and Ψm ( x, y) :
 where means the function to get the phase angle. If there are multiple mode pairs of
 ∗
 input and output, then the phase Φm ( x,value
 y) = Λ of[Ψthe
 m (plane
 x, y)φmshould
 ( x, y)]be its weighted phase value: (3)

 where Λ means the function
 ( ,
 to )
 get=the
 phase ( , ) If∗ ,there
 , angle. ( , )are multiple mode pairs(4)
 of
 input and output, then the phase value of the plane should be its weighted phase value:
 " #
 Φm ( x, y) = Λ ∑ Ψm,j (x, y)φm,j
 ∗
 ( x, y) (4)
 j
Photonic Matrix Computing: From Fundamentals to Applications
Nanomaterials 2021,11,
 Nanomaterials2021,
 2021, 11,x1683
 11, xFOR
 FORPEER
 PEERREVIEW
 REVIEW 44ofof1515

 wherej jjrepresents
 where
 where represents
 representsthethe j th set
 the setof
 set ofofinput–output
 input–outputmode
 input–output modepairs.
 mode pairs.By
 pairs. Bythe
 By themeans
 the meansof
 means ofiterating
 of iteratingback
 iterating back
 back
 and
 and forth
 forth to
 to update
 update the
 the parameters
 parameters ofof phase
 phase planes
 planes repeatedly,
 repeatedly, the
 the algorithm
 algorithm
 and forth to update the parameters of phase planes repeatedly, the algorithm will eventu- will
 will even-
 even-
 tually
 tually converge.
 converge.
 ally converge. Thekey
 The
 The keyadvantage
 key advantageof
 advantage ofofthe
 thewavefront
 the wavefrontmatching
 wavefront matchingmethod
 matching methodisisisthat
 method thatthe
 that theiterative
 the iterative
 iterative
 speed
 speed is very
 speedisisvery fast,
 veryfast, and
 fast,and
 andthethe whole
 thewhole plane
 wholeplane is updated
 planeisisupdated each
 updatedeach iterative
 eachiterative process,
 iterativeprocess,
 process,thusthus the
 thusthe conver-
 theconver-
 conver-
 gence
 genceisis
 gence exponential.Generally,
 isexponential.
 exponential. Generally,
 Generally,itit only
 itonly needs
 onlyneeds fewerthan
 needsfewer
 fewer than20
 than 20iterations
 20 iterationsback
 iterations backand
 back andforth
 and forthtoto
 forth to
 achieve
 achieve convergence.
 convergence.
 achieve convergence.

 Figure
 Figure3.3.
 Figure Schematic
 3.Schematic diagram
 diagramofof
 Schematicdiagram the
 ofthe wavefront
 thewavefront matchingmethod.
 wavefrontmatching
 matching method.
 method.

 Here,ininorder
 Here, ordertotovisually
 visuallydemonstrate
 demonstratethe theworking
 workingmechanism
 mechanismofofthe theMPLC
 MPLCmatrix
 matrix
 coreand
 core andverify
 verifyits
 itscapability
 capabilitytotoimplement
 implementlarge-scale
 large-scaleoptical
 opticalcomputing,
 computing,we wetheoretically
 theoretically
 demonstrate
 demonstrate aaaspecific
 demonstrate specificexample
 specific example
 example ofnumeric
 ofof numeric
 numeric holographic
 holographic
 holographic coding
 coding
 coding based
 basedbased on
 on theon theMPLC
 the
 MPLC MPLC
 scheme
 scheme
 scheme (Figure
 (Figure(Figure 4). We designed
 4). We designed
 4). We designed a two-digit
 a two-digit
 a two-digit seven-segment
 seven-segment
 seven-segment display
 displaydisplay
 with 14with with 14 segments
 14 segments
 segments in total,inin
 re-
 total,
 total, respectively
 spectively powered
 respectively powered by14
 by 14 different
 powered by 14different
 different Laguerre-Gaussian
 Laguerre-Gaussian (LG)The
 (LG) modes.
 Laguerre-Gaussian (LG) modes. Thetwo-digit
 two-digit
 modes. The two-digit
 numbers
 numbers
 range from
 numbers range
 range tofrom
 00 from 99 0000toto
 can be9999 canbe
 bedisplayed
 displayed
 can displayed andswitched
 and switched
 and switched
 by setting
 bybythe
 setting thecombination
 combination
 combination
 setting the of input
 ofof inputmodes,
 modes,
 input asmodes,
 shown asasinshown
 Figure
 shown inin4b.
 Figure4b.
 Figure 4b.

 Figure4.4.
 Figure
 Figure 4.The
 The numeric
 Thenumeric holographic
 numericholographic coding
 holographiccoding basedon
 codingbased
 based onthe
 on the MPLC
 theMPLC matrix
 MPLCmatrix core.
 matrixcore. (a)
 core.(a) This
 (a)This figure
 Thisfigure summarizes
 figuresummarizes
 summarizesthethe input-to-output
 theinput-to-output
 input-to-output
 mapping
 mapping relationships between the input LG modes and the output segments of numeric display.
 mapping relationships between the input LG modes and the output segments of numeric display. (b) of
 relationships between the input LG modes and the output segments of numeric display. (b)
 (b) Results
 Results ofthe
 theoptical-
 Results optical-
 of the
 powerednumeric
 powered numericdisplay
 displayfrom
 from0000toto99.
 99.
 optical-powered numeric display from 00 to 99.

 How
 How
 Howto toefficiently
 to efficientlydeal
 efficiently dealwith
 deal with
 with the
 the
 the increasing
 increasing
 increasing scale
 scale
 scale ofneural
 of of neural
 neural network
 network
 network computing
 computing
 computing re-
 re-
 remains
 mains
 mains aa significant
 significant problem
 problem toto be
 be solved.
 solved. Benefiting
 Benefiting from
 from the
 the parallelism
 parallelism
 a significant problem to be solved. Benefiting from the parallelism and minimal latency of and
 and minimal
 minimal
 latency
 latency
 opticalofofoptical
 optical
 systems systems
 insystems ininfree
 free space, free space,the
 thespace,
 MPLC theMPLC
 schemeMPLC scheme
 scheme
 has hasthe
 has
 the ability theimplement
 to abilitytotoimplement
 ability implement
 large-scale
 large-scale
 large-scale matrix
 matrix operation,
 operation, which
 which makes
 makes itit the
 the potential
 potential candidate
 candidate for
 for
 matrix operation, which makes it the potential candidate for large-scale neural networks. large-scale
 large-scale neural
 neural
 networks.
 networks. In
 In 2018,
 2018, Lin
 Lin etet al.
 al. introduced
 introduced the
 the diffraction
 diffraction ONNs
 ONNs framework
 framework
 In 2018, Lin et al. introduced the diffraction ONNs framework used for all-optical ma- used
 used for
 for all-op-
 all-op-
 tical
 tical
 chinemachine
 machine
 learning, learning,
 learning,
 i.e., D2 NN,i.e.,D
 i.e., D2NN,
 and
 2NN, and experimentally demonstrated the image classifica-
 and experimentally
 experimentally demonstrated
 demonstrated the imagethe classification
 image classifica-with
 tion
 tion with
 with Modified
 Modified National
 National Institute
 Institute ofof Standards
 Standards and
 and Technology
 Technology
 Modified National Institute of Standards and Technology (MNIST) handwritten (MNIST)
 (MNIST) handwritten
 handwritten
 digits
Photonic Matrix Computing: From Fundamentals to Applications
Nanomaterials2021,
Nanomaterials 2021,11,
 11,1683
 x FOR PEER REVIEW 5 5ofof1515

 digits
 and and Fashion-MNIST
 Fashion-MNIST datasetsdatasets
 [5]. The[5]. The optical
 optical D2 NN D
 2NN architecture shows great poten-
 architecture shows great potential for
 tial for machine
 machine learninglearning applications,
 applications, and it would and itbe would
 more be more complete
 complete if it included
 if it included an optical an
 optical nonlinear activation function [22]. The image information
 nonlinear activation function [22]. The image information is encoded in the amplitude is encoded in the ampli-
 tude
 or phaseor phase
 channels channels
 of theofinput
 the input
 optical optical
 fields, fields, and wave
 and wave propagation
 propagation in free
 in free space space
 can can
 be
 be mathematically
 mathematically described
 described by Kirchhoff’s
 by Kirchhoff’s diffraction
 diffraction integral,
 integral, which which
 amountsamounts to a con-
 to a convolu-
 volution
 tion operation
 operation of theofoptical
 the optical
 field field
 with with a trained
 a trained kernel. kernel.
 In this In work,
 this work, the training
 the training pro-
 process
 cess of ONNs is still completed by an electronic computer
 of ONNs is still completed by an electronic computer to update parameters, and each to update parameters, and each
 diffractivelayer
 diffractive layerisisfabricated
 fabricated by by 3D3D printing
 printing technology.
 technology. In 2021, Rahman et et al.
 al. applied
 applieda
 apruning
 pruningalgorithm
 algorithmtotofurther furtherimprove
 improvethe theimage
 image classification
 classification accuracy
 accuracy of of
 DD 2 2
 NNs NNs[23]. On
 [23].
 thethe
 On image
 image classification
 classification of the
 of theCIFAR-10
 CIFAR-10 dataset released
 dataset released by Canadian
 by Canadian Institute For For
 Institute Ad-
 vanced Research,
 Advanced Research, whose
 whosetest testimages
 imagesare aremore
 more complicated
 complicated than MNIST MNIST and and Fashion-
 Fashion-
 MNIST,the
 MNIST, theDD 2 NN
 2NN architecture combined with the pruning algorithm provides an infer-
 architecture combined with the pruning algorithm provides an inference
 ence improvement
 improvement of more of more
 than 16%thancompared
 16% compared to thetoaverage
 the average performance.
 performance. In neuromor-
 In neuromorphic
 phic optoelectronic
 optoelectronic computing,computing, the functionality
 the functionality of input ofnodes
 input andnodes and neurons
 output output neurons
 is imple-is
 mented with programmable
 implemented with programmable diffractive opticaloptical
 diffractive devices, such assuch
 devices, the digital micromirror
 as the digital micro-
 device
 mirror(DMD)
 device and (DMD) spatial
 andlight
 spatialmodulator (SLM). The
 light modulator (SLM).DMD Theprovides a high optical
 DMD provides a high con-opti-
 trast for encoding
 cal contrast information.
 for encoding It allowsIt encoding
 information. allows encodingthe binarized data into
 the binarized theinto
 data amplitude
 the am-
 of coherent
 plitude optical fields,
 of coherent opticalwhere
 fields, awherephasea SLMphasesubsequently
 SLM subsequently modulates their phase
 modulates dis-
 their phase
 tribution to realize the diffractive modulation. Zhou et al.
 distribution to realize the diffractive modulation. Zhou et al. proposed an optoelectronic proposed an optoelectronic
 fused
 fusedcomputing
 computingarchitecture
 architecturewith withaareconfigurable
 reconfigurablediffractive
 diffractiveprocessing
 processingunit, unit,which
 whichcan can
 support
 supportdifferent
 different neural networks,
 networks,and andachieved
 achievedexcellent
 excellentexperimental
 experimental accuracies
 accuracies forfor
 im-
 image
 age and andvideo
 videorecognition
 recognitionover overbenchmark
 benchmarkdata data sets
 sets [19].
 [19]. ItIt can
 can bebe seen
 seen that
 thatthe theMPLC
 MPLC
 scheme
 schemeisisofofgreat greatpotential
 potentialtotonarrownarrowthe thegapgapandandsurpass
 surpassininclassification
 classificationperformance
 performance
 between optical computing architecture and state-of-the-art
 between optical computing architecture and state-of-the-art electronic computers. electronic computers.
 Similar
 Similartotothe theMPLC
 MPLCmethodmethoddemonstrated
 demonstratedininfree freespace,
 space,the theMPLC
 MPLCmatrixmatrixcore corecan
 can
 also
 also be realized on a chip, whose structure is shown in Figure 5. The integratedMPLC
 be realized on a chip, whose structure is shown in Figure 5. The integrated MPLC
 matrix
 matrixcorecoreisiscomposed
 composedofofmultiple multiplelayers,
 layers,including
 includingalternately
 alternatelyarranged
 arrangedtunable
 tunablelayers
 layers
 and unitary diffractive layers, to implement multichannel
 and unitary diffractive layers, to implement multichannel transformation described by transformation described bya
 aunitary
 unitarymatrix.
 matrix.The The tunable layer can be achieved by independent
 tunable layer can be achieved by independent phase shifters or time phase shifters or
 time delay units, corresponding a unitary transfer matrix with
 delay units, corresponding a unitary transfer matrix with diagonal form. The unitary dif- diagonal form. The unitary
 diffractive layers
 fractive layers describe
 describe thethe interaction
 interaction between
 between channels,
 channels, which
 which cancanbebe implemented
 implemented by
 by multimode interference (MMI) couplers or a region of
 multimode interference (MMI) couplers or a region of coupled waveguides; therefore, coupled waveguides; therefore,
 the
 the transfer matrices of unitary diffractive layers are static because the structure of MMI
 transfer matrices of unitary diffractive layers are static because the structure of MMI cou-
 couplers and coupled waveguides cannot be changed once fabricated.
 plers and coupled waveguides cannot be changed once fabricated.

 Figure 5. The structure of the integrated MPLC matrix core.
 Figure 5. The structure of the integrated MPLC matrix core.
Photonic Matrix Computing: From Fundamentals to Applications
Nanomaterials 2021, 11, 1683 6 of 15

 In 2017, Tang et al. proposed a novel integrated architecture consisting of cascaded
 MMI couplers and phase shifter arrays [24] to realize an n × n unitary transfer matrix. The
 n × n unitary transfer matrix can be decomposed as M stages of unitary matrices with only
 diagonal elements (i.e., the transfer matrices of phase shifter arrays) and another M − 1
 stage of unitary matrices (i.e., the transfer matrices of MMI) when the path-dependent
 coupling loss is not considered. After that, Saygin et al. made an in-depth analysis of this
 integrated MPLC architecture, in which the flexibility and robustness of this architecture
 have been thoroughly discussed. The transfer matrices of the static MMI blocks can be
 randomly chosen from a continuous class of unitary matrices without sacrificing the quality
 of approximation for the target unitary transformation, making this scheme insensitive
 to errors [25]. The integrated MPLC matrix core provides an alternative viable solution
 to decompose large unitary matrices into small ones with high flexibility and robustness,
 and thus optical diffractive neural networks are possible to build on a chip based on
 this method.

 3. Microring Matrix Core
 The MPLC scheme of photonic matrix computing was thoroughly discussed in
 Section 2, from which we find that the predominant advantage of MPLC schemes is the
 capability to implement a large-scale matrix operation at present. However, one major
 drawback cannot be ignored—the MPLC scheme is usually limited by bulky optical instru-
 ments, and hence they are difficult to highly integrate on a chip.
 The microring has a very compact structure and its radius can be as small as a few
 microns [26], which means the footprint of photonic devices can be greatly reduced, and
 thus the integration density can be competitive. The microring has been widely used in on-
 chip WDM systems [27–29], filtering systems [30–32], etc. In addition, a microring array can
 also be used in the operations of incoherent matrix computation since each microring can
 independently configure the transmission coefficient of a wavelength channel. Therefore,
 the microring matrix core is well suited to implement WDM-MVM operation.
 The implementations of matrix computation enabled by a microring array are shown
 in Figure 6. The N × N-sized microring array in the right region of Figure 6 corresponds to
 an N × N-sized matrix M, alternatively called a microring matrix core, and the input signal
 X can be represented by a vector with a length of N. The input signal X can be generated
 off-chip or on-chip. If it is generated off-chip, the microring array in the left region of
 Figure 6, i.e., the microring front module, only serves as a wavelength division multiplexer
 for combining multiple input wavelength channels. If it is generated on-chip, the microring
 front module is used to modulate the input vector X to different wavelengths. After that,
 the input signal is divided into multiple beams with equal power after passing through
 the beam splitter, and each beam is sent to a different row of the microring matrix core
 in the right region of Figure 6. Each row of the microring matrix core can independently
 configure the transmission coefficient of each wavelength channel, and the total power is
 finally detected from each output port by photodetectors.
 The add-drop microring structure [33] is widely applied in on-chip optical computing
 owing to the capability of difference processing. Since the power value is non-negative,
 early work only utilized the through port, then the transmission matrix and the output
 vector are non-negative, thus the matrix operation is limited in the non-negative number
 domain. However, fundamental mathematical operations such as matrix–vector multipli-
 cation and matrix–matrix multiplication are usually performed in the real number domain
 in practice. In order to extend the matrix operation to the full real number domain, the
 final results need to be obtained via the differential processing between the power values
 of the drop port and the through port; in this way, the transmission matrix and final output
 vector are both able to contain negative domain.
 The drop transmission coefficient of the microring situated at the ith row and the jth
 column of the microring matrix core is represented by mij , and the through transmission
 coefficient is therefore represented by 1 − mij without considering the loss of the microring.
Photonic Matrix Computing: From Fundamentals to Applications
n × n unitary transfer matrix can be decomposed as M stages of unitary matrices with only
 diagonal elements (i.e., the transfer matrices of phase shifter arrays) and another M − 1
 stage of unitary matrices (i.e., the transfer matrices of MMI) when the path-dependent
 coupling loss is not considered. After that, Saygin et al. made an in-depth analysis of this
 integrated MPLC architecture, in which the flexibility and robustness of this architecture
Nanomaterials 2021, 11, 1683 7 of 15
 have been thoroughly discussed. The transfer matrices of the static MMI blocks can be
 randomly chosen from a continuous class of unitary matrices without sacrificing the qual-
 ity of approximation for the target unitary transformation, making this scheme insensitive
 to errors
 Each [25]. The
 microring integrated
 situated at theMPLC matrix core
 same column merely provides an alternative
 modulates the opticalviable solution
 signal with ato
 decompose
 specific large unitary
 wavelength, hence the matrices
 outputinto of eachsmall row ones with high
 contains flexibility
 the power of N and robustness,
 different wave-
 and thus
 lengths, and optical diffractive neural
 N is numerically equal tonetworks
 the number are of possible
 microrings to build on a chip
 positioned based
 in the same onrow.
 this
 method.
 Based on the microring matrix core model consisting of add-drop microring structure, we
 assume that the input vector is X = [ x1 , x2 , . . . , x N ] T , thus the output power of drop ports
 3. Microring
 and Matrix
 through ports Corerow can be calculated by Equations (5) and (6), respectively.
 of each
 The MPLC scheme of photonic matrix computing was thoroughly discussed in Sec-
 N
 tion 2, from which we find that the predominant yi = ∑ mij x j advantage of MPLC schemes is the(5) ca-
 pability to implement a large-scale matrix j=1 operation at present. However, one major
 drawback cannot be ignored—the MPLC scheme is usually limited by bulky optical in-
 struments, and hence they are difficult to N highly integrate on a chip.

 The microring has a very compact yi = ∑ (1 − mij )and
 structure x j its radius can be as small as a (6) few
 j =1
 microns [26], which means the footprint of photonic devices can be greatly reduced, and
 thus
 the the integration
 output vector of dropdensity
 portscan Y1 be= [competitive.
 y1 , y2 , . . . , y NThe ] T can microring has been widely
 be mathematically usedasin
 described
 on-chip WDM systems [27–29],  filtering systems [30–32],  etc. In addition,
  a microring ar-
 ray can also be used in the operations m11 m12 of incoherent . . . m1N matrix computation x1 since each mi-
 croring can independently m21 m22
  configure . . . m2N coefficient
 the transmission  x2 ofa wavelength channel.
 Y1 =    (7)
 Therefore, the microringmatrix . . . core. .is. well .suited
 .. . .to. implement
  . . . WDM-MVM
  operation.
 The implementations ofmmatrix N1 mcomputation
 N2 . . . menabledNN by ax N
 microring array are shown
 in Figure 6. The N × N-sized microring array in the right region of Figure 6 corresponds
 Similarly, the output vector of through ports Y2 can be written as
 to an N × N-sized matrix M, alternatively called a microring matrix core, and the input
 signal X can be represented
 
 1 − m11by a1 − vector
 m12 with. .a. length 1 − mof N. The
  inputsignal X can be
 x1
 1N
 generated off-chipor on-chip.1 − m If1it−ismgenerated . . . off-chip,
 1 − m the microring
  x2 array in the left
 21 22 2N
 region of FigureY2 =6, 
 i.e., the. . microring (8)
 . . front module, only
 . . . serves
 as a wavelength
  division
 . . ...  . . . 
 multiplexer for combining 1−m multiple input wavelength . . . 1 − channels. If itxis generated on-chip,
 N1 1 − m N2 m NN N
 the microring front module is used to modulate the input vector X to different wave-
 lengths.
 then Afteroutput
 the final that, the inputY signal
 matrix can beiscalculated
 divided into by the multiple beamsprocessing
 differential with equalaspower after
 passing through the beam splitter, and each beam is sent to a different  row  of the mi-
 1 − 2m − 2m 1 − 2m
 
 croring matrix core in the right 11 1 of
 region Figure
 12 . . . row
 6. Each of 1N
 the microring x1 matrix core
 can independently  1 −the 21 1 − 2m22 coefficient
 2mtransmission . . . 1of−each2m2N  x2 channel,
 Y = Y2 − Y1 configure wavelength and
 
 =   (9)
 the total power is finally detected
  . . . from each . . . output. .port . by .photodetectors.
 ..   ... 
 1 − 2m N1 1 − 2m N2 . . . 1 − 2m NN xN

 Figure 6. The scheme of WDM-MVM. The microring front module can be positioned off-chip or
 on-chip, which is designed to serve as a wavelength division multiplexer (off-chip) or to modulate
 the input vector X to different wavelengths (on-chip). The microring matrix core is an N × N-sized
 microring array corresponding to an N × N-sized matrix.

 The microring matrix core was used for on-chip photonic matrix operation early in
 2013, when Yang et al. proposed a matrix–vector multiplier based on a microring array
 and experimentally demonstrated matrix multiplication and weighted interconnection [34].
 A “broadcast-and-weight” scheme has been proposed [35,36] and demonstrated [37] to
 implement large-scale reconfigurable optical interconnections and the integrated optical
Photonic Matrix Computing: From Fundamentals to Applications
Nanomaterials 2021, 11, 1683 8 of 15

 neural
 Nanomaterials 2021, 11, x FOR PEER REVIEW network enabled by microring resonators on a silicon photonic chip utilizing 8 of 15 WDM
 technique. In this scheme, each microring plays a role as a tunable filter and only oper-
 ates at a specific wavelength. Optical signals are modulated in parallel by an array of
 microrings
 to implement[38]. The WDM-MVM
 large-scale reconfigurable scheme
 optical provides a viable
 interconnections solution
 and to achieve
 the integrated opti-orders of
 cal neural network
 magnitude enabled byinmicroring
 improvements resonators onspeed
 both computational a silicon
 andphotonic
 energy chip utilizing against
 consumption
 WDM technique.
 existing In this based
 architectures scheme,on each microring
 electronic plays aIn
 devices. role as a tunable
 addition, the filter and only
 microring matrix core
 operates
 can also be used in the linear computation part of the neural networksarray
 at a specific wavelength. Optical signals are modulated in parallel by an of
 to achieve the
 microrings [38]. The WDM-MVM scheme provides a viable solution to achieve
 optical acceleration of the ONNs. In 2019, Feldmann et al. presented an all-optical spiking orders of
 magnitude improvements in both computational speed and energy consumption against
 neurosynaptic network successfully realizing pattern recognition directly in the optical
 existing architectures based on electronic devices. In addition, the microring matrix core
 domain, in which a microring integrated with phase-change material (PCM) cell is able
 can also be used in the linear computation part of the neural networks to achieve the op-
 to control
 tical whether
 acceleration of thetoONNs.
 generate an output
 In 2019, Feldmann spike
 et al.pulse by switching
 presented an all-optical thespiking
 PCM states to
 change the optical resonance condition of the microring and its
 neurosynaptic network successfully realizing pattern recognition directly in the opticalpropagation loss [39]. A
 scalableinoptical
 domain, which aneural network
 microring architecture
 integrated is implemented
 with phase-change material using
 (PCM) thecell WDM
 is able to technique.
 Feldmann
 control et al.
 whether improved
 to generate their architecture
 an output spike pulse byand demonstrated
 switching the PCM an integrated
 states to changephotonic
 hardware
 the accelerator
 optical resonance utilizingofoptical
 condition frequency
 the microring and combs and the loss
 its propagation WDM technique
 [39]. A scalableto achieve
 optical
 parallelneural network architecture
 convolutional processingis[40].
 implemented
 Recently,using
 Xu ettheal.WDM technique.
 proposed Feldmann
 an optical convolutional
 et al. improved
 neural network their architecturebased
 architecture and demonstrated
 on WDM toanaccelerate
 integratedcomputing
 photonic hardware
 speed by ac- utilizing
 celerator utilizing
 broad optical optical frequency
 bandwidth [41], whichcombscanand
 bethe
 usedWDM technique
 for various to achieve parallel
 convolutional operations to
 convolutional processing
 realize handwritten [40].recognition.
 digits Recently, Xu et al. proposed an optical convolutional neural
 network architecture based on WDM to accelerate computing speed by utilizing broad
 optical
 4. MZIbandwidth [41], which can be used for various convolutional operations to realize
 Matrix Core
 handwritten digits recognition.
 As one of the basic photonic devices, MZI has been widely used in optical modula-
 4. MZI Matrixoptical
 tors [42,43], Core communication [44,45], and optical computing [46]. MZI is a natural
 minimum matrix operation
 As one of the basic photonicunit, and MZI
 devices, can has
 be fabricated
 been widelyon a silicon
 used platform
 in optical to implement
 modulators
 the minimum matrix multiplication. The photonic matrix network
 [42,43], optical communication [44,45], and optical computing [46]. MZI is a natural min- built by MZIs can be
 imum matrix operation unit, and can be fabricated on a silicon platform to implement theReck et al.
 extended to arbitrary matrix multiplication without fundamental loss. In 1994,
 firstly proposed
 minimum a general algorithm,
 matrix multiplication. the triangular
 The photonic decomposition
 matrix network built by algorithm,
 MZIs can befor ex-the design
 of an experimental
 tended realization
 to arbitrary matrix multiplication × n unitary
 of any nwithout matrix [47].
 fundamental In 1994,
 loss. In this case,
 Reckunitary
 et al. matrix
 firstly proposed
 transforms canabegeneral algorithm,
 achieved by thethe triangular decomposition
 architecture consisting of beamalgorithm, for the
 splitters, de- shifters,
 phase
 sign
 and of an experimental
 mirrors arranged realization
 according of toany n × n rules.
 specific unitary matrix [47]. In this case, unitary
 matrixThetransforms
 structurecanofbeMZI
 achieved by the in
 is shown architecture
 Figure 7a. consisting
 MZI is of beam splitters,
 composed phase
 of two multimode
 shifters, and mirrors arranged according to specific rules.
 interference couplers and two interference arms [48]. The phase shifts on the internal and
 The structure of MZI is shown in Figure 7a. MZI is composed of two multimode in-
 external phase shifters of MZI can be expressed as θ n , αn , and βn , respectively, moreover,
 terference couplers and two interference arms [48]. The phase shifts on the internal and
 these phase shifters can be easily configured and controlled by thermally tuning the heaters.
 external phase shifters of MZI can be expressed as θn, αn, and βn, respectively, moreover,
 these × 2 unitary
 The 2phase transformation
 shifters can matrixand
 be easily configured of controlled
 a single MZI can be expressed
 by thermally as a standard
 tuning the heat-
 SU(2) rotation matrix:
 ers. The 2 × 2 unitary transformation matrix of a single MZI can be expressed as a standard
 SU(2) rotation matrix:  iα iθ  iαn eiθn + 1
  
 1 − n e n −1
 e 1) ie
 1 
 ==R(n) = ( ( 1)
 U MZIn= ( ) 2 ieiβ1) n eiθn + 1 iβ n iθn (10) (10)
  
 2 ( (1 − e ) 1 − e

 Figure 7. The principle of the MZI matrix core. (a) The structure of one MZI used in an MZI matrix
 core. (b) The whole MZI mesh consists of MZI matrix cores, and MZIs can be equivalent to a
 reconfigurable black box. (c) A typical example of a 4 × 4 network structure enabled by an MZI
 matrix core.
Photonic Matrix Computing: From Fundamentals to Applications
Nanomaterials 2021, 11,
Nanomaterials 2021, 11, 1683
 x FOR PEER REVIEW 99 of
 of 15
 15

 Figure 7. The principle
 Arbitrary of the MZI
 n × n unitary matrix core. (a)
 transformation The structure
 matrix SU(N) can of one MZI used in an
 be theoretically MZI matrix
 decomposed
 core. (b) The whole MZI mesh consists of MZI matrix cores, and MZIs can be
 into the product of a series of SU(2) rotation submatrices, and therefore the wholeequivalent to a recon-
 MZI
 figurable black box. (c) A typical example of a 4 × 4 network structure enabled
 mesh can be equivalent to a reconfigurable black box as shown in Figure 7b to perform by an MZI matrix
 core.
 any unitary transformation we desired. A typical example of a 4 × 4 network structure
 enabled by an MZI matrix core is given in Figure 7c, which is composed of six MZIs, and
 Arbitrary n × n unitary transformation matrix SU(N) can be theoretically decomposed
 the matrix transformation relations of each port can be obtained by cutting the plane along
 into the product of a series of SU(2) rotation submatrices, and therefore the whole MZI
 the dashed lines. The detailed process can be mathematically described as
 mesh can be equivalent to a reconfigurable black box as shown in Figure 7b to perform
 any unitary transformation
 
 1 we desired.
  A typical example of a 4 × 4 network structure
 enabled by an MZI
 U2 = R1,1 U1 = matrix
  1core is given
  U1 Figure 7c, which is composed of six MZIs, and
 in
 the matrix transformation relations
 R (1) of each port can be obtained by cutting the plane along
 the dashed lines. The detailed process can be
 1 mathematically described as
   
 1
 U3 = R2,1 R2,21U2 =  1  R (2) U2 (11)
 = , = 1 R (3) 1
 (1)
 1
 
 1
 
 R (4)
 
 1 1
 U = R R3,2 R3,3 U3 =  1 R (5) 1 U3
 =4 , 3,1, = 1 (2) 
  
 (11)
 R (6) 1 1
 (3) 1
 1 1 (4)
 Consequently, the 4 × 4 unitary transformation matrix SU(4) can be expressed as
 = , , , = 1 (5) 1 
 (6) 1 1
 SU (4) = R R R R R R
 3,1 3,2 3,3 2,1 2,2 1,1 (12)
 Consequently, the 4 × 4 unitary transformation matrix SU(4) can be expressed as
 A similar principle can also be applied to any SU(N) matrix; in this way, an arbi-
 (4) = , , , , , , (12)
 trary n × n unitary matrix can always be decomposed into the product of n (n − 1)/2
 A similar principle
 rotation submatrices can also be applied to any SU(N) matrix; in this way, an arbitrary
 n × n unitary matrix can always be decomposed into the product of n (n − 1)/2 rotation
 submatrices SU ( N ) = R N −1,1 R N −1,2 . . . R N −1,N −1 . . . R3,1 R3,2 R3,3 R2,1 R2,2 R1,1 (13)
 ( ) = , , … , … , , , , , , (13)
 According to Equation (13), we can configure the MZI mesh to mimic mimic the
 the correspond-
 correspond-
 ing unitary matrix. Furthermore, when it comes comes to general nn ×
 to aa general × n matrix, which is not not
 limited in
 in unitary
 unitarymatrix,
 matrix,we weknowknowthat a general
 that a generalcomplex-valued
 complex-valued matrix M can
 matrix be decom-
 M can be de-
 posed as Mas= UΣV † utilizing singular valuevalue
 decomposition (SVD)(SVD)
 [49], where U is anUm is×anm
 composed M = UΣV † utilizing singular decomposition [49], where
 unitary
 m × m unitary Σ is an m
 matrix, matrix, Σ× is nanrectangular diagonal diagonal
 m × n rectangular matrix, and V † is and
 matrix, the complex
 V† is theconjugate
 complex
 of the n × of
 conjugate n unitary
 the n × nmatrix
 unitaryV [50].
 matrixThus, an arbitrary
 V [50]. Thus, an complex-valued matrix network
 arbitrary complex-valued matrix
 can be decomposed into a unitary MZI mesh, an array of tunable optical
 network can be decomposed into a unitary MZI mesh, an array of tunable optical attenu- attenuators, and
 another unitary MZI mesh (Figure 8), which can implement U, Σ,
 ators, and another unitary MZI mesh (Figure 8), which can implement U, Σ, and V , re-and V † , respectively
 †

 by tuning phase
 spectively shifters
 by tuning phase and attenuators
 shifters to changetothe
 and attenuators transmission
 change coefficient
 the transmission of each
 coefficient
 signal
 of eachchannel.
 signal channel.

 Figure8.8. MZI
 Figure MZI mesh
 mesh can
 canbe
 bedesigned
 designedto
 tomimic
 mimicarbitrary
 arbitrarynn×
 × n complex-valued matrix utilizing SVD.

 The above designs are all based on the triangular decomposition algorithm algorithm shown
 shown inin
 Figure 9a.
 Figure 9a.However,
 However,the the triangular
 triangular structure
 structure has ahas a large
 large footprint
 footprint andcompact
 and is not is not compact
 enough
 for highly
 enough forintegrated applications.
 highly integrated In 2016,In
 applications. Clements et al. optimized
 2016, Clements the design
 et al. optimized on the
 the design
 basis
 on theofbasis
 the triangular decomposition
 of the triangular algorithmalgorithm
 decomposition and proposedand aproposed
 rectangular decomposition
 a rectangular de-
 scheme (Figure
 composition 9b) [51].
 scheme The principles
 (Figure of these
 9b) [51]. The two schemes
 principles of these are
 twosimilar
 schemes forare
 they are both
 similar for
 based onboth
 they are rotation
 basedsubmatrices
 on rotationdecomposition, but the rectangular
 submatrices decomposition, scheme
 but the is morescheme
 rectangular compact is
 and
 moreneater
 compactthanand
 triangular scheme.
 neater than triangular scheme.
Nanomaterials 2021, 11, x FOR PEER REVIEW 10 of 15
 Nanomaterials 2021, 11, 1683 10 of 15

 The MZI matrix core has already shown its great potential for accelerating the linear
 The MZI matrix core has already shown its great potential for accelerating the linear
 computation part of ONNs and offered the promise to overcome the bottlenecks of state-
 computation part of ONNs and offered the promise to overcome the bottlenecks of state-
 of-the-art electronics. In 2017, Shen et al. proposed and experimentally demonstrated a
 of-the-art electronics. In 2017, Shen et al. proposed and experimentally demonstrated a
 coherent optical computing architecture enabled by a cascaded programmable MZI mesh
 coherent optical computing architecture enabled by a cascaded programmable MZI mesh
 utilizing SVD [50]. This design is capable of remarkably accelerating computing speed
 utilizing SVD [50]. This design is capable of remarkably accelerating computing speed and
 and improving power efficiency using coherent light, which makes the MZI matrix core
 improving power efficiency using coherent light, which makes the MZI matrix core one
 one of the most significant building blocks of ONNs and optical computing acceleration.
 of the most significant building blocks of ONNs and optical computing acceleration. The
 The following year, Hughes et al. thoroughly discussed the training of ONNs by back-
 following year, Hughes et al. thoroughly discussed the training of ONNs by backpropa-
 propagation and gradient measurement [52], and this work provided a path toward effec-
 gation and gradient measurement [52], and this work provided a path toward effectively
 tively implementing
 implementing on-chip
 on-chip training
 training and optimizing
 and optimizing reconfigurable
 reconfigurable integrated
 integrated optical
 optical platforms.
 platforms. By applying on-chip training on the integrated optical platform,
 By applying on-chip training on the integrated optical platform, Zhou et al. proposed Zhou et al.and
 proposed and experimentally demonstrated an all-in-one silicon photonic
 experimentally demonstrated an all-in-one silicon photonic polarization processor [53,54], polarization
 processor [53,54],
 a universal a universal
 matrix computing matrix
 chipcomputing
 [55], and a chip [55], and a self-configuring
 self-configuring program-
 programmable signal proces-
 mable
 sor [56], etc. Beyond the applications in machine learning, these works may alsoworks
 signal processor [56], etc. Beyond the applications in machine learning, these broaden
 may thealso broaden
 access the access
 to intelligent to intelligent
 optical optical
 information information
 processing. processing.
 In parallel, some Inrecent
 parallel, some in
 progress
 recent progress in network structures has been reported, including hexagonal
 network structures has been reported, including hexagonal MZI mesh for various filter op- MZI mesh
 for tical
 various filter
 switch optical
 signal switch signal
 processing [57–61]processing [57–61] and microwave
 and a programmable a programmable microwave
 photonic chip based
 photonic chip based on
 on a quadrilateral MZIa quadrilateral
 mesh [62], etc. MZI mesh
 These [62], further
 works etc. These works
 enrich thefurther
 matrix enrich the
 computation
 matrix computation
 functionalities andfunctionalities
 versatility of theandMZI
 versatility of theNevertheless,
 matrix core. MZI matrix core. Nevertheless,
 key issues such as the
 keylarge
 issues such asofthe
 footprint thelarge
 MZI footprint
 structure of the MZI
 (usually overstructure
 10,000 µm(usually over 10,000 μm
 2 per interferometer 2 per
 unit) and
 interferometer unit) and extra energy consumption of thermo-optic
 extra energy consumption of thermo-optic modulation (approximately 10 mW per heatermodulation (approx-
 imately
 of MZI)10 mWlimit per
 the heater of MZI)
 application of alimit the application
 large-scale programmable of a large-scale programmable
 optical neural network to a
 optical neural
 certain network to a certain extent.
 extent.

 Figure 9. Two decomposition schemes of the MZI matrix core. (a) Triangular [47] and (b) rectangular
 Figure 9. Two decomposition schemes of the MZI matrix core. (a) Triangular [47] and (b) rectangular
 decomposition
 decomposition schemes
 schemes [51].[51]. (c)typical
 (c) A A typical
 MZIMZI matrix
 matrix corecore based
 based on triangular
 on triangular decomposition.
 decomposition.

 Nano-opto-electro-mechanical
 Nano-opto-electro-mechanical systems
 systems (NOEMS)
 (NOEMS) are
 are structuresdesigned
 structures designedtotomaxim-
 maximize
 both opto-mechanical and electro-mechanical interaction at the nanoscale [63].
 ize both opto-mechanical and electro-mechanical interaction at the nanoscale [63]. Com- Compared
 to devices
 pared based
 to devices on thermo-optical
 based phase
 on thermo-optical shifters,
 phase NOEMS-based
 shifters, NOEMS-based devices can work
 devices without
 can work
 static power dissipation because mechanical displacements require extra energy only for
 without static power dissipation because mechanical displacements require extra energy
 switching to a different state. Experimental demonstrations of NOEMS-based devices on
 only for switching to a different state. Experimental demonstrations of NOEMS-based de-
 silicon have been reported using in-plane motion of directional couplers [64], microring [65],
 vices on silicon have been reported using in-plane motion of directional couplers [64],
 and MZI [66], demonstrating the great potential of NOEMS for static and microsecond-scale
 microring [65], and MZI [66], demonstrating the great potential of NOEMS for static and
 reconfiguration of integrated photonic circuits and quantum photonic networks. However,
 microsecond-scale reconfiguration of integrated photonic circuits and quantum photonic
 NOEMS require higher-precision lithography than conventional devices to ensure the
 networks. However, NOEMS require higher-precision lithography than conventional de-
 resolution and alignment accuracy of nanophotonic structures, and mechanical systems
 vices to ensure the resolution and alignment accuracy of nanophotonic structures, and
 generally need to be packaged to avoid the impact of the environment.
 mechanical systems generally need to be packaged to avoid the impact of the environ-
 ment.
 5. Discussion and Outlook
 The MVM operations enabled by photonics have a remarkably higher speed and
 5. Discussion and Outlook
 lower energy consumption compared to those of their electronic counter parts, which
 provides a feasible acceleration solution for the applications of artificial intelligence. On
 the one hand, on-chip integrated photonic circuits are an ideal platform for artificial
Nanomaterials 2021, 11, 1683 11 of 15

 neural networks owing to their high compactness and great potential for competitive
 integration density. In addition, fast electro-optical modulators and efficient nonlinear
 optical components built on the LiNbO3 -on-insulator (LNOI) platform are compatible with
 silicon photonic circuits [67–69] and provide a promising alternative approach to realize
 all-optical neural networks on one chip. On the other hand, the MPLC method based on
 holography can achieve an ultra-large size of MVM operations due to the capability of high
 parallel processing in free space, and a high model complexity with millions of neurons
 has already been achieved with the architecture enabled by the MPLC matrix core. The
 optical AI accelerator provides a hardware platform, which is completely different from
 the conventional electronic architecture, used to support several universal neural network
 algorithms to match specific artificial intelligence applications including image recognition,
 human action recognition, and Google PageRank, etc.
 Table 1 summarizes the comparison of different recently demonstrated photonic AI
 accelerators with well-known analog and digital electronic hardware. The performance
 parameters of photonic architectures were obtained by theoretically extrapolating from the
 experimental performance index of a handful of photonic processing units. In complemen-
 tary metal–oxide–semiconductor (CMOS), MVM operations are typically implemented
 by systolic arrays [70] or single instruction multiple data (SIMD) units [71]. Due to the
 properties of electronic components, performing simple operations requires a large number
 of transistors to work together and an extra scheduler program to coordinate the data
 movement involved in weights, while MVM operations can be easily implemented by fun-
 damental photonic components such as microring, MZI, and diffractive plane. Therefore,
 the rate of photonic computing is several orders of magnitude faster than electrons and
 consumes much less power.

 Table 1. Comparison of different photonic AI accelerators with well-known analog and digital electronic hardware.

 Computing Density Precision
 Technology Energy/MAC Latency
 (TMACs/s/mm2 ) (bits)
 MPLC with a reconfigurable diffractive processing unit [19] - 0.82 fJ/MAC - 8
 Broadcast-and-weight based on WDM [72] 50 2.1 fJ/MAC
You can also read