Locality Relationship Constrained Multi-view Clustering Framework
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
IEEE TRANSACTIONS ON MULTIMEDIA, VOL. XXX, NO. XX, 2021 1 Locality Relationship Constrained Multi-view Clustering Framework Xiangzhu Meng, Wei Wei∗ , and Wenzhe Liu Abstract—In most practical applications, it’s common to utilize producing satisfied performance by integrating the diversity multiple features from different views to represent one object. information among different views. An easy yet effective Among these works, multi-view subspace-based clustering has way is to concatenate all views as one common view, then gained extensive attention from many researchers, which aims to provide clustering solutions to multi-view data. However, utilize traditional single view methods to solve the clustering arXiv:2107.05073v1 [cs.CV] 11 Jul 2021 most existing methods fail to take full use of the locality task. However, this manner not only neglects the specific geometric structure and similarity relationship among samples statistical property in each view , but cannot take full use of under the multi-view scenario. To solve these issues, we propose the consistency and complementary information from multiple a novel multi-view learning method with locality relationship views. constraint to explore the problem of multi-view clustering, called Locality Relationship Constrained Multi-view Clustering Frame- To solve above issues, plenty of multi-view clustering meth- work (LRC-MCF). LRC-MCF aims to explore the diversity, ods have been widely proposed in the past ten years. A class geometric, consensus and complementary information among of representative multi-view methods [16–20] investigate to different views, by capturing the locality relationship information construct one common latent view shared by all views, which and the common similarity relationships among multiple views. can integrate the diversity and complementary information Moreover, LRC-MCF takes sufficient consideration to weights of different views in finding the common-view locality structure from different views into the common view and then partition and straightforwardly produce the final clusters. To effectually it into the ideal groups. For example, Auto-Weighted Multiple reduce the redundancy of the learned representations, the low- Graph Learning (AMGL)[17] method automatically assigns an rank constraint on the common similarity matrix is considered ideal weight for each view in integrating multi-view informa- additionally. To solve the minimization problem of LRC-MCF, an tion according to the importance of each view. Nevertheless, Alternating Direction Minimization (ADM) method is provided to iteratively calculate all variables LRC-MCF. Extensive exper- they cannot guarantee the complementarity information among imental results on seven benchmark multi-view datasets validate different views so that rich multi-view information cannot the effectiveness of the LRC-MCF method. be further fully utilized. Therefore, another class of multi- Index Terms—Multi-view clustering, Locality Relationship view clustering methods are proposed to further discover Constraint, Alternating Direction Minimization the complementary information among different views. These works based on Canonical Correlation Analysis (CCA) [21] and Hilbert-Schmidt Independence Criterion (HSIC) [22] are I. I NTRODUCTION two classes of representative multi-view clustering methods. With the rapid development of the information era, one The former [23–25] employs CCA to project the pairwise common object can usually be represented by different view- views into the common subspace by maximizing the corre- points [1, 2]. For example, the document (or paper) can be lation between two views, and the latter [26–28] explores represented by different languages [3, 4]. Generally, multiple complementary information based on HSIC term by mapping views can contain more meaningful and comprehensive in- variables into a reproducing kernel Hilbert space. Apart from formation than single view, which provide some knowledge these above works, co-training strategy [29, 30] is also con- information not involved in single view. The fact can show sidered as an effective tool to enforce different views to learn that, properly using multi-view information is beneficial for from each other. Unfortunately, these methods may produce improving the practical performance in most situations. How- unsatisfactory results when the gap of different views is ever, most traditional methods mainly focus on the single varying large. Although such methods have achieved excellent view, which cannot be directly used to deal with multi-view performances in some situations, most of them always focus data. Therefore, multi-view learning [5–7] methods has been on either the intrinsic structure information in each view or proposed in different tasks, such as classifications [8–10], geometric structures from other views. To discover the geo- clustering [11, 12], etc. Notably, multi-view clustering [13– metric structures information in multi-view data, those works 15] has become an important research field, which aims at [31–35] have been developed to learn the intrinsic structure in each view. However, the geometric structures information X. Meng is with Center for Research on Intelligent Perception and Comput- in these works usually depend on artificial definition, which ing, Institute of Automation, Chinese Academy of Sciences, Beijing, China (xiangzhu.meng@cripac.ia.ac.cn). cannot learn the geometric relation among samples from multi- W. Wei is with Center for Energy, Environment & Economy Research, view data. To sum up, multi-view clustering is still an open Tourism Management School of Zhengzhou University, Zhengzhou, China yet challenging problem. (weiwei123@zzu.edu.cn). W. Liu is with the School of Computer Science and Technology, Dalian To solve these issues, we propose a novel multi-view University of Technology, Dalian, China (liuwenzhe@mail.dlut.edu.cn). learning method, called Locality Relationship Constrained
IEEE TRANSACTIONS ON MULTIMEDIA, VOL. XXX, NO. XX, 2021 2 Multi-view Clustering Framework (LRC-MCF), to explore the II. R ELATED WORKS problem of multi-view clustering. LRC-MCF simultaneously In this section, we review two classical multi-view learning explores the diversity, geometric, consensus and complemen- methods, including Auto-Weighted Multiple Graph Learning, tary information among different views, which attempts to and Co-regularized Multi-view Spectral Embedding. capture the locality relationship information in each view, and then fuse the locality structure information in all views A. Auto-Weighted Multiple Graph Learning into one common similarity relationships among multiple views. In this way, we not only take use of the diversity Auto-Weighted Multiple Graph Learning (AMGL)[17] is a and geometric information among multiple views, but enforce multi-view framework based on the standard spectral learn- difference different views to learn with each other by the ing, which can be used for multi-view clustering n and semi- o common similarity relationships. Note that there is differ- supervised classification tasks. Let X v = xv1 , xv2 , . . . , xvN ent clustering performance for each view, thus LRC-MCF denote the features set in the vth view. AMGL aims to find adaptively assign the suitable weights for different views the common embedding U ∈ RN ×k as follows: in finding the common-view locality structure and straight- M q forwardly produce the final clusters. To effectually reduce X max tr(U T Lv U ) (1) the redundancy of the learned representations, the low-rank U ∈C v=1 constraint on the common similarity matrix is additionally v where L denotes the normalized graph Laplacian matrix in considered. To solve the minimization problem of LRC-MCF, the vth view, trace(·) denotes the trace of the matrix, and k is the optimization algorithm based on the alternating direction the number of clusters. C deontes the different constraints in minimization strategy is provided to iteratively calculate all the Eq. (1), if C is U T U = I, the framework is used to solve variables LRC-MCF. Finally, extensive experimental results on the clustering tasks; if C contains the labels information, the seven benchmark multi-view datasets show the effectiveness framework is also used for the semi-supervised classification and superiority of the LRC-MCF method. task. Applying the Lagrange Multiplier method to solve the Eq. (1), the adaptive weight mechanism is derived to allocate suitable weights for different views. Therefore, the multi-view A. Contributions framework can be transformed as the following problem: The major contributions of this paper can be summarized M X T as follows: max αv tr(U Lv U ) (2) U ∈C v=1 • LRC-MCF simultaneously explores the diversity, geomet- ric, consensus and complementary information among where q different views, by capturing the locality relationship in- αv = 1/ 2 tr(U T Lv U ) (3) formation and the common similarity relationships among multiple views. According to the Eq. (1), if the vth view is good, then • LRC-MCF takes sufficient consideration to weights of T tr(U Lv U ) should be small, and thus the learned weight different views in finding the common-view locality v α for the vth view is large. Correspondingly, a bad view will structure, and adds the low-rank constraint on the com- be also assigned a small weight. Therefore, AMGL optimizes mon similarity matrix to effectually reduce the redun- the weights meaningfully and can obtain better result than the dancy of the learned representations. classical combination approach which assigns equal weight to • To solve the minimization problem of LRC-MCF, an all the views. Alternating Direction Minimization (ADM) method is provided to iteratively calculate all variables LRC-MCF, which can convergence in limited iterations. B. Co-regularized Multi-view Spectral Clustering • The experimental results on seven benchmark datasets Co-regularized Multi-view Spectral Clustering (Co-reg)[36] demonstrate effectiveness and superiority of LRC-MCF, is a spectral clustering algorithm under multiple views, which which outperforms its counterparts and achieves compa- achieves this goal by co-regularizing the clustering hypotheses rable performance. across views. Co-reg aims to propose a spectral clustering framework for multi-view setting. To achieve this goal, Co-reg works with the cross-view assumption that the true underlying B. Organization clustering should assign corresponding points in each view to the same cluster. For the example of two-view case for The rest of the paper is organized as follows: in Section II, the ease of exposition, the cost function for the measure of we provide briefly some related methods which have attracted disagreement between two clusters of the learned embedding extensive attention; in Section III, we describe the construc- U v and U w in the pth view and the qth view could be defined tion procedure of LRC-MCF and optimization algorithm for as follows: LRC-MCF; in Section IV, extensive experiments on text and 2 image datasets demonstrate the effectiveness of our proposed v w KU v KU w D (U , U ) = 2 − 2 (4) approach; in Section V, the conclusion is put forth finally. kKU v kF kKU w kF F
IEEE TRANSACTIONS ON MULTIMEDIA, VOL. XXX, NO. XX, 2021 3 where KU v and KU w denote the similarity matrix for the Inspired by sparse representation method, we attempt to pth view and the qth view, respectively. For the convenience additionally add the sparsity constraint on S v , which encour- of solving the solution, linear kernel is chosen as the similarity ages the similarity matrix S v to have more zero elements. measure. Co-reg builds on the standard spectral clustering by Usually, L0 or L1 norm could be used to implement the appealing to the co-regularized framework, which makes the sparsity constraint, but this manner might neglect the local clustering relationships on different views agree with each structure in representation space. Based on the fact that the other. Therefore, combining Eq. (4) with the spectral clustering sample point could be inflected by its some nearest neighbors, objectives of all views, we could get the following joint we use the local structure information to construct the sparsity maximization problem for M views: constraint for S v , thus we can obtain the following equation based on locality relationship: M X T X min tr(U v Lv U v ) + λ D (U v , V w ) N X X N 2 U 1 ,U 2 ,...,U M v=1 1≤v6=w≤M min v d(xvi , xvj )Si,j v + λ kS v kF S T i=1 j=1 s.t. U v U v = I,1 ≤ v ≤ M (8) (5) s.t.si,j ≥ 0, 1T Siv v =1 v Si,j = 0, if xvj / N (xvi ) ∈ where the first term is the spectral clustering objectives, λ is a the non-negative hyperparameter to trade-off the spectral where N (xvi ) denotes the K nearest neighbors set of xvi . clustering objectives and the spectral embedding disagreement terms across different views. In this way, Co-reg implements B. Multi-view information integration a spectral clustering framework for multi-view setting. For the multi-view setting, a naive way is to incorporate all views directly according to the definition in Eq. (8), which III. T HE P ROPOSED A PPROACH can be expressed as follows: N X M X N M A. Locality relationship preserving X X 2 min v d(xvi , xvj )Si,j v +λ kS v kF Given two samples andxvi xvi in the vth view, the distance S v=1 i=1 j=1 v=1 (9) between two samples is denoted as d(xvi , xvj ), which is usually v s.t.∀v, Si,j ≥ 0, 1T Siv = 1 calculated by Mahalanobis distance. Eulidean distance, L1 - v norm, etc. Intuitively, the distance between samples reflects the Si,j = 0, if xvj ∈ / N (xvi ) relationship between samples, which is of vital importance to However, this manner is equal to solve the problem for construct the local geometric structure for the set of samples. all views separately, which will fail to integrate multi-view For the vth view, conventional similarity relationship between features and make favorable use of the complementary infor- v d(xv ,xv ) two samples is usually defined as Si,j = exp (− iσ j ), mation from multiple views. To solve this issue, we firstly where σ is a hyper-parameter. However, one major disad- make such hypothesis that similarities among the instances in vantage of this manner is that the hyper-parameter is hard each view and the centroid view should be consistent under the to set in practice due to the noise and outliers in the data. novel representations. This hypothesis means that all similarity Therefore, how to learn the suitable similarity matrix is of vital matrices from multiple views should be consistent with the importance. Meanwhile, the learnt similarity matrix should be similarity of the centroid view, which is implemented by subject to such a condition that a smaller distance between aligning the similarities matrix computed from the centroid two data points corresponds to a large similarity value, and a view and the vth view. Therefore, we utilize the following larger distance between two data points corresponds to a small cost function as a measurement of disagreement between the similarity value. Toward this end, we model the problem as centroid view and the vth view: follows: 2 D (S ∗ , S v ) = kS ∗ − S v kF (10) N X X N 2 min v d(xvi , xvj )Si,j v + λ kS v kF where S ∗ denotes the similarity matrix in the centroid view. S i=1 j=1 (6) To integrate rich information among different features, we v s.t.Si,j ≥0 could obtain the following optimization problem by adding up cost function in the Eq. (10) from all views: where the second term is a regularization term on S v , and M λ is a hyperparameter to balance these two terms. To further X 2 analyze S v , we add the normalization on S v in the Eq. (6), min ∗ kS ∗ − S v kF S v=1 i.e. 1T svi = 1, which makes the second term constant. Then, ∗ (11) the above problem can be transformed as follows: s.t. Si,j ≥ 0, 1T Si∗ = 1 N X X N 2 min v d(xvi , xvj )Si,j v + λ kS v kF In most practical applications, different views cannot play S i,j=1 j=1 (7) the same role in fusing different similarity matrixes. To reflect v s.t.Si,j ≥ 0, 1T Siv =1 the importance of different features in integrating multi-view
IEEE TRANSACTIONS ON MULTIMEDIA, VOL. XXX, NO. XX, 2021 4 information, we can obtain the following optimization problem D. Optimization process by allocating ideal weights w for all views: Obviously, all the variables in the Eq. (14) are coupled M together, thus solving the multi-view clustering problem to X 2 min wv kS ∗ − S v kF optimize all variable at once is still challenging. In addition, w,S ∗ the constraints are not smooth. To overcome these issues, we v=1 (12) s.t. ∗ Si,j ≥ 0, 1T Si∗ = 1 propose a novel algorithm based on the Alternating Direc- tion Minimization strategy. Under the assumption that other w ≥ 0, 1T w = 1 v variables have been obtained, we can calculate S ∗ via the where wv can reflect the importance of the vth view in Augmented Lagrange Multiplier scheme. That is to say, one obtaining the common similarity matrix S ∗ . variable is updated when the other variables are fixed, which inspires us to develop an alternating iterative algorithm to solve problem. The specific updated rules are shown below: C. Multi-view clustering based on Locality relationship pre- Fixing w and S ∗ , update S v : When w and S ∗ are given, serving graph matrix S v in the Eq. (14) is independent. Thus, we can update S v one by one, formulated as follows: To simultaneously exploit the diversity and consensus infor- N X N mation among different views, we can obtain the multi-view X 2 min d(xvi , xvj )Si,j v + λ kS v kF clustering framework by combining the Eq. (8) and Eq. (12): v S i=1 j=1 2 (15) + wv kS ∗ − S v kF M X N X N M v X X 2 s.t. Si,j ≥ 0, 1T Siv = 1 min d(xvi , xvj )Si,j v +λ kS v kF v w,S ∗ ,{S v } v=1 i=1 j=1 v=1 Si,j = 0, if xvj ∈ / N (xvi ) M X r 2 It’s easy to find that the above equation is independent + (wv ) kS ∗ − S v kF for different i. Thus, we can solve the following problem v=1 (13) separately for each Siv : ∗ s.t. Si,j > 0, 1T Si∗ = 1 N wv > 0, 1T w = 1 X 2 min d(xvi , xvj )Si,j v + λ kS v kF ∀v, svi,j > 0, 1T svi = 1 vS j=1 v Si,j = 0, if xvj ∈ / N (xvi ) + wv kSi∗ − Siv k2 2 (16) v s.t. Si,j ≥ 0, 1T Siv = 1 Our paper aims to solve the final clustering problem, i.e., v producing the clustering result directly on the unified graph Si,j = 0, if xvj ∈ / N (xvi ) matrix S ∗ without an additional clustering algorithm or step. By relaxing the constraint, we can transform the above equa- So far, the unified graph matrix S ∗ cannot tackle this problem. tion as follows: Siv : Now we give an efficient and yet simple solution to achieve this goal by imposing a rank constraint on the graph Laplacian N X 2 matrix L∗ of the unified matrix S ∗ . The works imply that if min v d(xvi , xvj )Si,j v + λ kSiv k2 S rank(L∗ ) = N − c, the corresponding S ∗ is an ideal case j=1 (17) 2 based on which the data points are partitioned into c clusters + wv kSi∗ − Siv k2 directly. Thus, there is no need to run an additional clustering v s.t. Si,j ≥ 0, 1T Siv = 1 algorithm on the unified graph matrix S ∗ to produce the final clusters. Motivated by this theorem, we add a rank constraint Using Lagrangian multiplier method and KKT condition to v to the above problem. Then our final multi-view clustering solve it, we can obtain the following solution for Si,j : model can be formulated as follows: v −d(xvi , xvj ) + 2wv + 2λη M X N X N M Si,j = (18) X X 2 2λ + 2wv min ∗ d(xvi , xvj )Si,j v +λ kS v kF w,S ,{S v } v=1 i=1 j=1 v=1 where η is Lagrangian coefficient of the constraint 1T svi = 1. M X Considering the locality constraint that svi,j must be equal to 0 2 + wv kS ∗ − S v kF if xvj ∈/ N (xvi ), we can further solve the η and λ in the above v=1 equation. Finally, we can get the following updated rule to ∗ s.t. Si,j > 0, 1T Si∗ = 1 (14) solve S v : ∗ ∗ wv > 0, 1T w = 1 d(xv v v v v i ,xî )−d(xi ,xj )+2w )(Si,j −Si,î ) K(d(xv ,xv )−2S ∗ )+PK (2wv )S ∗ −d(xv ,xv )) , v ∀v, Si,j > 0, 1T Siv = 1 v Si,j = i î i,î k=1 i,j i j if xvj ∈ N (xvi ) v = 0, if xvj ∈ / N (xvi ) Si,j 0, otherwise rank(L∗ ) = N − c (19)
IEEE TRANSACTIONS ON MULTIMEDIA, VOL. XXX, NO. XX, 2021 5 where î is the index of the (K + 1)th nearest neighbor of xi . Fixing {S v } and S ∗ , update w: When {S v } and S ∗ are In this way, we can sequentially update the similarity matrix given, the objective function on w can be reduced as follows: in each view by the above equation. M Fixing w and {S v }, update S ∗ : When w and {S v } are X 2 min wv kS ∗ − S v kF given, the objective function in the Eq. (14) on S ∗ can be w v=1 (26) reduced as follows: s.t. w > 0, 1T w = 1 v M X r 2 The solution to w in the above equation is wv = 1 corre- min ∗ (wv ) kS ∗ − S v kF sponding to the min(D (S ∗ , S v )) over different views, and S v=1 (20) ∗ wv = 0 otherwise. This solution means that only one view is s.t. Si,j > 0, 1 T Si∗ ∗ = 1, rank(L ) = N − c finally selected by this method. Therefore, the performance of this method is equivalent to the one from the best view. This It is difficult to solve the above equation because the constraint solution does not meet our objective on exploring the comple- rank(L∗ ) = N − c is nonlinear. Inspired by the work that mentary property of multiple views to get a better embedding L∗ is positive Pc semi-definite, the low-rank constraint can be than based on a single view. To avoid this phenomenon, we achieved i=1 θi = 0, where θi is the ith smallest eigenvalue r adopt a simple yet effective trick, i.e. we set wv ← (wv ) of L∗ . According to Ky Fan’s theorem [37], we can obtain the with r ≥ 1. In this way, the novel objective function can be following equation by combining it with the above equation: transformed as follows: M X M r 2 (wv ) kS ∗ − S v kF + βtr(F L∗ F T ) X r 2 min ∗ min (wv ) kS ∗ − S v kF S ,F v=1 (21) w (27) v=1 ∗ s.t. Si,j > 0, 1 T Si∗ = 1, F F T =I s.t. wv ≥ 0, 1T w = 1 To solve the above equation, we iteratively solve S ∗ and F . By using the Lagrange multiplier γ to take the constraint When F is fixed, the above equation on S ∗ can be transformed 1T w = 1 into consideration, we get the following Lagrange as follows: function: M M M X r 2 X min X v r (w ) kS ∗ − 2 S v kF + βtr(F L∗ F T ) L(w, γ) = (wv ) kS ∗ − S v kF − γ( wv − 1) (28) ∗ S v=1 (22) v=1 v=1 ∗ s.t. Si,j > 0, 1T Si∗ = 1 By setting the partial derivatives of L(w, γ) with respect to w and γ to zeros, w can be calculated as follows: For solving S ∗ , the above equation can be rewritten as follows: 2 1/(1−r) (kS ∗ − S v kF ) M X N N wv = P 1/(1−r) (29) X v r ∗ v 2 X ∗ 2 M ∗ v 2 min ∗ (w ) (Si,j − Si,j ) +β Si,j (fi∗ − fj∗ ) i=1 (kS − S kF ) si v=1 i,j=1 i,j=1 2 ∗ Obviously, kS ∗ − S v kF is a non-negative scala, thus we have s.t. Si,j > 0, 1T Si∗ =1 wv ≥ 0 naturally. According to the above equation, we have (23) the following understanding for r in controlling w. When r → ∞, different wv will be close to each other. When Let e denote the Eulidian distance matrix of F , i.e. ei,j = 2 r → ∞, only wv = 1 corresponding to the minimum (fi∗ − fj∗ ) , it’s easy to prove that the above optimal problem value of D (S ∗ , S v ) over different views, and wv = 0 is equal to the following equation: otherwise. Therefore, the selection of r should be based on M X N 2 the complementary property of all views. X β min Si∗ − Siv + r ei To sum up, we can iteratively solve the all variables in Eq. ∗si v=1 i=1 (wv ) 2 (24) (14) according to the aforementioned descriptions, which ob- ∗ s.t. Si,j > 0, 1T Si∗ =1 tain a local optimal solution of LRC-MCF. For the convenient of readers, we summarize the whole optimization process in The Eq. (24) is a standard quadratic constraint problem, which the Algorithm 1. Since problem is not a joint convex problem can be effectively solved by optimization tools. When S ∗ is of all variables, obtaining a globally optimal solution is still fixed, the above equation on F can be simplified as follows: an open problem. We solve problem using an alternating algorithm (as Algorithm 1). As each sub-problem is convex min tr(F L∗ F T ) and we find the optimal solution of each sub-problem, thus F (25) s.t.F F T = I the algorithm converges because the objective function reduces with the increasing of the iteration numbers. In particular, Obviously, L∗ is a symmetric positive-definite matrix. Based with fixed other variables, one optimal variable can reduce the on the Ky-Fan theory, F in Eq.(25) has a global optimal value of the objective function. Moreover, we will empirically solution, which is given as the eigenvectors associated with validate the convergence of our algorithm in the following the c smallest eigenvalues of L∗ . experiments.
IEEE TRANSACTIONS ON MULTIMEDIA, VOL. XXX, NO. XX, 2021 6 Algorithm 1: The optimization procedure for LRC- A. Datasets MCF To comprehensively demonstrate the effectiveness of the Input: Multi-view features {X v , ∀1 ≤ v ≤ M }, the proposed LRC-MCF, we perform all multi-view cluster- hyper-parameters α and β, the distance metric ing methods on seven benchmark multi-view datasets, in- function d(·, ·), the number of clustering c. cluding BBC1 , BBCSport2 , 3Source3 , Cora4 , Handwritten5 , 1 for v=1:M do HW2sources6 , and NGs7 . BBC is a documents dataset con- 2 Initialize d(xvi , xvj ) according to the distance sisting of 685 documents, where each document is received metric function d(·, ·). from BBC news corpora, corresponding to the sports news 3 Initialize S v by solving the Eq. (8). in five topical areas, and when we treat each topical area 4 Initialize wv = 1/M. as one view, BBC can be regarded as a dataset with five 5 end views; BBCSport consists of 544 documents received from v v ∗ 6 Fixing w and S , initialize S by solving the Eq. the BBC Sport website, where each document is corresponding (24). to the sports news in five topical areas, and BBCSport is 7 while all variables not converged do also can be regarded as a dataset with five views; 3Source 8 for v=1:M do contains 169 news, collected from three well-known news 9 Fixing wv and S ∗ , update S v by the Eq. (19). organizations (including BBC, Reuters, and Guardian), where 10 Fixing S v and S ∗ , update wv by the Eq. (29). each news is manually annotated with one of six labels; 11 end Cora contains 2708 scientific publications of seven categories, 12 Fixing wv and S v , update S ∗ by solving the Eq. where each publication document can be described by content (24). and citation, thus Cora could be considered as a two-view 13 Fixing S ∗ , update F by solving the Eq. (25). benchmark dataset; Handwritten is a hand-written dataset 14 end consisting of 2000 samples, where each sample is represented Output: The common similarity relationship matrix by six different features; HW2sources is a benchmark dataset S∗. from the UCI repository, including 2000 samples, which is collected two sources including Mnist hand-written digits and USPS hand-written digits; NGs is a dataset consisting of E. Computational Complexity Analysis 500 documents, where each document is pre-proposed by three different methods and manually annotated with one To clearly show the efficiency of LRC-MCF, we provide its of five labels. For convenience, we summarize the statistic computational complexity analysis in this section. Obviously, information on these datasets in Table I. the computational complexity to solve the all variables in Eq. (14) is mainly dependent on the following five parts: TABLE I: The summary information of benchmark datasets the computational complexity to initialize all variables in Datasets Samples Classes Views Eq. (14) is O(M N 2 d + M N K), where d is the computa- BBC 685 5 4 tional complexity to the distance metric function d(·, ·); the BBCSport 544 5 2 computational complexity to update S v by the Eq. (19) is O(M N K); the computational complexity to update S ∗ by 3Source 169 6 3 solving the Eq. (24) is O(CN ); the computational complexity Cora 2708 7 2 to update F is O(CN 2 ); the computational complexity to Handwritten 2000 10 6 update wv by the Eq. (29) is O(M N 2 ). Accordingly, the HW2sources 2000 10 2 overall computational complexity of Algorithm 1 is about NGs 500 5 3 O(M N 2 d + M N K + T (M N K + CN 2 + CN + M N 2 )), where T is the iteration times of the alternating optimization procedure in the Algorithm 1. B. Compared Methods We compare the proposed LRC-MCF with the following IV. E XPERIMENTS baseline methods on the above seven benchmark datasets, including two single-view learning methods, one feature con- To evaluate the effectiveness and superiority of the proposed catenation method, and eight multi-view learning methods. LRC-MCF, we conduct comprehensive clustering experiments The detail information of these comparing methods can be on seven benchmark multi-view datasets in this section. Firstly, summarized as follows: we introduce the details of the experiments, including bench- 1 http://mlg.ucd.ie/datasets/segment.html mark datasets, compared methods, experiment setting, and 2 http://mlg.ucd.ie/datasets/segment.html 3 evaluation metrics. Then, we perform all methods on different http://mlg.ucd.ie/datasets/3sources.html 4 3http://lig-membres.imag.fr/grimal/data.html multi-view datasets to evaluate the performance of LRC-MCF 5 https://archive.ics.uci.edu/ml/datasets/One-hundred+palnt+pecies+leaves+data and analysis the experimental results. Finally, we empirically +set demonstrate the sensitivity analysis and the parameter analysis 6 https://cs.nyu.edu/roweis/data.html of LRC-MCF. 7 http://lig-membres.imag.fr/grimal/data.html
IEEE TRANSACTIONS ON MULTIMEDIA, VOL. XXX, NO. XX, 2021 7 • Single-view learning methods: Spectral Clustering (SC) methods for given datasets to obtain the corresponding repre- [38] first computes the affinity between each pair of sentations and then utilize K-means clustering [45] to cluster points to construct the similarity matrix, and then make all samples; for the proposed LRC-MCF, we directly perform use of the spectrum (eigenvalues) of the similarity matrix the clustering process on the common similarity relationship to perform dimensionality reduction before clustering matrix S ∗ solved by the Algorithm 1. Since the clustering [45] in fewer dimensions; Graph regularized Non-negative algorithm is known to be sensitive to initialization, We run all Matrix Factorizatio (GNMF) [39] aims to yield a natural methods repeatedly 10 times and then obtain the mean value parts-based representation for the data and jointly dis- of all indicators for all methods. It’s worthy noting that all cover the intrinsic geometrical and discriminating struc- experiments are performed in the same environment. ture of the data space. Given the label information of all samples, three validation • Feature concatenation method: We first concatenates metrics are adapted to validate the performance of clustering the features of all views and then employs the traditional method, including accuracy (ACC), normalized mutual infor- clustering method to obtain the clustering results. mation (NMI), and purity (PUR). Specifically, ACC is defined • Multi-view learning methods: Canonical Correlation as follows: Analysis (CCA) [21] is utilized to deal with multi-view PN δ(yi , map(ŷi )) problems by maximizing the cross correlation between ACC = i=1 (30) N each pair-views; Multi-view Spectral Embedding (MSE) [16] is a multi-view spectral method, which is based where y i and ŷi are the truth label and cluster label, respec- on global coordinate alignment to obtain the common tively. map(·) is the permutation function that maps cluster representations; Multi-view Non-negative Matrix Factor- label to truth label, and δ(·, ·) is the function that equals one ization (MultiNMF) is proposed in work [40] to jointly if two inputs are equal, and zero otherwise. Meanwhile, NMI learn non-negative potential representations from multi- is formulated as follows: view information and obtain the clustering results by PK PK Ni,j i=1 j=1 Ni,j log Ni N̂j the latent common representations; Co-regularized Multi- NMI = qP (31) K Ni PK N̂j view Spectral Clustering (Co-reg) is a multi-view spec- i=1 Ni log N j=1 Nj log N tral clustering method proposed in work [36], which enforces different view to mutually learn by regular- where Ni is the number of samples in the ith cluster, N̂j is izing different views to be close to each other; Auto- the number of samples in the jth class, and Ni,j is the number weighted multiple graph learning (AMGL) [17] method, of samples in the intersection between the ith cluster and the which automatically assigns an ideal weight for each jth class. Finally, PUR is expressed as follows: view in integrating multi-view information; Multi-view K 1 X Dimensionality co-Reduction (MDcR) [41] adopts the PUR = max Ai ∩ Âj (32) kernel matching constraint based on Hilbert-Schmidt N 1≤j≤K i=1 Independence Criterion to enhance the correlations and where Ai and Âj are two sets with responding to the ith class penalizes the disagreement of different views; AASC and the jth cluster, and |·| is the length of set. [42] is proposed by aggregating affinity matrices, which Note that the higher values indicate better clustering perfor- attempt to make spectral clustering more robust by re- mance for all evaluation metrics. Different metrics can reflect lieving the influence of inefficient affinities and unrelated different properties in the multi-view clustering task, thus we features; WMSC [43] takes advantage of the spectral summarize all results on these measures to obtain a more perturbation characteristics of spectral clustering, uses the comprehensive evaluation. maximum gauge angle to estimate the difference between the impacts of distinct spectral clustering, and finally transforms the weight solution problem into a standard D. Performance Evaluation secondary planning problem; AWP [44] is an extension In this section, to show the effectiveness of LRC-MCF, of spectral rotation for multiview data with Procrustes extensive clustering experiments have been conducted on Average, which takes the clustering capacity differences seven multi-view datasets (BBC, BBCSport, 3Source, Cora, of different views into consideration.. Handwritten, and NGs), and the experimental results are summarized in Table II, Table III and Table IV, where the bold values represent the superior performance in each table. To C. Experiment Setting and Evaluation Metrics comprehensively show the performance of clustering methods, To implement the clustering of the above datasets, we we additionally add up one new column of MEAN in each perform all comparing methods and the proposed LRC-MCF table, which averages the evaluation indexes of all benchmark on seven multi-view datasets. Specifically, for single-view datasets for each clustering method. learning methods, we choose the best ones from the clustering For BBC and BBCSports datasets, the experimental results results of different views, i.e. SCbest and GNMFbest ; for show that LRC-MCF can performs better than comparing feature concatenation method, we utilize standard Spectral single-view and multi-view clustering methods in most sit- Clustering method [38] to deal with the concatenated features; uations, especially in terms of ACC and PUR. This validates for multi-view learning methods, we perform all multi-view the effectiveness of LRC-MCF in the domain of sports news
IEEE TRANSACTIONS ON MULTIMEDIA, VOL. XXX, NO. XX, 2021 8 TABLE II: ACC(%) results on seven benchmark datasets for all methods Datasets BBC BBCSport 3Source Cora Handwritten HW2sources NGs MEAN Methods SCbest 43.94 50.74 39.64 35.82 92.82 58.50 28.96 50.06 GNMFbest 32.12 32.12 45.56 31.91 73.80 51.10 23.01 41.37 SCconcat 47.26 67.22 36.45 32.14 60.03 76.24 35.98 50.76 CCA 48.12 38.60 41.01 44.15 73.63 74.51 27.62 49.66 MSE 39.26 54.01 44.85 31.69 79.84 73.27 27.55 50.07 MultiNMF 43.36 50.00 52.66 35.60 88.00 76.75 33.61 54.28 Co-reg 40.73 61.58 45.56 39.11 91.00 82.15 27.45 55.37 AMGL 39.21 55.22 42.31 32.05 83.34 90.38 30.83 53.33 MDcR 81.05 80.24 70.53 54.51 61.26 86.56 59.06 70.45 AASC 38.39 62.32 36.69 33.09 84.65 83.30 71.00 58.49 WMSC 42.92 61.21 42.60 42.25 84.50 83.40 43.00 57.12 AWP 37.37 44.85 51.48 32.39 96.70 93.85 23.80 54.34 LRC-MCF 87.45 80.70 78.70 51.18 88.25 99.45 98.20 83.42 TABLE III: NMI(%) results on seven benchmark datasets for all methods Datasets BBC BBCSport 3Source Cora Handwritten HW2sources NGs MEAN Methods SCbest 23.04 21.20 33.53 16.35 87.78 59.41 7.39 35.53 GNMFbest 21.43 19.99 29.39 19.14 74.37 57.49 8.85 30.09 SCconcat 35.45 52.10 35.72 19.22 61.14 80.57 22.75 43.85 CCA 30.93 25.30 35.64 26.74 68.15 60.83 13.35 37.28 MSE 26.93 34.71 46.61 18.58 80.13 72.17 13.58 41.82 MultiNMF 27.57 44.69 43.98 26.28 80.25 68.69 20.37 44.55 Co-reg 23.36 41.22 44.77 22.50 88.33 85.61 7.66 46.21 AMGL 26.90 39.92 45.67 18.77 81.08 88.27 19.05 45.67 MDcR 61.47 63.94 64.43 31.30 66.65 77.71 50.14 59.34 AASC 20.14 39.11 30.21 16.63 87.48 86.14 48.02 46.81 WMSC 22.10 47.12 42.00 27.96 87.44 86.50 24.84 48.28 AWP 20.06 23.15 41.53 12.26 92.08 91.00 3.88 40.57 LRC-MCF 73.62 76.00 70.50 37.85 92.03 98.63 93.92 77.50 TABLE IV: PUR(%) results on seven benchmark datasets for all methods Datasets BBC BBCSport 3Source Cora Handwritten HW2sources NGs MEAN Methods SCbest 49.64 55.70 56.80 40.73 93.29 62.75 29.44 55.48 GNMFbest 33.43 32.99 56.80 38.40 75.10 59.10 24.88 45.81 SCconcat 51.42 68.66 55.38 38.43 61.62 79.33 38.86 56.24 CCA 52.63 46.47 54.97 50.37 73.69 74.51 29.33 54.57 MSE 45.40 56.58 65.56 37.98 80.71 76.96 28.70 55.98 MultiNMF 43.65 63.42 60.36 49.74 88.00 76.75 35.99 59.70 Co-reg 49.20 66.91 62.72 45.27 91.00 86.70 28.68 61.50 AMGL 45.40 58.93 64.97 38.30 83.34 91.22 32.96 59.30 MDcR 81.05 80.24 76.75 56.19 67.22 86.56 69.30 73.90 AASC 46.42 67.10 55.03 40.99 87.10 85.85 71.00 64.78 WMSC 48.61 68.01 60.36 47.16 87.10 86.10 46.00 63.33 AWP 45.69 51.10 63.31 35.60 96.70 93.85 24.40 58.66 LRC-MCF 87.45 84.38 82.25 52.22 88.05 99.45 98.20 84.57
IEEE TRANSACTIONS ON MULTIMEDIA, VOL. XXX, NO. XX, 2021 9 clustering, and shows that LRC-MCF can capture the locality there exists a wide range for each hyper-parameter in which relationship information and the common similarity relation- relatively stable and good results can be readily obtained. ships among multiple views. Besides, multi-view methods cannot always obtain better performance than single-view F. Convergence Analysis methods, which implies that the diversity and complementary In this section, an iteration updating scheme is adopted by information cannot be readily exploited. our proposed framework to obtain the optimal solution, thus For 3Source and Cora datasets, the experimental results convergence analysis is conducted to verify the convergence show that LRC-MCF can obtain the best performance in terms property of LRC-MCF in detail. We conduct various exper- of ACC, NMI, and PUR. Besides, most multi-view methods iments on four datasets including BBCSport, Cora, Hand- outperform single-view methods, which might imply that the written and NGs. Fig. 2 plots the curves of objective values diversity information among multiple views is benefit for of LRC-MCF with the increase of iterations with respond- the improvement of clustering performance. However, these ing to the above four datasets. According to the variation methods cannot get better than RC-MCF, which might be trend of objective values of LRC-MCF in these sub-figures caused by the issue that the geometric and complementary of Fig.2(a)-Fig.2(d), we can readily find that the objective information cannot be made full use of. value decreases rapidly during the iterations on all the four For Handwritten and HW2sources datasets, the experimen- benchmark datasets. Moreover, it’s not difficult to observe tal results also show that LRC-MCF can performs better than that the convergence can be reached within the 10 steps of comparing single-view and multi-view clustering methods in iterations. That is to say, LRC-MCF can converge within a most situations, which can validate the superiority of LRC- limited number of iterations. It also might imply that LRC- MCF in the domain of image clustering. It implies that the MCF can effectually reduce the redundancy of the learned locality relationship information in image clustering tasks is representations. However, when facing the complex and large- the important factor. Thus, how to exploit the common locality scale datasets, it’s necessary to combine it with deep learning information among multiple views is of great significance. technology [46, 47] to construct the distance function. In the For NGs dataset, the experimental results show that LRC- future, we consider how to extend superior deep models into MCF can obtain the best performance in terms of ACC, NMI, our work, which can be optimized by the end-to-end manner. and PUR. This further validates the effectiveness of LRC- MCF, which shows that LRC-MCF can fully explore the di- G. Discussion versity, geometric, consensus and complementary information among different views and take sufficient consideration to As the experiment results in Tables II - IV on multi- weights of different views in finding the common-view locality view clustering tasks, we can clearly find that LRC-MCF structure. outperforms other comparing methods in most situations. From Finally, the columns of MEAN in Tables II - IV validate the the above evaluations, it’s readily seen that the embeddings comprehensive performance of LRC-MCF, the experimental obtained by our method could be more effective and suitable results show that LRC-MCF can performs better than com- for multi-view features. Besides, other multi-view methods paring clustering methods in terms of ACC, NMI, and PUR. outperform the other single-view methods in most situations, It can imply that LRC-MCF not only takes full of the locality which could show multi-view learning is a valuable research geometric structure and similarity relationship among samples field indeed. Compared to single view methods, taking the under the multi-view scenario, but explores the diversity, complementary information among different views into con- consensus and complementary information among different sideration could achieve more excellent performance, and the views. Thus, LRC-MCF can obtain the best performance on key is how to integrate the complementary information among seven benchmark datasets in most situations. different views while preserving its intrinsic characteristic in each view. According to Fig. 1, we find that the number K of nearest neighbors used in our method is relatively smooth E. Parameter Analysis to the K over the relatively large ranges of values. Note The proposed LRC-MCF is based on the locality relation- that the experimental results of our proposed LRC-MCF on ships constraint, thus we need to mainly discuss the sensitivity seven datasets are without fine-tuning, and usage of fine-tuning analysis on the number K of nearest neighbors used in LRC- might further improve its performance. For the discussion MCF in this section. Specifically, we conduct the clustering of convergence, according to Fig.2, it could be empirically experiments for LRC-MCF on BBC, 3Source, HW2sources verified that the proposed LRC-MCF converges once enough and NGs datasets, then the average value of all evaluation iterations are completed. That is, LRC-MCF could converge indexes is used as final criteria. Fig. 1 plots the clustering within a limited number of iterations. results in terms of ACC, NMI, and PUR on these four datasets, under different K in {10, 20, 30, 40, 50, 60, 70, 80, 90, V. C ONCLUSION 100, 110, 120, 130}. Through these variation trends of ACC, In this paper, we propose a novel multi-view learning NMI, and PUR over these four datasets, it’s easy to find that method to explore the problem of multi-view clustering, our method is relatively smooth to the K over the relatively called Locality Relationship Constrained Multi-view Clus- large ranges of values, which indicates that the performance is tering Framework (LRC-MCF). LRC-MCF attempts to cap- not so sensitive to those hyper-parameters. More importantly, ture the relationship information among samples under the
IEEE TRANSACTIONS ON MULTIMEDIA, VOL. XXX, NO. XX, 2021 10 (a) ACC in BBC dataset (b) NMI in BBC dataset (c) PUR in BBC dataset (d) ACC in 3Source dataset (e) NMI in 3Source dataset (f) PUR in 3Source dataset (g) ACC in Handwritten dataset (h) NMI in Handwritten dataset (i) PUR in Handwritten dataset (j) ACC in NGs dataset (k) NMI in NGs dataset (l) PUR in NGs dataset Fig. 1: K parameter analysis of LRC-MCF on four datasets
IEEE TRANSACTIONS ON MULTIMEDIA, VOL. XXX, NO. XX, 2021 11 (a) BBCSport dataset (b) Cora dataset (c) Handwritten dataset (d) NGs dataset Fig. 2: Objective values of LRC-MCF on four datasets constraint of locality structure and integrate the relationship ACKNOWLEDGMENT from different views into one common similarity relationships matrix. Moreover, LRC-MCF provide an adaptive weight The authors would like to thank the anonymous review- allocation mechanism to take sufficient consideration of the ers for their insightful comments and the suggestions to importance of different views in finding the common-view significantly improve the quality of this paper. The work similarity relationships. To effectually reduce the redundancy was supported by the National Natural Science Founda- of the learned representations, the low-rank constraint on tion of PRChina(72001191), Henan Natural Science Founda- the common similarity matrix is additionally considered. tion(202300410442), Henan Philosophy and Social Science Correspondingly, this paper provides an algorithm based on Program(2020CZH009)is gratefully acknowledged. We also the alternating direction minimization strategy to iteratively would like to thank the anonymous. calculate all variables of LRC-MCF. Finally, we conduct extensive experiments on seven benchmark multi-view datasets R EFERENCES to validate the performance of LRC-MCF, and the experi- mental results show that the proposed LRC-MCF outperforms [1] C. Zhang, J. Cheng, and Q. Tian, “Multi-view image clas- its counterparts and achieves comparable performance. In sification with visual, semantic and view consistency,” the future, we will incorporate our proposed LRC-MCF and IEEE Transactions on Image Processing, vol. 29, pp. graph-based deep models [48] to further explore multi-view 617–627, 2019. information in those complex situations. [2] H. Wang, Y. Yang, and B. Liu, “Gmc: Graph-based multi- view clustering,” IEEE Transactions on Knowledge and Data Engineering, vol. 32, no. 6, pp. 1116–1129, 2019. [3] M. R. Amini, N. Usunier, and C. Goutte, “Learning from multiple partially observed views - an application to
IEEE TRANSACTIONS ON MULTIMEDIA, VOL. XXX, NO. XX, 2021 12 multilingual text categorization,” in Advances in Neural [20] L. Tian, F. Nie, and X. Li, “A unified weight learning Information Processing Systems, 2009, pp. 28–36. paradigm for multi-view learning,” in Proceedings of [4] G. Bisson and C. Grimal, “Co-clustering of multi-view Machine Learning Research, vol. 89. PMLR, 2019, pp. datasets: a parallelizable approach,” in International Con- 2790–2800. ference on Data Mining. IEEE, 2012, pp. 828–833. [21] D. R. Hardoon, S. Szedmak, and J. Shawe-Taylor, [5] Y. Li, M. Yang, and Z. M. Zhang, “A survey of multi- “Canonical correlation analysis: An overview with ap- view representation learning,” IEEE Transactions on plication to learning methods,” Neural Computation, Knowledge and Data Engineering, vol. 31, no. 10, pp. vol. 16, no. 12, pp. 2639–2664, 2004. 1863–1883, 2018. [22] A. Gretton, O. Bousquet, A. Smola, and B. Schölkopf, [6] H. Zhao and Z. Ding, “Multi-view clustering via deep “Measuring statistical dependence with hilbert-schmidt matrix factorization,” in AAAI, 2017, pp. 2921–2927. norms,” in International conference on algorithmic learn- [7] C. Xu, D. Tao, and C. Xu, “Multi-view intact space ing theory. Springer, 2005, pp. 63–77. learning,” IEEE Transactions on Pattern Analysis and [23] J. Rupnik and J. Shawe-Taylor, “Multi-view canonical Machine Intelligence, vol. 37, no. 12, pp. 2531–2544, correlation analysis,” in Conference on Data Mining and 2015. Data Warehouses, 2010, pp. 1–4. [8] M. Kan, S. Shan, and X. Chen, “Multi-view deep network [24] A. Sharma, A. Kumar, H. Daume, and D. W. Jacobs, for cross-view classification,” in Proceedings of the IEEE “Generalized multiview analysis: A discriminative latent Conference on Computer Vision and Pattern Recognition, space,” in 2012 IEEE Conference on Computer Vision 2016, pp. 4847–4855. and Pattern Recognition. IEEE, 2012, pp. 2160–2167. [9] H. Wang, Y. Yang, B. Liu, and H. Fujita, “A [25] M. Kan, S. Shan, H. Zhang, S. Lao, and X. Chen, study of graph-based system for multi-view clustering,” “Multi-view discriminant analysis,” IEEE Transactions Knowledge-Based Systems, vol. 163, no. JAN.1, pp. on Pattern Analysis and Machine Intelligence, vol. 38, 1009–1019, 2019. no. 1, pp. 188–194, 2016. [10] H. Wang, Y. Wang, Z. Zhang, X. Fu, L. Zhuo, M. Xu, [26] Niu, D., Dy, J, Jordan, and a, “Iterative discovery of and M. Wang, “Kernelized multiview subspace analysis multiple alternativeclustering views,” IEEE Transactions by self-weighted learning,” IEEE Transactions on Multi- on Pattern Analysis and Machine Intelligence, vol. 36, media, 2020. no. 7, pp. 1340–1353, 2014. [11] Z. Zheng, L. Li, F. Shen, S. H. Tao, and S. Ling, “Binary [27] X. Cao, C. Zhang, H. Fu, S. Liu, and H. Zhang, multi-view clustering,” IEEE Transactions on Pattern “Diversity-induced multi-view subspace clustering,” in Analysis and Machine Intelligence, vol. 41, no. 7, pp. Computer Vision and Pattern Recognition, 2015, pp. 1774–1782, 2018. 586–594. [12] Y. Yang and H. Wang, “Multi-view clustering: A survey,” [28] C. Zhang, H. Fu, Q. Hu, P. Zhu, and X. Cao, “Flexible Big Data Mining and Analytics, vol. 1, no. 2, pp. 83–107, multi-view dimensionality co-reduction,” IEEE Transac- 2018. tions on Image Processing, vol. 26, no. 2, pp. 648–659, [13] L. Fu, P. Lin, A. V. Vasilakos, and S. Wang, “An overview 2016. of recent multi-view clustering,” Neurocomputing, vol. [29] W. Wang and Z. H. Zhou, “A new analysis of co- 402, pp. 148–161, 2020. training,” in International Conference on International [14] Y. Yang and H. Wang, “Multi-view clustering: A survey,” Conference on Machine Learning, 2010. Big Data Mining and Analytics, vol. 1, no. 2, pp. 83–107, [30] A. Kumar and H. Daumé, “A co-training approach for 2018. multi-view spectral clustering,” in Proceedings of the [15] G. Chao, S. Sun, and J. Bi, “A survey on multi-view 28th International Conference on Machine Learning, clustering,” arXiv preprint arXiv:1712.06246, 2017. 2011, pp. 393–400. [16] T. Xia, D. Tao, T. Mei, and Y. Zhang, “Multiview spectral [31] H. Wang, J. Peng, and X. Fu, “Co-regularized multi-view embedding,” IEEE Transactions on Systems, Man, and sparse reconstruction embedding for dimension reduc- Cybernetics, Part B (Cybernetics), vol. 40, no. 6, pp. tion,” Neurocomputing, vol. 347, pp. 191–199, 2019. 1438–1446, 2010. [32] H. Wang, J. Peng, Y. Zhao, and X. Fu, “Multi-path deep [17] F. Nie, J. Li, and X. Li, “Parameter-free auto-weighted cnns for fine-grained car recognition,” IEEE Transactions multiple graph learning: A framework for multiview clus- on Vehicular Technology, vol. 69, no. 10, pp. 10 484– tering and semi-supervised classification,” in Proceedings 10 493, 2020. of the Twenty-Fifth International Joint Conference on [33] H. Wang, J. Peng, D. Chen, G. Jiang, T. Zhao, and X. Fu, Artificial Intelligence. AAAI Press, 2016, pp. 181–187. “Attribute-guided feature learning network for vehicle [18] F. Nie, G. Cai, J. Li, and X. Li, “Auto-weighted multi- reidentification,” IEEE MultiMedia, vol. 27, no. 4, pp. view learning for image clustering and semi-supervised 112–121, 2020. classification,” IEEE Transactions on Image Processing, [34] H. Wang, J. Peng, G. Jiang, F. Xu, and X. Fu, “Discrim- vol. 27, no. 3, pp. 1501–1511, 2017. inative feature and dictionary learning with part-aware [19] S. Huang, Z. Kang, and Z. Xu, “Self-weighted multi- model for vehicle re-identification,” Neurocomputing, view clustering with soft capped norm,” Knowledge- 2020. Based Systems, vol. 158, no. 15, pp. 1–8, 2018. [35] Y. Zhu, Y. Xu, F. Yu, S. Wu, and L. Wang,
IEEE TRANSACTIONS ON MULTIMEDIA, VOL. XXX, NO. XX, 2021 13 “Cagnn: Cluster-aware graph neural networks for unsu- Xiangzhu Meng received his B.S. degree from pervised graph representation learning,” arXiv preprint Anhui University, in 2015, and the Ph.D. degree in Computer Science and Technology from Dalian arXiv:2009.01674, 2020. University of Technology, in 2021. Now, he is a [36] A. Kumar, P. Rai, and H. Daume, “Co-regularized multi- postdoctoral researcher with the Center for Research view spectral clustering,” in Advances in Neural Infor- on Intelligent Perception and Computing, Institute of Automation, Chinese Academy of Sciences, China. mation Processing Systems, 2011, pp. 1413–1421. He regularly publishes papers in prestigious journals, [37] R. Bhatia, Matrix analysis. Springer Science & Business including Knowledge-Based Systems, Engineering Media, 2013, vol. 169. Applications of Artificial Intelligence, Neurocom- puting, etc. In addition, he serves as a reviewer for [38] V. L. Ulrike, “A tutorial on spectral clustering,” Statistics ACM Transactions on Multimedia Computing Communications and Applica- and Computing, vol. 17, no. 4, pp. 395–416, 2007. tions. His research interests include multi-modal learning, deep learning, and [39] D. Cai, X. He, J. Han, and T. S. Huang, “Graph vision-language modeling. regularized non-negative matrix factorization for data representation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 33, no. 8, pp. 1548–1560, 2011. [40] J. Liu, C. Wang, J. Gao, and J. Han, “Multi-view clustering via joint nonnegative matrix factorization,” in Proceedings of the 2013 SIAM International Conference on Data Mining. SIAM, 2013, pp. 252–260. [41] C. Zhang, H. Fu, Q. Hu, P. Zhu, and X. Cao, “Flexible multi-view dimensionality co-reduction,” IEEE Transac- tions on Image Processing, pp. 648–659, 2016. [42] H.-C. Huang, Y.-Y. Chuang, and C.-S. Chen, “Affinity Wei Wei received the B.S. degree from the School aggregation for spectral clustering,” in 2012 IEEE Con- of Mathematics and Information Science, Henan University, in 2012, and the Ph.D. degree in Institute ference on Computer Vision and Pattern Recognition. of Systems Engineering from the Dalian University IEEE, 2012, pp. 773–780. of Technology, in 2018. He is an associate professor [43] L. Zong, X. Zhang, X. Liu, and H. Yu, “Weighted multi- of the Center for Energy, Environment & Econ- omy Research, College of Tourism Management, view spectral clustering based on spectral perturbation,” Zhengzhou University. His research interests include in Proceedings of the AAAI Conference on Artificial energy policy analysis, text mining, machine learn- Intelligence, vol. 32, no. 1, 2018. ing, and artificial intelligence. [44] F. Nie, L. Tian, and X. Li, “Multiview clustering via adaptively weighted procrustes,” in Proceedings of the 24th ACM SIGKDD international conference on knowl- edge discovery & data mining, 2018, pp. 2022–2030. [45] J. A. Hartigan and M. A. Wong, “A k-means clustering algorithm,” Applied Statistics, vol. 28, no. 1, 1979. [46] Y. Zhu, Y. Xu, F. Yu, Q. Liu, S. Wu, and L. Wang, “Deep graph contrastive representation learning,” in ICML Workshop on Graph Representation Learning and Beyond, 2020. [47] ——, “Graph contrastive learning with adaptive aug- mentation,” in Proceedings of the Web Conference 2021, Wenzhe Liu received her BS degree from Qingdao 2021, pp. 2069–2080. University of Science and Technology, in 2013 and [48] Y. Zhu, W. Xu, J. Zhang, Q. Liu, S. Wu, and L. Wang, received MS degree from Liaoning Normal Univer- “Deep graph structure learning for robust representations: sity . Now she is working towards the PHD degree in School of Computer Science and Technology, Dalian A survey,” arXiv preprint arXiv:2103.03036, 2021. University of Technology, China. Her research inter- ests include mulit-view learning, deep learning and data mining.
You can also read