Similarity Search for Web Services
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Similarity Search for Web Services
Xin Dong Alon Halevy Jayant Madhavan Ema Nemes Jun Zhang
{lunadong, alon, jayant, enemes, junzhang}@cs.washington.edu
University of Washington, Seattle
Abstract The growing number of web services available
within an organization and on the Web raises a
Web services are loosely coupled software compo- new and challenging search problem: locating de-
nents, published, located, and invoked across the web. sired web services. In fact, to address this problem,
The growing number of web services available within an several simple search engines have recently sprung
organization and on the Web raises a new and challeng- up [1, 2, 3, 4]. Currently, these engines provide only
ing search problem: locating desired web services. Tradi- simple keyword search on web service descriptions.
tional keyword search is insufficient in this context: the As one considers search for web services in more
specific types of queries users require are not captured,
detail, it becomes apparent that the keyword search
the very small text fragments in web services are unsuit-
able for keyword search, and the underlying structure paradigm is insufficient for two reasons. First, key-
and semantics of the web services are not exploited. words do not capture the underlying semantics of
We describe the algorithms underlying the Woogle web services. Current web service search engines re-
search engine for web services. Woogle supports similar- turn a particular service if its functionality descrip-
ity search for web services, such as finding similar web- tion contains the keywords in the query; such search
service operations and finding operations that compose may miss results. For example, when searching zip-
with a given one. We describe novel techniques to sup- code, the web services whose descriptions contain
port these types of searches, and an experimental study term zip or postal code but not zipcode will not be
on a collection of over 1500 web-service operations that returned.
shows the high recall and precision of our algorithms. Second, keywords do not suffice for accurately
specifying users’ information needs. Since a web-
1 Introduction service operation is going to be used as part of an
application, users would like to specify their search
Web services are loosely coupled software compo- criteria more precisely than by keywords. Current
nents, published, located, and invoked across the web-service search engines often enable a user to ex-
web. A web service comprises several operations plore the details of a particular web-service opera-
(see examples in Figure 1). Each operation takes tion, and in some cases to try it out by entering
a SOAP package containing a list of input param- an input value. Nevertheless, investigating a single
eters, fulfills a certain task, and returns the result web-service operation often requires several brows-
in an output SOAP package. Large enterprises are ing steps. Once users drill down all the way and find
increasingly relying on web services as methodology the operation inappropriate for some reason, they
for large-scale software development and sharing of want to be able to find similar operations to the
services within an organization. If current trends ones just considered, as opposed to laboriously fol-
continue, then in the future many applications will lowing parallel browsing patterns. Similarly, users
be built by piecing together web services published may want to find operations that take similar inputs
by third-party producers. (respectively, outputs), or that can compose with the
current operation being browsed.
Permission to copy without fee all or part of this material is To address the challenges involved in searching for
granted provided that the copies are not made or distributed web services, we built Woogle1 , a web-service search
for direct commercial advantage, the VLDB copyright notice
and the title of the publication and its date appear, and no- engine. In addition to simple keyword searches,
tice is given that copying is by permission of the Very Large Woogle supports similarity search for web services.
Data Base Endowment. To copy otherwise, or to republish, A user can ask for web-service operations similar
requires a fee and/or special permission from the Endowment. to a given one, those that take similar inputs (or
Proceedings of the 30th VLDB Conference,
Toronto, Canada, 2004 1 See http://www.cs.washington.edu/woogle
372W1 : Web Service: GlobalWeather mental evaluation. Section 7 discusses other types
Operation: GetTemperature of search that Woogle supports, and Section 8 con-
Input: Zip
cludes.
Output: Return
W2 : Web Service: WeatherFetcher
Operation: GetWeather 2 Related Work
Input: PostCode
Finding similar web-service operations is closely re-
Output: TemperatureF, WindChill, Humidity
W3 : Web Service: GetLocalTime
lated to three other matching problems: text doc-
Operation: LocalTimeByZipCode ument matching, schema matching, and software
Input: Zipcode component matching.
Output: LocalTimeByZipCodeResult Text document matching: Document matching
W4 : Web Service: PlaceLookup and classification is a long-standing problem in infor-
Operation1: CityStateToZipCode mation retrieval (IR). Most solutions to this problem
Input: City, State (e.g. [10, 20, 27, 19]) are based on term frequency
Output: ZipCode
analysis. However, these approaches are insufficient
Operation2: ZipCodeToCityState
Input: ZipCode in the web service context because text documenta-
Output: City, State tions for web-service operations are highly compact,
and they ignore structure information that aids cap-
Figure 1: Several example web services (not including turing the underlying semantics of the operations.
their textual descriptions). Note that each web service
includes a set of operations, each with input and out- Schema matching: The database community has
put parameters. For example, web services W1 and W2 considered the problem of automatically matching
provide weather information. schemas [24, 12, 13, 22]. The work in this area
has developed several methods that try to capture
outputs), and those that compose with a given one.
clues about the semantics of the schemas, and sug-
This paper describes the novel techniques we have
gest matches based on them. Such methods include
developed to support these types of searches, and
linguistic analysis, structural analysis, the use of do-
experimental evidence that shows the high accuracy
main knowledge and previous matching experience.
of our algorithms. In particular, our contributions
However, the search for similar web-service opera-
are the following:
tions differs from schema matching in two significant
1. We propose a basic set of search functionali- ways. First, the granularity of the search is differ-
ties that an effective web-service search engine ent: operation matching can be compared to finding
should support. a similar schema, while schema matching looks for
2. We describe algorithms for supporting similar- similar components in two given schemas that are
ity search. Our algorithms combine multiple assumed to be related. Second, the operations in a
sources of evidence in order to determine simi- web service are typically much more loosely related
larity between a pair of web-service operations. to each other than are tables in a schema, and each
The key ingredient of our algorithm is a novel web service in isolation has much less information
clustering algorithm that groups names of pa- than a schema. Hence, we are unable to adapt tech-
rameters of web-service operations into seman- niques for schema matching to this context.
tically meaningful concepts. These concepts are Software component matching: Software com-
then leveraged to determine similarity of inputs ponent matching is considered important for soft-
(or outputs) of web-service operations. ware reuse. [28] formally defines the problem by ex-
3. We describe a detailed experimental evaluation amining signature (data type) matching and spec-
on a set of over 1500 web-service operations. ification (program behavior) matching. The tech-
The evaluation shows that we can provide both niques employed there require analysis of data types
high precision and recall for similarity search, and post-conditions, which are not available for web
and that our techniques substantially improve services.
on naive keyword search. Some recent work (e.g., [9, 23]) has proposed an-
notating web services manually with additional se-
The paper is organized as follows. Section 2 be- mantic information, and then using these annota-
gins by placing our search problem in the context tions to compose services [8, 26]. In our context,
of the related work. Section 3 formally defines the annotating the collection of web services is infeasi-
similarity search problem for web services. Sec- ble, and we rely on only the information provided in
tion 4 describes the algorithm for clustering param- the WSDL file and the UDDI entry.
eter names, and Section 5 describes the similarity In [15] the authors studied the supervised classi-
search algorithm. Section 6 describes our experi- fication and unsupervised clustering of web services.
373Our work differs in that we are doing unsupervised that the users have already explored a web service
matching at the operation level, rather than super- in detail. Suppose they explored the operation Get-
vised classification at the entire web service level. Temperature in W1 . We identify the following im-
Hence, we face the challenge of understanding oper- portant similarity search queries they may want to
ations in a web service from very limited amount of pose:
information. Similar operations: Find operations with similar
functionalities. For example, the web-service oper-
3 Web Service Similarity Search ation GetWeather in W2 is similar to the operation
We begin by briefly describing the structure of web GetTemperature in W1 . Note that we are searching
services, and then we motivate and define the search for specific operations that are similar, rather than
problem we address. similar web services. The latter type of search is
typically too coarse for our needs. There is no for-
3.1 The Structure of Web Services mal definition for operation similarity, because, just
like in other types of search, similarity depends on
Each web service has an associated WSDL file de- the specific goal in the user’s mind. Intuitively, we
scribing its functionality and interface. A web ser- consider operations to be similar if they take similar
vice is typically (though not necessarily) published inputs, produce similar outputs, and the relation-
by registering its WSDL file and a brief description ships between the inputs and outputs are similar.
in UDDI business registries. Each web service con-
Similar inputs/outputs: Find operations with
sists of a set of operations. For each web service, we
similar inputs. As a motivating example for such
have access to the following information:
a search, suppose our goal is to collect a variety
• Name and text description: A web service of information about locations. While W1 provides
is described by a name, a text description in the weather, operations LocalTimeByZipCode in W3 and
WSDL file, and a description that is put in the ZipCodeToCityState in W4 provide other information
UDDI registry. about locations, and thereby may be of interest to
• Operation descriptions: Each operation is the user.
described by a name and a text description in Alternatively, we may want to search for opera-
the WSDL file. tions with similar outputs, but different inputs. For
example, we may be looking for temperature, but
• Input/Output descriptions: Each input and
the operation we are considering takes zipcode as
output of an operation contains a set of param-
input, while we need one that takes city and state
eters. For each parameter, the WSDL file de-
as input.
scribes the name, data type and arity (if the
parameter is of array type). Parameters may Composible operations: Find operations that
be organized in a hierarchy by using complex can be composed with the current one. One of the
types. key promises of building applications with web ser-
vices is that one should be able to compose a set of
3.2 Searching for Web Services given services to create ones that are specific to the
application’s needs. In our example, there are two
To motivate similarity search for web services, con- opportunities for composition. In the first case, the
sider the following typical scenario. Users begin a output of the operation is similar to the input of the
search for web services by entering keywords rele- given operation, such as CityStateToZipCode in W4 .
vant to the search goal. They then start inspecting Composing CityStateToZipCode with GetWeather in
some of the returned web services. Since the result W1 offers another option for getting the weather
of the search is rather complex, the users need to when the zipcode is not known. In the second case,
drill down in several steps. They first decide which the output of the given operation may be similar to
web service to explore in detail, and then consider the input of another operation; e.g., one that trans-
which specific operations in that service to look at. forms Centigrade and Fahrenheit and thereby pro-
Given a particular operation, they will look at each duces results in the desired scale.
of its inputs and outputs, and if the engine provides
a try it feature, they will try entering some value for In this paper we focus on the following two problems,
the inputs. from which we can easily build up the above search
At this point, the users may find that the web ser- capabilities.
vice is inappropriate for some reason, but not want Operation matching: Given a web-service opera-
to have to repeat the same process for each of other tion, return a list of similar operations. ¤
potentially relevant services. Hence, our goal is to Input/output matching: Given the input (respec-
provide a more direct method for searching, given tively, output) of a web-service operation, return a
374list of web-service operations with similar inputs (re- referred to as terms. We exploit the co-occurrence
spectively, outputs). ¤ of terms in web service inputs and outputs to clus-
We note that these two problems are also at the ter terms into meaningful concepts. As we shall see
core of two other types of search that Woogle sup- later, using these concepts, in addition to the orig-
ports (See Section 7): template search and composi- inal terms, greatly improves our ability to identify
tion search. Template search goes beyond keyword similar inputs/outputs and hence find similar web
search by specifying the functionality, input and out- service operations.
put of a desired operation. Composition search re- Applying an off-the-shelf text clustering algo-
turns not only single operations, but also composi- rithm directly to our context does not perform well
tions of operations that fulfill the user’s need. because the web service inputs/outputs are sparse.
For example, whereas synonyms tend to occur in the
3.3 Overview of Our Approach same document in an IR application, they seldom oc-
cur in the same operation input/output; therefore,
Similarity search for web services is challenging be- they will not get clustered. Our clustering algorithm
cause neither the textual descriptions of web services is a refinement of agglomerative clustering. We begin
and their operations nor the names of the input and by describing a particular kind of association rules
output parameters completely convey the underly- that capture our notion of term co-occurrence and
ing semantics of the operation. Nevertheless, knowl- then describe the clustering algorithm.
edge of the semantics is important to determining
similarity between operation.
Broadly speaking, our algorithm combines mul- 4.1 Clustering Parameters by Association
tiple sources of evidences to determine similarity. We base our clustering on the following heuristic:
In particular, it will consider similarity between the parameters tend to express the same concept if they
textual descriptions of the operations and of the en- occur together often. This heuristic is validated by
tire web services, and similarity between the param- our experimental results. We use it to cluster pa-
eter names of the operations. The key ingredient of rameters by exploiting their conditional probabilities
the algorithm is a technique that clusters parameter of occurrence in inputs and outputs of web-service
names in the collection of web services into seman- operations. Specifically, we are interested in associ-
tically meaningful concepts. By comparing the con- ation rules of the form:
cepts that input or output parameters belong to, we
t1 → t2 (s, c)
are able to achieve good similarity measures. Sec-
tion 4 describes the clustering algorithm, and Sec- In this rule, t1 and t2 are two terms. The support, s,
tion 5 describes how we combine the multiple sources is the probability that t1 occurs in an input/output;
of evidence. i.e., s = P (t1 ) = kIOk
kIOt1 k
, where ||IO|| is the to-
tal number of inputs and outputs of operations, and
4 Clustering Parameter Names ||IOt1 || is the number of inputs and outputs that
To effectively match inputs/outputs of web-service contain t1 . The confidence, c, is the probability that
operations, it is crucial to get at their underlying t2 occurs in an input or output, given that t1 is
kIO k
semantics. However, this is hard for two reasons. known to occur in it; i.e., c = P (t2 |t1 ) = kIOt1t,tk2 ,
1
First, parameter naming is dependent on the devel- where ||IOt1 ,t2 || is the number of inputs and out-
opers’ whim. Parameter names tend to be highly puts that contain both t1 and t2 . Note that the rule
varied given the use of synonyms, hypernyms, and t1 → t2 (s12 , c12 ) and the rule t2 → t1 (s21 , c21 ) may
different naming rules. They might even not be com- have different support and confidence values. These
posed of proper English words—there may be mis- rules can be efficiently computed using the A-Priori
spellings, abbreviations, etc. Therefore, lexical ref- algorithm [7].
erences, such as Wordnet [5], are hard to apply. Sec-
ond, inputs/outputs typically have few parameters,
4.2 Criteria for Ideal Clustering
and the associated WSDL files rarely provide rich
descriptions for parameters. Traditional IR tech- Ideally, parameter clustering results should have the
niques, such as TF/IDF [25] and LSI [11], rely on following two features:
word frequencies to capture the underlying seman-
tics and thus do not apply well. 1. Frequent and rare parameters should be left
A parameter name is typically a sequence of unclustered; strongly connected parameters in-
concatenated words (not necessarily proper English between are clustered into concepts. First,
words), with the first letter of every word capitalized not clustering frequent parameters is consistent
(e.g., LocalTimeByZipCodeResult). Such words are with the IR community’s observation that such
375technique leads to the best performance in au- 4.3.1 The basic agglomeration algorithm
tomatic query expansion [16]. Second, leaving
Agglomerative clustering is a bottom-up version of
rare parameters unclustered avoids over-fitting.
hierarchical clustering. Each object is initialized to
2. The cohesion of a concept—the connections be- be a cluster of its own. In general, at each iteration
tween parameters inside the concept—should the two most similar clusters are merged until no
be strong; the correlation between concepts— more clusters can be merged.
the connections between parameters in different In our context, each term is initialized to be a
concepts—should be weak. cluster of its own; i.e., there are as many clusters
Traditionally, cohesion is defined as the sum of as terms. The algorithm proceeds in a greedy fash-
squares of Euclidean distances from each point to ion. It sorts the association rules in descending order
the center of the cluster it belongs to; correlation is first by the confidence and then by the support. In-
defined as the sum of squares of distances between frequent rules with less than a minimum support ts
cluster centers [14]. This definition does not apply are discarded. At every step, the algorithm chooses
well in our context because of “the curse of dimen- the highest ranked rule that has not been consid-
sionality”: our feature sets are so large that a Eu- ered previously. If the two terms in the rule belong
clidean distance measure is no longer meaningful. to different clusters, the algorithm merges the clus-
We hence quantify the cohesion and correlation of ters. Formally, the condition that triggers merging
clusters based on our association rules. cluster I and J is
We say that t1 is closely associated to t2 if the rule ∃i ∈ I, j ∈ J . i → j(s > ts , c > tc )
t1 → t2 has a confidence greater than threshold tc . where i and j are terms. The threshold ts is cho-
The threshold tc is chosen manually to be the value sen to control the clustering of terms that do not
that best separates correlated and uncorrelated pairs occur frequently. We note that in our experiments
of terms. the results of operation and input/output matching
Given a cluster I, we define the cohesion of I as are not sensitive on the values of ts and tc .
the percentage of closely associated term pairs over
all term pairs. Formally, 4.3.2 Increasing cluster cohesion
k {i, j | i, j ∈ I, i 6= j, i → j(c > tc )} k The basic agglomerative algorithm merges two clus-
cohI =
||I||(||I|| − 1) ters together when any two terms in the two clusters
are closely associated. The merge condition is very
where i → j(c > tc ) is the association rule for term
loose and can easily result in low cohesion of clus-
i and j. As a special case, the cohesion of a single-
ters. To illustrate, suppose there is a concept for
term cluster is 1.
weather, containing temperature as a term, and a
Given clusters I and J, we define the correlation
concept for address, containing zip as a term. If,
between I and J as the percentage of closely associ-
when operations report temperature, they often re-
ated cross-cluster term pairs. Formally,
port the area zipcode as well, then the confidence of
C(I, J) + C(J, I) rule temperature → zip is high. As a result, the basic
corIJ =
2 k I kk J k algorithm will inappropriately combine the weather
where C(I, J) =k {i, j | i ∈ I, j ∈ J, i → j(c > tc )} k. concept and the address concept.
To measure the overall quality of a clustering C, The cohesion of a cluster is decided by the associ-
we define the cohesion/correlation score as ation of each pair of terms in the cluster. To ensure
P
P that we obtain clusters with high cohesion, we merge
I∈C cohI
kCk (||C|| − 1) I∈C cohI two clusters only if they satisfy a stricter condition,
scoreC = P = P
I,J∈C,I6=J corIJ 2 I,J∈C,I6=J corIJ called cohesion condition.
kCk(kCk−1)/2
Given a cluster C, a term is called a kernel term
if it is closely associated with at least half2 of the
The cohesion/correlation score captures the remaining terms in C. Our cohesion condition re-
trade-off between having a high cohesion score and quires that all the terms in the merged cluster be
a low correlation score. Our goal is to obtain a kernel terms. Formally, we merge two clusters I
high scoreC that will indicate tight connections in- and J only if they satisfy the cohesion condition:
side clusters and loose connections between clusters.
∀i ∈ I ∪ J . k {j | j ∈ I ∪ J, i 6= j, i → j(c > tc )} k
4.3 Clustering Algorithm 1
≥ (||I|| + ||J|| − 1)
We can now describe our clustering algorithm as a 2
series of refinements to the classical agglomerative 2 We tried different values for this fraction and found 1
2
clustering [18]. yielded the best results.
376• If I 0 6= I, J 0 6= J, then again, merging I and J
I I I
I’ = I I’ I - I’ I’ I - I’ directly disobeys the cohesion condition. There
J J J are two options: one is to split I into I 0 and
J’ = J J’ = J J’ J-J’ I −I 0 , split J into J 0 and J −J 0 , and then merge
(a) (b) (c) I 0 with J 0 (see Figure 2(c)); the other is not
Figure 2: Splitting and merging clusters to split or merge. We choose an option in two
steps: the first step checks whether in the first
4.3.3 Splitting and Merging option, the merged result satisfies the cohesion
A greedy algorithm pursues local optimal solutions condition; if so, the second step computes the
at each step, but usually cannot obtain the global cohesion/correlation score for each option, and
optimal solution. In parameter clustering, an inap- chooses the option with a higher score.
propriate clustering decision at an early stage may After the above processing, the merged cluster
prevent subsequent appropriate clustering. Consider necessarily satisfies the cohesion condition. How-
the case where there is a cluster for zipcode {zip, ever, the clusters that are split from the original
code}, formed because of the frequent occurrences of clusters may not. To ensure cohesion, we further
parameter ZipCode. Later we need to decide whether split such clusters: each time, we split the cluster
to merge this cluster with another cluster for address into two, one containing all kernel terms, and the
{state, city, street}. The term zip is closely associ- other containing the rest. We repeat splitting un-
ated with state, city and street, but code is not be- til eventually all result clusters satisfy the cohesion
cause it also occurs often in other parameters such condition. Note that applying such splitting strat-
as TeamCode and ProxyCode, which typically do not egy on an arbitrary cluster may generate clusters of
co-occur with state, city or street. Consequently, the small size. Therefore, we do not merge two clusters
two clusters cannot merge; the clustering result con- directly (without applying the above judgment) and
trasts with the ideal one: {state, city, street, zip} and then split the merged cluster.
{code}.
The solution to this problem is to split already- Remark 4.1. Our splitting-and-clustering tech-
formed clusters so as to obtain a better set of clusters nique is different from the dynamic modeling in the
with a higher cohesion/correlation score. Formally, Chameleon algorithm [17], which also first splits and
given clusters I and J, we denote then merges. We do splitting and clustering at each
step of the greedy algorithm. The Chameleon al-
I 0 = {i | i ∈ I, ||{j | j ∈ I ∪ J, i → j(c > tc )}|| gorithm first considers the whole set of parameters
1 as a big cluster and splits it into relatively small
≥ (||I|| + ||J|| − 1)
2 sub-clusters, and then repeatedly merges these sub-
J 0 = {j | j ∈ J, ||{i | i ∈ I ∪ J, j → i(c > tc )}|| clusters. ¤
1
≥ (||I|| + ||J|| − 1) (1) 4.3.4 Removing noise
2
0 0
Intuitively, I (respectively, J ) denotes the set of Even with splitting, the results may still have terms
terms in I that are closely associated with terms in that do not express the same concept as other terms
the union of I and J. Our algorithm makes splitting in its cluster. We call such terms noise terms. To il-
decision depending on which of the four following lustrate how noise terms can be formed, we continue
cases occurs: with the zipcode example. Suppose there is a clus-
ter for address {city, state, street, zip, code}, where
• If I 0 = I, J 0 = J, then I and J can be merged code is a noise term. The cluster is formed because
directly (see Figure 2(a)). the rules zip → city, zip → state, and zip → street all
• If I 0 6= I, J 0 = J, then merging I and J di- have very high confidence, e.g., 90%; even if the rule
rectly disobeys the cohesion condition. There code → zip has a lower confidence, e.g., 50%, the
are two options: one is to split I into I 0 and rules code → city, code → state, and code → street
I − I 0 , and then merge I 0 with J (see Figure can still have high confidence.
2(b)); the other is not to split or merge. We de- We use the following heuristic to detect noise
cide in two steps: the first step checks whether terms. A term is considered to be noise if in half
the merged result in the first option satisfies the of its occurrences there are no other terms from the
cohesion condition; if so, the second step com- same concept. After one pass of the greedy algo-
putes the cohesion/correlation score for each rithm (considering all association rules above a given
option, and chooses the option with a higher threshold), we scan the resulting concepts to remove
score. The decision is similar for the case where noise terms. Formally, for a term t, denote ||IOt ||
J 0 6= J, I 0 = I. as the number of inputs/outputs that contain t, and
377procedure MergeParameters(T , R) return (C) 4.4 Clustering Results
// T is the term set, R is the association rule set
// C is the result concept set We now briefly outline the results of our clustering
for (i = 1, n) Ci = {ti }; //initiate clusters algorithm. Our dataset, which we will describe in
sort R first by the descending order of confidence, detail in Section 6, contains 431 web services and
then by the descending order of support value; 3148 inputs/outputs. There are a total of 1599
for each (r : t1 → t2 (s > ts , c > tc ) in R) terms. The clustering algorithm converges after the
if t1 and t2 are in different clusters I and J
seventh run. It clusters 943 terms into 182 concepts.
Compute I 0 and J 0 according to formula (1);
if (I 0 = I ∧ J 0 = J) merge I and J; The rest 656 terms, including 387 infrequent terms
else if (splitting and merging satisfies the (each occurs in at most 3 inputs/outputs) and 54
cohesion condition and has a higher scoreC ) frequent terms (each occurs in at least 30 of the
split and merge; inputs/outputs) are left unclustered. There are 59
if (I 00 = I − I 0 and/or J 00 = J − J 0 dense clusters, each with at least 5 terms. Some of
does not observe the cohesion condition) them correspond roughly to the concepts of address,
split I 00 and/or J 00 iteratively; contact, geology, maps, weather, finance, commerce,
scan inputs/outputs and remove noise terms; statistics, and baseball, etc. The overall cohesion is
return result clusters; 0.96, correlation is 0.003, and average cohesion for
Figure 3: Algorithm for parameter clustering the dense clusters is 0.76. This result observes the
two features of an ideal clustering.
||SIOt || as the number of inputs/outputs that con-
tain t but no other terms in the same concept of t. 5 Finding Similar Operations
We remove t from the concept if ||SIOt || ≥ 21 ||IOt ||.
In this section we describe how to predict similarity
of inputs/outputs sets and of web-service operations.
4.3.5 Putting it all together We will determine similarity by combining multiple
sources of evidence. The intuition behind our match-
Figure 3 puts all the pieces together, and shows the ing algorithm is that the similarity of a pair of in-
details of a single pass of the clustering algorithm. puts (or outputs) is related to the similarity of the
The above algorithm still has two problems. parameter names, that of the concepts represented
First, the cohesion condition is too strict for large by the parameter names, and that of the operations
clusters, so it may prevent closely associated large they belong to. Note that parameter name similar-
clusters to merge. Second, early inappropriate merg- ity compares inputs/outputs on a fine-grained level,
ing may prevent later appropriate merging. Al- and concept similarity compares inputs/outputs on
though we do splitting, the terms taken off from the a coarse-grained level. The similarity between two
original clusters may have already missed the chance web-service operations is related to the similarity of
to merge with other closely associated terms. We their descriptions, that of their inputs and outputs,
solve the problems by running the clustering algo- and that of their host web services.
rithm iteratively. After each pass, we replace each
Input/output similarity: We identify the in-
term with its corresponding concept, re-collect as-
put i of a web-service operation op with a vector
sociation rules, and then re-run the clustering algo-
i = (pi , ci , op), where pi is the set of input param-
rithm. This process continues when no more clusters
eter names, and ci is the set of concepts associated
can be merged.
with the parameter names (as determined by the
We illustrate with an example that the iteration clustering algorithm described in Section 4). While
of clustering does not sharply loosen the cluster- comparing a pair of inputs, we determine the sim-
ing condition. Consider the case where {zip} is not ilarity on each of the three components separately,
clustered with {temperature, windchill, humidity}, be- and then combine them. We treat op’s output o as
cause zip is closely associated with only temperature, a vector o = (po , co , op), and process it analogously.
but not the other two. Another iteration of cluster-
ing will replace each occurrence of temperature, wind- Web-service operation similarity: We identify
chill and humidity with a single concept, say weather. a web-service operation op with a vector op =
The term zip will be closely associated with weather; (w, f, i, o), where w is the text description of the
however, the term weather is not necessarily closely web service to which op belongs, f is the textual de-
associated with zip, because that requires zip to oc- scription of op, and i and o denote the input and
cur often when any of temperature, windchill, or hu- output parameters. Here too, we determine similar-
midity occurs. Thus, the iteration will (correctly) ity by combining the similarities of the individual
keep the two clusters. components of the vector.
378Observe that there is a recursive relationship be- adding the terms in their inputs and outputs to the
tween the similarity of inputs/outputs and the simi- bag of words.
larity of web-service operations. Intuitively, this re- Web service description similarity: To compute
lationship holds because each one depends on the the similarity of web service descriptions, we create
other, and any decision on how to break this recur- a bag of words from the following: the tokenized
sive relationship would be arbitrary. In Section 5.2 web service name, WSDL documentation and UDDI
we show that with sufficient care for the choice of description, the tokenized names of the operations in
the combination weights, we can guarantee that the the web service, and their input and output terms.
recursive computation converges. We again apply TF/IDF on the bag of words.
5.1 Computing Individual Similarities 5.2 Combining Individual Similarities
We now describe how we compute similarities for We use a linear combination to combine the similar-
each one of the components of the vectors. ity of each component of the operation. Each type
Input/output parameter name similarity: of similarity is assigned a weight that is dependent
We consider the terms in an input/output as a on its relevance to the overall similarity. Currently
bag of words and use the TF/IDF (Term Fre- we set the weights manually based on our analysis
quency/Inverse Document Frequency) measure [25] of the results from different trials. Learning these
to compute the similarity of two such bags. weights based on direct or indirect user feedback is
To improve our accuracy, we pre-process the a subject of future work.
terms as follows. As noted earlier, there is a recursive dependency
between the similarity of operations and that of in-
1. Perform word stemming and remove stopwords. puts/outputs. We prove that computing the recur-
Stemming improves recall by removing term sive similarities ultimately converges.
suffixes and reducing all forms of a term to a sin-
gle stemmed form. Stopword removal improves Proposition 1. Computing operation similarity
precision by eliminating words with little sub- and input/output similarity converges. ¤
stantive meaning.
2. Group terms with close edit distance [21] and Proof (Sketch): Let Sop , Si and So be the simi-
replace terms in a group with a normalized larity of operations, of inputs, and of outputs. Let
form. This step helps normalize misspelled and wi and wo be the weights for input similarity and
abbreviated terms. output similarity in computing operation similarity,
3. Remove from the output bag the terms that and wop be the weight for operation similarity in
refer to the inputs. For example, in the out- computing input/output similarity.
put parameter LocalTimeByZipCodeResult, the We start by assigning zero to the operation simi-
term By indicates that the following terms de- larity, and based upon it compute input/output sim-
scribe inputs; thus, terms Zip and Code can be ilarity and operation similarity iteratively. We can
removed. prove that if z = wop (win + wout ) < 1, the compu-
tation converges and the results are:
4. Extract additional information from names of
web-service operations. Most operations are (∞) (0) 1
Sop = Sop ·
named after the output (e.g., GetWeather), 1−z
(∞) (0) wop
(0)
and some include input information (e.g., Zip- Si = Si + Sop ·
1−z
CodeToCityState). We put such terms into the wop
corresponding input/output bag. So(∞) = So(0) + Sop
(0)
·
1−z
Input/output concept similarity: To compute (0) (0) (0)
where sop , si and so are the results of the first
the similarity of the concepts represented by the in- (∞) (∞) (∞)
puts/outputs, we replace each term in the bag of round, and sop , si and so are the converged
words described above with its corresponding con- results. ¤
cept, and then use the TF/IDF measure. Note
that the clustering algorithm is applied on the in- 6 Experimental Evaluation
put/output terms after preprocessing. We now describe a set of experiments that vali-
Operation description similarity: To compute date the performance of our matching algorithms.
the similarity of operation descriptions, we consider Our goal is to show that we produce high precision
the tokenized operation name and WSDL documen- and recall on similarity queries and to investigate
tation as a bag of words, and use the TF/IDF mea- the contribution of the different components of our
sure. Furthermore, we supplement information by method.
3796.1 Experimental Setup Func Comb Woogle
We implemented a web-service search engine, called 1
0.9
Woogle, that has access to 790 web services from 0.8
the main authoritative UDDI repositories. The cov- 0.7
erage of Woogle is comparable to that of the other 0.6
Precision
0.5
web-service search engines [1, 2, 3, 4]. We ran our 0.4
experiments on the subset of web services whose as- 0.3
sociated WSDL files are accessible from the web, so 0.2
we can extract information about their functionality 0.1
0
descriptions, inputs and outputs. This set contains Top 2 Top 5 Top 10
431 web services, and 1574 operations in total.
(a)
Woogle performs parameter clustering, operation
matching and input/output matching offline, and Top 2 Top 5
stores the results in a database. TF/IDF was im- 1
plemented using the publicly available Rainbow [6] 0.9
0.8
classification tool. 0.7
Our experiments compared our method, which 0.6
Precision
0.5
we refer to as Woogle, with a couple of naive 0.4
algorithms Func and Comb. The Func method 0.3
matches operations by comparing only the words in 0.2
0.1
the operation names and text documentation. The 0
Comb method considers the words mentioned in Similar Input Similar Output Compose with
Input
Compose with
Output
the web service names, descriptions and parameter
(b)
names as well; in contrast to Woogle, these words
are all put into a single bag of words.
Figure 4: Top-k precision for Woogle similarity search.
Performance Measure: We measured over-
all performance using recall(r), precision(p), R-
pose with the output of the given operation, and
precision(pr ) and Top-k precision (pk ). Consider
operations that compose with the input of the given
these measures for operation matching. Let Rel be
operation. We evaluated the precision of these re-
the set of relevant operations, Ret be the set of re-
turned lists, and report the average top-2, top-5 and
turned operations, Retrel be the set of returned rel-
top-10 precision.
evant operations, and Retrelk be the set of relevant
operations in the top k returned operations. We de- We selected a benchmark of 25 web-service opera-
fine tions for which we tried to obtain similar operations
|Retrel| |Retrel| from our entire collection. When selecting these,
p= , r= we ensured that they are from a variety of domains
|Ret| |Rel|
and that they have different input/output sizes and
|Retrelk | |Retrel|Rel| | description sizes. To ensure the top-10 precision is
pk = , pr = p|Rel| = meaningful, we selected only operations for which
k |Rel|
Woogle and Comb both returned more than 10
Among the above measures, pr is considered relevant operations. (Func may return less than 10
to most precisely capture the precision and rank- relevant operations because typically it obtains re-
ing quality of a system. We also plotted the re- sult sets of very small size.)
call/precision curve (R-P curve). In an R-P curve Figure 4(a) shows the results for top-k precision
figure, the X-axis represents recall, and the Y-axis on operation matching. The top-2, top-5, and top-
represents precision. An ideal search engine has a 10 precisions of Woogle are 98%, 83%, 68% respec-
horizontal curve with a high precision value; a bad tively, higher than those of the two naive methods by
search engine has a horizontal curve with a low pre- 10 to 30 percentage points. This demonstrates that
cision value. The R-P curve is considered by the IR considering different sources of evidence, and con-
community as the most informative graph showing sidering them separately, will increase the precision.
the effectiveness of a search engine. We also observe that Comb has a higher top-2 and
top-5 precision than Func, but its top-10 precision
6.2 Measuring Precision is lower. This demonstrates that considering more
Given a web service, Woogle generates five lists: sim- evidence by simple combination does not greatly en-
ilar operations, operations with similar inputs, op- hance performance.
erations with similar outputs, operations that com- Figure 4(b) shows the precision for the four other
3801
0.9 1
0.8 0.9
0.7 Func 0.8
Comb 0.7
0.6
FuncWS
Percentage
0.5 FuncIO 0.6 ParIO
Percentage
ConIO
ParOnly 0.5
0.4 ParConIO
ConOnly
0.4 Woogle
0.3 Woogle
0.3
0.2
0.2
0.1
0.1
0
Precision Recall R-precision 0
Precision Recall R-precision
(a)
(a)
1.2
1.2
1
1
Func
0.8
Comb
FuncWS 0.8
Precision
ParIO
0.6 FuncIO
Precision
ConIO
ParOnly 0.6
ParConIO
ConOnly
0.4 Woogle
Woogle
0.4
0.2
0.2
0
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Recall
Recall
(b)
(b)
Figure 5: Performance for different operation matchers. Figure 6: Performance of different input/output match-
ers
returned lists. Note that we only reported the top-2
and top-5 precision, as these lists are much smaller From that list we chose the set of similar operations
in size. From the 25-operation test set, we selected and labeled them as relevant. The rest are labeled
20 where both input and output parameters are not as irrelevant. In a similar fashion, we label relevant
empty, and the sizes of the returned lists are not too inputs and outputs.
short. Figure 4(b) shows that for the majority of the In this experiment we also wanted to test the con-
four lists, the top-2 and top-5 precisions are between tributions of the different components of Woogle.
80% and 90%. To do that, we also considered the following
stripped-down variations of Woogle:
6.3 Measuring Recall
• FuncWS: consider only operation descriptions
In order to measure recall of similarity search, we
and web service descriptions;
need to know the set of all operations that are rele-
vant to a given operation in the collection. For this • FuncIO: consider only operation descriptions,
purpose, we created a benchmark of 8 operations inputs and outputs;
from six different domains: weather(2), address(2), • ParOnly: consider all of the four components,
stock(1), sports(1), finance(1), and time(1) (weather but compare inputs/outputs based on only pa-
and address are two major domains in the web ser- rameter names;
vice corpus). We chose operations with different • ConOnly: consider all of the four components,
popularity: four of them have more than 30 similar but compare inputs/outputs based on only the
operations each, and the other four each have about concepts they express.
10 similar operations. Among the 8 operations, one
has empty input, so we have 15 inputs/outputs in Figure 5(a) plots the average precision, recall and
total. When choosing the operations, we ensured R-precision on the eight operations in the bench-
that their inputs/outputs convey different numbers mark for each of the above matchers and also for
of concepts, and the concepts involved vary in pop- Func, Comb, and Woogle. Figure 5(b) plots the
ularity. average R-P curves. We observe the following.
For each of the 8 operations, we hand-labeled First, Woogle generally beats all other match-
other operations in our collection as relevant or ir- ers. Its recall and R-precision are 88% and 78% re-
relevant. We began by inspecting a set of operations spectively, much higher than those of the two naive
that had similar web service descriptions, or similar methods. Second, considering evidences from dif-
operation descriptions, or similar inputs or outputs. ferent sources by simply putting them into a big
381bag of words (Comb) does not help much. This observe the following. Matching inputs/outputs by
strategy only beats Func, which considers evidence comparing the expressed concepts significantly im-
from a single source. Even FuncWS, which dis- proves the performance: the three concept-aware
cards all input and output information, has a bet- matchers obtain a recall 25 percentage points higher
ter performance than Comb. Third, FuncIO per- than that of ParIO. Based on concept compari-
forms better than FuncWS. It shows that in oper- son, the performance of input/output matching can
ation matching, the semantics of input and output be further improved by considering parameter name
provides stronger evidence than the web service de- similarity and host operation similarity.
scription. This observation agrees with the intuition
that operation similarity depends more on input and
output similarity. Fourth, Woogle performs bet-
7 Searching with Woogle
ter than ParOnly, and also slightly better than Similarity search supplements keyword search for
ConOnly. ParOnly has a higher precision, but web services. Besides, its core techniques power
a lower recall; ConOnly has a higher recall, but a other search methods in the Woogle search engine,
lower precision. By considering parameter match- namely, template search and composition search.
ing (fine-grained matching) and concept matching These two methods go beyond keyword-search by
(coarse-grained matching) together, Woogle ob- directly exploring the semantics of web-service op-
tains a recall as high as ConOnly, and a precision erations. Because of lack of space, we describe them
as high as ParOnly. only briefly.
An interesting observation is that Woogle beats
FuncIO in precision up till the point when the re- Template search: The user can specify the func-
call reaches 80%. Also, the recall of Woogle is 8 tionality, input and output of the desired web-service
percentage points lower than that of FuncIO. This operation, and Woogle returns a list of operations
is not surprising because verbose textual descrip- that fulfill the requirements. It is distinguished from
tions of web services have two-fold effects: on the the keyword search in that (1) it explores the under-
one hand, they provide additional evidence, which lying structure of operations; and (2) the parameters
helps significantly in the top returned operations, of the returned operations are relevant to the user’s
where the input and output already provide strong requirement, but do not necessarily contain the spe-
evidence; on the other hand, they contain noise that cific words that the user uses. For example, the user
dilutes the high-quality evidence, especially at the can ask for operations that take zipcode of an area
end of the returned list where real evidence is not and return its nine-day forecast by specifying input
very strong. as zipcode, output as forecast, and description as the
In our experiments, we also observe that com- weather in the next nine days. The inputs of the re-
pared with the benefits of our clustering technique turned operation can be named zip, zipcode, or post-
and that of the structure-aware matching, tuning the code. The outputs can be forecast, weather, or even
parameters in a reasonable range and pre-processing temperature, humidity at the end of the list of the
the input/output terms improve the performance returned operations.
only slightly. Template search is implemented by considering a
user-specified template as an operation and applying
6.3.1 Input/output matching the similarity search algorithm. A key challenge is
We performed an additional experiment focusing on to perform the operation matching efficiently on-the-
the performance of input/output matching. This ex- fly.
periment considered the following matchers: Composition search: Much of the promise of web
services is the ability to build complex services by
• Woogle: matches inputs/outputs by consider- composition. Composition search in Woogle returns
ing parameter names, their corresponding con- not only single operations, but also operation com-
cepts, and the operations they belong to. positions that achieve the desired functionality. The
• ParConIO: considers both parameter names composition can be of any length. For example,
and concepts, but not the operations. when an operation satisfying the above search re-
• ConIO: considers only concepts. quirement is not available, it will be valuable to re-
• ParIO: considers only parameter names. turn a composition of an operation with zipcode as
input and city and state as output, and an operation
Figure 6(a) shows the average recall, precision with city and state as input and nine-day forecast as
and R-precision on the fifteen inputs/outputs in the output.
benchmark for each of the above matchers. We also Based on the machinery that we have already
plotted the average R-P curves in Figure 6(b). We built for matching operation inputs and outputs, we
382can discover compositions automatically. The chal- [4] Web service list. http://www.webservicelist.com/.
lenge lies in avoiding redundancy and loop in the [5] Wordnet. http://www.cogsci.princeton.edu/ wn/.
composition. Another challenge is to discover the [6] rainbow. http://www.cs.cmu.edu/ mccallum/bow, 2003.
compositions efficiently on-the-fly. [7] R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and
A. Verkamo. Fast discovery of association rules. Ad-
vances in Knowledge Discovery and Data Mining, 1996.
8 Conclusions and Future Work
[8] J. Cardoso. Quality of Service and Semantic Composi-
As the use of web services grows, the problem of tion of Workflows. PhD thesis, University of Georgia,
searching for relevant services and operations will 2002.
get more acute. We proposed a set of similarity [9] D.-S. Coalition. Daml-s: Web service description for the
semantic web. In ISWC, 2002.
search primitives for web service operations, and de-
[10] S. Cost and S. Salzberg. A weighted nearest neighbor
scribed algorithms for effectively implementing these algorithm for learning with symbolic features. Machine
searches. Our algorithm exploits the structure of the Learning, 10:57–78, 1993.
web services and employ a novel clustering mecha- [11] S. C. Deerwester, S. T. Dumais, T. K. Landauer, G. W.
nism that groups parameter names into meaning- Furnas, and R. A. Harshman. Indexing by latent seman-
ful concepts. We implemented our algorithms in tic analysis. JASIS, 41(6):391–407, 1990.
Woogle, a web service search engine, and experi- [12] H.-H. Do and E. Rahm. COMA - A System for Flexible
mented on a set of over 1500 operations. The experi- Combination of Schema Matching Approaches. In Proc.
of VLDB, 2002.
mental results show that our techniques significantly
[13] A. Doan, P. Domingos, and A. Halevy. Reconciling
improve the precision and recall compared with two schemas of disparate data sources: a machine learning
naive methods, and perform well overall. approach. In Proc. of SIGMOD, 2001.
In future work, we plan to expand Woogle to in- [14] D. Hand, H. Mannila, and P. Smyth. Principles of Data
clude automatic web-service invocation; i.e., after Mining. The MIT Press, 2001.
finding the potential operations, Woogle should be [15] A. Hess and N. Kushmerick. Learning to attach semantic
able to fill in the input parameters and invoke the metadata to web services. In ISWC, 2003.
operations automatically for the user. This search [16] K. S. Jones. Automatic keyword classification for infor-
is particularly promising because it will, in the end, mation retrieval. Archon Books, 1971.
be able to answer questions such as “what is the [17] G. Karypis, E. H. Han, and V. Kumar. Chameleon: A
hierarchical clustering algorithm using dynamic model-
weather of an area with zipcode 98195.” ing. COMPUTER, 32, 1999.
While this paper focuses exclusively on searches
[18] L. Kaufman and P. J. Rousseeuw. Finding Groups in
for web services, the search strategy we have de- Data: An Introduction to Cluster Analysis. John Wiley
veloped applies to other important domains. As a & Sons, New York, 1990.
prime example, if we model web forms as web ser- [19] L. S. Larkey. Automatic essay grading using text classi-
vice operations, a deep-web search can be performed fication techniques. In Proc. of ACM SIGIR, 1998.
by first searching appropriate web forms with a de- [20] L. S. Larkey and W. Croft. Combining classifiers in text
sired functionality, and then automatically filling in categorization. In Proc. of ACM SIGIR, 1996.
the inputs and displaying the results. As another [21] V. Levenshtein. Binary codes capable of correcting dele-
tions, insertions and reversals. Soviet Physics Daklady,
example, applying template search and composition 10:707–710, 1966.
search to class libraries (considering each class as
[22] S. Melnik, H. Garcia-Molina, and E. Rahm. Similarity
a web service, and each of its methods as a web- Flooding: A Versatile Graph Matching Algorithm. In
service operation) would be a valuable tool for soft- Proc. of ICDE, 2002.
ware component reusing. [23] M. Paolucci, T. Kawmura, T. Payne, and K. Sycara. Se-
mantic matching of web services capabilities. In Proc. of
International Semantic Web Conference(ISWC), 2002.
Acknowledgments
[24] E. Rahm and P. A. Bernstein. A survey on approaches
We would like to thank Pedro Domingo, Oren Et- to automatic schema matching. VLDB Journal, 10(4),
2001.
zioni and Zack Ives for many helpful discussions, and
thank the reviewers of this paper for their insight- [25] G. Salton, editor. The SMART Retrieval System—
Experiments in Automatic Document Retrieval. Prentice
ful comments. This work was supported by NSF Hall Inc., Englewood Cliffs, NJ, 1971.
ITR grant IIS-0205635 and NSF CAREER grant [26] E. Sirin, J. Hendler, and B. Parsia. Semi-automatic com-
IIS-9985114. position ofweb services using semantic descriptions. In
WSMAI-2003, 2003.
References [27] Y. Yang and J. Pedersen. A comparative study on fea-
ture selection in text categorization. In International
[1] Binding point. http://www.bindingpoint.com/. Conference on Machine Learning, 1997.
[2] Grand central. http://www.grandcentral.com/directory/. [28] A. M. Zaremski and J. M. Wing. Specification matching
of software components. TOSEM, 6:333–369, 1997.
[3] Salcentral. http://www.salcentral.com/.
383You can also read