Baixe o app para aproveitar ainda mais
Prévia do material em texto
Local Probabilistic Models for Link Prediction∗ Chao Wang Venu Satuluri Srinivasan Parthasarathy Department of Computer Science and Engineering, The Ohio State University Abstract One of the core tasks in social network analysis is to predict the formation of links (i.e. various types of relation- ships) over time. Previous research has generally repre- sented the social network in the form of a graph and has leveraged topological and semantic measures of similar- ity between two nodes to evaluate the probability of link formation. Here we introduce a novel local probabilistic graphical model method that can scale to large graphs to estimate the joint co-occurrence probability of two nodes. Such a probability measure captures information that is not captured by either topological measures or measures of se- mantic similarity, which are the dominant measures used for link prediction. We demonstrate the effectiveness of the co-occurrence probability feature by using it both in isola- tion and in combination with other topological and seman- tic features for predicting co-authorship collaborations on three real datasets. 1 Introduction In recent times, there has been a lot of interest in under- standing and characterizing the properties of large scale net- works or graphs. Part of this interest is because of the gener- ality of the graph model: many domains, such as social net- works, gene regulatory networks and the World Wide Web can be naturally thought of as graphs. An important prob- lem in this context is that of link prediction. Informally link prediction is concerned with the problem of predicting the (future) existence of links among nodes in a graph. Link prediction is useful in many application domains, ranging from recommender systems to the detection of unseen links in terrorism networks, from protein interaction networks to the prediction of collaborations among scientists, and from prediction of friendship formations to the prediction of web hyperlinks. In this article we focus on the problem of link predic- tion particularly in the context of evolving co-authorship ∗This work is supported in part by the following research grants: DOE Award No. DE-FG02-04ER25611; NSF CAREER Grant IIS-0347662; Contact email - srini@cse.ohio-state.edu networks. This has been a hotbed of recent research activ- ity where much of the focus has been on encapsulating the topological and/or semantic information embedded in such networks to address the link prediction problem. In contrast in this article we explore the realm of probabilistic models derived from frequency statistics and use the resulting pre- dictions from the probabilistic models as additional features to further enhance predictions made by topological-based and semantic-based link prediction algorithms. Specifically our probabilistic model is driven by two as- pects. First, given the candidate link (say between nodes X and Y ) whose probability is to be estimated, we identify the central neighborhood set (say W,X, Y, Z), which are the nodes that are deemed germane to the estimation pro- cedure. The identification of the central neighborhood set is governed by the local topology of the social network as viewed from the perspective of the two nodes whose link probability is to be estimated. Second, once the central neighborhood set(W,X, Y, Z) is identified we learn a maximum entropy Markov ran- dom field model that estimates the joint probability of the nodes comprising the central neighborhood set, i.e., p(W,X, Y, Z). In this context one can leverage the fact that most co-authorship networks are computed from an event log (an event corresponding to a publication). Multi-way statistics (e.g. non-derivable frequent itemsets [4] whose elements are drawn from (W,X, Y, Z)) on these event logs can be used to constrain and learn the model parameters efficiently [19]. The resulting model can then be used to estimate the link probability between X and Y which we henceforth denote as the co-occurrence probability. In our empirical results we demonstrate that the co- occurrence probabilities inferred from the resulting model can be computed in a scalable manner and is highly dis- criminatory for link prediction when compared with state- of-the-art topological and semantic features on several real world datasets. Moreover, we demonstrate that the result- ing co-occurrence probability can also be effectively com- bined with these other features and then one can employ any classification algorithm to predict if a link will be formed between two nodes. Specifically, we employ a simple yet novel variant of the Katz score as a topological feature, one Seventh IEEE International Conference on Data Mining 1550-4786/07 $25.00 © 2007 IEEE DOI 10.1109/ICDM.2007.108 322 Seventh IEEE International Conference on Data Mining 1550-4786/07 $25.00 © 2007 IEEE DOI 10.1109/ICDM.2007.108 322 Seventh IEEE International Conference on Data Mining 1550-4786/07 $25.00 © 2007 IEEE DOI 10.1109/ICDM.2007.108 322 Seventh IEEE International Conference on Data Mining 1550-4786/07 $25.00 © 2007 IEEE DOI 10.1109/ICDM.2007.108 322 Seventh IEEE International Conference on Data Mining 1550-4786/07 $25.00 © 2007 IEEE DOI 10.1109/ICDM.2007.108 322 lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado that scales reasonably well at some cost to accuracy. Ad- ditionally we describe and use straightforward state-of-the- art methods to measure the semantic overlap among nodes based on the topics they work on, to further enhance the feature vector and improve overall link prediction perfor- mance. 2 Related Work The seminal work of Liben-Nowell and Kleinberg [12] was the first comprehensive study on the utility of topolog- ical features derived from graphs for predicting links in so- cial networks. They examine various topological features, including graph shortest distance, common neighbors, pref- erential attachment, Adamic-Adar, Jaccard, SimRank, hit- ting time, rooted PageRank, and Katz. They find that topo- logical information is quite useful when compared to a ran- dom predictor. In particular, Adamic-Adar and the Katz measure appear to be more effective than the other topolog- ical features. Recently, Huang [7] proposes to use another topological feature – generalized clustering coefficient – to solve the link prediction problem. An important limitation of these works is that they only use a single (topological) feature for the link prediction task. Intuitively, it seems that one can achieve better performance by utilizing the other sources of information, such as the content or semantic attributes of the nodes. A natural way to do this would be to use the multiple sources of information as features to be fed into a classifier that is trained to dis- criminate between positive instances (i.e. links that form) and negative instances (links that do not form) by making use of all the features. This is the approach adopted by Hasan et al. [6] and O’Madadhain et al. [14]. Hasan et al. [6] have used topological features (such as the short- est distance between the two nodes), aggregated features (such as the sum of neighbors) and semantic features (such as the number of matching keywords). They report keyword match count to be their most useful feature on one dataset, which indicates that a lot is to be gained by taking into con- sideration the semantic similarity in the publications of the two authors. O’Madadhain et al. [14] also have investi- gated the use of content-based attributes such as the KL- divergence of the topic distributions of the two nodes, their geographic proximity, and similarity of journal publicationpatterns. The work of Popescul et al. [16] is another interesting approach to integrating different kinds of information. They represent the data in a relational format, generate candidates for features through database join queries, select features using statistical model selection criteria and use Logistic Regression using the selected features for classification. A potential problem with this approach is that the features so generated are simple aggregation functions of the column values in the result set of the join queries (such as count and average). A more complex feature such as cosine similar- ity between bag-of-words representations cannot be easily expressed using simple SQL aggregation functions. Researchers have also examined the use of probabilistic models for solving the link prediction problem. Taskar et al. [18] use discriminatively trained relational Markov net- works to define a joint probabilistic model over the entire graph (i.e. over the links as well as the content attributes of the nodes). The trained model is used to collectively clas- sify the test data. Kashima and Abe [10] propose a parame- terized probabilist model of network evolution and then use it for link prediction. They assume the network structure is in a stationary state and propose an EM algorithm to es- timate model parameters. They report encouraging results on two small biological datasets. However, both collective classification and training global probabilistic models can be expensive to compute and typically do not scale well to medium and large scale networks. In related work, Rattigan and Jensen [17] argue that the link prediction problem is too hard to solve because of the extreme class skew problem. Social networks are usually very sparse and positive links only hold a very small amount of all possible pairs of nodes. As an alternative, they pro- pose a simpler problem – anomalous link discovery. Specif- ically, they constrain their focus on the links that have been formed and infer their anomaly (surprisingness) scores from the previous data. 3 Methods We consider three sources of information of the network data for link prediction. We have a large number of lo- cal events that are accumulated along time, where by lo- cal event we mean an interaction among a set of objects in the network. For example, the publication of a paper would represent a local event involving all the authors of the pa- per. We refer to the collection of such local events as the event log. This is the raw format of the network data and provides the first source of information for link prediction. Such an event log is typically converted to a graph repre- sentation, in which nodes represent objects in the network and two nodes are connected to each other if they co-occur in at least one local event. This graph provides the second source of information for link prediction. Finally, we have access to other attributes of an object in the network, such as the research areas of the authors in an author collaboration network, usually referred to as content or semantic informa- tion. This semantic information provides the third source of information for link prediction. It is difficult to capture all information from different sources with one single feature. For this reason, we ex- amine three types of features – co-occurrence probability 323323323323323 lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado features, topological features and semantic features - com- ing from the first, second and third source, respectively. In the text below, we discuss how we derive these features in turn. 3.1 Deriving Co-occurrence Probability Features For a pair of nodes that have never co-occurred in the event log, our aim is to estimate the chances of their co- occurring in the future, i.e. of a link forming in the fu- ture between those two nodes. In order to estimate the co- occurrence probability of the given two nodes in a princi- pled manner, we use probabilistic graphical models. Specif- ically, we employ undirected graphical models, also called Markov Random Fields (MRFs), to model the local neigh- borhood containing the two nodes. We stress the fact that we build a local model, rather than a global model, as build- ing global models can become prohibitively expensive for large scale networks. There are two main stages in our approach to use graph- ical models in this context - (a) determining the nodes that will be included in the local model, and (b) using fre- quent non-derivable itemsets to determine the structure of the graphical model as well as learn the parameters of the model. Once the model is learned, we use exact inference techniques to determine the co-occurrence probability of the pair of nodes under consideration. We need not resort to ap- proximate inference, as we build a local model, leading to a low treewidth (maximum clique size in the graph formed by triangulating the model minus 1) for the model. 3.1.1 Determining the Central Neighborhood Set of Two Nodes For a given pair of nodes, we retrieve a small set of nodes that we believe to be most relevant to estimating the co- occurrence probability of the given pair of nodes. We re- fer to this set of nodes as the central neighborhood set of the two nodes. At one extreme, we can include any node that lies on any path between the two nodes as belonging to the central neighborhood set. However, this would lead to a large probabilistic model, over which learning can be expensive. For this reason, we set a parameter size that specifies the number of nodes to be present in the central neighborhood set. We need a mechanism of selecting the nodes to include in the central neighborhood set. Intuitively, the nodes that lie along paths of shorter length are more relevant. Hence, we propose a method of enumerating simple paths length- wise, i.e. we first collect all nodes that lie on length-2 sim- ple paths, and then those on length-3 simple paths and so on. (A simple path is a path without cycles.) The algorithm for enumerating all simple paths of a given length is presented in Figure 1. However, this order of enumerating nodes may not be enough as there may be many nodes that lie on paths of a given length. Hence, we need a way of ordering paths of the same length. For this purpose, we define the fre- quency score of a path as the sum of the occurrence counts of all nodes along the paths. Now, among paths of the same length, we enumerate paths with higher frequency scores before paths with lower frequency scores. The pseudo-code of the full algorithm for selecting the central neighborhood set of a pair of nodes is presented in Figure 2. Enumerate Simple Paths(G, s, t, K) Input : G, a graph; s, starting node; t, ending node; K, path length Output : P, a set of simple paths of length K {∗ Find distance-(K − 1) neighbors of s without visiting t and bookkeeping all path information ∗} N = Breadth− first− search(G, s, K − 1, t); foreach (e ∈ N) {∗ If e and t are connected in G ∗} if (e.Connect(t, G)) add path(s → e → t) to P ; return P Figure 1. Enumerating all simple paths of length K between two nodes Select Central Node Set(G, s, t, maxSize) Input : G, a graph; s, staring node; t, ending node; maxSize, central neighborhood set size threshold; Output : C, central neighborhood set between s and t; C ← ∅; i ← 2; while i < PATH LENGTH THRESHOLD Pi ← Enumerate Simple Paths(G, s, t, i); Sort all paths ∈ Pi by length and frequency score; foreach path p ∈ Pi if |C| < maxSize add all nodes along p to C; i ← i + 1; return (C) Figure 2. Selecting the central neighborhood set for two nodes We also use a path length threshold in the algorithm, be- cause in practice, we cannot afford to enumerate all simple paths betweentwo nodes in a large graph. In our study, we consider paths up to length 4 because we found that this 324324324324324 lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado threshold works well in capturing the contextual informa- tion between two nodes. In situations where there does not exist such paths between two nodes, we define the central neighborhood set to be the two nodes themselves. Interest- ingly, we note that in this case, the local probabilistic model reduces to a simple independence model. 3.1.2 Learning Local Markov Random Fields For a given pair of nodes and their corresponding central neighborhood set, how should one learn a local probabilistic model for it? We adopt an approach of using non-derivable frequent itemsets from the underlying network log events data to learn local probabilistic models. The event log is essentially a transactional dataset and we apply widely-used frequent itemset mining techniques [2, 20, 5] on it to collect occurrence statistics of network objects. These statistics can be leveraged afterward to learn local probabilistic models on the central neighborhood set. The basic idea of using a set of itemsets to learn a model is as follow: Each itemset and its occurrence statistic can be viewed as a constraint on the underlying unknown distribution. A model that satisfies all present occurrence constraints and in the meanwhile has the maximum entropy (“as uniform as possible”) is used as the estimate of the underlying unknown distribution. One can verify that this maximum entropy distribution specifies a Markov Random Field. More information about this can be found in [15, 19]. In our study we pre-compute all frequent itemsets from the underlying log events. Social networks are usually very sparse – the proportion of formed links is very low as op- posed to the number of all possible pairs of nodes. For this reason we use a support threshold of one to collect frequent itemset patterns. As a result, all positive occurrence evi- dence will be captured. We note however, at this support threshold level, frequent itemsets are too many to use. For- tunately, we only need to mine and use non-derivable item- sets for model learning. Simply speaking, non-derivable itemsets are those itemsets whose occurrence statistics can not be inferred from other itemset patterns. As such, non- derivable itemsets provide non-redundant constraints and we can employ them to learn probabilistic models without any information loss. Calders et al. [4] propose an efficient depth-first search method to mine non-derivable itemsets. We use their implementation to mine non-derivable item- sets. To predict if two nodes will be linked, we first identify from the network the central neighborhood set of two in- volved nodes. Then we select all itemsets that lie entirely within this set and use them as evidence to learn an MRF. The learned MRF is local in that it specifies a joint distribu- tion over only those nodes in this set. Then we estimate the joint co-occurrence probability feature of the link through Co-occurrence Probability Feature Induction(C, s, t, NDI) Input : C, central neighborhood set between s and t; s, starting node; t, ending node; NDI, collection of non derivable itemsets Output : f, co-occurrence probability feature of s and t {∗ Retrieve ndi patterns relevant to C ∗} R ← ∅ foreach (ndi ∈ NDI) if (ndi ∈ C) add ndi to R; {∗ Learn an MRF model on C using R∗} M = learn MRF (C, R); {∗ Infer co− occurrence prob. of s and t from M∗} f = Inference(M, s, t); return (f) Learn MRF(C, R) Input : C, common neighborhood; R, collection of itemsets; Output : MRF M {∗ Obtain all variables in C and initialize M∗} Initialize Parameters(M); while (Not all constraints are satisfied) foreach constraint e ∈ R Update M to force it to satisfy e; return (M); Figure 3. Inducing co-occurrence probability feature for a pair of nodes inference over the local model. The formal algorithm is presented in Figure 3. We illustrate this whole process with a simple example presented in Figure 4. Assume we want to predict the link between nodes a and b and there are two paths connecting them in the graph: p1 = a → c → b and p2 = a → d → e → b. Also assume that we use both p1 and p2 to identify the central neighborhood set between a and b. As a result, the central neighborhood set C is given by: C = {a, b, c, d, e}. Next we retrieve all non-derivable itemsets that lie entirely within this set. Let us assume that the itemsets retrieved are: {a, b, c, d, e, ac, ad, bc, be, de}. Their occurrence statistics are collected from the log events and are presented in the figure. We employ all of these pat- terns and their occurrence statistics to learn a local proba- bilistic modelM overC: M = P (a, b, c, d, e). M specifies a joint distribution on all variables inC and its clique poten- tial functions are listed as follows (μ’s are model parameters to be learned and I() is indicator function). The shaded area 325325325325325 lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy lizzy lizzy lizzy lizzy a b e c d NDI = { a: 0.04, b: 0.03, c: 0.05, d: 0.1, e: 0.02 ac: 0.02, ad: 0.03, bc: 0.01, be: 0.006, de: 0.01 } Figure 4. An example of local model-based co-occurrence probability feature induction in Figure 4 shows a clique of M –{a,c}. ψa,c = μ I(a=1) 1 · μI(c=1)2 · μI(a=c=1)3 ψa,d = μ I(d=1) 4 · μI(a=d=1)5 ψb,c = μ I(b=1) 6 · μI(b=c=1)7 ψb,e = μ I(e=1) 8 · μI(b=e=1)9 ψd,e = μ I(d=e=1) 10 Then we derive the co-occurrence probability of a and b by computing the marginal probability of p(a = 1, b = 1) on M . We use an iterative scaling algorithm [9] Learn MRF() to learn a local MRF for the central neighborhood set. The idea is to iterate over all itemset constraints and repeatedly update the model to force it to satisfy the current itemset constraint, until the model converges. After the model is constructed, we do inference over it to estimate the joint co-occurrence probability of s and t. For the Inference() procedure in the algorithm, we can plug in exact inference algorithms for it since our model is local. In our study, we use the Junction Tree inference algorithm [11]. 3.2 Deriving Topological Features The Katz measure is a weighted sum of the number of paths in the graph that connect two nodes, with shorter paths being given the more weight. This leads to the following measure: Katz(s, t) = Σ∞i=1β ipi Here pi is the number of paths of length i connecting nodes s and t, while β is a damping factor. It has been shown that Katz is among the most effective topological measures for the link prediction task [12]. It outperforms shortest distance, hitting time and many others. It can be verified that the matrix of Katz scores can be computed by (I − βM)−1 − I , where M is the adjacency matrix of the graph [12]. However, this method does not scale well to handle the network data under the consideration since computing ma- trix inverse for large graphs is very expensive. As such, we come up with a way to approximately compute the Katz score. Specifically, we only consider paths of length up to certain thresholdto compute the Katz score. The new mea- sure is as follows: aKatz(s, t) = Σki=1β ipi where pi and β have the same meaning as above, while k is a new input parameter specifying the maximum path length we consider. Since the score terms damp exponen- tially with the longer length, this new measure captures the most significant portion of the exact Katz score. We find that k of 4 can give a good approximation of the Katz scores in practice. We design graph algorithms to evaluate this new measure. To this end, we follow the similar process of iden- tifying the central neighborhood set for two nodes shown above, enumerate all simple paths up to length k from s to t and use the above formula to compute an approximate Katz score. We execute breadth-first-search from s up to k levels without visiting t, while keeping track of all paths formed so far. We will denote this approximate Katz measure as aKatz throughout the rest of the paper. 3.3 Deriving Semantic Features The degree of semantic similarity among entities is something that can be useful to predict links that might not be captured by either topological or frequency-based fea- tures. For example, in the context of co-authorship net- works, we use the following method to compute the seman- tic similarity for two authors: 1. Collect the words in the titles of each author (removing stop words), so that an author is represented as a set of words (akin to a text document). 2. Derive a bag of words representation for each author, weighting each word by its TFIDF (Term Frequency - Inverse Document Frequency) measure. 3. Compute the cosine between the TFIDF feature vec- tors of the two authors whose semantic similarity we need to determine. Previously, Hasan et al. [6] have used keyword match count between two authors as a feature and have reported 326326326326326 lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado the feature to be the most useful feature. Our method for computing semantic similarity makes use of the well-known techniques such as TFIDF feature vector representation and the cosine measure to compute similarity - the former can weight words automatically and the latter is a widely-used and effective measure for computing similarity between text documents represented in the vector space model. 3.4 Combining Different Features Using Supervised Learning Framework Since we have three types of features - the co-occurrence probability feature, the topological similarity feature and the semantic similarity feature - we need an effective way to combine these features. For this we resort to supervised learning. In order to do this, we need to come up with a way to partition the original dataset into training and testing datasets. A supervised learning approach to the link predic- tion problem has been taken previously by Hasan et al. [6] and Madadhain et al. [14], and the two works have taken different approaches to partitioning the dataset into training and testing sets. We find the approach taken by Madadhain et al. [14] to be cleaner and follow the same, which we de- scribe below. An illustration of our approach can be found in Figure 5. We form a labeled training dataset as follows: we take all the links that are formed in the 9th year (T9 in Figure 5) and label them as positive training instances. Of the links that are not formed in the first 9 years, we randomly sam- ple a subset and label them as negative training instances. We sample 10 times as many negative instances as positive instances. The features for each of these instances are con- structed from the first 8 years of data. A classifier is then trained on the labeled training set - any off-the-shelf classi- fier can be used, we chose to use Logistic Regression1, since it is computationally efficient, and produces well-calibrated class probabilities that can be used to rank predictions. The testing dataset is formed in a similar fashion: the links that are formed in the 10th year (T10 in Figure 5) are treated as testing instances that need to be predicted as positive, and we include a sample of the links that are not formed in the whole of the dataset as testing instances whose ground truth labeling is negative. The features that are used by the classifier trained previously are formed from the first 9 years of data. 4 Evaluation In this section, we report the experimental results of our proposed approach. 1We used the Logistic Regression implementation in the popular WEKA suite of Data Mining algorithms Figure 5. Split of the datasets into Training and Testing data. T1, T2, .. , T10 represent the 10 time intervals the dataset spans. Dataset No. of authors No. of papers No. of edges DBLP 23136 18613 56829 Genetics 41846 12074 164690 Biochemistry 50119 16072 191250 Table 1. Summary of the three datasets that were constructed 4.1 Datasets We evaluated performance on three real datasets in all, described below. The details of the datasets are summarized in Table 1. • The DBLP dataset was generated using the DBLP col- lection of Computer Science articles2. This dataset contains the publication details of the proceedings of 28 conferences related to Data Mining, Databases and Machine Learning from the years 1997 to 2006. • The Genetics dataset contains articles published from 1996 to 2005 in 14 journals related to genetics and molecular biology. • The Biochemistry dataset contains articles published from 1996 to 2005 in 5 journals related to biochem- istry. The Genetics and the Biochemistry datasets were gener- ated from the PubMed database.3 4.2 Class Conditional Distributions of the Different Features First we examine the distribution of the features among the positive and negative examples. Figure 6a-c plot the distribution for three features on the Genetics dataset. The 2DBLP is located online at http://dblp.uni-trier.de/ 3The PubMed database can be accessed online at http://www.ncbi.nlm.nih.gov/entrez/ 327327327327327 lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy Resaltado lizzy lizzy lizzy lizzy results on the other two datasets are similar and are not plot- ted here due to space constraints. One can see that both the co-occurrence probability and the aKatz measure can discriminate among the negative and positive instances. The main difficulty in the link prediction task occurs because the number of links that do not form far outweighs the number of links that do form. The ratio of negative instances to positive instances in our workload is 10 : 1. 4.3 The additional information captured by Co-occurrence Probability Feature We believe that the co-occurrence probability feature captures information a large chunk of which is not captured by either topological metrics such as aKatz or content-based metrics such as semantic similarity. To test this conjecture, we examined the number of correct predictions that were made by the co-occurrence probability feature that were not made by either aKatz or semantic similarity. The results are shown in Table 2. As can be observed, in all three of the datasets there exists a significant percentage - up to 75% on the Genetics dataset - of correct predictions in the top 500 that are captured only by the co-occurrence probabil- ity feature and not by the other features. This confirms our hypothesis that the co-occurrence probability feature uses information about the domain that is not captured by other types of features. 4.4 Results on Link Prediction as Classi- fication We report the classification resultswhen we vary the fea- tures used for classification. First, we report the results when the three features – co-occurrence probability, aKatz and semantic similarity – are used in isolation. Then we examine the results when we use all three features. Unless otherwise specified, we use the length threshold of 4 for the aKatz feature and 6 for central neighborhood set size of the local probabilistic model-based co-occurrence probabil- ity feature. 4.4.1 Baseline Approaches For the sake of comparison, the results on two baseline ap- proaches – the Adamic-Adar measure [1] and Preferential Attachment measure [13] are also presented. The Adamic-Adar measure was originally meant for computing the similarity of two homepages, but has been adapted for computing the similarity between two nodes in a graph by [12]. Let Γ(x) be the set of all neighbors of node x. Then the similarity between two nodes x, y is given by score(x, y) = ∑ z∈Γ(x)∩Γ(y) 1 log |Γ(z)| The intuition behind the score is that instead of simply counting the number of neighbors shared by two nodes, we should weight the hub nodes less and rarer nodes more. Preferential Attachment is a measure based on a genera- tive model for graphs that has been well received [3]. Based on the generative model that specifies that new nodes are more likely to form edges with nodes that have a large num- ber of neighbors, Newman [13] has proposed the score for two nodes x, y as score(x, y) = |Γ(x)| . |Γ(y)|. 4.4.2 Evaluation Metrics Previous literature has mainly used precision of top-K pre- dictions (K is a user specified parameter - usually the num- ber of true links formed in the testing period) as a metric for evaluation. While this metric has its merits, it has some problems too. Some link prediction methods have relatively high precisions for their top-K predictions when K is small, because they are good at predicting the “easy” links, but the precision drops off dramatically as one increases K. It seems desirable to have an additional evaluation metric that can measure the precision of the classifier without reference to any arbitrary cut-off point. For this reason, we also use AUC (Area Under the ROC Curve) to compare different classifiers, which is a metric that does not need the spec- ification of arbitrary cut-off points and is widely used to evaluate rankings output by a classifier. Huang [8] has pre- viously used this as an evaluation metric in the context of link prediction. An AUC score of 1.0 represents a perfect classifier, and a score of 0.5 is a random classifier. Visu- ally, the closer the ROC curve is to the top left corner of the graph, the better the classifier. 4.4.3 Discussion Table 3 presents the AUC scores and precision of top-K pre- dictions for different features on different datasets. Follow- ing [12], the K for the precision metric has been chosen to be the number of true links in the testing period. We also plot the ROC Curves for the different features and the ensemble method considering all the three features on the three datasets in Figures 7. The main point to note is that the co-occurrence proba- bility feature consistently outperforms all other features on AUC scores and performs comparably or better on preci- sion. We discuss the results for each dataset in detail below. One can see that on the DBLP dataset, the co-occurrence probability feature yields the best AUC score (0.8229) when the three features are used in isolation. The aKatz feature 328328328328328 lizzy lizzy lizzy lizzy lizzy lizzy lizzy lizzy lizzy lizzy lizzy lizzy −18 −17 −16 −15 −14 −13 −12 −11 −10 −9 100 101 102 103 104 105 Feature value N um be r o f i ns ta nc es Positive class distribution Negative class distribution 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 100 101 102 103 104 105 Feature value N um be r o f i ns ta nc es Positive class distribution Negative class distribution 0 0.05 0.1 0.15 0.2 0.25 100 101 102 103 104 105 Feature value N um be r o f i ns ta nc es Positive class distribution Negative class distribution (a) (b) (c) Figure 6. Class conditional distributions for different features on the Genetics dataset: (a)Co- occurrence Probability (b) aKatz (c) Semantic Similarity # correct predictions in c500 # correct predictions in c500 − k500 − s500 (percentage) DBLP 323 125 (38.7) Genetics 425 321 (75.5) Biochemistry 474 246 (51.9) Table 2. The number of predictions made by the Co-occurrence probability feature alone that were not made by either aKatz or semantic similarity. c500, k500 and s500 refer respectively to the sets to top 500 predicted links using co-occurrence probability, the set of top 500 predicted links using aKatz and the set of top 500 predicted links using semantic similarity. Feature DBLP Genetics Biochemistry AUC Precision (top-K, K=1500) AUC Precision (top-K, K=2912) AUC Precision (top-K, K=3991) co-occur. prob. (c) 0.8229 45.80 0.7904 46.77 0.8331 52.59 aKatz(k) 0.7501 54.67 0.5888 32.21 0.7644 54.95 semantic(s) 0.7148 35.93 0.5738 16.79 0.6732 20.90 k+c 0.8665 56.06 0.7904 46.77 0.8526 56.38 c+k+s 0.8722 57.66 0.7886 47.08 0.8528 56.32 adamic-adar 0.6148 31.26 0.4864 15.79 0.5384 18.41 preferential attachment (p) 0.7482 36.67 0.7194 35.03 0.8200 51.12 p+k+s 0.8387 52.53 0.7332 37.36 0.8359 54.92 Table 3. Link prediction classification results. The K for calculating precision for each dataset is the number of positive instances (i.e. true links) in the test dataset. Neighborhood Size Distance 2 Distance 3 Distance 4 AUC Precision(for top 173) AUC Precision(for top 247) AUC Precision (for top 382) 2 0.8181 68.78 0.9361 81.78 0.8877 60.20 4 0.9932 98.26 0.9855 93.11 0.8854 59.94 6 0.9373 89.01 0.9943 90.28 0.9806 84.03 8 0.9621 92.48 0.9942 89.06 0.9819 84.29 Table 4. Classification results when varying central neighborhood set size on the Genetics dataset 329329329329329 yields the second best AUC score 0.7501 and it has the best precision for K=1500 (however the precision for this feature drops unexpectedly at some point after K=1500, leading to the lower AUC score). The semantic feature is inferior to the previous two features, yielding 0.7148 AUC score and 35.93% precision. We note that all three features outper- form the Adamic-Adar measure significantly in terms of both the AUC score and precision. Preferential Attachment outperforms the semantic feature but is worse than the other two features. Furthermore, when we combine the aKatz and the co-occurrence probability features, we improve the AUC score to 0.8665 as well as the precision to 56.06%. We get the best results when we use all three features together – 0.8722 AUC score and 57.66% precision. The results are even better on the Genetics dataset, where again the co-occurrence probability feature performs signif- icantly better than the other features. This feature alone can give 0.7904 AUC score and 46.77% precision. The other two features do not perform very well on this dataset when used in isolation, with AUCs dropping to around 0.58 and precisions at 32.21% and 16.79%. When we combine all the three features together, there is not much improvement in the AUC over the co-occurrence probability alone, but there is a slight improvement in the precision to 47.08%. This is because the co-occurrence probability feature has been able to predict a majority of the links that were cor- rectly predicted by the other two features, and predict addi- tional links, leading to not much improvement when using the three features together. Among the baseline methods, Preferential Attachment performs better than aKatz, seman- tic as well as Adamic-Adar, giving an AUC of 0.71 and a precision of 35.03%. Coming to the Biochemistry dataset, we again observe the same trend of co-occurrence probability being the most useful feature with an AUC of 0.83 followed by Preferen- tial Attachment with an AUC of 0.82. Precision-wise, aKatz has a slight edge over co-occurrence probabilityand Pref- erential Attachment, with aKatz slightly better at 54.9% whereas the latter two have scores of 52.6% and 51.12%. Combining aKatz and co-occurrence probability improves the AUC and the precision to 0.8526 and 56.4%, with ad- ditionally combining the semantic similarity giving essen- tially no improvements. The reason the performance of Preferential Attachment is better on this dataset than Ge- netics is that the latter is a sparser dataset (it was prepared from 14 journals), which meant that it gave high scores to pairs of prolific authors even though they happened to be in different sub-fields, whereas that would happen less on the Biochemistry dataset. 4.5 Results on Varying Central Neighbor- hood Size In this section we report the results on varying central neighborhood size for local probabilistic models. We use the Genetics dataset as an example for this set of exper- iments. The results on other two datasets are consistent. To better validate the use of contextual information for co- occurrence probability estimation, we divide the positive examples (true links) into different classes by their short- est distance. We first examine the case where true links are formed between nodes with shortest distance 2, followed by shortest distance 3 and so on. For each class of links, we generate correspondingly negative examples using the same ratio (1 to 10). We train a separate classifier for each class. We examine the true links up to distance 4. Specifically on this dataset, we have 173 true links within testing period that are of distance 2, 247 true links of dis- tance 3 and 382 true links of distance 4. Table 4 presents the classification results when we use co-occurrence proba- bility feature alone. We vary the size threshold of the cen- tral neighborhood set. The larger threshold is, we tend to use more contextual information when estimating the joint probability for a link. Note that threshold 2 is essentially the independence model. From the results, one can see that overall the local probabilistic model-based approach outper- forms the independence model by taking into consideration contextual information of pairs of nodes. For the links of distance 2, the AUC score can be improved from 0.8181 to 0.9621 and we can identify up to 51more true links. For the links of distance 3, the AUC score can be improved from 0.9361 to 0.9943 and we can identify up to 28 more true links. Finally for the links of distance 4, the AUC score can be improved from 0.8877 to 0.9819 and we can identify up to 92 more true links. We see that the contextual informa- tion does indeed improve predictions. 4.6 Results on Timing Performance Now we report the timing performance on co-occurrence probability feature induction when we vary the size of the central neighborhood set. We use the DBLP dataset as an example for this set of experiments. The results on the other two datasets are similar. Table 5 presents the average time used to induce the co-occurrence probability feature for one link. As one can see, when we increase the size of the cen- tral neighborhood set, it takes more time to compute co- occurrence probability feature. This is expected since the cost of learning a local model is higher as we increase the size of the central neighborhood set. Overall, one can see that inducing the co-occurrence probability feature is com- putationally efficient. 330330330330330 0 5000 10000 15000 0 500 1000 1500 False positives Tr ue p os iti ve s Preferential Attachment Adamic−Adar Approximate Katz(k) Semantic(s) Co−occurence probability(c) c+k+s 0 0.5 1 1.5 2 2.5 3 x 104 0 500 1000 1500 2000 2500 3000 False positives Tr ue p os iti ve s Preferential Attachment Adamic−Adar Approximate Katz(k) Semantic(s) Co−occurence probability(c) c+k+s 0 0.5 1 1.5 2 2.5 3 3.5 4 x 104 0 500 1000 1500 2000 2500 3000 3500 4000 False positives Tr ue p os iti ve s Preferential Attachment Adamic−Adar Approximate Katz(k) Semantic(s) Co−occurence probability(c) c+k+s (a) (b) (c) Figure 7. ROC Curve Comparison on the (a)DBLP (b)Biochemistry (c)Genetics dataset Size Threshold Time(ms) 2 4.2 4 6.3 6 8.6 8 10.6 Table 5. Timing results on co-occurrence probability feature induction 5 Conclusions In this paper, we have presented a simple yet effective approach of leveraging local probabilistic models for link prediction. Specifically, we use topological structure of the network to identify the central neighborhood set of two nodes, and then learn a local MRF model constrained on non-derivable frequent itemsets from this local neighbor- hood. We then infer the co-occurrence (link) probability from the resulting model and feed it as a feature into a su- pervised learning algorithm. We have shown that this co- occurrence feature is quite effective for link prediction on real data. When used in combination with other two types of features – topological and semantic features, we find that the resulting classification performance improves. As fu- ture work, we would like to examine and test on additional datasets from other domains. Also, we would like to in- vestigate the use of temporal evolution information of these features in link prediction. References [1] L. A. Adamic and E. Adar. Friends and neighbors on the web. Social Networks, 25(3):211–230, 2003. [2] R. Agrawal and R. Srikant. Fast algorithms for mining association rules in large databases. In Proceedings of the 20th International Conference on Very Large Data Bases, pages 487–499, 1994. [3] A.-L. Barabasi and R. Albert. Emergence of scaling in random networks. Sci- ence, 286:509–512, 1999. [4] T. Calders and B. Goethals. Depth-first non-derivable itemset mining. In Pro- ceedings of the SIAM 2005 International Conference on Data Mining, 2005. [5] J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate genera- tion. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pages 1–12, 2000. [6] M. A. Hasan, V. Chaoji, S. Salem, and M. Zaki. Link prediction using super- vised learning. Workshop on Link Analysis, Counter-terrorism and Security (at SIAM Data Mining Conference), 2006. [7] Z. Huang. Link prediction based on graph topology: The predictive value of generalized clustering coefficient. In Workshop on Link Analysis: Dynamics and Static of Large Networks (LinkKDD2006), 2006. [8] Z. Huang, X. Li, and H. Chen. Link prediction approach to collaborative filter- ing. In JCDL ’05: Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries, pages 141–142, New York, NY, USA, 2005. ACM Press. [9] F. Jelinek. Statistical Methods for Speech Recognition. MIT Press, Cambridge, MA, 1998. [10] H. Kashima and N. Abe. A parameterized probabilistic model of network evo- lution for supervised link prediction. In ICDM, pages 340–349, 2006. [11] S. Lauritzen and D. Speigelhalter. Local computations with probabilities on graphical structures and their application to expert systems. Journal of the Royal Statistical Society, Series B (Methodological), 50(2):157224, 1988. [12] D. Liben-Nowell and J. Kleinberg. The link prediction problem for social net- works. In CIKM ’03: Proceedings of the twelfth international conference on In- formation and knowledge management, pages 556–559, New York, NY, USA, 2003. ACM Press. [13] M. E. J. Newman. Clustering and preferential attachment in growing networks. Physical Review Letters E, 2001. [14] J. O’Madadhain, J. Hutchins, and P. Smyth. Prediction and ranking algorithms for event-based network data. SIGKDD Explor. Newsl., 7(2):23–30, 2005. [15] D. Pavlov, H. Mannila, and P. Smyth. Beyond independence: probabilistic models for query approximation on binary transaction data. IEEE Transactions on Knowledge and Data Engineering, 15(6):1409–1421, November 2003. [16] A. Popescul and L. H. Ungar. Statistical relational learning for link prediction. In IJCAI03 Workshop on Learning Statistical Models from Relational Data, 2003. [17] M. J. Rattiganand D. Jensen. The case for anomalous link discovery. SIGKDD Explor. Newsl., 7(2):41–47, 2005. [18] B. Taskar, M. F. Wong, P. Abbeel, and D. Koller. Link prediction in relational data. In Advances in Neural Information Processing Systems 16 [Neural Infor- mation Processing Systems, 2003. [19] C. Wang and S. Parthasarathy. Summarizing itemset patterns using probabilistic models. In Proceedings of the Twelfth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 730–735, New York, NY, USA, 2006. ACM Press. [20] M. J. Zaki, S. Parthasarathy, and W. L. Mitsunori Ogihara. New algorithms for fast discovery of association rules. In Proceedings of the Third International Conference on Knowledge Discovery and Data Mining, pages 283–286, 1997. 331331331331331 lizzy << /ASCII85EncodePages false /AllowTransparency false /AutoPositionEPSFiles false /AutoRotatePages /None /Binding /Left /CalGrayProfile (None) /CalRGBProfile (None) /CalCMYKProfile (None) /sRGBProfile (sRGB IEC61966-2.1) /CannotEmbedFontPolicy /Error /CompatibilityLevel 1.6 /CompressObjects /Off /CompressPages true /ConvertImagesToIndexed true /PassThroughJPEGImages true /CreateJobTicket false /DefaultRenderingIntent /Default /DetectBlends true /DetectCurves 0.1000 /ColorConversionStrategy /LeaveColorUnchanged /DoThumbnails true /EmbedAllFonts true /EmbedOpenType false /ParseICCProfilesInComments true /EmbedJobOptions true /DSCReportingLevel 0 /EmitDSCWarnings false /EndPage -1 /ImageMemory 1048576 /LockDistillerParams true /MaxSubsetPct 100 /Optimize true /OPM 0 /ParseDSCComments false /ParseDSCCommentsForDocInfo false /PreserveCopyPage true /PreserveDICMYKValues true /PreserveEPSInfo false /PreserveFlatness true /PreserveHalftoneInfo true /PreserveOPIComments false /PreserveOverprintSettings true /StartPage 1 /SubsetFonts true /TransferFunctionInfo /Remove /UCRandBGInfo /Preserve /UsePrologue false /ColorSettingsFile () /AlwaysEmbed [ true ] /NeverEmbed [ true ] /AntiAliasColorImages false /CropColorImages true /ColorImageMinResolution 36 /ColorImageMinResolutionPolicy /Warning /DownsampleColorImages true /ColorImageDownsampleType /Bicubic /ColorImageResolution 300 /ColorImageDepth -1 /ColorImageMinDownsampleDepth 1 /ColorImageDownsampleThreshold 2.00333 /EncodeColorImages true /ColorImageFilter /DCTEncode /AutoFilterColorImages false /ColorImageAutoFilterStrategy /JPEG /ColorACSImageDict << /QFactor 0.76 /HSamples [2 1 1 2] /VSamples [2 1 1 2] >> /ColorImageDict << /QFactor 0.76 /HSamples [2 1 1 2] /VSamples [2 1 1 2] >> /JPEG2000ColorACSImageDict << /TileWidth 256 /TileHeight 256 /Quality 15 >> /JPEG2000ColorImageDict << /TileWidth 256 /TileHeight 256 /Quality 15 >> /AntiAliasGrayImages false /CropGrayImages true /GrayImageMinResolution 36 /GrayImageMinResolutionPolicy /Warning /DownsampleGrayImages true /GrayImageDownsampleType /Bicubic /GrayImageResolution 300 /GrayImageDepth -1 /GrayImageMinDownsampleDepth 2 /GrayImageDownsampleThreshold 2.00333 /EncodeGrayImages true /GrayImageFilter /DCTEncode /AutoFilterGrayImages false /GrayImageAutoFilterStrategy /JPEG /GrayACSImageDict << /QFactor 0.76 /HSamples [2 1 1 2] /VSamples [2 1 1 2] >> /GrayImageDict << /QFactor 0.76 /HSamples [2 1 1 2] /VSamples [2 1 1 2] >> /JPEG2000GrayACSImageDict << /TileWidth 256 /TileHeight 256 /Quality 15 >> /JPEG2000GrayImageDict << /TileWidth 256 /TileHeight 256 /Quality 15 >> /AntiAliasMonoImages false /CropMonoImages true /MonoImageMinResolution 36 /MonoImageMinResolutionPolicy /Warning /DownsampleMonoImages true /MonoImageDownsampleType /Bicubic /MonoImageResolution 600 /MonoImageDepth -1 /MonoImageDownsampleThreshold 1.00167 /EncodeMonoImages true /MonoImageFilter /CCITTFaxEncode /MonoImageDict << /K -1 >> /AllowPSXObjects false /CheckCompliance [ /None ] /PDFX1aCheck false /PDFX3Check false /PDFXCompliantPDFOnly false /PDFXNoTrimBoxError true /PDFXTrimBoxToMediaBoxOffset [ 0.00000 0.00000 0.00000 0.00000 ] /PDFXSetBleedBoxToMediaBox true /PDFXBleedBoxToTrimBoxOffset [ 0.00000 0.00000 0.00000 0.00000 ] /PDFXOutputIntentProfile (None) /PDFXOutputConditionIdentifier () /PDFXOutputCondition () /PDFXRegistryName (http://www.color.org) /PDFXTrapped /False /CreateJDFFile false /Description << /JPN <FEFF3053306e8a2d5b9a306f300130d330b830cd30b9658766f8306e8868793a304a3088307353705237306b90693057305f00200050004400460020658766f830924f5c62103059308b3068304d306b4f7f75283057307e305930023053306e8a2d5b9a30674f5c62103057305f00200050004400460020658766f8306f0020004100630072006f0062006100740020304a30883073002000520065006100640065007200200035002e003000204ee5964d30678868793a3067304d307e30593002> /DEU <FEFF00560065007200770065006e00640065006e0020005300690065002000640069006500730065002000450069006e007300740065006c006c0075006e00670065006e0020007a0075006d002000450072007300740065006c006c0065006e00200076006f006e0020005000440046002d0044006f006b0075006d0065006e00740065006e002c00200075006d002000650069006e00650020007a0075007600650072006c00e40073007300690067006500200041006e007a006500690067006500200075006e00640020004100750073006700610062006500200076006f006e00200047006500730063006800e40066007400730064006f006b0075006d0065006e00740065006e0020007a0075002000650072007a00690065006c0065006e002e00200044006900650020005000440046002d0044006f006b0075006d0065006e007400650020006b00f6006e006e0065006e0020006d006900740020004100630072006f0062006100740020006f0064006500720020006d00690074002000640065006d002000520065006100640065007200200035002e003000200075006e00640020006800f600680065007200200067006500f600660066006e00650074002000770065007200640065006e002e> /FRA <FEFF004f007000740069006f006e00730020007000650072006d0065007400740061006e007400200064006500200063007200e900650072002000640065007300200064006f00630075006d0065006e007400730020005000440046002000700072006f00660065007300730069006f006e006e0065006c007300200066006900610062006c0065007300200070006f007500720020006c0061002000760069007300750061006c00690073006100740069006f006e0020006500740020006c00270069006d007000720065007300730069006f006e002e00200049006c002000650073007400200070006f0073007300690062006c0065002000640027006f00750076007200690072002000630065007300200064006f00630075006d0065006e007400730020005000440046002000640061006e00730020004100630072006f0062006100740020006500740020005200650061006400650072002c002000760065007200730069006f006e002000200035002e00300020006f007500200075006c007400e9007200690065007500720065002e> /PTB <FEFF005500740069006c0069007a006500200065007300740061007300200063006f006e00660069006700750072006100e700f5006500730020007000610072006100200063007200690061007200200064006f00630075006d0065006e0074006f0073002000500044004600200063006f006d00200075006d0061002000760069007300750061006c0069007a006100e700e3006f0020006500200069006d0070007200650073007300e3006f00200061006400650071007500610064006100730020007000610072006100200064006f00630075006d0065006e0074006f007300200063006f006d0065007200630069006100690073002e0020004f007300200064006f00630075006d0065006e0074006f0073002000500044004600200070006f00640065006d0020007300650072002000610062006500720074006f007300200063006f006d0020006f0020004100630072006f006200610074002c002000520065006100640065007200200035002e00300020006500200070006f00730074006500720069006f0072002e> /DAN <FEFF004200720075006700200064006900730073006500200069006e0064007300740069006c006c0069006e006700650072002000740069006c0020006100740020006f0070007200650074007400650020005000440046002d0064006f006b0075006d0065006e007400650072002c0020006400650072002000650072002000650067006e006500640065002000740069006c0020007000e5006c006900640065006c006900670020007600690073006e0069006e00670020006f00670020007500640073006b007200690076006e0069006e006700200061006600200066006f0072007200650074006e0069006e006700730064006f006b0075006d0065006e007400650072002e0020005000440046002d0064006f006b0075006d0065006e007400650072006e00650020006b0061006e002000e50062006e006500730020006d006500640020004100630072006f0062006100740020006f0067002000520065006100640065007200200035002e00300020006f00670020006e0079006500720065002e>/NLD <FEFF004700650062007200750069006b002000640065007a006500200069006e007300740065006c006c0069006e00670065006e0020006f006d0020005000440046002d0064006f00630075006d0065006e00740065006e0020007400650020006d0061006b0065006e00200064006900650020006700650073006300680069006b00740020007a0069006a006e0020006f006d0020007a0061006b0065006c0069006a006b006500200064006f00630075006d0065006e00740065006e00200062006500740072006f0075007700620061006100720020007700650065007200200074006500200067006500760065006e00200065006e0020006100660020007400650020006400720075006b006b0065006e002e0020004400650020005000440046002d0064006f00630075006d0065006e00740065006e0020006b0075006e006e0065006e00200077006f007200640065006e002000670065006f00700065006e00640020006d006500740020004100630072006f00620061007400200065006e002000520065006100640065007200200035002e003000200065006e00200068006f006700650072002e> /ESP <FEFF0055007300650020006500730074006100730020006f007000630069006f006e006500730020007000610072006100200063007200650061007200200064006f00630075006d0065006e0074006f0073002000500044004600200071007500650020007000650072006d006900740061006e002000760069007300750061006c0069007a006100720020006500200069006d007000720069006d0069007200200063006f007200720065006300740061006d0065006e0074006500200064006f00630075006d0065006e0074006f007300200065006d00700072006500730061007200690061006c00650073002e0020004c006f007300200064006f00630075006d0065006e0074006f00730020005000440046002000730065002000700075006500640065006e00200061006200720069007200200063006f006e0020004100630072006f00620061007400200079002000520065006100640065007200200035002e003000200079002000760065007200730069006f006e0065007300200070006f00730074006500720069006f007200650073002e> /SUO <FEFF004e00e4006900640065006e002000610073006500740075007300740065006e0020006100760075006c006c006100200076006f006900740020006c0075006f006400610020006a0061002000740075006c006f00730074006100610020005000440046002d0061007300690061006b00690072006a006f006a0061002c0020006a006f006900640065006e0020006500730069006b0061007400730065006c00750020006e00e400790074007400e400e40020006c0075006f00740065007400740061007600610073007400690020006c006f00700070007500740075006c006f006b00730065006e002e0020005000440046002d0061007300690061006b00690072006a0061007400200076006f0069006400610061006e0020006100760061007400610020004100630072006f006200610074002d0020006a0061002000520065006100640065007200200035002e00300020002d006f0068006a0065006c006d0061006c006c0061002000740061006900200075007500640065006d006d0061006c006c0061002000760065007200730069006f006c006c0061002e> /ITA <FEFF00550073006100720065002000710075006500730074006500200069006d0070006f007300740061007a0069006f006e00690020007000650072002000630072006500610072006500200064006f00630075006d0065006e007400690020005000440046002000610064006100740074006900200070006500720020006c00610020007300740061006d00700061002000650020006c0061002000760069007300750061006c0069007a007a0061007a0069006f006e006500200064006900200064006f00630075006d0065006e0074006900200061007a00690065006e00640061006c0069002e0020004900200064006f00630075006d0065006e00740069002000500044004600200070006f00730073006f006e006f0020006500730073006500720065002000610070006500720074006900200063006f006e0020004100630072006f00620061007400200065002000520065006100640065007200200035002e003000200065002000760065007200730069006f006e006900200073007500630063006500730073006900760065002e> /NOR <FEFF004200720075006b00200064006900730073006500200069006e006e007300740069006c006c0069006e00670065006e0065002000740069006c002000e50020006f00700070007200650074007400650020005000440046002d0064006f006b0075006d0065006e00740065007200200073006f006d002000700061007300730065007200200066006f00720020007000e5006c006900740065006c006900670020007600690073006e0069006e00670020006f00670020007500740073006b007200690066007400200061007600200066006f0072007200650074006e0069006e006700730064006f006b0075006d0065006e007400650072002e0020005000440046002d0064006f006b0075006d0065006e00740065006e00650020006b0061006e002000e50070006e006500730020006d006500640020004100630072006f0062006100740020006f0067002000520065006100640065007200200035002e00300020006f0067002000730065006e006500720065002e> /SVE <FEFF0041006e007600e4006e00640020006400650020006800e4007200200069006e0073007400e4006c006c006e0069006e006700610072006e00610020006e00e40072002000640075002000760069006c006c00200073006b0061007000610020005000440046002d0064006f006b0075006d0065006e007400200073006f006d00200070006100730073006100720020006600f600720020007000e5006c00690074006c006900670020007600690073006e0069006e00670020006f006300680020007500740073006b0072006900660074002000610076002000610066006600e4007200730064006f006b0075006d0065006e0074002e0020005000440046002d0064006f006b0075006d0065006e00740065006e0020006b0061006e002000f600700070006e006100730020006d006500640020004100630072006f0062006100740020006f00630068002000520065006100640065007200200035002e003000200065006c006c00650072002000730065006e006100720065002e> /ENU (Use these settings with Distiller 7.0 or equivalent to create PDF documents suitable for IEEE Xplore. Created 29 November 2005. ****Preliminary version. NOT FOR GENERAL RELEASE***) >> >> setdistillerparams << /HWResolution [600 600] /PageSize [612.000 792.000] >> setpagedevice
Compartilhar