IEEE Transactions on Knowledge and Data Engineering

Download PDF

Keywords

Anomaly Detection, Feature Extraction, Semantics, Databases, Training Data, Training, Transfer Learning, Graph Anomaly Detection, Graph Neural Networks, Graph Transfer Learning, Anomaly Detection, Training Data, Performance Degradation, Target Domain, Related Domains, Domain Classifier, Label Information, Source Domain, Set Of Graphs, One Class Classification, Classification Problem, Transfer Learning, Multi Label, Trainable Parameters, Representation Learning, Latent Space, Domain Adaptation, Graph Neural Networks, Adversarial Training, Node Attributes, Normal Class, Transfer Task, Unsupervised Domain Adaptation Methods, Graph Database, System Logs, Domain Adaptation Methods, Traditional Domain, Source Class, Domain Discrepancy, Graph Structured Data

Abstract

Existing graph level anomaly detection methods are predominantly unsupervised due to high costs for obtaining labels, yielding sub-optimal detection accuracy when compared to supervised methods. Moreover, they heavily rely on the assumption that the training data exclusively consists of normal graphs. Hence, even the presence of a few anomalous graphs can lead to substantial performance degradation. To alleviate these problems, we propose a cross-domain graph level anomaly detection method, aiming to identify anomalous graphs from a set of unlabeled graphs (target domain) by using easily accessible normal graphs from a different but related domain (source domain). Our method consists of four components: a feature extractor that preserves semantic and topological information of individual graphs while incorporating the distance between different graphs; an adversarial domain classifier to make graph level representations domain-invariant; a one-class classifier to exploit label information in the source domain; and a class aligner to align classes from both domains based on pseudolabels. Experiments on seven benchmark datasets show that the proposed method largely outperforms state-of-the-art methods.

I.   Introduction

Graph-structured data is ubiquitous, as it can represent relations between objects using edges and semantic characteristics of objects using node attributes. As a result, graph level anomaly detection has a wide range of potential applications, such as criminal detection in financial network [1], error detection in system logs [2], identifying specific molecules in drug discovery [3], and detecting unhealthy brain structures [4].

Graph neural networks (GNNs) are capable of learning discriminative feature representations for graphs and have significantly advanced benchmark results for graph level anomaly detection [5]. Similar to other types of neural networks, the impressive performance of graph neural networks (GNNs) is often attained by using a substantial amount of labeled data. However, the process of manually annotating graph data is laborious and thus often impractical. To circumvent this challenge, recent studies have turned to unsupervised (or semi-supervised) learning instead. These methods, however, strongly rely on the assumption that the training data exclusively consists of normal graphs. Our experiments demonstrate that even a minor presence of anomalous graphs in the training data can lead to substantial performance degradation for these methods (c.f. Table III).

TABLE I Datasets. #Nodes , #Edges , Degree , #Attr Denote the Average Number of Nodes and Edges, the Average Degree, and the Dimensionality of Node Attributes, Respectively

TABLE II Description of Hyperparameters and Their Recommended Values

TABLE III Anomaly Detection Accuracy (Average AUC ROC and Corresponding Standard Deviations Across 10 Runs) of Unsupervised Methods (OCCIN, GLAM, and GLocalKD) Under Two Scenarios: 1) When the Training Dataset is Clean, Namely Containing Exclusively Normal Graphs (Shown on the Left Side of ‘’), and 2) When the Training Dataset is Contaminated, Namely Containing Both Normal and Abnormal Instances (Shown on the Right Side of ‘’)

In practice, labeled data may be accessible or relatively cheap to obtain in some domains. Hence, in situations where a certain ‘target’ domain of interest suffers from a dearth of labeled data (or its purity cannot be guaranteed), there is a strong motivation to construct learners that can exploit abundant labeled data from a different but related domain.

Recent work on graph level anomaly detection [1],[5],[6],[7] is mostly unsupervised (or semi-supervised), and has been limited to detecting anomalies within a single domain. That is, the potential benefits of incorporating labeled information from a related domain has not yet been researched. In this paper we investigate how to transfer ‘anomaly knowledge’ from a source to a target graph database.

Unsupervised domain adaptation (UDA) is an attractive approach to achieve this: it adapts models learned from a source domain with plenty labeled data to a target domain without labels, and has demonstrated remarkable performance in computer vision and natural language processing [8],[9]. Although a few studies have explored UDA for cross-domain node classification, there has been no prior research on cross-domain graph level anomaly detection. Two challenges need to be overcome. First, most existing UDA methods are developed for vector-based data, such as image and text data, for which a distance in a euclidean space can be defined, while for graph-structured data distance is typically defined in a non-euclidean space due to graph isomorphism. This makes directly applying off-the-shelf UDA methods to graphs impractical. Second, graph level anomaly detection is inherently more challenging than node level anomaly detection, as anomalies at the graph level may involve global patterns and interactions that cannot be easily discerned by examining individual nodes.

To fill this gap, being motivated and supported by domain adaptation theory [10], we propose an unsupervised domain adaptation based graph level anomaly detection method called ARMET. It addresses the following cross-domain graph level anomaly detection problem: given a target graph database with fully unlabeled graphs and a different but related source graph database that contains only normal graphs, learn a one-class classifier that identifies anomalous graphs from the target graph database.

To achieve this, ARMET leverages an adversarial learning approach consisting of four main components. First, to learn graph level representations, it utilizes a two-part feature extractor: a semantic feature extractor to jointly preserve the semantic and topological information of each graph, and a structure feature extractor to extract the structure of each graph domain. Second, a domain classifier is learned to make graph level representations domain-invariant, thereby reducing the domain discrepancy. Third, a one-class classifier is trained using normal source graphs, aiming to make the learned graph level representations label-discriminative. Finally, a class aligner is trained to align normal graphs in both domains while separating anomalous graphs and normal graphs in the target domain. As a result, in an end-to-end manner, ARMET can learn both domain-invariant and label-discriminative graph level representations, and thus effectively identify anomalous graphs from the target domain.

Our contributions can be summarized as follows:

  • We introduce the cross-domain graph level anomaly detection problem, and develop ARMET, an effective approach to address this problem;
  • ARMET is the first attempt to combine graph neural networks and adversarial domain adaptation techniques for performing graph level anomaly detection tasks;
  • Experiments demonstrate its improved performance for graph level anomaly detection when compared to state-of-the-art unsupervised graph level anomaly detectors.

The rest of this work is organized as follows. Section II reviews related literature. Section III introduces relevant notations and formulates the research problem. Section IV presents the framework of our method ARMET. Sections V and VI describe the design details of ARMET. Section VII provides experimental setup, and Section VIII presents results and corresponding analysis. Section X concludes this work.

II.   Related Work

We review work that is most closely related; readers are referred to the following surveys for more on transfer learning [11],[12], graph representation learning [13], and graph anomaly detection [14],[15].

Graph Level Anomaly Detection Unsupervised/semi-supervised methods include OCGIN [6], GLAM [5], GLocalKD [7], OCGTL [1] and CODEtect [16]. Unlike ARMET, CODEtect can only handle unattributed graphs. Meanwhile, OCGIN, GLAM, GLocalKD, and OCGTL can handle attributed graphs, but they heavily depend on the assumption that the training data exclusively contains normal graphs. This assumption is impractical or expensive in real-world applications. As we will show later, the presence of anomalies in the training data may largely decrease the performance of these methods (c.f. Table III). Moreover, the supervised method iGAD requires fully labelled training data, which is expensive or sometimes impossible to obtain. In contrast, ARMET only requires that the source domain data exclusively contains normal graphs, while imposing no additional assumptions on the target domain data. Further, it is the first graph level anomaly detection method that leverages labeled data from a different but related domain.

Traditional Unsupervised Domain Adaptation Traditional UDA techniques, such as DeepCoral [17], DANN [18], ADDA [19], and CDAN [20], are primarily designed for addressing multi-class classification problems in computer vision and natural language processing. Consequently, these methods fail to account for the unique characteristics of graph-structured data, as well as the highly imbalanced nature of anomaly detection problems. Moreover, these methods assume that the source domain contains all classes and is fully labeled. As a result, their effectiveness diminishes when the source domain contains only partial classes (e.g., only the normal class in anomaly detection), and this will be shown later in Table VI.

TABLE IV Anomaly Detection Accuracy (Average AUC ROC and Corresponding Standard Deviations Across 10 Runs)

TABLE V Ablation Studies With the Following Stripped-Down Variations, Where ‘✓’ and ‘✗’ Mean the Corresponding Component is Included or Excluded, Respectively

TABLE VI Anomaly Detection Accuracy (Average AUC ROC and Corresponding Standard Deviations Across 10 Runs) of Traditional UDA Methods (Including ADDA, CDAN, and DeepCoral) When the Source Domain Contains Both Labeled Anomalous and Normal Instances. (‘’ Indicates the Performance is Boosted Compared to the Case Where the Source Domain Contains Only Normal Instances, While ‘’ Means It is Degraded.)

Domain Adaptation on Graphs Most existing domain adaptation methods on graphs, such as SDA-DAGL [21], DANE [22], UDA-GCN [23], DASGA [24], ACDNE [25], CDNE [26], DANE [27], COMMANDER [28], AdaGCN [29], DGL [30], AdaGIn [31] and CD-GAD [32], only consider domain adaptation from an individual graph to another graph.

Only a few works, including DGDA [33], CDA [34], and [35], consider domain adaptation from a set of graphs to another set of graphs. However, they focus on the multi-class graph classification problem, while ARMET is the first to consider cross-domain graph level anomaly detection, which is more challenging due to the extreme class imbalance. Moreover, unlike ARMET, these methods require the source domain data to contain all classes and be fully labeled.

III.   Problem Statement

Following the notation commonly used in transfer learning [11], a domain D consists of two components, namely a feature space X and its marginal probability distribution P(X). Meanwhile, a task contains two components, that is, a label space Y and a predictive function h() that can be expressed as P(Y|X). Moreover, we consider an attributed and undirected graph G=(V,E), where the node set V is associated with a node feature matrix XR|V|×C and the the edge set E is associated with the adjacency matrix AR|V|×|V|. In the following sections, we use the superscripts ()s and ()t to denote concepts in the source and target domains (also tasks), respectively.

Traditional Unsupervised Domain Adaptation on Graphs Traditional unsupervised domain adaptation (UDA) assumes that there is a set of labeled source graphs Gs={Gns,Yns}n=1Ns that are i.i.d drawn from P(Xs,Ys), and another set of unlabeled target graphs Gt={Gns}n=1Nt that are i.i.d drawn from P(Xt). Moreover, UDA usually imposes the covariate shift assumption , i.e., Ps(Xs)Pt(Xt) but Ps(Ys|Xs)=Pt(Yt|Xt). In other words, it assumes that the source and the target are different (in the sense that Ps(Xs)Pt(Xt)) but related (in the sense that Xs=Xt and Ps(Ys|Xs)=Pt(Yt|Xt)). On this basis, UDA aims to learn a classifier h:XtYt by using fully labelled source graphs and unlabelled target graphs.

For simplicity, we assume that the node feature matrices of the source (i.e., Xs) and target graphs (i.e., Xt) have the same dimensionality and their columns share the same semantic meanings. Otherwise, we can construct a unified node feature set X=XsXt by following the practice in [29],[31]. This assumption is important for utilising parameters-shared graph embedding models and fulfilling the covariate shift assumption in UDA.

Cross-Domain Graph Level Anomaly Detection The anomaly detection problem can be considered as a binary classification problem. Hence, we have Ys=Yt={0,1}, where 0 and 1 represent normal and abnormal graphs, respectively. To further relax the dependency on fully labeled source data, we assume that Gs only contains normal graphs. Specifically, given Gs={Gns,Yns}n=1NsXs×Ys with Yns=0 for n1,,Ns, and Gt={Gnt}n=1NtXt, Cross-Domain Graph Level Anomaly Detection (CD-GLAD) aims to learn a binary classifier g:XtYt that accurately predicts anomaly labels for graphs in the target domain, with the assistance of both normal graphs from the source domain and unlabeled graphs from the target domain. Hence, the CD-GLAD problem is more practical but entails greater challenges than traditional UDA.

IV.   Proposed Method: ARMET

Motivation According to domain adaptation theory [10], given the source domain Ds={Xs,P(Xs)} and the target domain Dt={Xt,P(Xt)}, given any binary classifier h drawn from a hypothesis class H, for any δ(0,1), with the probability at least of 1δ, we have (1)ϵt(h)ϵs(h)+12dHΔH(Ds,Dt)+[ϵs(h)+ϵt(h)]+ω, where ϵt(h) and ϵs(h) indicate the expected error of classifier h on the target domain and source domain, respectively. Moreover, dHΔH(Ds,Dt) represents the domain discrepancy. Importantly, [ϵsh)+ϵs(h)] means the combined error of the ideal joint classifier h on both domains, where h=:argminhH[ϵs(h)+ϵt(h)]. Besides, ω is the item associated with the model complexity and the sample sizes.

In this work, we aim to learn a classifier h that has the minimal expected error on the target domain. Therefore, the sum of terms on the right side of (1) should be minimized. This sheds key insights into the design of our algorithm, which considers the following overall objective: L(Xs,Ys,Xt)=λ1LSC(Xs,Ys)λ2LDA(Xs,Xt)(2)+λ3LCA(Xs,Ys,Xt), where L(Xs,Ys,Xt) corresponds to the upper bound of ϵt(h), LSC(Xs,Ys) is the source classifier loss corresponding to ϵs(h), LDA(Xs,Xt) is the domain classifier loss that approximates d(Ds,Dt) (larger loss indicates smaller discrepancy), and LCA(Xs,Ys,Xt) is the class alignment loss that corresponds to [ϵs(h)+ϵs(h)]. Further, the balance parameters λ1>0, λ2>0 and λ3>0.

Approach Fig. 1 depicts the architecture of our proposed method, dubbed ARMET (adversarial cross domain graph level anomaly detection). Specifically, ARMET adapts an adversarial learning framework to perform cross-domain graph level anomaly detection. It involves the four following main components:

  • Parameters-Shared Feature Extractor hFE: takes a source graph database Gs consisting of exclusively normal graphs and an unlabeled target graph database Gt as inputs, and aims to learn a representation vector hFE(Gi) for each graph GiGsGt such that similar graphs (in terms of semantic and structure properties) from both domains have similar embeddings; Graphic: An overview of ARMET. Graphs and their learned representations are framed in oval rectangles, where the target domain is highlighted in green and the source domain in orange. Meanwhile, the semantic feature extractor, structure feature extractor, and other learners including one-class classifier, domain classifier, and class aligner are depicted in blue rectangles.

    Fig. 1. An overview of ARMET. Graphs and their learned representations are framed in oval rectangles, where the target domain is highlighted in green and the source domain in orange. Meanwhile, the semantic feature extractor, structure feature extractor, and other learners including one-class classifier, domain classifier, and class aligner are depicted in blue rectangles.

  • One-Class Classifier hs: takes the representation of each graph hFE(Gi) from the source domain as input, and learns a hypersphere to include embeddings of normal graphs while excluding those of anomalous graphs. This classifier can be directly used to predict labels for graphs in the target domain and corresponds to LSC;
  • Domain Classifier hd: takes the representation of each graph hFE(Gi) as input, and attempts to discriminate whether it is drawn from the source domain or the target domain such that the distributions of embeddings of both domains are aligned. And it corresponds to LDA;
  • Class Aligner: takes {Gns,Yns}n=1Ns and {Gnt,Y^nt}n=1Nt as inputs, further making normal graphs from both domains have close embeddings, while normal graphs and anomalous graphs in the target domain have distant embeddings. Particularly, we apply the One-Class Classifier hs to obtain pseudo-labels Y^nt for graphs in the target domain while using the true labels Yns for graphs in the source domain; This component corresponds to LCA.

For better organization, we divide these components into two modules, as shown in Fig. 1: the graph feature extraction module that contains the feature extractor (steps ①,②, ③), and the cross-domain anomaly detection module that includes the domain classifier (step ④), the one-class classifier (step ⑤), and the class aligner (step ⑥). Importantly, the two modules are trained jointly in an end-to-end manner, although they are introduced separately in the following.

V.   Module 1: Graph Feature Extraction

The graph feature extraction module consists of a feature extractor dubbed hFE() with trainable parameters ΘFE, aiming to extract graph level representations for graphs from source domain and target domain. This component is further decomposed to three sub-components: a semantic feature extractor (step ①) and a structure feature extractor (step ②), followed by a feature concatenation operator (step ③).

Semantic Feature Extractor We denote the semantic feature extractor as hSE() with trainable parameters ΘSE, which learns a graph level representation for each input graph. Specifically, we adapt a K-layer GIN model [36] followed by a READOUT function to obtain graph level representations due to the superior performance of GIN compared to other competing methods [37]. Concretely, GIN updates node representations as (3)fv(k+1)=MLP((1+α(k))fv(k)+uN(v)fu(k)), where fv(k) denotes the representation of node v learned at the k-th layer, N(v) indicates the neighbour set for node v, α(k) is a trainable parameter while MLP represents a multilayer perceptron. Next, we obtain the graph level representation by leveraging READOUT(fv(k)|k=1,,K), which can be a simple permutation-invariant function such as the maximum, sum or mean. Using this, we can preserve the semantic and topological information of each graph.

Structure Feature Extractor We denote the structure extractor as hST() with parameters ΘST. We assume that Gs={G1s,,Gjs,,GNs} and Gt={G1t,,Gjt,,GMt} exhibit some inherent data structures such as clusters in Xs and Xt, respectively. Inspired by [38], for each graph database, we construct a k-nearest neighbours graph (KNN graph ) in the latent space to model its data structure, aiming to capture the neighbourhood information between different graphs. Without loss of generality, we use Gs to demonstrate the construction of a KNN graph .

The KNN graph of Gs contains N nodes, with each node representing a source graph and its node attribute indicating the corresponding graph level representation vector learned by hST(). Next, to generate edges, we can construct the adjacency matrix as (4)Aij={1,if GiNk(Gj) or GjNk(Gi)0,otherwise where Nk(Gj) represents the k-nearest neighbours set of graph Gj based on the euclidean distance between graph level embeddings. After constructing the KNN graph , we can leverage another GIN model (without READOUT function) to learn the node representations, wherein a node represents a source graph instance.

Feature Concatenation Operator We utilize a concatenation operator that directly appends the structure feature for each graph to its semantic feature. Hence, this operator has no parameters. Overall, given a graph Gi, its final extracted feature can be expressed as hFE(Gi)=hSE(Gi)hST(Gi). As a result, the corresponding trainable parameter ΘFE=(ΘSE,ΘST). Importantly, these parameters are shared when learning representations for graphs from the source domain and the target domain, respectively.

VI.   Module 2: Cross-Domain Graph Level Anomaly Detection

This module contains three components: the one-class classifier, the domain classifier, and the class aligner. We first elucidate the rationale and design of each component, followed by introducing the theory of adversarial training.

One-Class Classifier: We use hs() to represent the source classifier, which has no additional parameters beyond the feature extraction parameters ΘFE. We train a one-class classifier on the source domain by minimizing the following One-Class Deep SVDD objective [39]: (5)LSC(Xs,Ys;ΘFE)=:1Nsm=1NshFE(Gm)o22, where hFE(Gm) is the final extracted feature of graph Gm (from the source domain), and o is the learned center that represents the normality defined in the source domain. Minimizing this SVDD loss ensures the model learns the label information from the source domain.

Domain Classifier: We denote the domain classifier as hd(), with trainable parameters Θd in addition to parameters ΘFE. We maximize the binary cross-entropy loss to learn domain-invariant features: LDA(Xs,Xt;Θd,ΘFE)=:(6)[1Ns+Nti=1Ns+Nt[dilog(1d^i)+(1di)log(11d^i)]], where di denotes the binary ground-truth domain label for graph Gi. Specifically, di is 0 for graphs from the source domain and 1 for graphs from the target domain. Additionally, d^i represents the predicted probability that Gi belongs to the target domain, as determined by hd(). More precisely, the prediction d^i is given by d^i=hd(hFE(Gi)), where hFE() is the final feature extractor and hd() is the domain classifier. In loss term (6), we consider the negative log-probability, expressed as log(d^i). This can be rewritten as log(1hd(hFE(Gi))). The goal of this loss function is to maximize it, which encourages the feature extractor hFE() to create similar representations for graphs from both the source and target domains. This helps align the domains, making the feature distributions of the source and target domains indistinguishable and reducing discrepancies between them.

Class Aligner: Structure consistency between domains (via parameters-shared feature extractor), discriminability in source domain (via source classifier), and domain-invariant features (via domain classifier) do not necessarily lead to discriminability in the target domain. That is, although we manage to align features cross domains, the learned features may be distorted in the sense that they are not representative of the underlying patterns in the target domain. As a result, features of the normal and abnormal classes in the target domain may exhibit close proximity or even overlap, leading to a high value of ϵt(h) in the target domain. This is known as excessive alignment in [40] and collapse of target neighborhood structure in [41].

To alleviate this problem, we should consider the third term in (1), namely [ϵs(h)+ϵt(h)], that considers labels from both domains. Although the true label information in the target domain cannot be obtained, we can generate pseudolabels and minimize the class centroid alignment loss: (7)LCA(Xs,Ys,Xt,Y^t;ΘFE)=:[ψ(Cns,Cnt)ψ(Cat,Cnt)], where Y^t are the pseudolabels obtained by directly applying hs() to the target graphs, and hs() is obtained by optimizing loss term (5) in previous iteration. Moreover, Cns, Cnt, and Cat represent the centroid of normal source class, normal target class, and anomalous target class, respectively. The function ψ(,) is a distance metric such as the euclidean distance. Minimizing this loss can reduce the inter-domain distance between centroids of normal classes, while simultaneously maximizing the intra-domain distances between centroids of normal and abnormal classes within the target domain. Particularly, with an increase of training epochs, we expect that the discrepancy between source domain and target domain is reduced, the data structures are better aligned, the hs() is further improved, and thus the pseudolabels are gradually updated to approach the ground-truth labels, progressively improving the class centroid alignment.

Adversarial Training: It can be seen that the optimization of (5) and (7) involves a minimization w.r.t. parameters ΘFE, while the optimization of (6) concerns a maximization w.r.t. parameters Θd and ΘFE. In other words, objective (6) competes against objectives (5) and (7) during training over the overall objective (2). To obtain a good trade-off, adversarial training techniques have been explored, with impressive results [18].

For simplicity, we rewrite L(Xs,Ys,Xt;ΘFE,Θd) as L(ΘFE,Θd), which contains two sets of parameters. Adversarial training attempts to find a saddle point (Θ^FE,Θ^d) such that Θ^FE=argminΘFEL(ΘFE,Θ^d) and Θ^d=argmaxΘdL(Θ^FE,Θd). In other words, we perform a minmax optimization over (2): minΘFEmaxΘd[λ1LSC(ΘFE)λ2LDA(ΘFE,Θd)(8)+λ3LCA(ΘFE)]. Hence, the trainable parameters can be optimized alternatively as follows: (9)ΘdΘdμLDAΘd,ΘFEΘFEμ(λ1LSCΘFEλ2LDAΘFE+λ3LCAΘFE), where μ is the learning rate. We perform mini-batch training with gradient descent and the corresponding pseudo-code is provided in Algorithm 1.

Algorithm 1:   Mini-Batch Algorithm of ARMET.

Input: Labelled source graphs Gs={Gns,Yns}n=1Ns; Unlabelled target graphs Gt={Gns}n=1Nt; Balance parameters λ1, λ2 and λ3; Batch size Nb; Maximal training epochs Ne; Maximal iteration per epoch Ni
Output: Predicted labels Y^t of target graphs
Initialise parameters ΘFE,Θd
for epoch<Ne and not converge do
 for iteration<Nido
 Sample a batch Bs and Bt
 Learn representations using hFE for Bs and Bt
 Compute source classifier loss LSC using (5)
 Compute domain classifier loss LDC using (6)
 Backpropagate LDC and update Θd using (9)
 Compute class aligner loss LCA using (7)
 Compute total loss L using (2)
 Backpropagate L and update ΘFE using (9)
 end for
end for
h^FE(Gt) Use feature extractor with optimised parameters Θ^FE to extract features for Gt
Y^t Use optimised source classifier h^s() to predict labels for h^FE(Gt)

VII.   Experiment Setup

We aim to answer the following research questions (RQ) via experiments:

  1. How does ARMET perform when compared to state-of-the-art graph level anomaly detection methods?
  2. How does each component of ARMET affect the performance? (Ablation study)
  3. How does the performance of ARMET change with different hyperparameter values? (Sensitivity analysis)

In addition, to get a better understanding of ARMET, we utilize t-SNE [42] to visualize the learned graph representations from both domains.

A. Benchmark Datasets

As summarized in Table I, we study the following datasets collected from various application fields of graph mining:

System Logs We use four benchmark datasets for log anomaly detection, as converting logs into graphs and then leveraging GNNs to detect anomalies can achieve superior performance [2],[43]. Hence, following [2], we construct four graph datasets: HDFS (HD) [44], BGL (BG), SPIRIT (SP), and THUNDERBIRD (TH) [45], where each dataset contains 5000 graphs with 5% anomalous graphs. Particularly, HD consists of Hadoop Distributed File System logs while BG, SP, and TH containing system logs collected from three different supercomputing systems. For this group of datasets, we create 12 transfer tasks.

Letter Drawings We use three benchmark graph datasets with letter drawings of varying levels of distortion: low (LL), medium (LM), and high (LH) [46]. Following the practice of downsampling classification datasets for anomaly detection [47], for each dataset the letters N, M, and W are selected as the normal class, comprising a total of 450 instances (150 instances per letter), while the letter F is chosen as the anomalous class (downsampled to 50 instances). For each graph, a node denotes the end point of a line, an edge represents a line, and a node attribute represents its two-dimensional coordinate. For these datasets, we create 6 transfer tasks.

Discussion on Dataset Selection: Some commonly used benchmark graph datasets, including BZR, DHFR and COX2 (small molecules), as well as IMDB-Binary and REDDIT (social networks), have been excluded from our analysis for the following reasons: 1) the UDA assumptions do not hold for them, as these datasets have different definitions of classification problem, namely they have different label semantics; 2) transferability cannot be guaranteed due to huge domain discrepancy between datasets. For instance, node attributes of graphs in these datasets have different dimensionalities and/or meanings, and these datasets have very different graph statistics; and 3) even supervised classification methods can only achieve very limited in-domain classification accuracy (i.e., with an accuracy lower than 0.7) on these datasets.

B. Baselines

We compare ARMET to the following baselines:

  • Unsupervised graph level anomaly detection methods: We directly apply OCGIN [6], GLAM [5], and GLocalKD [7] on unlabeled target graphs.
  • Traditional Domain Adaptation Methods: ADDA [19], CDAN [20], and DeepCoral [17] are state-of-the-art UDA methods for image classification. As they are not designed for graph-structured data, we modify these models for cross-domain graph level anomaly detection by following [32].

To explore the ceiling performance of GLAD methods, we additionally present the outcomes of iGAD [48], a supervised method that is not considered a direct competitor due to its reliance on labeled target data.

C. Evaluation

We employ the widely used Area Under the Curve of the Receiver Operating Characteristics curve (AUC ROC), Area Under the Curve of Precision Recall (AUC PR), and F1-Score to evaluate and compare the different methods, where a higher value (closer to 1) represents better anomaly detection accuracy. Particularly, we report the average values of AUC ROC, AUC PR, F1-Score and the corresponding standard deviations across 10 independent runs.

D. Implementation and Model Configuration

We use the publicly available implementations of OCGIN1, GLAM,1 GLocalKD2 and iGAD3 with their recommended configurations. Besides, ADDA,4 CDAN5 and DeepCoral5 are adapted from their publicly available implementations. As they are not designed for graph-structured data, we modify these models for cross-domain graph level anomaly detection by following the practice in [32]:

  • CDAN: We replace the AlexNet encoder with a GIN plus a mean readout function;
  • ADDA: Similarly, we replace the CNN encoder with a GIN plus a mean readout function;
  • DeepCoral: We replace the CNN encoder (CaffeNet) with a GIN plus a mean readout function and apply CORAL loss to the last classification layer.

For ARMET, by following the practice in [17], the values of hyperparameters λ1, λ2 and λ3 can be set such that, after the training process (e.g., 100 epochs), the losses associated with one-class classification (LSC), domain discrimination (LDA) and class-alignment (LCA) are approximately at the same magnitude (after being multiplied by their corresponding weights). The rational behind it is that we aim to learn feature representations that are both label-discriminative and domain-invariant [17]. Particularly, we found that there is usually not a single set of optimal weights for each transfer task, as pointed out in multi-task learning by [49]. Although this hyperparameter tuning method works well, it is time-consuming and labor-intensive. For simplicity, we can adapt a de-facto strategy for hyperparameter tuning in UDA, namely splitting the target dataset according to a ratio of 20%:80%, where the 20% data with labels is used as the validation dataset to select these three hyperparameters and the remaining 80% data without labels is used as the test data.

For fair comparisons, all GIN models used in different domain adaptation methods are configured with the same backbone architecture, namely a backbone of two layers (64 hidden units) with each layer followed by a ReLU activation. Besides, all neural network based methods are trained using mini-batch gradient descent, with a batchsize of 512 on log anomaly detection datasets and a batchsize of 128 on other datasets respectively, initial learning rate of 0.01, weight decay rate of 0.0001, and a maximum of 200 training epochs. The settings for these hyperparameters for ARMET are summarised in Table II. Other algorithm-specific hyperparamters are set in accordance with their respective references given by the original authors.

E. Training Hardware and Reproducibility

We implemented and ran all algorithms in Python 3.8, using PyTorch [50] and PyTorch Geometric [51] when applicable, on a workstation equipped with an Intel i7-11700KF CPU and Nvidia RTX3070 GPU. For reproducibility, all code and datasets are made available on GitHub.6

VIII.   Experiment Results and Analysis

We answer the three research questions as follows.

A. Detection Accuracy (RQ1)

We perform the following analysis based on the experiment results in Tables III,IV and VI. Particularly, the results in terms of AUC PR and F1-Score are consistent with those in terms of AUR ROC. Therefore, these results are either deferred to Table VIII or omitted.

TABLE VII Ablation Study (Average AUC ROC With 10 Runs)

TABLE VIII Anomaly Detection Accuracy: Average AUC PR and Corresponding Standard Deviations Across 10 Runs (Top), and Average F1-Score (Binary) and Corresponding Standard Deviations Across 10 Runs (Bottom)

1) Accuracy of Single-Domain GLAD

Table III demonstrates that unsupervised/semi-supervised methods OCGIN, GLAM,7 and GLocalKD generally suffer from large performance degradation when the training data is contaminated with anomalies. Moreover, these unsupervised/semi-supervised methods deliver very unstable results across different datasets even when the training data is clean. For example, GLocalKD yields high performance on BGL (ROC =0.91), but very poor performance on HDFS (ROC =0.45). In contrast, Table IV shows that supervised method iGAD achieves perfect results, with an ROC of 1.0, on all cases. This indicates that these anomaly detection problems are solvable when sufficient labeled data is available in the target domain. However, this is unrealistic in many real-world scenarios as fully labeled data is usually expensive and often even impossible to obtain in practice.

2) Accuracy of Traditional UDA Methods

Table IV shows that when there is only one class in the source domain, ADDA, CDAN, and DeepCoral behave like random guessing (e.g., with ROC 0.50), performing much worse than ARMET for most transfer tasks. The potential factors contributing to their subpar performance are as follows. First, CDAN considers the conditional distribution that captures the cross-variance between feature representations and classifier predictions, but the conditional distribution degenerates to the marginal distribution when there is only one class in the source domain. Second, DeepCoral aligns the second-order statistics of layer activations in the source encoder and the target encoder, leading to random-guessing results since the source training data contains only normal graphs while the target training data includes both normal and abnormal graphs (and thus the statistics of layer activations should be different by nature). Third, ADDA performs poorly as it utilises different encoders for the source and target domains, implying the importance of parameters-shared models when encoding graphs from two domains. The efficacy of the source encoder degrades to random-guessing when the source domain contains only a single class, as the source classifier fails to acquire any informative knowledge. Furthermore, these traditional UDA methods are primarily designed for balanced multiple classification problems, making them ill-suited for anomaly detection, which involves an extremely imbalanced binary classification task. As a reference, we report the performance of ADDA, CDAN, and DeepCoral when the source domain contains both labeled anomalous and normal instances in Table VI. One can see that the performance of ADDA, CDAN, and DeepCoral is largely boosted with auxiliary label information in most transfer tasks. However, even with auxiliary labels, they are still outperformed by ARMET in most cases.

3) Accuracy of ARMET

Table IV shows that ARMET achieves the best performance on 13 out of 18 transfer tasks, and the second-best performance on the remaining tasks. Further, it largely outperforms unsupervised graph level anomaly detection methods on certain target datasets, demonstrating the benefit of leveraging label information from a different but related domain. For example, transferring knowledge from HD to SP leads to 54%, 28%, and 38% performance gains compared to OCGIN, GLAM, and GLocalKD, respectively. Finally, ARMET achieves impressive results under the setting that the source domain contains only normal graphs, while other traditional unsupervised domain adaptation methods provide “random-guessing” results, making ARMET more applicable to real-world scenarios.

B. Ablation Study (RQ2)

As summarised in Table V, we here compare ARMET to five of its stripped-down variations: SD is trained on the source domain and then directly applied on the target domain; /SF is ARMET trained without the structure feature extractor; /SC is ARMET trained without the source one-class classifier; /DC is ARMET trained without the domain classifier; and /CA is ARMET trained without the class aligner. We have the following important observations from Table VII:

  • SD versus ARMET. ARMET is always superior to the case where we train the model on the source domain and then directly apply it to the target domain. This exemplifies the benefits and necessity of performing transfer learning.
  • /SF versus ARMET. ARMET consistently outperforms its counterpart without the structural feature extractor. This underlines the importance of considering the neighbourhood information in a set of graphs for CD-GLAD.
  • /SC versus ARMET. It shows that explicitly incorporating the source label information via a source classifier is typically beneficial.
  • /DC versus ARMET. In most cases, explicitly aligning the embeddings of both domains via a domain classifier is favorable. In certain cases, the presence of a domain classifier may have a small adverse impact on performance. One possible reason is that the domain classifier overly distorts the embedding space by aligning the embedding space in a brute-force manner. Another possible reason is that the embeddings are also implicitly aligned via the parameters-shared feature extractor and the class aligner.
  • /CA versus ARMET. The removal of the class aligner always leads to a performance degradation. This corroborates the importance of considering labels (or pseudo-labels) in the target domain when performing CD-GLAD.

C. Sensitivity Analysis (RQ3)

We examine the effects of the following hyperparameters on the detection performance, and Fig. 2 depicts the selected results on four representative transfer tasks:

  • The number of embedding dimensions d in parameters-shared feature extractor: most transfer tasks can achieve the best performance with an embedding dimension of 64. Small fluctuations can be observed with varying number of dimensions. Moreover, an overly small value of d may lead to suboptimal performance, while a large value introduces a considerable computational burden. Graphic: Sensitivity analysis of hyperparameters on four representative transfer tasks (HD $\rightarrow$→ SP, TH $\rightarrow$→ BG, HD $\rightarrow$→ TH, LM $\rightarrow$→ LL). Average results over five runs.

    Fig. 2. Sensitivity analysis of hyperparameters on four representative transfer tasks (HD SP, TH BG, HD TH, LM LL). Average results over five runs.

  • The number of hidden layers L in the GIN model in the parameters-shared feature extractor: optimal performances are obtained when L=1 or L=2 on most transfer tasks. A further increase of its value usually results in performance degradation, which is widely known as over-smoothing [52].
  • The number of neighbours k in the KNN graph: most transfer tasks can obtain the best performance on a wide range of k's values (namely 1k10). However, further increasing the value of k may cause large performance fluctuations.

D. Visualisation With T-SNE

Fig. 3 visualizes the domain alignment and anomaly separation process on the transfer task LM LL . The target normal graphs (green dots) are progressively adapted to the source normal graphs (blue dots), while the target anomalous graphs (red crosses) become increasingly separable in the embedding space.

Graphic: T-SNE visualization of the graph representation space during domain adaptation on task LM$\rightarrow$→LL. From left to right: the number of epochs correspond to zero, two, five and one hundred, respectively.

Fig. 3. T-SNE visualization of the graph representation space during domain adaptation on task LM LL . From left to right: the number of epochs correspond to zero, two, five and one hundred, respectively.

IX.   Discussion on Transferability

As shown in Table VII, when comparing ARMET with its counterpart without explicit transfer learning (namely the column ‘SD’ that represents the scenarios where we train the model on the source domain and then directly apply it to the target domain), the performance gains from transfer learning are consistently non-negative. Moreover, when comparing ARMET with the best performing single-domain graph level anomaly detectors that are trained directly on the target domain, the performance gains are often positive (in 13 out 18 cases, see Table IV). This further demonstrates the benefits of transfer learning.

The underlying reason why transfer learning is beneficial in this context is that the extent of “relatedness” among System Logs datasets (i.e., BG, HD, SP and TH) is large enough, and so is the extent of “relatedness” among Letter Drawings datasets (i.e., LL, LM, and LH). In cross domain graph-level anomaly detection, it is crucial to ensure that the semantic meanings of normal patterns in the source and target domains are approximately the same. For instance, in System Logs data, these are normal patterns of system operations, while in Letter Drawings data, they represent the same set of letters.

As in other typical unsupervised domain adaptation settings, we assume that the source and target domains are different yet related. However, the extent of “relatedness” in this study is ensured based on our domain knowledge rather than a quantifiable transferability metric. To our knowledge, how to quantify the extent of this “relatedness” (namely transferability cross datasets) remains a long-standing problem [53]. Notably, when this “relatedness” is low, it may introduce negative transfer [54]. Moreover, it is critical to note that the transferability is usually asymmetrical , e.g., the performance gain for transfer task SP TH is 0.44 (namely 1.0 minus 0.56 in terms of ROC AUC), which is different from the performance gain for TH SP (namely 0.23 in terms of ROC AUC). Therefore, we should exercise caution (especially in safety-critical fields such as healthcare) when the transferability from the source domain to the target domain cannot be guaranteed, whether from a quantifiable metric or domain knowledge perspective.

X.   Conclusion

This paper studies the problem of cross-domain graph level anomaly detection, wherein a set of unlabelled graphs from the target domain and a set of normal graphs from a different but related domain are given. We propose ARMET, a theoretically motivated, novel method to solve this widely encountered but largely understudied problem. Specifically, ARMET consists of four components: a feature extractor, an adversarial domain classifier, a one-class classifier, and a class aligner. It is the first attempt to combine graph neural networks and adversarial domain adaptation techniques for performing graph level anomaly detection tasks. Extensive experiments demonstrate the efficacy of ARMET and its superiority to single-domain graph level anomaly detection methods and traditional unsupervised domain adaptation methods. Moreover, extensive ablation studies validate the benefits of incorporating each component into ARMET. Understanding and quantifying the transferability across different graph domains should be addressed in future research.

Footnotes

  • 7 With a few exceptions for GLAM, where the performance is slightly increased or stable.

References


  • [1]C. Qiu, M. Kloft, S. Mandt, and M. Rudolph, “Raising the bar in graph-level anomaly detection,” 2022, arXiv:2205.13845.
  • [2]Z. Li, J. Shi, and M. Van Leeuwen, “Graph neural networks based log anomaly detection and explanation,” in Proc. IEEE/ACM 46th Int. Conf. Softw. Eng.: Companion Proc., 2024, pp. 306–307.
  • [3]H. G. Vogel, W. H. Vogel, H. G. Vogel, G. Müller, J. Sandow, and B. A. Schölkens, Drug Discovery and Evaluation: Pharmacological Assays, vol. 2. Berlin, Germany: Springer, 1997.
  • [4]T. Lanciano, F. Bonchi, and A. Gionis, “Explainable classification of brain networks via contrast subgraphs,” in Proc. 26th ACM SIGKDD Int. Conf. Knowl. Discov. Data Mining, 2020, pp. 3308–3318.
  • [5]L. Zhao, S. Sawlani, A. Srinivasan, and L. Akoglu, “Graph anomaly detection with unsupervised GNNs,” 2022, arXiv:2210.09535.
  • [6]L. Zhao and L. Akoglu, “On using classification datasets to evaluate graph outlier detection: Peculiar observations and new insights,” Big Data, vol. 11, no. 3, pp. 151–180, 2021.
  • [7]R. Ma, G. Pang, L. Chen, and A. van denHengel, “Deep graph-level anomaly detection by glocal knowledge distillation,” in Proc. 15th ACM Int. Conf. Web Search Data Mining, 2022, pp. 704–714.
  • [8]G. Wilson and D. J. Cook, “A survey of unsupervised deep domain adaptation,” ACM Trans. Intell. Syst. Technol., vol. 11, no. 5, pp. 1–46, 2020.
  • [9]E. Nie, S. Liang, H. Schmid, and H. Schütze, “Cross-lingual retrieval augmented prompt for low-resource languages,” in Proc. Findings Assoc. Computat. Linguistics, 2023, pp. 8320–8340.
  • [10]S. Ben-David, J. Blitzer, K. Crammer, A. Kulesza, F. Pereira, and J. W. Vaughan, “A theory of learning from different domains,” Mach. Learn., vol. 79, pp. 151–175, 2010.
  • [11]S. J. Pan and Q. Yang, “A survey on transfer learning,” IEEE Trans. Knowl. Data Eng., vol. 22, no. 10, pp. 1345–1359, Oct.2010.
  • [12]C. Tan, F. Sun, T. Kong, W. Zhang, C. Yang, and C. Liu, “A survey on deep transfer learning,” in Proc. 27th Int. Conf. Artif. Neural Netw., Rhodes, Greece, Springer, 2018, pp. 270–279.
  • [13]F. Chen, Y.-C. Wang, B. Wang, and C.-C. J. Kuo, “Graph representation learning: A survey,” APSIPA Trans. Signal Inf. Process., vol. 9, 2020, Art. no. e15.
  • [14]L. Akoglu, H. Tong, and D. Koutra, “Graph based anomaly detection and description: A survey,” Data Mining Knowl. Discov., vol. 29, pp. 626–688, 2015.
  • [15]X. Ma , “A comprehensive survey on graph anomaly detection with deep learning,” IEEE Trans. Knowl. Data Eng., vol. 35, no. 12, pp. 12012–12038, Dec.2023.
  • [16]H. T. Nguyen, P. J. Liang, and L. Akoglu, “Anomaly detection in large labeled multi-graph databases,” 2020, arXiv: 2010.03600.
  • [17]B. Sun and K. Saenko, “Deep coral: Correlation alignment for deep domain adaptation,” in Proc. Eur. Conf. Comput. Vis., Amsterdam, The Netherlands, Springer, 2016, pp. 443–450.
  • [18]Y. Ganin , “Domain-adversarial training of neural networks,” J. Mach. Learn. Res., vol. 17, no. 1, pp. 2096–2030, 2016.
  • [19]E. Tzeng, J. Hoffman, K. Saenko, and T. Darrell, “Adversarial discriminative domain adaptation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 7167–7176.
  • [20]M. Long, Z. Cao, J. Wang, and M. I. Jordan, “Conditional adversarial domain adaptation,” in Proc. Adv. Neural Inf. Process. Syst., 2018, pp. 1647–1657.
  • [21]E. Vural, “Domain adaptation on graphs by learning graph topologies: Theoretical analysis and an algorithm,” Turkish J. Elect. Eng. Comput. Sci., vol. 27, no. 3, pp. 1619–1635, 2019.
  • [22]Y. Zhang, G. Song, L. Du, S. Yang, and Y. Jin, “DANE: Domain adaptive network embedding,” 2019, arXiv: 1906.00684.
  • [23]M. Wu, S. Pan, C. Zhou, X. Chang, and X. Zhu, “Unsupervised domain adaptive graph convolutional networks,” in Proc. Web Conf., 2020, pp. 1457–1467.
  • [24]M. Pilanci and E. Vural, “Domain adaptation on graphs by learning aligned graph bases,” IEEE Trans. Knowl. Data Eng., vol. 34, no. 2, pp. 587–600, Feb.2022.
  • [25]X. Shen, Q. Dai, F.-L. Chung, W. Lu, and K.-S. Choi, “Adversarial deep network embedding for cross-network node classification,” in Proc. AAAI Conf. Artif. Intell., 2020, pp. 2991–2999.
  • [26]X. Shen, Q. Dai, S. Mao, F.-L. Chung, and K.-S. Choi, “Network together: Node classification via cross-network deep network embedding,” IEEE Trans. Neural Netw. Learn. Syst., vol. 32, no. 5, pp. 1935–1948, May2021.
  • [27]G. Song, Y. Zhang, L. Xu, and H. Lu, “Domain adaptive network embedding,” IEEE Trans. Big Data, vol. 8, no. 5, pp. 1220–1232, Oct.2020.
  • [28]K. Ding, K. Shu, X. Shan, J. Li, and H. Liu, “Cross-domain graph anomaly detection,” IEEE Trans. Neural Netw. Learn. Syst., vol. 33, no. 6, pp. 2406–2415, Jun.2022.
  • [29]Q. Dai, X.-M. Wu, J. Xiao, X. Shen, and D. Wang, “Graph transfer learning via adversarial domain adaptation with graph convolution,” IEEE Trans. Knowl. Data Eng., vol. 35, no. 5, pp. 4908–4922, May2023.
  • [30]J. Li, W. Liu, Y. Zhou, J. Yu, D. Tao, and C. Xu, “Domain-invariant graph for adaptive semi-supervised domain adaptation,” ACM Trans. Multimedia Comput., Commun., Appl., vol. 18, no. 3, pp. 1–18, 2022.
  • [31]J. Xiao, Q. Dai, X. Xie, Q. Dou, K.-W. Kwok, and J. Lam, “Domain adaptive graph infomax via conditional adversarial networks,” IEEE Trans. Netw. Sci. Eng., vol. 10, no. 1, pp. 35–52, Jan./Feb.2022.
  • [32]Q. Wang, G. Pang, M. Salehi, W. Buntine, and C. Leckie, “Cross-domain graph anomaly detection via anomaly-aware contrastive alignment,” 2022, arXiv:2212.01096.
  • [33]R. Cai, F. Wu, Z. Li, P. Wei, L. Yi, and K. Zhang, “Graph domain adaptation: A generative view,” 2021, arXiv:2106.07482.
  • [34]M. Wu and M. Rostami, “Unsupervised domain adaptation for graph-structured data using class-conditional distribution alignment,” 2023, arXiv:2301.12361.
  • [35]Y. You, T. Chen, Z. Wang, and Y. Shen, “Graph domain adaptation via theory-grounded spectral regularization,” in Proc. 11th Int. Conf. Learn. Representations, 2023.
  • [36]K. Xu, W. Hu, J. Leskovec, and S. Jegelka, “How powerful are graph neural networks?,” 2018, arXiv: 1810.00826.
  • [37]F. Errica, M. Podda, D. Bacciu, and A. Micheli, “A fair comparison of graph neural networks for graph classification,” 2019, arXiv: 1912.09893.
  • [38]X. Ma, T. Zhang, and C. Xu, “GCAN: Graph convolutional adversarial network for unsupervised domain adaptation,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2019, pp. 8266–8276.
  • [39]L. Ruff , “Deep one-class classification,” in Proc. Int. Conf. Mach. Learn., PMLR, 2018, pp. 4393–4402.
  • [40]N. Xiao and L. Zhang, “Dynamic weighted learning for unsupervised domain adaptation,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2021, pp. 15242–15251.
  • [41]K. Saito, D. Kim, P. Teterwak, S. Sclaroff, T. Darrell, and K. Saenko, “Tune it the right way: Unsupervised validation of domain adaptation via soft neighborhood density,” in Proc. IEEE/CVF Int. Conf. Comput. Vis., 2021, pp. 9184–9193.
  • [42]L. Van der Maaten and G. Hinton, “Visualizing data using t-SNE,” J. Mach. Learn. Res., vol. 9, no. 11, pp. 2579–2605, 2008.
  • [43]Y. Xie, H. Zhang, and M. A. Babar, “LogGD: Detecting anomalies from system logs by graph neural networks,” 2022, arXiv:2209.07869.
  • [44]W. Xu, L. Huang, A. Fox, D. Patterson, and M. I. Jordan, “Detecting large-scale system problems by mining console logs,” in Proc. ACM SIGOPS 22nd Symp. Operating Syst. Princ., 2009, pp. 117–132.
  • [45]A. Oliner and J. Stearley, “What supercomputers say: A study of five system logs,” in Proc. 37th Annu. IEEE/IFIP Int. Conf. Dependable Syst. Netw., 2007, pp. 575–584.
  • [46]K. Riesen , “IAM graph database repository for graph based pattern recognition and machine learning,” in Proc. Int. Workshop Structural, Syntactic, Statist. Pattern Recognit., 2008, pp. 287–297.
  • [47]G. O. Campos , “On the evaluation of unsupervised outlier detection: Measures, datasets, and an empirical study,” Data Mining Knowl. Discov., vol. 30, pp. 891–927, 2016.
  • [48]G. Zhang , “Dual-discriminative graph neural network for imbalanced graph-level anomaly detection,” in Proc. Adv. Neural Inf. Process. Syst., 2022, pp. 24144–24157.
  • [49]A. Kendall, Y. Gal, and R. Cipolla, “Multi-task learning using uncertainty to weigh losses for scene geometry and semantics,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 7482–7491.
  • [50]A. Paszke , “PyTorch: An imperative style, high-performance deep learning library,” in Proc. Adv. Neural Inf. Process. Syst., 2019, pp. 8026–8037.
  • [51]M. Fey and J. E. Lenssen, “Fast graph representation learning with pytorch geometric,” 2019, arXiv: 1903.02428.
  • [52]D. Chen, Y. Lin, W. Li, P. Li, J. Zhou, and X. Sun, “Measuring and relieving the over-smoothing problem for graph neural networks from the topological view,” in Proc. AAAI Conf. Artif. Intell., vol. 2020, pp. 3438–3445.
  • [53]S. Ibrahim, N. Ponomareva, and R. Mazumder, “Newer is not always better: Rethinking transferability metrics, their peculiarities, stability and performance,” in Proc. Joint Eur. Conf. Mach. Learn. Knowl. Discov. Databases, Springer, 2022, pp. 693–709.
  • [54]W. Zhang, L. Deng, L. Zhang, and D. Wu, “A survey on negative transfer,” IEEE/CAA J. Automatica Sinica, vol. 10, no. 2, pp. 305–329, Feb.2023.

Zhong Li received the MSc degree in mathematics from Tongji University, China, and the Diplôme d’Ingénieur degree in data science from ENSAI, France. He is currently working toward the PhD degree in computer science with Leiden University, The Netherlands. His research focuses on trustworthy anomaly detection, particularly in complex data such as event sequences and graph-structured data. He has published papers in leading data mining journals such as Data Mining and Knowledge Discovery , ACM Transactions on Knowledge Discovery from Data , and SIGKDD Explorations . He also serves as reviewer for leading conferences/journals such as KDD, Data Mining and Knowledge Discovery , and IEEE Transactions on Knowledge and Data Engineering .
Sheng Liang received the MSc degree in computer science from the University of Sheffield in U.K. He is currently working toward the PhD degree in computer science with the Ludwig Maximilian University of Munich, Germany. His current research focuses on multilingual modelling and retrieval augmented language model.
Jiayang Shi received the MSc degree in electrical and computer engineering from the Karlsruhe Institute of Technology, Germany. He is currently working toward the PhD degree in computer science with Leiden University, The Netherlands. He focuses on integrating machine learning with denoising and artifact reduction techniques for computed tomography (CT) imaging data. This encompasses tomographic reconstruction and addressing inverse problems in general.
Matthijs van Leeuwen is associate professor and group leader with Explanatory Data Analysis Group, LIACS, Leiden University, The Netherlands. His primary research interest is exploratory data mining: how can we enable domain experts to explore and analyse their data, to discover structure and ultimately novel knowledge? He was awarded several grants and best paper awards, co-organised international conferences and workshops, is action editor for Data Mining and Knowledge Discovery and on the guest editorial board of the ECML PKDD Journal Track. He was guest editor of a ACM Transactions on Knowledge Discovery from Data special issue on Interactive Data Exploration and Analytics.

Related Articles