Abstract
I. Introduction
Graph-structured data is ubiquitous, as it can represent relations between objects using edges and semantic characteristics of objects using node attributes. As a result, graph level anomaly detection has a wide range of potential applications, such as criminal detection in financial network [1], error detection in system logs [2], identifying specific molecules in drug discovery [3], and detecting unhealthy brain structures [4].
Graph neural networks (GNNs) are capable of learning discriminative feature representations for graphs and have significantly advanced benchmark results for graph level anomaly detection [5]. Similar to other types of neural networks, the impressive performance of graph neural networks (GNNs) is often attained by using a substantial amount of labeled data. However, the process of manually annotating graph data is laborious and thus often impractical. To circumvent this challenge, recent studies have turned to unsupervised (or semi-supervised) learning instead. These methods, however, strongly rely on the assumption that the training data exclusively consists of normal graphs. Our experiments demonstrate that even a minor presence of anomalous graphs in the training data can lead to substantial performance degradation for these methods (c.f. Table III).



In practice, labeled data may be accessible or relatively cheap to obtain in some domains. Hence, in situations where a certain ‘target’ domain of interest suffers from a dearth of labeled data (or its purity cannot be guaranteed), there is a strong motivation to construct learners that can exploit abundant labeled data from a different but related domain.
Recent work on graph level anomaly detection [1],[5],[6],[7] is mostly unsupervised (or semi-supervised), and has been limited to detecting anomalies within a single domain. That is, the potential benefits of incorporating labeled information from a related domain has not yet been researched. In this paper we investigate how to transfer ‘anomaly knowledge’ from a source to a target graph database.
Unsupervised domain adaptation (UDA) is an attractive approach to achieve this: it adapts models learned from a source domain with plenty labeled data to a target domain without labels, and has demonstrated remarkable performance in computer vision and natural language processing [8],[9]. Although a few studies have explored UDA for cross-domain node classification, there has been no prior research on cross-domain graph level anomaly detection. Two challenges need to be overcome. First, most existing UDA methods are developed for vector-based data, such as image and text data, for which a distance in a euclidean space can be defined, while for graph-structured data distance is typically defined in a non-euclidean space due to graph isomorphism. This makes directly applying off-the-shelf UDA methods to graphs impractical. Second, graph level anomaly detection is inherently more challenging than node level anomaly detection, as anomalies at the graph level may involve global patterns and interactions that cannot be easily discerned by examining individual nodes.
To fill this gap, being motivated and supported by domain adaptation theory [10], we propose an unsupervised domain adaptation based graph level anomaly detection method called ARMET. It addresses the following cross-domain graph level anomaly detection problem: given a target graph database with fully unlabeled graphs and a different but related source graph database that contains only normal graphs, learn a one-class classifier that identifies anomalous graphs from the target graph database.
To achieve this, ARMET leverages an adversarial learning approach consisting of four main components. First, to learn graph level representations, it utilizes a two-part feature extractor: a semantic feature extractor to jointly preserve the semantic and topological information of each graph, and a structure feature extractor to extract the structure of each graph domain. Second, a domain classifier is learned to make graph level representations domain-invariant, thereby reducing the domain discrepancy. Third, a one-class classifier is trained using normal source graphs, aiming to make the learned graph level representations label-discriminative. Finally, a class aligner is trained to align normal graphs in both domains while separating anomalous graphs and normal graphs in the target domain. As a result, in an end-to-end manner, ARMET can learn both domain-invariant and label-discriminative graph level representations, and thus effectively identify anomalous graphs from the target domain.
Our contributions can be summarized as follows:
- We introduce the cross-domain graph level anomaly detection problem, and develop ARMET, an effective approach to address this problem;
- ARMET is the first attempt to combine graph neural networks and adversarial domain adaptation techniques for performing graph level anomaly detection tasks;
- Experiments demonstrate its improved performance for graph level anomaly detection when compared to state-of-the-art unsupervised graph level anomaly detectors.
The rest of this work is organized as follows. Section II reviews related literature. Section III introduces relevant notations and formulates the research problem. Section IV presents the framework of our method ARMET. Sections V and VI describe the design details of ARMET. Section VII provides experimental setup, and Section VIII presents results and corresponding analysis. Section X concludes this work.
II. Related Work
We review work that is most closely related; readers are referred to the following surveys for more on transfer learning [11],[12], graph representation learning [13], and graph anomaly detection [14],[15].
Graph Level Anomaly Detection Unsupervised/semi-supervised methods include OCGIN [6], GLAM [5], GLocalKD [7], OCGTL [1] and CODEtect [16]. Unlike ARMET, CODEtect can only handle unattributed graphs. Meanwhile, OCGIN, GLAM, GLocalKD, and OCGTL can handle attributed graphs, but they heavily depend on the assumption that the training data exclusively contains normal graphs. This assumption is impractical or expensive in real-world applications. As we will show later, the presence of anomalies in the training data may largely decrease the performance of these methods (c.f. Table III). Moreover, the supervised method iGAD requires fully labelled training data, which is expensive or sometimes impossible to obtain. In contrast, ARMET only requires that the source domain data exclusively contains normal graphs, while imposing no additional assumptions on the target domain data. Further, it is the first graph level anomaly detection method that leverages labeled data from a different but related domain.
Traditional Unsupervised Domain Adaptation Traditional UDA techniques, such as DeepCoral [17], DANN [18], ADDA [19], and CDAN [20], are primarily designed for addressing multi-class classification problems in computer vision and natural language processing. Consequently, these methods fail to account for the unique characteristics of graph-structured data, as well as the highly imbalanced nature of anomaly detection problems. Moreover, these methods assume that the source domain contains all classes and is fully labeled. As a result, their effectiveness diminishes when the source domain contains only partial classes (e.g., only the normal class in anomaly detection), and this will be shown later in Table VI.



Domain Adaptation on Graphs Most existing domain adaptation methods on graphs, such as SDA-DAGL [21], DANE [22], UDA-GCN [23], DASGA [24], ACDNE [25], CDNE [26], DANE [27], COMMANDER [28], AdaGCN [29], DGL [30], AdaGIn [31] and CD-GAD [32], only consider domain adaptation from an individual graph to another graph.
Only a few works, including DGDA [33], CDA [34], and [35], consider domain adaptation from a set of graphs to another set of graphs. However, they focus on the multi-class graph classification problem, while ARMET is the first to consider cross-domain graph level anomaly detection, which is more challenging due to the extreme class imbalance. Moreover, unlike ARMET, these methods require the source domain data to contain all classes and be fully labeled.
III. Problem Statement
Following the notation commonly used in transfer learning [11], a domain consists of two components, namely a feature space and its marginal probability distribution . Meanwhile, a task contains two components, that is, a label space and a predictive function that can be expressed as . Moreover, we consider an attributed and undirected graph , where the node set is associated with a node feature matrix and the the edge set is associated with the adjacency matrix . In the following sections, we use the superscripts and to denote concepts in the source and target domains (also tasks), respectively.
Traditional Unsupervised Domain Adaptation on Graphs Traditional unsupervised domain adaptation (UDA) assumes that there is a set of labeled source graphs that are i.i.d drawn from , and another set of unlabeled target graphs that are i.i.d drawn from . Moreover, UDA usually imposes the covariate shift assumption , i.e., but . In other words, it assumes that the source and the target are different (in the sense that ) but related (in the sense that and ). On this basis, UDA aims to learn a classifier by using fully labelled source graphs and unlabelled target graphs.
For simplicity, we assume that the node feature matrices of the source (i.e., ) and target graphs (i.e., ) have the same dimensionality and their columns share the same semantic meanings. Otherwise, we can construct a unified node feature set by following the practice in [29],[31]. This assumption is important for utilising parameters-shared graph embedding models and fulfilling the covariate shift assumption in UDA.
Cross-Domain Graph Level Anomaly Detection The anomaly detection problem can be considered as a binary classification problem. Hence, we have , where 0 and 1 represent normal and abnormal graphs, respectively. To further relax the dependency on fully labeled source data, we assume that only contains normal graphs. Specifically, given with for , and , Cross-Domain Graph Level Anomaly Detection (CD-GLAD) aims to learn a binary classifier that accurately predicts anomaly labels for graphs in the target domain, with the assistance of both normal graphs from the source domain and unlabeled graphs from the target domain. Hence, the CD-GLAD problem is more practical but entails greater challenges than traditional UDA.
IV. Proposed Method: ARMET
Motivation According to domain adaptation theory [10], given the source domain and the target domain , given any binary classifier drawn from a hypothesis class , for any , with the probability at least of , we have where and indicate the expected error of classifier on the target domain and source domain, respectively. Moreover, represents the domain discrepancy. Importantly, means the combined error of the ideal joint classifier on both domains, where . Besides, is the item associated with the model complexity and the sample sizes.
In this work, we aim to learn a classifier that has the minimal expected error on the target domain. Therefore, the sum of terms on the right side of (1) should be minimized. This sheds key insights into the design of our algorithm, which considers the following overall objective: where corresponds to the upper bound of , is the source classifier loss corresponding to , is the domain classifier loss that approximates (larger loss indicates smaller discrepancy), and is the class alignment loss that corresponds to . Further, the balance parameters , and .
Approach Fig. 1 depicts the architecture of our proposed method, dubbed ARMET (adversarial cross domain graph level anomaly detection). Specifically, ARMET adapts an adversarial learning framework to perform cross-domain graph level anomaly detection. It involves the four following main components:
- Parameters-Shared Feature Extractor : takes a source graph database consisting of exclusively normal graphs and an unlabeled target graph database as inputs, and aims to learn a representation vector for each graph such that similar graphs (in terms of semantic and structure properties) from both
domains have similar embeddings;
Fig. 1. An overview of ARMET. Graphs and their learned representations are framed in oval rectangles, where the target domain is highlighted in green and the source domain in orange. Meanwhile, the semantic feature extractor, structure feature extractor, and other learners including one-class classifier, domain classifier, and class aligner are depicted in blue rectangles.
- One-Class Classifier : takes the representation of each graph from the source domain as input, and learns a hypersphere to include embeddings of normal graphs while excluding those of anomalous graphs. This classifier can be directly used to predict labels for graphs in the target domain and corresponds to ;
- Domain Classifier : takes the representation of each graph as input, and attempts to discriminate whether it is drawn from the source domain or the target domain such that the distributions of embeddings of both domains are aligned. And it corresponds to ;
- Class Aligner: takes and as inputs, further making normal graphs from both domains have close embeddings, while normal graphs and anomalous graphs in the target domain have distant embeddings. Particularly, we apply the One-Class Classifier to obtain pseudo-labels for graphs in the target domain while using the true labels for graphs in the source domain; This component corresponds to .
For better organization, we divide these components into two modules, as shown in Fig. 1: the graph feature extraction module that contains the feature extractor (steps ①,②, ③), and the cross-domain anomaly detection module that includes the domain classifier (step ④), the one-class classifier (step ⑤), and the class aligner (step ⑥). Importantly, the two modules are trained jointly in an end-to-end manner, although they are introduced separately in the following.
V. Module 1: Graph Feature Extraction
The graph feature extraction module consists of a feature extractor dubbed with trainable parameters , aiming to extract graph level representations for graphs from source domain and target domain. This component is further decomposed to three sub-components: a semantic feature extractor (step ①) and a structure feature extractor (step ②), followed by a feature concatenation operator (step ③).
Semantic Feature Extractor We denote the semantic feature extractor as with trainable parameters , which learns a graph level representation for each input graph. Specifically, we adapt a -layer GIN model [36] followed by a READOUT function to obtain graph level representations due to the superior performance of GIN compared to other competing methods [37]. Concretely, GIN updates node representations as where denotes the representation of node learned at the -th layer, indicates the neighbour set for node , is a trainable parameter while MLP represents a multilayer perceptron. Next, we obtain the graph level representation by leveraging , which can be a simple permutation-invariant function such as the maximum, sum or mean. Using this, we can preserve the semantic and topological information of each graph.
Structure Feature Extractor We denote the structure extractor as with parameters . We assume that and exhibit some inherent data structures such as clusters in and , respectively. Inspired by [38], for each graph database, we construct a k-nearest neighbours graph (KNN graph ) in the latent space to model its data structure, aiming to capture the neighbourhood information between different graphs. Without loss of generality, we use to demonstrate the construction of a KNN graph .
The KNN graph of contains nodes, with each node representing a source graph and its node attribute indicating the corresponding graph level representation vector learned by . Next, to generate edges, we can construct the adjacency matrix as where represents the -nearest neighbours set of graph based on the euclidean distance between graph level embeddings. After constructing the KNN graph , we can leverage another GIN model (without READOUT function) to learn the node representations, wherein a node represents a source graph instance.
Feature Concatenation Operator We utilize a concatenation operator that directly appends the structure feature for each graph to its semantic feature. Hence, this operator has no parameters. Overall, given a graph , its final extracted feature can be expressed as . As a result, the corresponding trainable parameter . Importantly, these parameters are shared when learning representations for graphs from the source domain and the target domain, respectively.
VI. Module 2: Cross-Domain Graph Level Anomaly Detection
This module contains three components: the one-class classifier, the domain classifier, and the class aligner. We first elucidate the rationale and design of each component, followed by introducing the theory of adversarial training.
One-Class Classifier: We use to represent the source classifier, which has no additional parameters beyond the feature extraction parameters . We train a one-class classifier on the source domain by minimizing the following One-Class Deep SVDD objective [39]: where is the final extracted feature of graph (from the source domain), and is the learned center that represents the normality defined in the source domain. Minimizing this SVDD loss ensures the model learns the label information from the source domain.
Domain Classifier: We denote the domain classifier as , with trainable parameters in addition to parameters . We maximize the binary cross-entropy loss to learn domain-invariant features: where denotes the binary ground-truth domain label for graph . Specifically, is 0 for graphs from the source domain and 1 for graphs from the target domain. Additionally, represents the predicted probability that belongs to the target domain, as determined by . More precisely, the prediction is given by , where is the final feature extractor and is the domain classifier. In loss term (6), we consider the negative log-probability, expressed as . This can be rewritten as . The goal of this loss function is to maximize it, which encourages the feature extractor to create similar representations for graphs from both the source and target domains. This helps align the domains, making the feature distributions of the source and target domains indistinguishable and reducing discrepancies between them.
Class Aligner: Structure consistency between domains (via parameters-shared feature extractor), discriminability in source domain (via source classifier), and domain-invariant features (via domain classifier) do not necessarily lead to discriminability in the target domain. That is, although we manage to align features cross domains, the learned features may be distorted in the sense that they are not representative of the underlying patterns in the target domain. As a result, features of the normal and abnormal classes in the target domain may exhibit close proximity or even overlap, leading to a high value of in the target domain. This is known as excessive alignment in [40] and collapse of target neighborhood structure in [41].
To alleviate this problem, we should consider the third term in (1), namely , that considers labels from both domains. Although the true label information in the target domain cannot be obtained, we can generate pseudolabels and minimize the class centroid alignment loss: where are the pseudolabels obtained by directly applying to the target graphs, and is obtained by optimizing loss term (5) in previous iteration. Moreover, , , and represent the centroid of normal source class, normal target class, and anomalous target class, respectively. The function is a distance metric such as the euclidean distance. Minimizing this loss can reduce the inter-domain distance between centroids of normal classes, while simultaneously maximizing the intra-domain distances between centroids of normal and abnormal classes within the target domain. Particularly, with an increase of training epochs, we expect that the discrepancy between source domain and target domain is reduced, the data structures are better aligned, the is further improved, and thus the pseudolabels are gradually updated to approach the ground-truth labels, progressively improving the class centroid alignment.
Adversarial Training: It can be seen that the optimization of (5) and (7) involves a minimization w.r.t. parameters , while the optimization of (6) concerns a maximization w.r.t. parameters and . In other words, objective (6) competes against objectives (5) and (7) during training over the overall objective (2). To obtain a good trade-off, adversarial training techniques have been explored, with impressive results [18].
For simplicity, we rewrite as , which contains two sets of parameters. Adversarial training attempts to find a saddle point such that and In other words, we perform a minmax optimization over (2): Hence, the trainable parameters can be optimized alternatively as follows: where is the learning rate. We perform mini-batch training with gradient descent and the corresponding pseudo-code is provided in Algorithm 1.
Algorithm 1: Mini-Batch Algorithm of ARMET.
Output: Predicted labels of target graphs
Initialise parameters
for epoch and not converge do
for iterationdo
Sample a batch and
Learn representations using for and
Compute source classifier loss using (5)
Compute domain classifier loss using (6)
Backpropagate and update using (9)
Compute class aligner loss using (7)
Compute total loss using (2)
Backpropagate and update using (9)
end for
end for
Use feature extractor with optimised parameters to extract features for
Use optimised source classifier to predict labels for
VII. Experiment Setup
We aim to answer the following research questions (RQ) via experiments:
- How does ARMET perform when compared to state-of-the-art graph level anomaly detection methods?
- How does each component of ARMET affect the performance? (Ablation study)
- How does the performance of ARMET change with different hyperparameter values? (Sensitivity analysis)
In addition, to get a better understanding of ARMET, we utilize t-SNE [42] to visualize the learned graph representations from both domains.
A. Benchmark Datasets
As summarized in Table I, we study the following datasets collected from various application fields of graph mining:
System Logs We use four benchmark datasets for log anomaly detection, as converting logs into graphs and then leveraging GNNs to detect anomalies can achieve superior performance [2],[43]. Hence, following [2], we construct four graph datasets: HDFS (HD) [44], BGL (BG), SPIRIT (SP), and THUNDERBIRD (TH) [45], where each dataset contains 5000 graphs with 5% anomalous graphs. Particularly, HD consists of Hadoop Distributed File System logs while BG, SP, and TH containing system logs collected from three different supercomputing systems. For this group of datasets, we create 12 transfer tasks.
Letter Drawings We use three benchmark graph datasets with letter drawings of varying levels of distortion: low (LL), medium (LM), and high (LH) [46]. Following the practice of downsampling classification datasets for anomaly detection [47], for each dataset the letters N, M, and W are selected as the normal class, comprising a total of 450 instances (150 instances per letter), while the letter F is chosen as the anomalous class (downsampled to 50 instances). For each graph, a node denotes the end point of a line, an edge represents a line, and a node attribute represents its two-dimensional coordinate. For these datasets, we create 6 transfer tasks.
Discussion on Dataset Selection: Some commonly used benchmark graph datasets, including BZR, DHFR and COX2 (small molecules), as well as IMDB-Binary and REDDIT (social networks), have been excluded from our analysis for the following reasons: 1) the UDA assumptions do not hold for them, as these datasets have different definitions of classification problem, namely they have different label semantics; 2) transferability cannot be guaranteed due to huge domain discrepancy between datasets. For instance, node attributes of graphs in these datasets have different dimensionalities and/or meanings, and these datasets have very different graph statistics; and 3) even supervised classification methods can only achieve very limited in-domain classification accuracy (i.e., with an accuracy lower than 0.7) on these datasets.
B. Baselines
We compare ARMET to the following baselines:
- Unsupervised graph level anomaly detection methods: We directly apply OCGIN [6], GLAM [5], and GLocalKD [7] on unlabeled target graphs.
- Traditional Domain Adaptation Methods: ADDA [19], CDAN [20], and DeepCoral [17] are state-of-the-art UDA methods for image classification. As they are not designed for graph-structured data, we modify these models for cross-domain graph level anomaly detection by following [32].
To explore the ceiling performance of GLAD methods, we additionally present the outcomes of iGAD [48], a supervised method that is not considered a direct competitor due to its reliance on labeled target data.
C. Evaluation
We employ the widely used Area Under the Curve of the Receiver Operating Characteristics curve (AUC ROC), Area Under the Curve of Precision Recall (AUC PR), and F1-Score to evaluate and compare the different methods, where a higher value (closer to 1) represents better anomaly detection accuracy. Particularly, we report the average values of AUC ROC, AUC PR, F1-Score and the corresponding standard deviations across 10 independent runs.
D. Implementation and Model Configuration
We use the publicly available implementations of OCGIN1, GLAM,1 GLocalKD2 and iGAD3 with their recommended configurations. Besides, ADDA,4 CDAN5 and DeepCoral5 are adapted from their publicly available implementations. As they are not designed for graph-structured data, we modify these models for cross-domain graph level anomaly detection by following the practice in [32]:
- CDAN: We replace the AlexNet encoder with a GIN plus a mean readout function;
- ADDA: Similarly, we replace the CNN encoder with a GIN plus a mean readout function;
- DeepCoral: We replace the CNN encoder (CaffeNet) with a GIN plus a mean readout function and apply CORAL loss to the last classification layer.
For ARMET, by following the practice in [17], the values of hyperparameters , and can be set such that, after the training process (e.g., 100 epochs), the losses associated with one-class classification (), domain discrimination () and class-alignment () are approximately at the same magnitude (after being multiplied by their corresponding weights). The rational behind it is that we aim to learn feature representations that are both label-discriminative and domain-invariant [17]. Particularly, we found that there is usually not a single set of optimal weights for each transfer task, as pointed out in multi-task learning by [49]. Although this hyperparameter tuning method works well, it is time-consuming and labor-intensive. For simplicity, we can adapt a de-facto strategy for hyperparameter tuning in UDA, namely splitting the target dataset according to a ratio of , where the 20% data with labels is used as the validation dataset to select these three hyperparameters and the remaining 80% data without labels is used as the test data.
For fair comparisons, all GIN models used in different domain adaptation methods are configured with the same backbone architecture, namely a backbone of two layers (64 hidden units) with each layer followed by a ReLU activation. Besides, all neural network based methods are trained using mini-batch gradient descent, with a batchsize of 512 on log anomaly detection datasets and a batchsize of 128 on other datasets respectively, initial learning rate of 0.01, weight decay rate of 0.0001, and a maximum of 200 training epochs. The settings for these hyperparameters for ARMET are summarised in Table II. Other algorithm-specific hyperparamters are set in accordance with their respective references given by the original authors.
E. Training Hardware and Reproducibility
We implemented and ran all algorithms in Python 3.8, using PyTorch [50] and PyTorch Geometric [51] when applicable, on a workstation equipped with an Intel i7-11700KF CPU and Nvidia RTX3070 GPU. For reproducibility, all code and datasets are made available on GitHub.6
VIII. Experiment Results and Analysis
We answer the three research questions as follows.
A. Detection Accuracy (RQ1)
We perform the following analysis based on the experiment results in Tables III,IV and VI. Particularly, the results in terms of AUC PR and F1-Score are consistent with those in terms of AUR ROC. Therefore, these results are either deferred to Table VIII or omitted.


1) Accuracy of Single-Domain GLAD
Table III demonstrates that unsupervised/semi-supervised methods OCGIN, GLAM,7 and GLocalKD generally suffer from large performance degradation when the training data is contaminated with anomalies. Moreover, these unsupervised/semi-supervised methods deliver very unstable results across different datasets even when the training data is clean. For example, GLocalKD yields high performance on BGL (ROC ), but very poor performance on HDFS (ROC ). In contrast, Table IV shows that supervised method iGAD achieves perfect results, with an ROC of 1.0, on all cases. This indicates that these anomaly detection problems are solvable when sufficient labeled data is available in the target domain. However, this is unrealistic in many real-world scenarios as fully labeled data is usually expensive and often even impossible to obtain in practice.
2) Accuracy of Traditional UDA Methods
Table IV shows that when there is only one class in the source domain, ADDA, CDAN, and DeepCoral behave like random guessing (e.g., with ROC ), performing much worse than ARMET for most transfer tasks. The potential factors contributing to their subpar performance are as follows. First, CDAN considers the conditional distribution that captures the cross-variance between feature representations and classifier predictions, but the conditional distribution degenerates to the marginal distribution when there is only one class in the source domain. Second, DeepCoral aligns the second-order statistics of layer activations in the source encoder and the target encoder, leading to random-guessing results since the source training data contains only normal graphs while the target training data includes both normal and abnormal graphs (and thus the statistics of layer activations should be different by nature). Third, ADDA performs poorly as it utilises different encoders for the source and target domains, implying the importance of parameters-shared models when encoding graphs from two domains. The efficacy of the source encoder degrades to random-guessing when the source domain contains only a single class, as the source classifier fails to acquire any informative knowledge. Furthermore, these traditional UDA methods are primarily designed for balanced multiple classification problems, making them ill-suited for anomaly detection, which involves an extremely imbalanced binary classification task. As a reference, we report the performance of ADDA, CDAN, and DeepCoral when the source domain contains both labeled anomalous and normal instances in Table VI. One can see that the performance of ADDA, CDAN, and DeepCoral is largely boosted with auxiliary label information in most transfer tasks. However, even with auxiliary labels, they are still outperformed by ARMET in most cases.
3) Accuracy of ARMET
Table IV shows that ARMET achieves the best performance on 13 out of 18 transfer tasks, and the second-best performance on the remaining tasks. Further, it largely outperforms unsupervised graph level anomaly detection methods on certain target datasets, demonstrating the benefit of leveraging label information from a different but related domain. For example, transferring knowledge from HD to SP leads to 54%, 28%, and 38% performance gains compared to OCGIN, GLAM, and GLocalKD, respectively. Finally, ARMET achieves impressive results under the setting that the source domain contains only normal graphs, while other traditional unsupervised domain adaptation methods provide “random-guessing” results, making ARMET more applicable to real-world scenarios.
B. Ablation Study (RQ2)
As summarised in Table V, we here compare ARMET to five of its stripped-down variations: SD is trained on the source domain and then directly applied on the target domain; /SF is ARMET trained without the structure feature extractor; /SC is ARMET trained without the source one-class classifier; /DC is ARMET trained without the domain classifier; and /CA is ARMET trained without the class aligner. We have the following important observations from Table VII:
- SD versus ARMET. ARMET is always superior to the case where we train the model on the source domain and then directly apply it to the target domain. This exemplifies the benefits and necessity of performing transfer learning.
- /SF versus ARMET. ARMET consistently outperforms its counterpart without the structural feature extractor. This underlines the importance of considering the neighbourhood information in a set of graphs for CD-GLAD.
- /SC versus ARMET. It shows that explicitly incorporating the source label information via a source classifier is typically beneficial.
- /DC versus ARMET. In most cases, explicitly aligning the embeddings of both domains via a domain classifier is favorable. In certain cases, the presence of a domain classifier may have a small adverse impact on performance. One possible reason is that the domain classifier overly distorts the embedding space by aligning the embedding space in a brute-force manner. Another possible reason is that the embeddings are also implicitly aligned via the parameters-shared feature extractor and the class aligner.
- /CA versus ARMET. The removal of the class aligner always leads to a performance degradation. This corroborates the importance of considering labels (or pseudo-labels) in the target domain when performing CD-GLAD.
C. Sensitivity Analysis (RQ3)
We examine the effects of the following hyperparameters on the detection performance, and Fig. 2 depicts the selected results on four representative transfer tasks:
- The number of embedding dimensions in parameters-shared feature extractor: most transfer tasks can achieve the best
performance with an embedding dimension of 64. Small fluctuations can be observed
with varying number of dimensions. Moreover, an overly small value of may lead to suboptimal performance, while a large value introduces a considerable
computational burden.
Fig. 2. Sensitivity analysis of hyperparameters on four representative transfer tasks (HD SP, TH BG, HD TH, LM LL). Average results over five runs.
- The number of hidden layers in the GIN model in the parameters-shared feature extractor: optimal performances are obtained when or on most transfer tasks. A further increase of its value usually results in performance degradation, which is widely known as over-smoothing [52].
- The number of neighbours in the KNN graph: most transfer tasks can obtain the best performance on a wide range of 's values (namely ). However, further increasing the value of may cause large performance fluctuations.
D. Visualisation With T-SNE
Fig. 3 visualizes the domain alignment and anomaly separation process on the transfer task LM LL . The target normal graphs (green dots) are progressively adapted to the source normal graphs (blue dots), while the target anomalous graphs (red crosses) become increasingly separable in the embedding space.

Fig. 3. T-SNE visualization of the graph representation space during domain adaptation on task LM LL . From left to right: the number of epochs correspond to zero, two, five and one hundred, respectively.
IX. Discussion on Transferability
As shown in Table VII, when comparing ARMET with its counterpart without explicit transfer learning (namely the column ‘SD’ that represents the scenarios where we train the model on the source domain and then directly apply it to the target domain), the performance gains from transfer learning are consistently non-negative. Moreover, when comparing ARMET with the best performing single-domain graph level anomaly detectors that are trained directly on the target domain, the performance gains are often positive (in 13 out 18 cases, see Table IV). This further demonstrates the benefits of transfer learning.
The underlying reason why transfer learning is beneficial in this context is that the extent of “relatedness” among System Logs datasets (i.e., BG, HD, SP and TH) is large enough, and so is the extent of “relatedness” among Letter Drawings datasets (i.e., LL, LM, and LH). In cross domain graph-level anomaly detection, it is crucial to ensure that the semantic meanings of normal patterns in the source and target domains are approximately the same. For instance, in System Logs data, these are normal patterns of system operations, while in Letter Drawings data, they represent the same set of letters.
As in other typical unsupervised domain adaptation settings, we assume that the source and target domains are different yet related. However, the extent of “relatedness” in this study is ensured based on our domain knowledge rather than a quantifiable transferability metric. To our knowledge, how to quantify the extent of this “relatedness” (namely transferability cross datasets) remains a long-standing problem [53]. Notably, when this “relatedness” is low, it may introduce negative transfer [54]. Moreover, it is critical to note that the transferability is usually asymmetrical , e.g., the performance gain for transfer task SP TH is 0.44 (namely 1.0 minus 0.56 in terms of ROC AUC), which is different from the performance gain for TH SP (namely 0.23 in terms of ROC AUC). Therefore, we should exercise caution (especially in safety-critical fields such as healthcare) when the transferability from the source domain to the target domain cannot be guaranteed, whether from a quantifiable metric or domain knowledge perspective.
X. Conclusion
This paper studies the problem of cross-domain graph level anomaly detection, wherein a set of unlabelled graphs from the target domain and a set of normal graphs from a different but related domain are given. We propose ARMET, a theoretically motivated, novel method to solve this widely encountered but largely understudied problem. Specifically, ARMET consists of four components: a feature extractor, an adversarial domain classifier, a one-class classifier, and a class aligner. It is the first attempt to combine graph neural networks and adversarial domain adaptation techniques for performing graph level anomaly detection tasks. Extensive experiments demonstrate the efficacy of ARMET and its superiority to single-domain graph level anomaly detection methods and traditional unsupervised domain adaptation methods. Moreover, extensive ablation studies validate the benefits of incorporating each component into ARMET. Understanding and quantifying the transferability across different graph domains should be addressed in future research.
Footnotes
- 1 [Online]. Available: https://github.com/lingxiaoshawn/glam
- 2 [Online]. Available: https://github.com/RongrongMa/GLocalKD
- 3 [Online]. Available: https://github.com/graph-level-anomalies/iGAD
- 4 [Online]. Available: https://github.com/yuhui-zh15/pytorch-adda
- 5 [Online]. Available: https://github.com/agrija9/deep-unsupervised-domain-adaptation
- 6 [Online]. Available: https://github.com/ZhongLIFR/ARMET/
References
- [1]C. Qiu, M. Kloft, S. Mandt, and M. Rudolph, “Raising the bar in graph-level anomaly detection,” 2022, arXiv:2205.13845.
- [2]Z. Li, J. Shi, and M. Van Leeuwen, “Graph neural networks based log anomaly detection and explanation,” in Proc. IEEE/ACM 46th Int. Conf. Softw. Eng.: Companion Proc., 2024, pp. 306–307.
- [3]H. G. Vogel, W. H. Vogel, H. G. Vogel, G. Müller, J. Sandow, and B. A. Schölkens, Drug Discovery and Evaluation: Pharmacological Assays, vol. 2. Berlin, Germany: Springer, 1997.
- [4]T. Lanciano, F. Bonchi, and A. Gionis, “Explainable classification of brain networks via contrast subgraphs,” in Proc. 26th ACM SIGKDD Int. Conf. Knowl. Discov. Data Mining, 2020, pp. 3308–3318.
- [5]L. Zhao, S. Sawlani, A. Srinivasan, and L. Akoglu, “Graph anomaly detection with unsupervised GNNs,” 2022, arXiv:2210.09535.
- [6]L. Zhao and L. Akoglu, “On using classification datasets to evaluate graph outlier detection: Peculiar observations and new insights,” Big Data, vol. 11, no. 3, pp. 151–180, 2021.
- [7]R. Ma, G. Pang, L. Chen, and A. van denHengel, “Deep graph-level anomaly detection by glocal knowledge distillation,” in Proc. 15th ACM Int. Conf. Web Search Data Mining, 2022, pp. 704–714.
- [8]G. Wilson and D. J. Cook, “A survey of unsupervised deep domain adaptation,” ACM Trans. Intell. Syst. Technol., vol. 11, no. 5, pp. 1–46, 2020.
- [9]E. Nie, S. Liang, H. Schmid, and H. Schütze, “Cross-lingual retrieval augmented prompt for low-resource languages,” in Proc. Findings Assoc. Computat. Linguistics, 2023, pp. 8320–8340.
- [10]S. Ben-David, J. Blitzer, K. Crammer, A. Kulesza, F. Pereira, and J. W. Vaughan, “A theory of learning from different domains,” Mach. Learn., vol. 79, pp. 151–175, 2010.
- [11]S. J. Pan and Q. Yang, “A survey on transfer learning,” IEEE Trans. Knowl. Data Eng., vol. 22, no. 10, pp. 1345–1359, Oct.2010.
- [12]C. Tan, F. Sun, T. Kong, W. Zhang, C. Yang, and C. Liu, “A survey on deep transfer learning,” in Proc. 27th Int. Conf. Artif. Neural Netw., Rhodes, Greece, Springer, 2018, pp. 270–279.
- [13]F. Chen, Y.-C. Wang, B. Wang, and C.-C. J. Kuo, “Graph representation learning: A survey,” APSIPA Trans. Signal Inf. Process., vol. 9, 2020, Art. no. e15.
- [14]L. Akoglu, H. Tong, and D. Koutra, “Graph based anomaly detection and description: A survey,” Data Mining Knowl. Discov., vol. 29, pp. 626–688, 2015.
- [15]X. Ma , “A comprehensive survey on graph anomaly detection with deep learning,” IEEE Trans. Knowl. Data Eng., vol. 35, no. 12, pp. 12012–12038, Dec.2023.
- [16]H. T. Nguyen, P. J. Liang, and L. Akoglu, “Anomaly detection in large labeled multi-graph databases,” 2020, arXiv: 2010.03600.
- [17]B. Sun and K. Saenko, “Deep coral: Correlation alignment for deep domain adaptation,” in Proc. Eur. Conf. Comput. Vis., Amsterdam, The Netherlands, Springer, 2016, pp. 443–450.
- [18]Y. Ganin , “Domain-adversarial training of neural networks,” J. Mach. Learn. Res., vol. 17, no. 1, pp. 2096–2030, 2016.
- [19]E. Tzeng, J. Hoffman, K. Saenko, and T. Darrell, “Adversarial discriminative domain adaptation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 7167–7176.
- [20]M. Long, Z. Cao, J. Wang, and M. I. Jordan, “Conditional adversarial domain adaptation,” in Proc. Adv. Neural Inf. Process. Syst., 2018, pp. 1647–1657.
- [21]E. Vural, “Domain adaptation on graphs by learning graph topologies: Theoretical analysis and an algorithm,” Turkish J. Elect. Eng. Comput. Sci., vol. 27, no. 3, pp. 1619–1635, 2019.
- [22]Y. Zhang, G. Song, L. Du, S. Yang, and Y. Jin, “DANE: Domain adaptive network embedding,” 2019, arXiv: 1906.00684.
- [23]M. Wu, S. Pan, C. Zhou, X. Chang, and X. Zhu, “Unsupervised domain adaptive graph convolutional networks,” in Proc. Web Conf., 2020, pp. 1457–1467.
- [24]M. Pilanci and E. Vural, “Domain adaptation on graphs by learning aligned graph bases,” IEEE Trans. Knowl. Data Eng., vol. 34, no. 2, pp. 587–600, Feb.2022.
- [25]X. Shen, Q. Dai, F.-L. Chung, W. Lu, and K.-S. Choi, “Adversarial deep network embedding for cross-network node classification,” in Proc. AAAI Conf. Artif. Intell., 2020, pp. 2991–2999.
- [26]X. Shen, Q. Dai, S. Mao, F.-L. Chung, and K.-S. Choi, “Network together: Node classification via cross-network deep network embedding,” IEEE Trans. Neural Netw. Learn. Syst., vol. 32, no. 5, pp. 1935–1948, May2021.
- [27]G. Song, Y. Zhang, L. Xu, and H. Lu, “Domain adaptive network embedding,” IEEE Trans. Big Data, vol. 8, no. 5, pp. 1220–1232, Oct.2020.
- [28]K. Ding, K. Shu, X. Shan, J. Li, and H. Liu, “Cross-domain graph anomaly detection,” IEEE Trans. Neural Netw. Learn. Syst., vol. 33, no. 6, pp. 2406–2415, Jun.2022.
- [29]Q. Dai, X.-M. Wu, J. Xiao, X. Shen, and D. Wang, “Graph transfer learning via adversarial domain adaptation with graph convolution,” IEEE Trans. Knowl. Data Eng., vol. 35, no. 5, pp. 4908–4922, May2023.
- [30]J. Li, W. Liu, Y. Zhou, J. Yu, D. Tao, and C. Xu, “Domain-invariant graph for adaptive semi-supervised domain adaptation,” ACM Trans. Multimedia Comput., Commun., Appl., vol. 18, no. 3, pp. 1–18, 2022.
- [31]J. Xiao, Q. Dai, X. Xie, Q. Dou, K.-W. Kwok, and J. Lam, “Domain adaptive graph infomax via conditional adversarial networks,” IEEE Trans. Netw. Sci. Eng., vol. 10, no. 1, pp. 35–52, Jan./Feb.2022.
- [32]Q. Wang, G. Pang, M. Salehi, W. Buntine, and C. Leckie, “Cross-domain graph anomaly detection via anomaly-aware contrastive alignment,” 2022, arXiv:2212.01096.
- [33]R. Cai, F. Wu, Z. Li, P. Wei, L. Yi, and K. Zhang, “Graph domain adaptation: A generative view,” 2021, arXiv:2106.07482.
- [34]M. Wu and M. Rostami, “Unsupervised domain adaptation for graph-structured data using class-conditional distribution alignment,” 2023, arXiv:2301.12361.
- [35]Y. You, T. Chen, Z. Wang, and Y. Shen, “Graph domain adaptation via theory-grounded spectral regularization,” in Proc. 11th Int. Conf. Learn. Representations, 2023.
- [36]K. Xu, W. Hu, J. Leskovec, and S. Jegelka, “How powerful are graph neural networks?,” 2018, arXiv: 1810.00826.
- [37]F. Errica, M. Podda, D. Bacciu, and A. Micheli, “A fair comparison of graph neural networks for graph classification,” 2019, arXiv: 1912.09893.
- [38]X. Ma, T. Zhang, and C. Xu, “GCAN: Graph convolutional adversarial network for unsupervised domain adaptation,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2019, pp. 8266–8276.
- [39]L. Ruff , “Deep one-class classification,” in Proc. Int. Conf. Mach. Learn., PMLR, 2018, pp. 4393–4402.
- [40]N. Xiao and L. Zhang, “Dynamic weighted learning for unsupervised domain adaptation,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2021, pp. 15242–15251.
- [41]K. Saito, D. Kim, P. Teterwak, S. Sclaroff, T. Darrell, and K. Saenko, “Tune it the right way: Unsupervised validation of domain adaptation via soft neighborhood density,” in Proc. IEEE/CVF Int. Conf. Comput. Vis., 2021, pp. 9184–9193.
- [42]L. Van der Maaten and G. Hinton, “Visualizing data using t-SNE,” J. Mach. Learn. Res., vol. 9, no. 11, pp. 2579–2605, 2008.
- [43]Y. Xie, H. Zhang, and M. A. Babar, “LogGD: Detecting anomalies from system logs by graph neural networks,” 2022, arXiv:2209.07869.
- [44]W. Xu, L. Huang, A. Fox, D. Patterson, and M. I. Jordan, “Detecting large-scale system problems by mining console logs,” in Proc. ACM SIGOPS 22nd Symp. Operating Syst. Princ., 2009, pp. 117–132.
- [45]A. Oliner and J. Stearley, “What supercomputers say: A study of five system logs,” in Proc. 37th Annu. IEEE/IFIP Int. Conf. Dependable Syst. Netw., 2007, pp. 575–584.
- [46]K. Riesen , “IAM graph database repository for graph based pattern recognition and machine learning,” in Proc. Int. Workshop Structural, Syntactic, Statist. Pattern Recognit., 2008, pp. 287–297.
- [47]G. O. Campos , “On the evaluation of unsupervised outlier detection: Measures, datasets, and an empirical study,” Data Mining Knowl. Discov., vol. 30, pp. 891–927, 2016.
- [48]G. Zhang , “Dual-discriminative graph neural network for imbalanced graph-level anomaly detection,” in Proc. Adv. Neural Inf. Process. Syst., 2022, pp. 24144–24157.
- [49]A. Kendall, Y. Gal, and R. Cipolla, “Multi-task learning using uncertainty to weigh losses for scene geometry and semantics,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 7482–7491.
- [50]A. Paszke , “PyTorch: An imperative style, high-performance deep learning library,” in Proc. Adv. Neural Inf. Process. Syst., 2019, pp. 8026–8037.
- [51]M. Fey and J. E. Lenssen, “Fast graph representation learning with pytorch geometric,” 2019, arXiv: 1903.02428.
- [52]D. Chen, Y. Lin, W. Li, P. Li, J. Zhou, and X. Sun, “Measuring and relieving the over-smoothing problem for graph neural networks from the topological view,” in Proc. AAAI Conf. Artif. Intell., vol. 2020, pp. 3438–3445.
- [53]S. Ibrahim, N. Ponomareva, and R. Mazumder, “Newer is not always better: Rethinking transferability metrics, their peculiarities, stability and performance,” in Proc. Joint Eur. Conf. Mach. Learn. Knowl. Discov. Databases, Springer, 2022, pp. 693–709.
- [54]W. Zhang, L. Deng, L. Zhang, and D. Wu, “A survey on negative transfer,” IEEE/CAA J. Automatica Sinica, vol. 10, no. 2, pp. 305–329, Feb.2023.