IEEE Transactions on Dependable and Secure Computing

Download PDF

Keywords

Feature Extraction, Correlation, Anomaly Detection, Training, Adaptation Models, Noise Measurement, Estimation, Cyber Physical System, Anomaly Detection, Unsupervised, Correlation, Dynamic Graph, Dual Autoencoder, Anomaly Detection, Cyber Physical Systems, Unsupervised Deep Learning, Detection In Cyber Physical Systems, Original Features, Directed Graph, Gaussian Mixture Model, Correlated Features, Estimation Network, Training Adaptations, Dynamic Graph, Anomaly Detection Methods, Original Reconstruction, Training Set, Validation Set, Weight Matrix, Comparative Experiments, Geographic Information System, Multilayer Perceptron, Generative Adversarial Networks, Graph Attention Network, Graph Convolutional Network, Results Of Scenario, Accurate Characterization, Autoencoder Model, K 0 Values, Area Under Curve, Energy Based Model, GAN Based Models, Transformer Based Methods

Abstract

Cyber-Physical System needs high security to ensure the safe operation. Anomaly detection is one of the mainstream security technologies, the core of which is data analysis and learning. Unsupervised Deep-Learning-based Anomaly Detection Methods can be used in the scenarios that collects large amounts of unlabeled data and are more in line with the actual needs of CPS. However, the correlation among data did not attract enough attention to exploring their implicit relationship, and the adaptive training was deficient. Therefore, we propose an Adaptive-Correlation-aware Unsupervised Deep Learning (ACUDL) for anomaly detection in CPS. It constructs a directed graph structure to represent the implicit correlation among data and adaptively updates with dynamic graph; then, designs a dual-autoencoder to extract the original non-correlation, correlation, and reconstruction features, and builds an estimation network using the Gaussian mixture model (GMM) to estimate the anomaly energy. Experimental results on several CPS data scenarios show that ACUDL can be well adapted to many application scenarios with different data characteristics and achieves better overall results than some up-to-date DL-ADMs.

I   Introduction

Cyber-physical System (CPS) has been successfully used in many fields, such as intelligent living, industrial control, and digital healthcare, etc [1]. CPS security is a vital technology that needs to be addressed [2]. Among them, anomaly detection is an essential security means to identify attacks through real-time detection [3]. Due to the dynamic and uncertain environments, CPS has massive high-dimensional, low-quality and noisy data, making the anomaly detection for CPS a challenging task [4]. Many deep-learning-based anomaly detection methods (DL-ADMs) have been proposed and achieved excellent performance in many CPS scenarios [5]. Moreover, since it is difficult to obtain sufficient data labels in most CPS scenarios, anomaly detection usually adopts unsupervised mode for model training [6]. But the existing methods, such as AutoEncoder (AE) [7], Generative Adversarial Network (GAN) [8], etc, ignore the implicit correlations among the complex CPS data, making them achieve suboptimal performance [9].

For example, as shown in Fig. 1, the Intelligent Cruise Control System (ICCS), Geographic Information System (GIS), and Smart Healthcare System (SHS) are typical CPS scenarios. In ICCS, the vehicle speed and obstacle position are within a specific range. In GIS or SHS, many data have similar flow curves (Fig. 1(d)). These phenomena may imply some implicit correlation, while noises also hurt accurate correlation analysis. Thus, it is critical to design an adaptive method that can reduce the impact of noise and effectively capture the correlations among the complex CPS data. Graphic: Some representative CPS scenarios.

Fig. 1. Some representative CPS scenarios.

In this paper, we utilize the graph to represent and adaptively update the correlations among CPS data. To highlight our motivation, we take SHS as an example and choose the Arrhythmia dataset [10] to explore an ElectroCardioGraph Monitoring System for correlation analysis (ECGMS). The statistical information and experimental settings are shown in Table II of Section IV-A. We use three DL-ADMs to carry out the comparison: the AE for the O riginal non-correlation F eatures (OF-AE), the AE for the C orrelation F eatures (CF-AE) with dynamic graph, and the D ual-AE (D-AE) combining OF-AE and CF-AE. CF-AE and D-AE use dynamic graph and Graph ATtention network (GAT). The three models all use the GMM-based estimation network for anomaly detection (i.e., the D-AE+GMM is our method). The experimental results are shown in Table I (parameter settings are in Section IV-C, the complete results can be seen in Table V). We can see that CF-AE only using the correlation features obtains better performance than OF-AE, and D-AE achieves the best effects. The results suggest that it is essential to capture the correlation features, and the adequate extraction and fusion of the two types of features can further improve the detection performance.

TABLE I Detection Results of the Three Models

Moreover, we have noticed that, the correlation features in normal samples differ from the features in abnormal samples [11]. Therefore, we can employ the unsupervised strategy to extract the correlation features. However, due to the complex CPS data background, the correlation features of the initial mining may not reflect reality correctly, which should be adaptively updated.

Based on the above analysis, we propose an End-to-End A daptive-C orrelation-aware U nsupervised D eep L earning (ACUDL ) for anomaly detection in CPS, as shown in Fig. 2: Firstly, we employ the directed graph to represent the correlation and carry out the adaptive correlation update based on dynamic graph; Secondly, we design a D-AE to encode the correlation and non-correlation features, and calculate the reconstruction error and extract the reconstruction features; Finally, we merge the non-correlation, correlation and reconstruction features and build a GMM-based estimation network to estimate the anomaly energy for detection in CPS scenarios. Our main contributions are summarized as follows:

  1. We design the adaptive correlation update by KNN and dynamic graph to minimize the negative effect of noise.
  2. We build a D-AE to extract adequate latent correlation, non-correlation and reconstruction features, and fuse them into the GMM to accurately estimate the latent distribution. The detection results in different CPS scenarios are superior to other methods.
Graphic: Model motivation.

Fig. 2. Model motivation.

II   Related Work

A. Supervised, Semi-Supervised and Unsupervised Anomaly Detection

Supervised methods require a large amount of labeled training data [12]; semi-supervised methods are sensitive to noise [13]. So supervised and semi-supervised methods are not suitable for the CPS environments.

Unsupervised methods use only unlabeled data for training [14], which is more consistent with CPS data scenarios. There are mainly four categories, Reconstruction-based [15], Support-domain-based [16], Cluster-based [17] and the hybrids. The reconstruction-based methods are not designed for anomaly detection tasks directly, the effects are not satisfactory; The support-domain-based and cluster-based methods are sensitive to parameters; The hybrid methods, such as the Deep Structured Energy-Based Model (DSEBM) [15], Deep AE-based GMM (DAGMM) [14], etc, take advantage of various strengths and achieve relatively better results. But these methods mainly focus on the original features, ignoring the implicit correlation features.

B. Representative DL-ADMs in CPS Applications

1) AE-based ADMs (AE-ADMs): Is the most popular type of DL-ADMs. Currently, AE-based variant designs are very popular. Zhai et al. put forward a DSEBM, which can reduce the training loss in the information fusion and connect to the regularized AE to complete complex data sampling and model training [15]. It has been applied to ICCS, SHS, etc. However, DSEBM requires clear boundaries between normal and abnormal data. Jie et al. propose an AE-based Deep Feature Correspondence (DFC) model with a generic feature extraction network and an elaborated feature estimation network, and design a self-feature enhancement strategy and a multi-context residual learning module to boost the anomaly performance [18].

However, these methods ignore the distribution estimation. Therefore, in Reference [19], MOCCA learns the features extracted from each layer based on AE and estimates the distribution to enhance the anomaly performance. Moreover, Since GMM can accurately estimate latent distribution [16], DAGMM, another Energy-based model, significantly outperforms the other AE models [14]. DAGMM has been used in traditional IoT security and SHS, and can compensate for the information lost by calculating RE as part of the latent features, but it requires a high-quality training set. Therefore, ACUDL presented in this paper exploits the advantages of the above models based on the DAGMM.

2) GAN-based ADMs (GAN-ADMs): As another important branch of DL-ADMs, GAN-ADM's advantage is estimating the probability distribution of the latent features. In Reference [20], AnoGAN, the first GAN-ADM, has been used in the SHS. However, It is not suitable for high-dimensional data scenarios. Zenati et al. proposed an Adversarially Learned Anomaly Detection (ALAD) based on AE and bi-directional GAN [21], which has been applied to the CPS of environmental monitoring and precise instrument monitoring [22]. But the training process of ALAD is complex. Moreover, the training instability of GANs also weakens their practical effectiveness. Because GAN ensembles often outperform single GANs, Han et al. construct the GAN ensembles for anomaly detection in SHS and image analysis (namely EGBADen, Efficient GAN-based anomaly detection), but it still has the problem of training instability [23].

3) GNN and Dynamic graph: Recently, GNN and Graph Convolutional Network (GCN) have used graph structure to establish learning models of paired relationships and been used in Social Network, Recommendation System, Anomaly Detection [13],[24], etc. The latest widely concerned method is OCGNN (One-Class GNN) [25], which combines Deep-SVDD and GCN to improve the feature mining ability. Graph ATtention network (GAT) can efficiently focus on the critical information and has been widely used [26].

However, various CPS data scenarios do not have explicit graphical relations. Moreover, GCN can not guarantee that the structure information is optimal [27]. Dynamic graph, mainly based on GCN (DGCN), can automatically update the structure information, and is already well used for Intelligent Recommendations [28], Social Network [29], SHS [30], etc. But in CPS, the works of adaptive update using dynamic graph have rarely appeared, while the complex CPS data require an adaptive mechanism to capture the most accurate latent features.

As graph structure is more suitable for extracting correlation features, ACUDL in this paper takes dynamic graph to design the adaptive correlation update and builds a D-AE with GAT to adequately extract correlation and original non-correlation features.

III   Method

The End-to-End A daptive-C orrelation-aware U nsupervised D eep L earning (ACUDL) can adaptively build correlation directly from data and extract the features for anomaly detection in CPS. The model framework is shown in Fig. 3.

  1. After CPS data preprocessing, the correlations among data are adaptively constructed and updated by KNN and dynamic graph.
  2. We design a D-AE combining the original feature encoder and graph encoder, which can adequately extract the non-correlation and correlation features, and obtain reconstruction features with a corresponding decoder.
  3. ACUDL uses a GMM-based estimation network to realize the anomaly detection tasks in CPS by accurately estimating the probability distribution with the original non-correlation, correlation, and reconstruction features.
  4. We use Adam Optimizer for model training.
Graphic: The model framework.

Fig. 3. The model framework.

A. Definitions and Problem Statement

Definition 1.

The directed graph is denoted as: G = {V , E , X }, V = {v i |i = 1, 2, …, NV } is the node set, E = {ei = <vi 1, vi 2> |vi 1, vi 2V , i = 1, 2, …, NE } is the directed-edge set, XRNV×NFis the feature matrix, and each row represents an eigenvector of a node.

Definition 2.

Anomaly Detection (AD): Given a sample set to be tested, T = {t i |i = 1, 2, …, NT }, each t i can be represented as an NF -dimensional eigenvector,XiRNF. AD aims to learn a function, f(Xi):RNFR1, to judge whether t i is an abnormal sample based on the preset threshold δ: (1)AD(ti)={Abnormal,iff(Xi)δNormal,otherwise

B. Adaptive Correlation Update

To accurately extract the correlation, we build a directed graph structure to relate the samples, and adaptively update, an example of which is shown in Fig. 4.

1) Constructing the initial correlation directed-graph, G 0: For each sample, x i X , we choose K 0 nearest neighbors, NB i = {x ik |k = 1, 2, …, K 0}, employing the KNN algorithm, KNN (X , Kg ), g = 0, 1, 2, …, is the iteration number. Then, a directed-edge points from x ik to x i . Finally, a directed graph, G 0, is constructed.

2) Update the correlation based on dynamic graph: Based on G 0, we design a simple and efficient dynamic graph to update the correlation adaptively. The key is to dynamically adjust K and obtain the latest G based on K and previous G under the constraint of the training loss. Graphic: Correlation constructing and adaptive update.

Fig. 4. Correlation constructing and adaptive update.

The basic process of adaptive correlation update mainly consists of three steps:

(i) We should update the Kg first: (2)Kg={K0,g=112NViNV(|Gg1(i)|+|Gg2(i)|),g2 where G g (i ) represents the i -th vector of G g .

(ii) Then, perform KNN for each sample, x i X , to obtain the adaptive update graph, GKg, based on Kg : (3)GKg=KNN(X,Kg)

(iii) And update the correlation directed-graph, G g : (4)Gg=(1εg)G0+εgGKg(5)εg=1ωgiNV|GKg(i)G0(i)|G0 and GKgare the variants of G0and GKg, where all non-zero elements are set to 1, respectively; ωgis the number of non-zero elements of G0∪ ︀GKg.

3) The Graph Error, Eg , which is one item of the loss function of ACUDL, is used as the objective function of adaptive correlation update: (6)Eg=GgG02.

The basic process of adaptive correlation update is shown in Algorithm 1. Based on the adaptive adjustment of ɛg , the G can be adaptively updated according to Formula (3) to obtain the most realistic correlation features.

 INPUT: Training set: X
 Neighbor number: Kg , g = 01,2,…
 OUTPUT: Directed graph: G g , g = 01,2,…
 FUNCTIONS: KNN algorithm: KNN()
 Affinity calculation: AFF()
 BEGIN:
 Initialize G 0based on K 0and KNN()
 FOR each g DO //g is the iteration number
 Update Kg by Formula (2)
 FOR each x i X DO
 NB i = KNN (x i , X , Kg ) //Find the Kg nearest neighbors of x i
 D = AFF (x i , NB i ) //Calculate the affinities of x i with x ik
 G (i ,k ) = dk //dk D , k ∈[1,Kg ], Construct and update the G
 END FOR
 Update G g by Formula (3)
 Model training
 END FOR
 END.

C. Dual-AutoEncoder (D-AE)

It contains an original feature encoder, a graph encoder for correlation features, and a decoder. The critical modules are Multi-Layer Perceptron (MLP) and GAT.

Original feature encoder: It adopts MLP, composed of Fully Connected (FC) layers, for nonlinear original non-correlation feature extraction [14]: (7)ZlO=MLP(Zl1OWO+DO), where Zl1O=[z(l1)1O,,z(l1)iO,,z(l1)NVO]T is the input, WO=[w1O,,wlO,,wLO] and DO=[d1O,,dlO,,dLO] are the weight and bias matrix, respectively. l = 1, 2, …, L is the number of the network layer. Z0O=X is the initial input, ZLORNV×NC is the final output, and NC is the number of neural cells in the last layer. ACTivation function (ACT()) in the “1∼l -1” layers is Tanh() or Relu().

Graph encoder for correlation features: It uses a GAT to capture the correlation features among samples and implements shared attention, αi,j , on each sample: (8)αi,j=ATT(xi,xj)=ACT(ωT[WCxi||WCxj]), where x j NB i . αi,j is randomly initialized and adaptively updated with training, and used to indicate the importance of the nodes. ωTRNCand WCRNC2×NFare the coefficient vector and weight matrix of GAT, respectively [26]. “| |” is the concatenation operation.

Then, after Softmax(), we can extract the correlation feature,ziC, by the weighted sum function: (9)ziC=j=1Keαi,jk=1Keαi,kxj,

Finally, (10)ZC=[z1C,,ziC,,zNVC]TRNV×NC.

Feature fusion and Decoder: The fusion features: are obtained through an FC layer: Zf=Fusion(ZLO,ZC)=ZLOZC, where represents the operator for matrices addition element by element.

Then, the decoder reconstructs the initial data, X , with Z f as input, and obtains the reconstruction data, X^, reconstruction error, RE , and the reconstruction features, Z r . The MLP is similar to Formula (7): (11)RE=Euc(X,X^),(12)Zr=[Cos(X,X^),Euc(X,X^)]. where Cos() and Euc() are the cosine distance and Euclidean distance functions, respectively.

D. GMM-Based Estimation Network

We construct a GMM-based estimation network with multi-FC layers. Z = [Z r , Z f ] is the input.

Then, the mixture membership,PRNV×M, is estimated by MLP, where M is the number of the mixed probability distribution. The ACT() in the last layer is Softmax(). The mean vector, μ , and covariance matrix, , of GMM are calculated by the following Formulas: (13)μm=n=1NVpn,mZnn=1NVpn,m(14)Σm=n=1NVpn,m(Znμm)(Znμm)Tn=1NVpn,m

Finally, the anomaly energy, E , is calculated based on P : E=log(m=1Mn=1NVpn,mNV(15)exp(12(Znμm)TΣm1(Znμm))|2πΣm|)

E. Objective Function

The objective function is: (16)L=RE+λ1E+λ2Eg+λ3Z22 the last item is the regularization item to make the distributions of normal and abnormal as distinct as possible. The λ 1, λ 2 and λ 3 are the weight parameters.

ACUDL can make full use of the dynamic graph and D-AE to adaptively capture the original non-correlation, correlation, and reconstruction features on the basis of minimizing noise interference, and conduct probability distribution estimation based on GMM, to learn more accurate sample distribution and identify abnormal data by anomaly energy calculation.

IV.   Experiments

A. Experimental Setup

The experiments are written in Python 3.6, modeled in TensorFlow 1.14, and conducted on a Ubuntu-OS-based Server:

CPU: Intel Xeon CPU E5-2640 (16-Core, 2.4GHz)

GPU: 8×GeForce RTX 2080Ti Graphics Card

Memory: 128G RAM

We chose three scenarios with different characteristics:

SHS: We continue to use the SHS scenarios in the “INTRODUCTION ” by the Arrhythmia [10] and ECG5000 datasets [31]. They are high-dimensional and multi-type time series databases. The Arrhythmia dataset aims to distinguish cardiac arrhythmia and classify it into one of the 16 groups. The data are pre-processed by extracting each heartbeat and making each heartbeat equal length using interpolation. The difference is that Arrhythmia is small-scale, and ECG5000 is larger-scale.

GIS: GIS collects the ground information to detect whether the positioning information is normal. We use the Satellite dataset to build this scenario [32], which is small-scale, low-dimensional, and multi-type. Each sample contains the 3×3 pixel values in the four spectral bands (converted to ASCII), with 36 features.

ICCS: ICCS collects image information to judge whether there is an obstacle in front of the vehicle to trigger the emergency braking. The data collected by the sensors usually have noise. The collected abnormal data generally have a certain similarity. We build the ICCS scenario with the CIFAR-10 dataset [33], a large-scale, high-dimensional, and multi-type image dataset. The classes are entirely mutually exclusive. There is no overlap between automobiles and trucks.

We adopt the experimental strategy of “training set -validation set - test set”. The statistical information and experimental Settings are shown in Table II:

TABLE II Statistical Information of These Datasets

B. Comparative Models and Performance Metrics

We select several currently known DL-ADMs used in CPS for comparison with ACUDL: AnoGAN [20], ALAD [21],[22] and EGBADen[23] are GAN-based; DSEBM [15], DAGMM [14], DFC [18] and MOCCA [19] are AE-based models; OCGNN [25] is GNN-based. These methods are discussed in Section II-B. In addition, we add another baseline, GOAD [34], which is a transformation-based method, but the transformation can introduce redundant information that deviates from the data distribution.

We use the four metrics to evaluate these methods: AUC (Area Under Curve), AP (Average Precision), F1-score (F1), and Detection Time (DT) of each sample.

C. Parameter Settings

The parameter settings of ACUDL are shown in Tables III and IV, where the initial value of K , K 0, and the value of M in GMM are tested and set in the “Parameter sensitivity experiment ”. Other parameters are set by a large number of experiments. All the results are obtained from the mean and standard deviation of the experiments under five random seeds. The learning rate of Adam is set as 0.001.

TABLE III The Parameter Setting of ACUDL

TABLE IV Parameter Setting of Modules in ACUDL

D. Parameter Sensitivity Experiments of ACUDL

Parameter (K0) sensitivity experiment: The K 0 is critical for KNN to determine whether the adaptive correlation analysis is accurate. The experimental results with different values of K under different scenarios are shown in Fig. 5. In most scenarios, the overall performance of ACUDL does not depend on different values of K 0; In GIS scenario, there are certain fluctuations in AP and F1 curves when K 0∈[11, 15]. It is mainly because ECG5000 is a small time series dataset, and ACUDL may be affected to some extent. These results mean that ACUDL is relatively insensitive to K 0 as a whole. Graphic: Parameter (K0) sensitivity experimental results.

Fig. 5. Parameter (K 0) sensitivity experimental results.

Parameter (M) sensitivity experiment: In GMM, the value of M is the key parameter to estimate the probability distribution. So we continue to perform this experiment to set the appropriate value of M . The experimental results are shown in Fig. 6. In most application scenarios, the overall performance of ACUDL does not depend on different values of M ; In SHS (Arrhythmia) scenario, there are certain fluctuations in F1 curves when M ∈[2, 6]. This is also because Arrhythmia is a smaller time series dataset than ECG5000. These results mean that ACUDL is relatively insensitive to M as a whole. Graphic: Parameter (M) sensitivity experimental results.

Fig. 6. Parameter (M ) sensitivity experimental results.

Based on these results, the values of K 0 and M are set as shown in Table IV in the subsequent experiments.

E. Comparative Experiment of Anomaly Detection

The parameter settings of these comparative currently known models mentioned in Section IV-B are mainly based on the experimental settings in their respective references and fine-tuned with the optimal results obtained by many tests in each scenario of this experiment. Notably, we conduct the one-vs-all comparative experiment on the CIFAR-10 set. The experimental results are shown in Table V and Fig. 7. We can see that AE-based models get the worst results, while ACUDL achieves the best results in each application scenario as a whole.

1) Overall, GAN-based models (ALAD, AnoGAN and EGBADen) are inferior to the other methods due to the instability and high data quality requirement. Such as, the AP and F1 results of ALAD and EGBADen in SHS (Arrhythmia) scenarios, the AUC results of the three models in ICCS scenarios are not ideal.

2) After avoiding the shortcomings of AE and GAN-based models, DAGMM and OCGNN work better than DSEBM. They are almost equally effective and have their advantages in different scenarios. Such as, the results of DAGMM are better in SHS (ECG5000) scenario but worse in SHS (Arrhythmia) scenario; the results of OCGNN are better in GIS scenario but worse in ICCS scenario. The relatively better results of DAGMM are primarily due to the use of GMM, while those of OCGNN are primarily due to the introduction of GCN.

3) For the latest methods, MOCCA and DFC, due to their respective strengths, they achieve desirable results in different scenarios. Such as, DFC gets the second-best results in SHS (Arrhythmia) scenario and MOCCA gets the best F1 results in GIS and ICCS scenarios. But they are still weaker than ACUDL, and they takes more detection time than the other methods except ALAD overall.

4) GOAD is a transformation-based method that has received particular attention in the last two years. However, it is sensitive to data scenarios, and the AP and F1 results in SHS (ECG5000) and ICCS scenarios are very unsatisfactory, even though his AUC result in ICCS scenario is better than those of other methods except ACUDL.

5) From the average results of one-vs-all in ICCS scenario, we can also see that: ACUDL outperforms the other models; From the detailed results of one-vs-all, we also can see that: Overall, ACUDL and GOAD get the best AUC results, and they are comparable; ACUDL gets the best AP and F1 results and the detection times are also ideal. Therefore, the comprehensive analysis of the average and detailed results can prove that ACUDL obtains the best results in this scenario as a whole.

6) As for the detection time, the efficiency of ACUDL is in the middle level. Combining with the results of the other performance metrics, ACUDL obtains optimal detection performance.

TABLE V Detection Results of These Comparative Models (The Best Results are Marked in Bold, the Second-Best Results are Underlined, and the Worst Results are Italicized)

Graphic: One-vs-all experimental results on ICCS (CIFAR-10 dataset).

Fig. 7. One-vs-all experimental results on ICCS (CIFAR-10 dataset).

All the results can verify that ACUDL combines the advantages of all the other methods, effectively avoids their disadvantages, and focuses on the correlation feature extraction and feature fusion during the adaptive training. Hence it achieves the best results as a whole.

F. Ablation Experiment of ACUDL

In this experiment, the previous experiment in “INTRODUCTION ” is extended to different application scenarios. The experimental results are shown in Table VI.

1) In any application scenario, the results of both CF-AE and OF-AE are reasonable. Especially, the results of CF-AE are significantly better than those of OF-AE in SHS (Arrhythmia) scenario, and the results of CF-AE and OF-AE are comparable in the other three scenarios, but OF-AE has a moderate AUC result in ICCS scenario.

2) D-AE (ACUDL) achieves the best results in all scenarios. Especially, it significantly outperforms CF-AE and OF-AE in SHS (Arrhythmia) and ICCS scenarios.

TABLE VI Detection Results of the Three Models

These results show that the two modules of the original features extraction and the correlation feature extraction working together can achieve better results. ACUDL is designed as an adaptive end-to-end model that makes each module fully functional in different CPS scenarios. Moreover, the fusion of the non-correlation features, correlation features and reconstruction features can further extract more effective and comprehensive features and improve the detection performance.

G. Noise Experiment of ACUDL

This experiment mainly tests the detection results of ACUDL in different application scenarios after adding noise to prove the scheme's robustness. The experimental results are shown in Table VII. In any application scenario, the effects are reduced when the noise samples are added, but all of them are relatively stable with different noise levels. Even though the AUC and AP results have the largest drops in SHS (Arrhythmia) scenario, they are still within acceptable limits. Moreover, as shown in Table V, they are comparable to the results obtained by baselines without adding noise.

TABLE VII Noise Experimental Results of ACUDL

These results indicate that, after noise injection, ACUDL can still adequately capture the correlation features based on adaptive training with dynamic graph to minimize the noise impact to the greatest extent.

H. Visualization Experiment

To further verify the quality of latent features obtained by each model, this experiment makes a visualization comparison in SHS (ECG5000) scenario. We select some models with better performances in the “Comparative experiment of anomaly detection ”. The embeddings extracted from each model are as the input of the t-SNE tool [35] to generate the 2D visualization results, as shown in Fig. 8: The distribution of DVAE's embeddings is messy; AnoGAN and OCGNN have some overlapping regions; EGBADen has few overlapping regions, but the abnormal samples are divided into three sub-regions. DAGMM and ACUDL have similar effects, but ACUDL's normal/abnormal boundary is more apparent. Moreover, the visualization results are basically consistent with the detection results in Table V. The results mean that ACUDL can accurately capture the correlation features by the adaptive training to separate the normal and abnormal latent distributions more effectively, providing a good prerequisite for anomaly detection. Graphic: Visualization results of comparative models in the SHS (ECG5000) scenario (The blue dots represent normal samples, and the orange dots represent abnormal samples).

Fig. 8. Visualization results of comparative models in the SHS (ECG5000) scenario (The blue dots represent normal samples, and the orange dots represent abnormal samples).

I. Simulation Experiment of ACUDL

This experiment simulates the SHS to verify that the adaptive correlation modeling of ACUDL has effective practical effects. In each sub-experiment, we randomly capture 1654 normal samples and 176 samples in different normal distributions, and randomly select 1480 normal samples to form the training set, 827 normal samples and 88 abnormal samples to form the validation set, 828 normal samples and the remaining 88 abnormal samples to form the test set. The experimental setup and procedure are the same as above. The results are shown in Table VIII.

TABLE VIII Detection Results of ACUDL in SHS Simulation Scenario

We can see that the detection performance maintains an ideal and stable level in each sub-experiment. It indicates that ACUDL can be applied to different data characteristics due to the dynamic-graph-based adaptive training to obtain more accurate and comprehensive features.

J. Comparative Application Experiment

We select two typical CPS application scenarios to perform the application experiment:

Smart Grid (SG): We use the Decentral Smart Grid Control dataset (DSGC) [36]. It corresponds to an augmented version of the “Electrical Grid Stability Simulated Dataset”, which performs the local stability analysis of the 4-node star system. It contains the attributes of reaction time of participant, the value for electricity producer, power consumed, price elasticity, stability label, etc.

Smart Home (SH): We use the REFIT dataset [37], including 20 household data from Loughborough, U.K., over the period 2013-2014. We use the data of freezers in House-1. It is a time series dataset, and has two classes, namely the power demands of freezers in the kitchen and the garage, respectively.

Each dataset does not have explicit correlations, and the statistical information is shown in Table X. The main parameter settings of ACUDL for the two datasets are shown in Tables IX and XI. The experimental setup and procedure are the same as above. The results are shown in Table XII.

TABLE IX Parameter Setting of Modules for the Two Application Scenarios

TABLE X Statistical Information of the Two Datasets

TABLE XI The Parameter Setting of ACUDL for the Two Application Scenarios

TABLE XII Detection Results of These Comparative Models in the Two Scenarios (The Best Results are Marked in Bold, the Second-Best Results are Underlined, and the Worst Results are Italicized)

We can see that ACUDL achieves better results than the comparative methods in the two scenarios. Combining with the results in the “Comparative experiment of anomaly detection ”, we can see that ACUDL can effectively use various features to perform anomaly detection tasks well in application scenarios with or without correlation characteristics.

V   Conclusion

The DL-ADM is one of the mainstream methods in various CPS application scenarios. There are implicit correlation features among CPS data besides many explicit original features. Moreover, The complex CPS data background makes the traditional static GNN unable to get accurate features. Worse yet, few current works involve data correlation analysis and adaptive training, resulting in unsatisfactory performance. Therefore, ACUDL uses the end-to-end mode, designs the D-AE, adaptively trains with dynamic graph to extract correlation features accurately, and combines the non-correlation features and reconstruction features to estimate the probability distribution and complete the anomaly detection tasks in CPSs. Experimental results show that ACUDL outperforms the currently known models in different CPS data scenarios as a whole. It also highlights that ACUDL is designed as an adaptive end-to-end model that can be well adapted to complex CPS data scenarios with different characteristics. Next, we will continue accurately capturing various correlation features and focus on building a more robust feedback model for more CPS scenarios.

References


  • [1]I. F. Akyildiz and A. Kak, “The Internet of space Things/CubeSats: A ubiquitous cyber-physical system for the connected world,” Comput. Netw., vol. 150, pp. 134–149, 2019.
  • [2]M. Farajzadeh-Zanjani, E. Hallaji, R. Razavi-Far, and M. Saif, “Generative-adversarial class-imbalance learning for classifying cyber-attacks and faults - a cyber-physical power system,” IEEE Trans. Dependable Secure Comput., vol. 19, no. 6, pp. 4068–4081, Nov./Dec.2021.
  • [3]W. Yan, L. K. Mestha, and M. Abbaszadeh, “Attack detection for securing cyber physical systems,” IEEE Internet Things J., vol. 6, no. 5, pp. 8471–8481, Oct.2019.
  • [4]Y. Yuan , “Data driven discovery of cyber physical systems,” Nature Commun., vol. 10, 2019, Art. no. 4894.
  • [5]J. Zhang, L. Pan, Q. L. Han, C. Chen, S. Wen, and Y. Xiang, “Deep learning based attack detection for cyber-physical system CyberSecurity: A survey,” IEEE-CAA J. Automatica Sinica, vol. 9, no. 3, pp. 377–391, Mar.2022.
  • [6]G. J. Qi and J. B. Luo, “Small data challenges in Big Data era: A survey of recent progress on unsupervised and semi-supervised methods,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 4, pp. 2168–2187, Apr.2022.
  • [7]Y. Bao , “Computer vision and deep learning–based data anomaly detection method for structural health monitoring,” Struct. Health Monit., vol. 18, no. 2, pp. 401–421, 2019.
  • [8]T. Schlegl , “Unsupervised anomaly detection with generative adversarial networks to guide marker discovery,” in Proc. Int. Conf. Inf. Process. Med. Imag., Boone, USA, 2017, pp. 146–157.
  • [9]Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, and P. S. Yu, “A comprehensive survey on graph neural networks,” IEEE Trans. Neural Netw. Learn. Syst., vol. 32, no. 1, pp. 4–24, Jan.2021.
  • [10]UC Irvine Machine Learning Repository, “Arrhythmia dataset,” 1998. [Online]. Available: http://archive.ics.uci.edu/ml/datasets/Arrhythmia
  • [11]H. Y. Fan , “Correlation-aware deep generative model for unsupervised anomaly detection,” in Proc. Pacific-Asia Conf. Knowl. Discov. Data, Singapore, 2020, pp. 688–700.
  • [12]L. Akoglu, H. Tong, and D. Koutra, “Graph based anomaly detection and description: A survey,” Data Mining Knowl. Discov., vol. 29, no. 3, pp. 628–688, 2015.
  • [13]G. J. Qi and J. B. Luo, “Small data challenges in Big Data era: A survey of recent progress on unsupervised and semi-supervised methods,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 1, pp. 2168–2187, Apr.2022.
  • [14]B. Zong , “Deep autoencoding Gaussian mixture model for unsupervised anomaly detection,” in Proc. 6th Int. Conf. Learn. Representations, Vancouver, BC, Canada, 2018, pp. 1–19.
  • [15]S. F. Zhai , “Deep structured energy based models for anomaly detection,” in Proc. 33rd Int. Conf. Int. Conf. Mach. Learn., Daytona Beach, FL, USA, 2016, pp. 1100–1109.
  • [16]K. Liu , “Generalized zero-shot learning for action recognition with web-scale video data,” World Wide Web-Internet Web Inf. Syst., vol. 22, no. 2, pp. 807–824, 2019.
  • [17]N. Ding , “Real-time anomaly detection based on long short-Term memory and Gaussian mixture model,” Comput. Elect. Eng., vol. 79, 2019, Art. no. UNSP 106458.
  • [18]J. Yang, Y. Shi, and Z. Qi, “Learning deep feature correspondence for unsupervised anomaly detection and segmentation,” Pattern Recognit., vol. 132, 2022, Art. no. 108874.
  • [19]F. V. Massoli, F. Falchi, A. Kantarci, Ş. Akti, H. K. Ekenel, and G. Amato, “MOCCA: Multilayer one-class classification for anomaly detection,” IEEE Trans. Neural Netw. Learn. Syst., vol. 33, no. 6, pp. 2313–2323, Jun.2022.
  • [20]T. Schlegl , “Unsupervised anomaly detection with generative adversarial networks to guide marker discovery,” in Proc. Int. Conf. Inf. Process. Med. Imag., Boone, MA, USA, 2017, pp. 146–157.
  • [21]H. Zenati, M. Romain, C. -S. Foo, B. Lecouat, and V. Chandrasekhar, “Adversarially learned anomaly detection,” in Proc. IEEE Int. Conf. Data Mining, Singapore, 2018, pp. 727–736.
  • [22]O. Knapp , “Adversarially learned anomaly detection on CMS open data: Re-discovering the top quark,” Eur. Phys. J. Plus, vol. 136, 2021, Art. no. 236.
  • [23]X. Han, X. H. Chen, and L. P. Liu, “GAN ensemble for anomaly detection,” in Proc. 35 AAAI Conf. Artif. Intell., 2021, pp. 4090–4097.
  • [24]T. N. Kipf and M. Welling, “Semi-supervised classification with graph convolutional networks,” in Proc. 5th Int. Conf. Learn. Representations, Toulon, France, 2017, pp. 1–14.
  • [25]X. H. Wang , “One-class graph neural networks for anomaly detection in attributed networks,” Neural Comput. Appl., vol. 33, no. 18, pp. 12073–12085, 2021.
  • [26]P. Veličković , “Graph attention networks,” in Proc. 6th Int. Conf. Learn. Representations, Vancouver, BC, Canada, 2018, pp. 1–12.
  • [27]W. F. Liu , “Human activity recognition by manifold regularization based dynamic graph convolutional networks,” Neurocomputing, vol. 444, pp. 217–225, 2021.
  • [28]Z. Q. Pan, W. Y. Chen, and H. H. Chen, “Dynamic graph learning for session-based recommendation,” Mathematics, vol. 9, no. 12, 2021, Art. no. 1420.
  • [29]S. H. Cheong, Y. W. Si, and R. K. Wong, “Online force-directed algorithms for visualization of dynamic graphs,” Inf. Sci., vol. 556, pp. 223–255, 2021.
  • [30]S. C. Fu , “Dynamic graph learning convolutional networks for semi-supervised classification,” ACM Trans. Multimedia Comput. Commun. Appl., vol. 17, no. 1, 2021, Art. no. 4.
  • [31]UCR Time Series Classification Repository, “ECG5000 dataset,” 2000. [Online]. Available: http://www.timeseriesclassification.com/description.php?Dataset=ECG5000
  • [32]UC Irvine Machine Learning Repository, “Satellite dataset,” 1993. [Online]. Available: https://archive-beta.ics.uci.edu/ml/datasets/statlog+landsat+satellite
  • [33]A. Krizhevsky, “CIFAR-10 dataset,” 2009. [Online]. Available: http://www.cs.toronto.edu/∼kriz/cifar.html
  • [34]L. Bergman and Y. Hoshen, “Classification-based anomaly detection for general data,” in Proc. 8th Int. Conf. Learn. Representations, 2020, pp. 1–12.
  • [35]L. Maaten and G. Hinton, “Visualizing data using t-SNE,” J. Mach. Learn. Res., vol. 9, pp. 2579–2605, 2008.
  • [36]U C Irvine Machine Learning Repository, “Electrical grid stability simulated dataset,” 2018. [Online]. Available: https://www.kaggle.com/datasets/pcbreviglieri/smart-grid-stability
  • [37]Smart Homes and Energy Demand Reduction, “REFIT data set,” 2017. [Online]. Available: https://www.refitsmarthomes.org/datasets/

Graphic:
Liang Xi received the PhD degree in computer applied technology from the Harbin University of Science and Technology, Harbin, China, in 2012. He is currently a professor with the Harbin University of Science and Technology. His current research interests include artificial intelligence, network security, machine learning, etc. He was a recipient of the National Science Foundation of China in 2012, the Chunhui Project Foundation of the Education Department of China in 2023, the Natural Science Foundation of Heilongjiang Province in 2018 and 2022, the Innovative Talents project of Common University in Heilongjiang Province in 2015, etc.
Graphic:
Dehua Miao is currently working toward the MS degree in computer science and technology with the Harbin University of Science and Technology, Harbin, China. His current research interests include artificial intelligence, network security, and machine learning, etc.
Graphic:
Menghan Li is currently working toward the MS degree in computer science and technology with the Harbin University of Science and Technology, Harbin, China. Her current research interests include artificial intelligence, network security, and machine learning, etc.
Graphic:
Ruidong Wang received the MS degree in computer science and technology from the Harbin University of Science and Technology, Harbin, China, in 2020. He is currently working toward the PhD degree in computer science and technology with the Harbin University of Science and Technology. His current research interests include artificial intelligence, network security, and machine learning, etc.
Graphic:
Han Liu received the MS degree in computer science and technology from the Harbin University of Science and Technology, Harbin, China, in 2021. He is currently working toward the PhD degree in computer science and technology with the Harbin University of Science and Technology. His current research interests include artificial intelligence, network security, and machine learning, etc.
Graphic:
Xunhua Huang received the MS degree in computer science and technology from the Harbin University of Science and Technology, Harbin, China, in 2021. He is currently working toward the PhD degree in computer science and technology with the Harbin University of Science and Technology. His current research interests include artificial intelligence, network security, and machine learning, etc.
Like what you’re reading?
Already a member?
Get this article FREE with a new membership!

Related Articles