Abstract
I Introduction
Cyber-physical System (CPS) has been successfully used in many fields, such as intelligent living, industrial control, and digital healthcare, etc [1]. CPS security is a vital technology that needs to be addressed [2]. Among them, anomaly detection is an essential security means to identify attacks through real-time detection [3]. Due to the dynamic and uncertain environments, CPS has massive high-dimensional, low-quality and noisy data, making the anomaly detection for CPS a challenging task [4]. Many deep-learning-based anomaly detection methods (DL-ADMs) have been proposed and achieved excellent performance in many CPS scenarios [5]. Moreover, since it is difficult to obtain sufficient data labels in most CPS scenarios, anomaly detection usually adopts unsupervised mode for model training [6]. But the existing methods, such as AutoEncoder (AE) [7], Generative Adversarial Network (GAN) [8], etc, ignore the implicit correlations among the complex CPS data, making them achieve suboptimal performance [9].
For example, as shown in Fig. 1, the Intelligent Cruise Control System (ICCS), Geographic Information System (GIS),
and Smart Healthcare System (SHS) are typical CPS scenarios. In ICCS, the vehicle
speed and obstacle position are within a specific range. In GIS or SHS, many data
have similar flow curves (Fig. 1(d)). These phenomena may imply some implicit correlation, while noises also hurt accurate
correlation analysis. Thus, it is critical to design an adaptive method that can reduce
the impact of noise and effectively capture the correlations among the complex CPS
data.
Fig. 1. Some representative CPS scenarios.
In this paper, we utilize the graph to represent and adaptively update the correlations among CPS data. To highlight our motivation, we take SHS as an example and choose the Arrhythmia dataset [10] to explore an ElectroCardioGraph Monitoring System for correlation analysis (ECGMS). The statistical information and experimental settings are shown in Table II of Section IV-A. We use three DL-ADMs to carry out the comparison: the AE for the O riginal non-correlation F eatures (OF-AE), the AE for the C orrelation F eatures (CF-AE) with dynamic graph, and the D ual-AE (D-AE) combining OF-AE and CF-AE. CF-AE and D-AE use dynamic graph and Graph ATtention network (GAT). The three models all use the GMM-based estimation network for anomaly detection (i.e., the D-AE+GMM is our method). The experimental results are shown in Table I (parameter settings are in Section IV-C, the complete results can be seen in Table V). We can see that CF-AE only using the correlation features obtains better performance than OF-AE, and D-AE achieves the best effects. The results suggest that it is essential to capture the correlation features, and the adequate extraction and fusion of the two types of features can further improve the detection performance.

Moreover, we have noticed that, the correlation features in normal samples differ from the features in abnormal samples [11]. Therefore, we can employ the unsupervised strategy to extract the correlation features. However, due to the complex CPS data background, the correlation features of the initial mining may not reflect reality correctly, which should be adaptively updated.
Based on the above analysis, we propose an End-to-End A daptive-C orrelation-aware U nsupervised D eep L earning (ACUDL ) for anomaly detection in CPS, as shown in Fig. 2: Firstly, we employ the directed graph to represent the correlation and carry out the adaptive correlation update based on dynamic graph; Secondly, we design a D-AE to encode the correlation and non-correlation features, and calculate the reconstruction error and extract the reconstruction features; Finally, we merge the non-correlation, correlation and reconstruction features and build a GMM-based estimation network to estimate the anomaly energy for detection in CPS scenarios. Our main contributions are summarized as follows:
- We design the adaptive correlation update by KNN and dynamic graph to minimize the negative effect of noise.
- We build a D-AE to extract adequate latent correlation, non-correlation and reconstruction features, and fuse them into the GMM to accurately estimate the latent distribution. The detection results in different CPS scenarios are superior to other methods.

Fig. 2. Model motivation.
II Related Work
A. Supervised, Semi-Supervised and Unsupervised Anomaly Detection
Supervised methods require a large amount of labeled training data [12]; semi-supervised methods are sensitive to noise [13]. So supervised and semi-supervised methods are not suitable for the CPS environments.
Unsupervised methods use only unlabeled data for training [14], which is more consistent with CPS data scenarios. There are mainly four categories, Reconstruction-based [15], Support-domain-based [16], Cluster-based [17] and the hybrids. The reconstruction-based methods are not designed for anomaly detection tasks directly, the effects are not satisfactory; The support-domain-based and cluster-based methods are sensitive to parameters; The hybrid methods, such as the Deep Structured Energy-Based Model (DSEBM) [15], Deep AE-based GMM (DAGMM) [14], etc, take advantage of various strengths and achieve relatively better results. But these methods mainly focus on the original features, ignoring the implicit correlation features.
B. Representative DL-ADMs in CPS Applications
1) AE-based ADMs (AE-ADMs): Is the most popular type of DL-ADMs. Currently, AE-based variant designs are very popular. Zhai et al. put forward a DSEBM, which can reduce the training loss in the information fusion and connect to the regularized AE to complete complex data sampling and model training [15]. It has been applied to ICCS, SHS, etc. However, DSEBM requires clear boundaries between normal and abnormal data. Jie et al. propose an AE-based Deep Feature Correspondence (DFC) model with a generic feature extraction network and an elaborated feature estimation network, and design a self-feature enhancement strategy and a multi-context residual learning module to boost the anomaly performance [18].
However, these methods ignore the distribution estimation. Therefore, in Reference [19], MOCCA learns the features extracted from each layer based on AE and estimates the distribution to enhance the anomaly performance. Moreover, Since GMM can accurately estimate latent distribution [16], DAGMM, another Energy-based model, significantly outperforms the other AE models [14]. DAGMM has been used in traditional IoT security and SHS, and can compensate for the information lost by calculating RE as part of the latent features, but it requires a high-quality training set. Therefore, ACUDL presented in this paper exploits the advantages of the above models based on the DAGMM.
2) GAN-based ADMs (GAN-ADMs): As another important branch of DL-ADMs, GAN-ADM's advantage is estimating the probability distribution of the latent features. In Reference [20], AnoGAN, the first GAN-ADM, has been used in the SHS. However, It is not suitable for high-dimensional data scenarios. Zenati et al. proposed an Adversarially Learned Anomaly Detection (ALAD) based on AE and bi-directional GAN [21], which has been applied to the CPS of environmental monitoring and precise instrument monitoring [22]. But the training process of ALAD is complex. Moreover, the training instability of GANs also weakens their practical effectiveness. Because GAN ensembles often outperform single GANs, Han et al. construct the GAN ensembles for anomaly detection in SHS and image analysis (namely EGBADen, Efficient GAN-based anomaly detection), but it still has the problem of training instability [23].
3) GNN and Dynamic graph: Recently, GNN and Graph Convolutional Network (GCN) have used graph structure to establish learning models of paired relationships and been used in Social Network, Recommendation System, Anomaly Detection [13],[24], etc. The latest widely concerned method is OCGNN (One-Class GNN) [25], which combines Deep-SVDD and GCN to improve the feature mining ability. Graph ATtention network (GAT) can efficiently focus on the critical information and has been widely used [26].
However, various CPS data scenarios do not have explicit graphical relations. Moreover, GCN can not guarantee that the structure information is optimal [27]. Dynamic graph, mainly based on GCN (DGCN), can automatically update the structure information, and is already well used for Intelligent Recommendations [28], Social Network [29], SHS [30], etc. But in CPS, the works of adaptive update using dynamic graph have rarely appeared, while the complex CPS data require an adaptive mechanism to capture the most accurate latent features.
As graph structure is more suitable for extracting correlation features, ACUDL in this paper takes dynamic graph to design the adaptive correlation update and builds a D-AE with GAT to adequately extract correlation and original non-correlation features.
III Method
The End-to-End A daptive-C orrelation-aware U nsupervised D eep L earning (ACUDL) can adaptively build correlation directly from data and extract the features for anomaly detection in CPS. The model framework is shown in Fig. 3.
- After CPS data preprocessing, the correlations among data are adaptively constructed and updated by KNN and dynamic graph.
- We design a D-AE combining the original feature encoder and graph encoder, which can adequately extract the non-correlation and correlation features, and obtain reconstruction features with a corresponding decoder.
- ACUDL uses a GMM-based estimation network to realize the anomaly detection tasks in CPS by accurately estimating the probability distribution with the original non-correlation, correlation, and reconstruction features.
- We use Adam Optimizer for model training.

Fig. 3. The model framework.
A. Definitions and Problem Statement
Definition 1.
The directed graph is denoted as: G = {V , E , X }, V = {v i |i = 1, 2, …, NV } is the node set, E = {ei = <vi 1, vi 2> |vi 1, vi 2∈V , i = 1, 2, …, NE } is the directed-edge set, is the feature matrix, and each row represents an eigenvector of a node.
Definition 2.
Anomaly Detection (AD): Given a sample set to be tested, T = {t i |i = 1, 2, …, NT }, each t i can be represented as an NF -dimensional eigenvector,. AD aims to learn a function, , to judge whether t i is an abnormal sample based on the preset threshold :
B. Adaptive Correlation Update
To accurately extract the correlation, we build a directed graph structure to relate the samples, and adaptively update, an example of which is shown in Fig. 4.
1) Constructing the initial correlation directed-graph, G 0: For each sample, x i ∈ X , we choose K 0 nearest neighbors, NB i = {x ik |k = 1, 2, …, K 0}, employing the KNN algorithm, KNN (X , Kg ), g = 0, 1, 2, …, is the iteration number. Then, a directed-edge points from x ik to x i . Finally, a directed graph, G 0, is constructed.
2) Update the correlation based on dynamic graph: Based on G 0, we design a simple and efficient dynamic graph to update the correlation adaptively.
The key is to dynamically adjust K and obtain the latest G based on K and previous G under the constraint of the training loss.
Fig. 4. Correlation constructing and adaptive update.
The basic process of adaptive correlation update mainly consists of three steps:
(i) We should update the Kg first: where G g (i ) represents the i -th vector of G g .
(ii) Then, perform KNN for each sample, x i ∈ X , to obtain the adaptive update graph, , based on Kg :
(iii) And update the correlation directed-graph, G g : and are the variants of and , where all non-zero elements are set to 1, respectively; is the number of non-zero elements of ∪ ︀.
3) The Graph Error, Eg , which is one item of the loss function of ACUDL, is used as the objective function of adaptive correlation update:
The basic process of adaptive correlation update is shown in Algorithm 1. Based on the adaptive adjustment of ɛg , the G can be adaptively updated according to Formula (3) to obtain the most realistic correlation features.
Neighbor number: Kg , g = 01,2,…
OUTPUT: Directed graph: G g , g = 01,2,…
FUNCTIONS: KNN algorithm: KNN()
Affinity calculation: AFF()
BEGIN:
Initialize G 0based on K 0and KNN()
FOR each g DO //g is the iteration number
Update Kg by Formula (2)
FOR each x i ∈X DO
NB i = KNN (x i , X , Kg ) //Find the Kg nearest neighbors of x i
D = AFF (x i , NB i ) //Calculate the affinities of x i with x ik
G (i ,k ) = dk //dk ∈D , k ∈[1,Kg ], Construct and update the G
END FOR
Update G g by Formula (3)
Model training
END FOR
END.
C. Dual-AutoEncoder (D-AE)
It contains an original feature encoder, a graph encoder for correlation features, and a decoder. The critical modules are Multi-Layer Perceptron (MLP) and GAT.
Original feature encoder: It adopts MLP, composed of Fully Connected (FC) layers, for nonlinear original non-correlation feature extraction [14]: where is the input, and are the weight and bias matrix, respectively. l = 1, 2, …, L is the number of the network layer. is the initial input, is the final output, and NC is the number of neural cells in the last layer. ACTivation function (ACT()) in the “1∼l -1” layers is Tanh() or Relu().
Graph encoder for correlation features: It uses a GAT to capture the correlation features among samples and implements shared attention, αi,j , on each sample: where x j ∈NB i . αi,j is randomly initialized and adaptively updated with training, and used to indicate the importance of the nodes. and are the coefficient vector and weight matrix of GAT, respectively [26]. “| |” is the concatenation operation.
Then, after Softmax(), we can extract the correlation feature,, by the weighted sum function:
Finally,
Feature fusion and Decoder: The fusion features: are obtained through an FC layer: where represents the operator for matrices addition element by element.
Then, the decoder reconstructs the initial data, X , with Z f as input, and obtains the reconstruction data, , reconstruction error, RE , and the reconstruction features, Z r . The MLP is similar to Formula (7): where Cos() and Euc() are the cosine distance and Euclidean distance functions, respectively.
D. GMM-Based Estimation Network
We construct a GMM-based estimation network with multi-FC layers. Z = [Z r , Z f ] is the input.
Then, the mixture membership,, is estimated by MLP, where M is the number of the mixed probability distribution. The ACT() in the last layer is Softmax(). The mean vector, μ , and covariance matrix, ∑ , of GMM are calculated by the following Formulas:
Finally, the anomaly energy, E , is calculated based on P :
E. Objective Function
The objective function is: the last item is the regularization item to make the distributions of normal and abnormal as distinct as possible. The λ 1, λ 2 and λ 3 are the weight parameters.
ACUDL can make full use of the dynamic graph and D-AE to adaptively capture the original non-correlation, correlation, and reconstruction features on the basis of minimizing noise interference, and conduct probability distribution estimation based on GMM, to learn more accurate sample distribution and identify abnormal data by anomaly energy calculation.
IV. Experiments
A. Experimental Setup
The experiments are written in Python 3.6, modeled in TensorFlow 1.14, and conducted on a Ubuntu-OS-based Server:
CPU: Intel Xeon CPU E5-2640 (16-Core, 2.4GHz)
GPU: 8×GeForce RTX 2080Ti Graphics Card
Memory: 128G RAM
We chose three scenarios with different characteristics:
SHS: We continue to use the SHS scenarios in the “INTRODUCTION ” by the Arrhythmia [10] and ECG5000 datasets [31]. They are high-dimensional and multi-type time series databases. The Arrhythmia dataset aims to distinguish cardiac arrhythmia and classify it into one of the 16 groups. The data are pre-processed by extracting each heartbeat and making each heartbeat equal length using interpolation. The difference is that Arrhythmia is small-scale, and ECG5000 is larger-scale.
GIS: GIS collects the ground information to detect whether the positioning information is normal. We use the Satellite dataset to build this scenario [32], which is small-scale, low-dimensional, and multi-type. Each sample contains the 3×3 pixel values in the four spectral bands (converted to ASCII), with 36 features.
ICCS: ICCS collects image information to judge whether there is an obstacle in front of the vehicle to trigger the emergency braking. The data collected by the sensors usually have noise. The collected abnormal data generally have a certain similarity. We build the ICCS scenario with the CIFAR-10 dataset [33], a large-scale, high-dimensional, and multi-type image dataset. The classes are entirely mutually exclusive. There is no overlap between automobiles and trucks.
We adopt the experimental strategy of “training set -validation set - test set”. The statistical information and experimental Settings are shown in Table II:

B. Comparative Models and Performance Metrics
We select several currently known DL-ADMs used in CPS for comparison with ACUDL: AnoGAN [20], ALAD [21],[22] and EGBADen[23] are GAN-based; DSEBM [15], DAGMM [14], DFC [18] and MOCCA [19] are AE-based models; OCGNN [25] is GNN-based. These methods are discussed in Section II-B. In addition, we add another baseline, GOAD [34], which is a transformation-based method, but the transformation can introduce redundant information that deviates from the data distribution.
We use the four metrics to evaluate these methods: AUC (Area Under Curve), AP (Average Precision), F1-score (F1), and Detection Time (DT) of each sample.
C. Parameter Settings
The parameter settings of ACUDL are shown in Tables III and IV, where the initial value of K , K 0, and the value of M in GMM are tested and set in the “Parameter sensitivity experiment ”. Other parameters are set by a large number of experiments. All the results are obtained from the mean and standard deviation of the experiments under five random seeds. The learning rate of Adam is set as 0.001.


D. Parameter Sensitivity Experiments of ACUDL
Parameter (K0) sensitivity experiment: The K 0 is critical for KNN to determine whether the adaptive correlation analysis is accurate.
The experimental results with different values of K under different scenarios are shown in Fig. 5. In most scenarios, the overall performance of ACUDL does not depend on different
values of K 0; In GIS scenario, there are certain fluctuations in AP and F1 curves when K 0∈[11, 15]. It is mainly because ECG5000 is a small time series dataset, and ACUDL
may be affected to some extent. These results mean that ACUDL is relatively insensitive
to K 0 as a whole.
Fig. 5. Parameter (K 0) sensitivity experimental results.
Parameter (M) sensitivity experiment: In GMM, the value of M is the key parameter to estimate the probability distribution. So we continue to
perform this experiment to set the appropriate value of M . The experimental results are shown in Fig. 6. In most application scenarios, the overall performance of ACUDL does not depend
on different values of M ; In SHS (Arrhythmia) scenario, there are certain fluctuations in F1 curves when M ∈[2, 6]. This is also because Arrhythmia is a smaller time series dataset than ECG5000.
These results mean that ACUDL is relatively insensitive to M as a whole.
Fig. 6. Parameter (M ) sensitivity experimental results.
Based on these results, the values of K 0 and M are set as shown in Table IV in the subsequent experiments.
E. Comparative Experiment of Anomaly Detection
The parameter settings of these comparative currently known models mentioned in Section IV-B are mainly based on the experimental settings in their respective references and fine-tuned with the optimal results obtained by many tests in each scenario of this experiment. Notably, we conduct the one-vs-all comparative experiment on the CIFAR-10 set. The experimental results are shown in Table V and Fig. 7. We can see that AE-based models get the worst results, while ACUDL achieves the best results in each application scenario as a whole.
1) Overall, GAN-based models (ALAD, AnoGAN and EGBADen) are inferior to the other methods due to the instability and high data quality requirement. Such as, the AP and F1 results of ALAD and EGBADen in SHS (Arrhythmia) scenarios, the AUC results of the three models in ICCS scenarios are not ideal.
2) After avoiding the shortcomings of AE and GAN-based models, DAGMM and OCGNN work better than DSEBM. They are almost equally effective and have their advantages in different scenarios. Such as, the results of DAGMM are better in SHS (ECG5000) scenario but worse in SHS (Arrhythmia) scenario; the results of OCGNN are better in GIS scenario but worse in ICCS scenario. The relatively better results of DAGMM are primarily due to the use of GMM, while those of OCGNN are primarily due to the introduction of GCN.
3) For the latest methods, MOCCA and DFC, due to their respective strengths, they achieve desirable results in different scenarios. Such as, DFC gets the second-best results in SHS (Arrhythmia) scenario and MOCCA gets the best F1 results in GIS and ICCS scenarios. But they are still weaker than ACUDL, and they takes more detection time than the other methods except ALAD overall.
4) GOAD is a transformation-based method that has received particular attention in the last two years. However, it is sensitive to data scenarios, and the AP and F1 results in SHS (ECG5000) and ICCS scenarios are very unsatisfactory, even though his AUC result in ICCS scenario is better than those of other methods except ACUDL.
5) From the average results of one-vs-all in ICCS scenario, we can also see that: ACUDL outperforms the other models; From the detailed results of one-vs-all, we also can see that: Overall, ACUDL and GOAD get the best AUC results, and they are comparable; ACUDL gets the best AP and F1 results and the detection times are also ideal. Therefore, the comprehensive analysis of the average and detailed results can prove that ACUDL obtains the best results in this scenario as a whole.
6) As for the detection time, the efficiency of ACUDL is in the middle level. Combining with the results of the other performance metrics, ACUDL obtains optimal detection performance.


Fig. 7. One-vs-all experimental results on ICCS (CIFAR-10 dataset).
All the results can verify that ACUDL combines the advantages of all the other methods, effectively avoids their disadvantages, and focuses on the correlation feature extraction and feature fusion during the adaptive training. Hence it achieves the best results as a whole.
F. Ablation Experiment of ACUDL
In this experiment, the previous experiment in “INTRODUCTION ” is extended to different application scenarios. The experimental results are shown in Table VI.
1) In any application scenario, the results of both CF-AE and OF-AE are reasonable. Especially, the results of CF-AE are significantly better than those of OF-AE in SHS (Arrhythmia) scenario, and the results of CF-AE and OF-AE are comparable in the other three scenarios, but OF-AE has a moderate AUC result in ICCS scenario.
2) D-AE (ACUDL) achieves the best results in all scenarios. Especially, it significantly outperforms CF-AE and OF-AE in SHS (Arrhythmia) and ICCS scenarios.

These results show that the two modules of the original features extraction and the correlation feature extraction working together can achieve better results. ACUDL is designed as an adaptive end-to-end model that makes each module fully functional in different CPS scenarios. Moreover, the fusion of the non-correlation features, correlation features and reconstruction features can further extract more effective and comprehensive features and improve the detection performance.
G. Noise Experiment of ACUDL
This experiment mainly tests the detection results of ACUDL in different application scenarios after adding noise to prove the scheme's robustness. The experimental results are shown in Table VII. In any application scenario, the effects are reduced when the noise samples are added, but all of them are relatively stable with different noise levels. Even though the AUC and AP results have the largest drops in SHS (Arrhythmia) scenario, they are still within acceptable limits. Moreover, as shown in Table V, they are comparable to the results obtained by baselines without adding noise.

These results indicate that, after noise injection, ACUDL can still adequately capture the correlation features based on adaptive training with dynamic graph to minimize the noise impact to the greatest extent.
H. Visualization Experiment
To further verify the quality of latent features obtained by each model, this experiment
makes a visualization comparison in SHS (ECG5000) scenario. We select some models
with better performances in the “Comparative experiment of anomaly detection ”. The embeddings extracted from each model are as the input of the t-SNE tool [35] to generate the 2D visualization results, as shown in Fig. 8: The distribution of DVAE's embeddings is messy; AnoGAN and OCGNN have some overlapping
regions; EGBADen has few overlapping regions, but the abnormal samples are divided into three sub-regions.
DAGMM and ACUDL have similar effects, but ACUDL's normal/abnormal boundary is more
apparent. Moreover, the visualization results are basically consistent with the detection
results in Table V. The results mean that ACUDL can accurately capture the correlation features by the
adaptive training to separate the normal and abnormal latent distributions more effectively,
providing a good prerequisite for anomaly detection.
Fig. 8. Visualization results of comparative models in the SHS (ECG5000) scenario (The blue dots represent normal samples, and the orange dots represent abnormal samples).
I. Simulation Experiment of ACUDL
This experiment simulates the SHS to verify that the adaptive correlation modeling of ACUDL has effective practical effects. In each sub-experiment, we randomly capture 1654 normal samples and 176 samples in different normal distributions, and randomly select 1480 normal samples to form the training set, 827 normal samples and 88 abnormal samples to form the validation set, 828 normal samples and the remaining 88 abnormal samples to form the test set. The experimental setup and procedure are the same as above. The results are shown in Table VIII.

We can see that the detection performance maintains an ideal and stable level in each sub-experiment. It indicates that ACUDL can be applied to different data characteristics due to the dynamic-graph-based adaptive training to obtain more accurate and comprehensive features.
J. Comparative Application Experiment
We select two typical CPS application scenarios to perform the application experiment:
Smart Grid (SG): We use the Decentral Smart Grid Control dataset (DSGC) [36]. It corresponds to an augmented version of the “Electrical Grid Stability Simulated Dataset”, which performs the local stability analysis of the 4-node star system. It contains the attributes of reaction time of participant, the value for electricity producer, power consumed, price elasticity, stability label, etc.
Smart Home (SH): We use the REFIT dataset [37], including 20 household data from Loughborough, U.K., over the period 2013-2014. We use the data of freezers in House-1. It is a time series dataset, and has two classes, namely the power demands of freezers in the kitchen and the garage, respectively.
Each dataset does not have explicit correlations, and the statistical information is shown in Table X. The main parameter settings of ACUDL for the two datasets are shown in Tables IX and XI. The experimental setup and procedure are the same as above. The results are shown in Table XII.




We can see that ACUDL achieves better results than the comparative methods in the two scenarios. Combining with the results in the “Comparative experiment of anomaly detection ”, we can see that ACUDL can effectively use various features to perform anomaly detection tasks well in application scenarios with or without correlation characteristics.
V Conclusion
The DL-ADM is one of the mainstream methods in various CPS application scenarios. There are implicit correlation features among CPS data besides many explicit original features. Moreover, The complex CPS data background makes the traditional static GNN unable to get accurate features. Worse yet, few current works involve data correlation analysis and adaptive training, resulting in unsatisfactory performance. Therefore, ACUDL uses the end-to-end mode, designs the D-AE, adaptively trains with dynamic graph to extract correlation features accurately, and combines the non-correlation features and reconstruction features to estimate the probability distribution and complete the anomaly detection tasks in CPSs. Experimental results show that ACUDL outperforms the currently known models in different CPS data scenarios as a whole. It also highlights that ACUDL is designed as an adaptive end-to-end model that can be well adapted to complex CPS data scenarios with different characteristics. Next, we will continue accurately capturing various correlation features and focus on building a more robust feedback model for more CPS scenarios.
References
- [1]I. F. Akyildiz and A. Kak, “The Internet of space Things/CubeSats: A ubiquitous cyber-physical system for the connected world,” Comput. Netw., vol. 150, pp. 134–149, 2019.
- [2]M. Farajzadeh-Zanjani, E. Hallaji, R. Razavi-Far, and M. Saif, “Generative-adversarial class-imbalance learning for classifying cyber-attacks and faults - a cyber-physical power system,” IEEE Trans. Dependable Secure Comput., vol. 19, no. 6, pp. 4068–4081, Nov./Dec.2021.
- [3]W. Yan, L. K. Mestha, and M. Abbaszadeh, “Attack detection for securing cyber physical systems,” IEEE Internet Things J., vol. 6, no. 5, pp. 8471–8481, Oct.2019.
- [4]Y. Yuan , “Data driven discovery of cyber physical systems,” Nature Commun., vol. 10, 2019, Art. no. 4894.
- [5]J. Zhang, L. Pan, Q. L. Han, C. Chen, S. Wen, and Y. Xiang, “Deep learning based attack detection for cyber-physical system CyberSecurity: A survey,” IEEE-CAA J. Automatica Sinica, vol. 9, no. 3, pp. 377–391, Mar.2022.
- [6]G. J. Qi and J. B. Luo, “Small data challenges in Big Data era: A survey of recent progress on unsupervised and semi-supervised methods,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 4, pp. 2168–2187, Apr.2022.
- [7]Y. Bao , “Computer vision and deep learning–based data anomaly detection method for structural health monitoring,” Struct. Health Monit., vol. 18, no. 2, pp. 401–421, 2019.
- [8]T. Schlegl , “Unsupervised anomaly detection with generative adversarial networks to guide marker discovery,” in Proc. Int. Conf. Inf. Process. Med. Imag., Boone, USA, 2017, pp. 146–157.
- [9]Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, and P. S. Yu, “A comprehensive survey on graph neural networks,” IEEE Trans. Neural Netw. Learn. Syst., vol. 32, no. 1, pp. 4–24, Jan.2021.
- [10]UC Irvine Machine Learning Repository, “Arrhythmia dataset,” 1998. [Online]. Available: http://archive.ics.uci.edu/ml/datasets/Arrhythmia
- [11]H. Y. Fan , “Correlation-aware deep generative model for unsupervised anomaly detection,” in Proc. Pacific-Asia Conf. Knowl. Discov. Data, Singapore, 2020, pp. 688–700.
- [12]L. Akoglu, H. Tong, and D. Koutra, “Graph based anomaly detection and description: A survey,” Data Mining Knowl. Discov., vol. 29, no. 3, pp. 628–688, 2015.
- [13]G. J. Qi and J. B. Luo, “Small data challenges in Big Data era: A survey of recent progress on unsupervised and semi-supervised methods,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 1, pp. 2168–2187, Apr.2022.
- [14]B. Zong , “Deep autoencoding Gaussian mixture model for unsupervised anomaly detection,” in Proc. 6th Int. Conf. Learn. Representations, Vancouver, BC, Canada, 2018, pp. 1–19.
- [15]S. F. Zhai , “Deep structured energy based models for anomaly detection,” in Proc. 33rd Int. Conf. Int. Conf. Mach. Learn., Daytona Beach, FL, USA, 2016, pp. 1100–1109.
- [16]K. Liu , “Generalized zero-shot learning for action recognition with web-scale video data,” World Wide Web-Internet Web Inf. Syst., vol. 22, no. 2, pp. 807–824, 2019.
- [17]N. Ding , “Real-time anomaly detection based on long short-Term memory and Gaussian mixture model,” Comput. Elect. Eng., vol. 79, 2019, Art. no. UNSP 106458.
- [18]J. Yang, Y. Shi, and Z. Qi, “Learning deep feature correspondence for unsupervised anomaly detection and segmentation,” Pattern Recognit., vol. 132, 2022, Art. no. 108874.
- [19]F. V. Massoli, F. Falchi, A. Kantarci, Ş. Akti, H. K. Ekenel, and G. Amato, “MOCCA: Multilayer one-class classification for anomaly detection,” IEEE Trans. Neural Netw. Learn. Syst., vol. 33, no. 6, pp. 2313–2323, Jun.2022.
- [20]T. Schlegl , “Unsupervised anomaly detection with generative adversarial networks to guide marker discovery,” in Proc. Int. Conf. Inf. Process. Med. Imag., Boone, MA, USA, 2017, pp. 146–157.
- [21]H. Zenati, M. Romain, C. -S. Foo, B. Lecouat, and V. Chandrasekhar, “Adversarially learned anomaly detection,” in Proc. IEEE Int. Conf. Data Mining, Singapore, 2018, pp. 727–736.
- [22]O. Knapp , “Adversarially learned anomaly detection on CMS open data: Re-discovering the top quark,” Eur. Phys. J. Plus, vol. 136, 2021, Art. no. 236.
- [23]X. Han, X. H. Chen, and L. P. Liu, “GAN ensemble for anomaly detection,” in Proc. 35 AAAI Conf. Artif. Intell., 2021, pp. 4090–4097.
- [24]T. N. Kipf and M. Welling, “Semi-supervised classification with graph convolutional networks,” in Proc. 5th Int. Conf. Learn. Representations, Toulon, France, 2017, pp. 1–14.
- [25]X. H. Wang , “One-class graph neural networks for anomaly detection in attributed networks,” Neural Comput. Appl., vol. 33, no. 18, pp. 12073–12085, 2021.
- [26]P. Veličković , “Graph attention networks,” in Proc. 6th Int. Conf. Learn. Representations, Vancouver, BC, Canada, 2018, pp. 1–12.
- [27]W. F. Liu , “Human activity recognition by manifold regularization based dynamic graph convolutional networks,” Neurocomputing, vol. 444, pp. 217–225, 2021.
- [28]Z. Q. Pan, W. Y. Chen, and H. H. Chen, “Dynamic graph learning for session-based recommendation,” Mathematics, vol. 9, no. 12, 2021, Art. no. 1420.
- [29]S. H. Cheong, Y. W. Si, and R. K. Wong, “Online force-directed algorithms for visualization of dynamic graphs,” Inf. Sci., vol. 556, pp. 223–255, 2021.
- [30]S. C. Fu , “Dynamic graph learning convolutional networks for semi-supervised classification,” ACM Trans. Multimedia Comput. Commun. Appl., vol. 17, no. 1, 2021, Art. no. 4.
- [31]UCR Time Series Classification Repository, “ECG5000 dataset,” 2000. [Online]. Available: http://www.timeseriesclassification.com/description.php?Dataset=ECG5000
- [32]UC Irvine Machine Learning Repository, “Satellite dataset,” 1993. [Online]. Available: https://archive-beta.ics.uci.edu/ml/datasets/statlog+landsat+satellite
- [33]A. Krizhevsky, “CIFAR-10 dataset,” 2009. [Online]. Available: http://www.cs.toronto.edu/∼kriz/cifar.html
- [34]L. Bergman and Y. Hoshen, “Classification-based anomaly detection for general data,” in Proc. 8th Int. Conf. Learn. Representations, 2020, pp. 1–12.
- [35]L. Maaten and G. Hinton, “Visualizing data using t-SNE,” J. Mach. Learn. Res., vol. 9, pp. 2579–2605, 2008.
- [36]U C Irvine Machine Learning Repository, “Electrical grid stability simulated dataset,” 2018. [Online]. Available: https://www.kaggle.com/datasets/pcbreviglieri/smart-grid-stability
- [37]Smart Homes and Energy Demand Reduction, “REFIT data set,” 2017. [Online]. Available: https://www.refitsmarthomes.org/datasets/





