Abstract
Graph neural networks (GNNs) have drawn increasing attention in recent years by addressing many machine learning challenges on graphs, ranging from node classification to graph classification. However, GNN models are known to be vulnerable to adversarial attacks, usually by modifying edges or node features using gradient-based methods or deep rein-forcement learning. Previous research mainly focuses on node classification attacks, and only a few attack graph classification. Unlike the few previous works attacking graph classification globally, we consider a practical attack setting where the perturbations used during the attack are highly constrained and localized. Under this setting, the attacker is only allowed to control a subgraph with a few nodes in a much larger target graph. Our attack algorithm perturbs only the features of a few of the selected neighboring nodes by using a Monte Carlo Tree Search algorithm. Our extensive experimental evaluation demonstrates the effectiveness of the proposed method by showing that the performance of the GNN model goes down significantly after perturbing only a few node features, consistently beating the baseline attacks in terms of attack performance. These results indicate that GNN based graph classification methods could be attacked by just changing a few node features without modifying the graph structure. We believe that the fragility of these GNN techniques for graph classification raises further questions with respect to their suitability in adversarial domains such as cybersecurity.