Abstract
The application of machine learning (ML) libraries has tremendously increased in many domains, including autonomous driving systems, medical, and critical industries. Vulnerabilities of such libraries could result in irreparable consequences. However, the characteristics of software security vulnerabilities have not been well studied. In this paper, to bridge this gap, we take the first step toward characterizing and understanding the security vulnerabilities of seven well-known ML libraries, including TensorFlow, PyTorch, Scikit-learn, Mlpack, Pandas, Numpy, and Scipy. To do so, we collected 683 security vulnerabilities to explore four major factors: 1) vulnerability types, 2) root causes, 3) symptoms, and 4) fixing patterns of security vulnerabilities in the studied ML libraries. The findings of this study can help developers and researchers understand the characteristics of security vulnerabilities across the studied ML libraries.