Abstract
Recent hardware trojan detection (HTD) techniques employ supervised machine learning to ascertain the presence of Hardware Trojans in Integrated Chips (ICs) manufactured in untrusted Foundries. However, Supervised Machine Learning models are vulnerable to adversarial attacks, through which adversaries could utilize resulting adversarial samples to modify their hardware trojans and circumvent these detection techniques. In the proposed work, adversarial attacks are introduced against published supervised machine learning models that use classified data of side-channel analysis information for hardware trojan detection in integrated circuits. It is demonstrated that these models are susceptible to Feature Space Adversarial Attacks, achieving a successful evasion rate of more than 50 %, with over 50 % of adversarial samples being misclassified. Additionally, as a means to mitigate the adversarial attack, adversarial learning techniques are implemented on the attacked model to reinforce the resilience of the machine learning model, achieving more than 50 % increase in resiliency against adversarial attacks, while accuracy loss of the retrained model was maintained below 10%.