EFDA-JET-CP(14)05/12
Feature Selection for Disruption Prediction from Scratch in JET by using Genetic Algorithms and Probabilistic Predictors
Recently, a probabilistic classifier has been developed at JET to be used as predictor from scratch. It has been applied to a database of 1237 JET ITER-like wall discharges (of which 201 disrupted) with good results: success rate of 94% and false alarm rate of 4.21%. A combinatorial analysis between 14 features to ensure the selection of the best ones to achieve good enough results in terms of success rate and false alarm rate was performed. All possible combinations with a number of features between 2 and 7 were tested and 9893 different predictors were analyzed. An important drawback in this analysis was the time required to compute the results that can be estimated in 1731 hours (~2.4 months). Genetic algorithms (GA) are searching algorithms that simulate the process of natural selection. In this article, the GA and the Venn predictors are combined with the objective not only of finding good enough features within the 14 available ones but also of reducing the computational time requirements. Five different performance metrics as measures of the GA fitness function have been evaluated. The measure F1-score needed 15 generations to reach the highest fitness value, equivalent to assess 420 predictors at 73.5 hours. Accuracy-rate measure required 12 generations (336 predictors at 58.8 hours). Matthew's correlation coefficient (MCC) found the most relevant features after 8 generations (224 predictors, 39.2 hours). Markedness measurement scored 7 generations (196 predictors, 34.3 hours) and the best assessment was the measure called Informedness (the difference of success rate and false alarms), with just 6 generations (168 predictors at 29.4 hours). In all cases, the results show a success rate of 94% and a false alarm rate of 4.21%.