Model search algorithm based on genetic algorithms (GA).

TerGA is a model search algorithm based on genetic algorithms (GA). An “individual” (i.e. genome) in this context is a combination of features that will be associated together using a selected "language" to compute a score that will constitute the prediction model. Depending on the type of fitting (i.e. evaluation) function that is maximized, the fatures are weighed by specific coefficients. In short the algorithm is based on different operations such as crossing, mutating and evolving different “individuals” and evaluating their fitness to the “environment” which is represented by the variable to be predicted.

Usage

terga2(
  sparsity = c(1:10),
  max.nb.features = 1000,
  language = "terinter",
  objective = "auc",
  evalToFit = "accuracy_",
  k_penalty = 0,
  estimate_coefs = FALSE,
  scoreFormula = scoreRatio,
  epsilon = "NULL",
  size_pop = 100,
  size_pop_random = size_pop,
  final.pop.perc = 100,
  in_pop = "NULL",
  popSourceFile = "NULL",
  popSaveFile = "NULL",
  individual_vec = individual_vec_v2,
  randomSigns = FALSE,
  unique_vars = FALSE,
  select_perc = 25,
  selector = list(selector_v1, selector_v2),
  select_percByMethod = list(50, 50),
  cross = TRUE,
  crosser = crossingIndividual_v3,
  mutate = TRUE,
  mutate_size = 75,
  mutate_rate = 50,
  mutator = mutator_v2,
  evolver = "v2m",
  nb_generations = 100,
  convergence = TRUE,
  convergence_steps = 10,
  evolve_k1 = TRUE,
  plot = FALSE,
  verbose = FALSE,
  warnings = FALSE,
  debug = FALSE,
  print_ind_method = "short",
  parallelize.folds = TRUE,
  nCores = 4,
  seed = "NULL",
  maxTime = Inf,
  experiment.id = "NULL",
  experiment.description = "NULL",
  experiment.save = "nothing"
)

Arguments

language: is the language that is used by the different algorithms bin, bininter, ter, terinter, (default:"terinter")
size_pop_random: the number of individuals initialized randomly. This is used by the metal algorithm (i.e. aggregator method).
sparsity:: number of features in a given model (default:1:10). This is a vector with the model-size range (number of features used by a model).
objective:: This is the task that is to be learned and can be either classification (auc) or can be a regression (cor) (default:auc).
evalToFit:: The model performance attribute to use as fitting score (default:"accuracy_"). Other choices are c("accuracy_", "auc_", "precision_","recall_","f_score_") for the classification task. It can be either rho, rho-squared or minimizing the standar error of the regression for the regression task.
k_penalty:: Model-size penalization effect applied on the fit scpre (default: 0).
estimate_coefs:: _deprecated_ A particular option for the regression mode with the aic objective (default:FALSE)
max.nb.features:: If this number is smaller than the number of variables in the dataset, the max.nb.features most significant features will be selected and the dataset will be restricted (default:1000).
size_pop:: the number of individuals in a population to be evolved (default:100)
final.pop.perc:: What percentage of the final population should be returned (default:100)
in_pop:: a specific population of models that can be evolved. This is particulary useful for the metal algorithm
popSourceFile:: It is possible to load a population of models that has been already learned before. With this option we can specify such file (default:NULL).
popSaveFile:: Once the population of models evolved, we can store it in another file (default:NULL).
scoreFormula:: a Function that contains the ratio Formula or other specific ones
epsilon:: a very small value to be used with the ratio language (useCustomLanguage) (default: NULL). When null it is going to be calculated by the minimum value of X divided by 10.
individual_vec:: The function that is used to generate an individual (default:individual_vec_v2).
randomSigns:: When generating an individual composed of a set of features, we can set the coefficients of the variables from -1 or 1 randomly (default:FALSE).
unique_vars:: When performing operations on multiple individuals it can be that in an individual we have multiple time the same feature. If set to TRUE this individual will be destroyed (default:FALSE)
select_perc:: The percentage of the population to be selected for crossing/mutation (default:50)
selector:: During the selection process, the parent population can be selected using different strategies. For instance the default process is performed using both elite and random selection (default:list(selector_v1, selector_v2)).
select_percByMethod:: A list contaning the percentage of individuals that each of the methods specified in selector should get.
cross:: A swithch, which activates the crossing operator (default:TRUE).
crosser:: The method that should be applied to cross individuals together (default:crossingIndividual_v4).
mutate:: A swithch, which activates the mutation operator (default:TRUE).
mutate_size:: The percentage of individuals in the population to be mutated (default:70).
mutate_rate:: The percentage of features in an individual to be mutated (default:50).
mutator:: The method that should be applied to mutate individuals (default:mutator_v2). The operations can be, deletion, insertion or changing the coeffiecient (from -1 to 1 and vice-versa).
evolver:: The method that will be used to evolve the individuals together. This is the core of the algorithm and can be one of different implementations c("v1", "v2", "v3","v4") where the default one is "v4".
nb_generations:: The maximum number of generations to evolve the population.
convergence:: A switch which activates the automatic convergence of the algorithm when the best individual is not improving (default:TRUE).
convergence_steps:: The number of generations after which we consider convergence (default:10).
evolve_k1:: Whether or not to evaluate exhaustively the features for model size = 1. This will take a lot of time if the dataset is large, thus the possibility to evolve this using the GA is interesting. (default:TRUE)
plot:: Plot graphics indicating the evolution of the simulation (default:FALSE)
verbose:: Print out information on the progress of the algorithm (default:FALSE).
warnings:: Print out warnings when runnig (default:FALSE).
debug:: Print out detailed information on the progress of the algorithm (default:FALSE)
print_ind_method:: One of c("short","graphical") indicates how to print a model and subsequently a population during the run (default:"short").
parallelize.folds:: parallelize folds when cross-validating (default:TRUE).
nCores:: The number of cores to execute the program. If nCores = 1 than the program runs in a non parallel mode
seed:: The seed to be used for reproductibility. If seed=NULL than it is not taken into account (default:NULL).
maxTime:: We can use a time limit to evolve a population (default:Inf).
experiment.id:: The id of the experiment that is to be used in the plots and comparitive analyses (default is the learner's name, when not specified)
experiment.description:: A longer description of the experiment. This is important when many experiments are run and can also be printed in by the printExperiment function.
experiment.save:: Data from an experiment can be saved with different levels of completness, with options to be selected from c("nothing", "minimal", "full"), default is "minimal"

Value

an object of the classifier class, containing a list of parameters