Skip to contents

terga1 is a model search algorithm based on genetic algorithms (GA). A “genome” or “individual” in this context is a combination of features that will be associated together to compute a score that will be the prediction model. Depending on the type of fitting function that is maximized the fatures are weighed by specific coefficients. In short the algorithm is based on different operations such as crossing, mutating and evolving different “individuals” and evaluating their fitness to the “environment” which is represented by the variable to be predicted.

Usage

terga1(
  sparsity = c(1:10),
  size_pop = 100,
  size_world = "NULL",
  max.nb.features = 1000,
  popSourceFile = "NULL",
  popSaveFile = "NULL",
  language = "terinter",
  scoreFormula = scoreRatio,
  epsilon = "NULL",
  unique_vars = FALSE,
  objective = "auc",
  k_penalty = 0,
  evalToFit = "fit_",
  estimate_coefs = FALSE,
  intercept = "NULL",
  select_type = "mixed",
  select_perc1 = 20,
  select_perc2 = 30,
  perc_best_ancestor = 10,
  mutate_size = 70,
  mutate_rate = 50,
  nb_generations = 100,
  convergence = TRUE,
  convergence_steps = 10,
  evolve_k1 = TRUE,
  plot = FALSE,
  verbose = TRUE,
  warnings = FALSE,
  debug = FALSE,
  print_ind_method = "short",
  parallelize.folds = TRUE,
  nCores = 4,
  seed = "NULL",
  experiment.id = "NULL",
  experiment.description = "NULL",
  experiment.save = "nothing"
)

Arguments

language

is the language that is used by the different algorithms bin, bininter, ter, terinter, ratio, (default:"terinter")

sparsity:

number of features in a given model. This is a vector with multiple lengths.

size_pop:

the number of individuals in a population to be evolved.

size_world:

this is the number of features in the dataset.

max.nb.features:

focuses only on the subset of top most significant features (default:1000)

popSourceFile:

A population of models that can start as a first generation to be evolved (default:NULL).

popSaveFile:

(??)

scoreFormula:

a Function that contains the ratio Formula or other specific ones

epsilon:

a small value to be used with the ratio language (default: NULL). When null it is going to be calculated by the minimum value of X divided by 10.

unique_vars:

logical (default: FALSE) indicates weather unique variables can be used in a model or population.

objective:

this can be auc, cor or aic. Terga can also predict regression, other than class prediction. (default:auc)

estimate_coefs:

non ternary solution for the aic objective (default:FALSE)

intercept:

(Interceot for the a given model) (default:NULL)

evalToFit:

The model performance attribute to use as fitting score (default:"fit_"). Other choices are c("auc_","accuracy_","precision_","recall_","f_score_")

k_penalty:

Penalization of the fit by the k_sparsity (default: 0)

select_type:

the selection operator type. can be mixed, elite or tournoi (default: mixed)

select_perc1:

percentage of individuals to be selected with elite

select_perc2:

percentage of individuals to be selected with tournoi

perc_best_ancestor:

percentage of best ancentors as seeding in the new population

mutate_size:

percentage of individuals in the population to be mutated

mutate_rate:

percentage of features in an individual to be mutated

plot:

plot graphics indicating the evolution of the simulation (default:FALSE)

convergence:

should the algorithm converge when the best individual is not improving (default:TRUE).

convergence_steps:

the number of generations after which we consider convergence (default:10).

evolve_k1:

weather or not to evaluate exhaustively the features for k_sparse=1. This will take a lot of time if the dataset is large, thus the possibility to evolve this using the GA. (default:TRUE)

verbose:

print out information on the progress of the algorithm (default:TRUE)

warnings:

Print out warnings when runnig (default:FALSE).

debug:

print debug information (default:FALSE)

print_ind_method:

One of c("short","graphical") indicates how to print a model and subsequently a population during the run (default:"short").

parallelize.folds:

parallelize folds when cross-validating (default:TRUE)

nb_generations:

maximum number of generations to evolve the population.

nCores:

the number of cores to execute the program. If nCores=1 than the program runs in a non parallel mode

seed:

the seed to be used for reproductibility. If seed=NULL than it is not taken into account (default:NULL).

experiment.id:

The id of the experiment that is to be used in the plots and comparitive analyses (default is the learner's name, when not specified)

experiment.description:

A longer description of the experiment. This is important when many experiments are run and can also be printed in by the printExperiment function.

experiment.save:

Data from an experiment can be saved with different levels of completness, with options to be selected from c("nothing", "minimal", "full"), default is "minimal"

Value

an object containing a list of parameters for this classifier

Details

terga1: Model search algorithm based on genetic algorithms (GA)