preprocess.Rd
Transforms a transactional table into an id aggregated table with custom options for aggregation methods for numeric and categorical columns.
preprocess( df, samplesize = NA, numeric_operation_list = c("mean"), categories = NULL, target = NA, target_agg = "mean", verbose = TRUE )
df | data.frame, the data to preprocess |
---|---|
samplesize | numeric, the fraction of ids used to create a sub-sample of the input df |
numeric_operation_list | list, a list of the aggregation functions to apply to numeric columns |
categories | list, a list of the categorical columns to aggregate |
target | character, the column to use as a response variable for supervised learning |
target_agg | character, the aggregation function to use to aggregate the target column |
verbose | logical whether information about the preprocessing should be given |
An id attributes data frame, e.g. customer attributes if the id represents customer IDs. A single row per unique id.