Transforms a transactional table into an id aggregated table with custom options for aggregation methods for numeric and categorical columns.

preprocess(
  df,
  samplesize = NA,
  numeric_operation_list = c("mean"),
  categories = NULL,
  target = NA,
  target_agg = "mean",
  verbose = TRUE
)

Arguments

df

data.frame, the data to preprocess

samplesize

numeric, the fraction of ids used to create a sub-sample of the input df

numeric_operation_list

list, a list of the aggregation functions to apply to numeric columns

categories

list, a list of the categorical columns to aggregate

target

character, the column to use as a response variable for supervised learning

target_agg

character, the aggregation function to use to aggregate the target column

verbose

logical whether information about the preprocessing should be given

Value

An id attributes data frame, e.g. customer attributes if the id represents customer IDs. A single row per unique id.