Preprocess Function — preprocess • citrus

Transforms a transactional table into an id aggregated table with custom options for aggregation methods for numeric and categorical columns.

preprocess(
  df,
  samplesize = NA,
  numeric_operation_list = c("mean"),
  categories = NULL,
  target = NA,
  target_agg = "mean",
  verbose = TRUE
)

Arguments

df	data.frame, the data to preprocess
samplesize	numeric, the fraction of ids used to create a sub-sample of the input df
numeric_operation_list	list, a list of the aggregation functions to apply to numeric columns
categories	list, a list of the categorical columns to aggregate
target	character, the column to use as a response variable for supervised learning
target_agg	character, the aggregation function to use to aggregate the target column
verbose	logical whether information about the preprocessing should be given

Value

An id attributes data frame, e.g. customer attributes if the id represents customer IDs. A single row per unique id.