Discretize a data frame

Converts all numerical variables into factor or character, depending on 'stringsAsFactors' parameter, based on equal frequency criteria. The thresholds for each segment in each variable are generated based on the output of discretize_get_bins function, which returns a data frame containing the threshold for each variable. This result is must be the 'data_bins' parameter input. Important to note that the returned data frame contains the non-transformed variables plus the transformed ones. More info about converting numerical into categorical variables can be found at: https://livebook.datascienceheroes.com/data-preparation.html#data_types

discretize_df(data, data_bins, stringsAsFactors = T)

Arguments

data	Input data frame
data_bins	data frame generated by 'discretize_get_bins' function. It contains the variable name and the thresholds for each bin, or segment.
stringsAsFactors	Boolean variable which indicates if the discretization result is character or factor. When TRUE, the segments are ordered. TRUE by default.

Value

Data frame with the transformed variables

Examples

if (FALSE) {
# Getting the bins thresholds for each. If input is missing, will run for all numerical variables.
d_bins=discretize_get_bins(data=heart_disease,
input=c("resting_blood_pressure", "oldpeak"), n_bins=5)

# Now it can be applied on the same data frame, or in a new one (for example in a predictive model that change data over time)
heart_disease_discretized=discretize_df(data=heart_disease, data_bins=d_bins, stringsAsFactors=T)

}

Arguments

Value

Examples

Contents