Get the data frame thresholds for discretization

It takes a data frame and returns another data frame indicating the threshold for each bin (or segment) in order to discretize the variable.

discretize_get_bins(data, n_bins = 5, input = NULL)

Arguments

data	Data frame source
n_bins	The number of desired bins (or segments) that each variable will have.
input	Vector of string containing all the variables that will be processed. If empty it will run for all numerical variables that match the following condition, the number of unique values must be higher than the ones defined at 'n_bins' parameter. NAs values are automatically handled by converting them into another category (more info about it at https://livebook.datascienceheroes.com/data-preparation.html#treating-missing-values-in-numerical-variables). This function must be used with discretize_df. If it is needed a different number of bins per variable, then the function must be called more than once.

Value

Data frame containing the thresholds or cuts to bin every variable

Examples

if (FALSE) {
# Getting the bins thresholds for each. If input is missing, will run for all numerical variables.
d_bins=discretize_get_bins(data=heart_disease,
                           input=c("resting_blood_pressure", "oldpeak"),
                           n_bins=5)

# Now it can be applied on the same data frame, or in a new one (for example in a predictive model
# that change data over time)
 heart_disease_discretized=discretize_df(data=heart_disease, data_bins=d_bins, stringsAsFactors=T)

# Checking results
df_status(heart_disease_discretized)
}

Arguments

Value

Examples

Contents