discretize_df.Rd
Converts all numerical variables into factor or character, depending on 'stringsAsFactors' parameter,
based on equal frequency criteria. The thresholds for each segment in each variable are generated based on the
output of discretize_get_bins
function, which returns a data frame
containing the threshold for each variable. This result is must be the 'data_bins' parameter input.
Important to note that the returned data frame contains the non-transformed variables plus the transformed ones.
More info about converting numerical into categorical variables
can be found at: https://livebook.datascienceheroes.com/data-preparation.html#data_types
discretize_df(data, data_bins, stringsAsFactors = T)
data | Input data frame |
---|---|
data_bins | data frame generated by 'discretize_get_bins' function. It contains the variable name and the thresholds for each bin, or segment. |
stringsAsFactors | Boolean variable which indicates if the discretization result is character or factor. When TRUE, the segments are ordered. TRUE by default. |
Data frame with the transformed variables
if (FALSE) { # Getting the bins thresholds for each. If input is missing, will run for all numerical variables. d_bins=discretize_get_bins(data=heart_disease, input=c("resting_blood_pressure", "oldpeak"), n_bins=5) # Now it can be applied on the same data frame, or in a new one (for example in a predictive model that change data over time) heart_disease_discretized=discretize_df(data=heart_disease, data_bins=d_bins, stringsAsFactors=T) }