Given a data frame, we need to create models (xgboost, random forest, regression, etc). Each one of them has its constraints regarding data types. Many errors appear when we are creating models just because of data format. This function returns, given a certain model, which are the constraints that the data is not satisfying. This way we can anticipate and correct errors before we call for model creation. This function is quite related to data_integrity.

data_integrity_model(data, model_name, MAX_UNIQUE = 35)

Arguments

data

data frame or a single vector

model_name

model name, you can check all the available models by printing `metadata_models` data frame.

MAX_UNIQUE

max unique threshold to flag a categorical variable as a high cardinality one. Normally above 35 values it is needed to reduce the number of different values. # Example 1: data_integrity_model(data=heart_disease, model_name="pca") # Example 2: # changing the default minimum threshold to flag a variable as high cardiniality data_integrity_model(data=iris, model_name="xgboost", MAX_UNIQUE=50)

Value

an `integritymodel` object