For each variable it returns: Quantity and percentage of zeros (q_zeros and p_zeros respectevly). Same metrics for NA values (q_NA/p_na), and infinite values (q_inf/p_inf). Last two columns indicates data type and quantity of unique values. 'status' function is the evolution of 'df_status'. Main change is to have the decimal points as it is, except in percentage. For example now p_na=0.04 means 4 This time it's easier to embbed in a data process flow and to take actions based on this number.

status(data)

Arguments

data

data frame, tibble or a single vector

Value

Tibble with metrics

Examples

status(heart_disease)
#> variable q_zeros p_zeros q_na p_na q_inf p_inf type #> 1 age 0 0.0000000 0 0.00000000 0 0 integer #> 2 gender 0 0.0000000 0 0.00000000 0 0 factor #> 3 chest_pain 0 0.0000000 0 0.00000000 0 0 factor #> 4 resting_blood_pressure 0 0.0000000 0 0.00000000 0 0 integer #> 5 serum_cholestoral 0 0.0000000 0 0.00000000 0 0 integer #> 6 fasting_blood_sugar 258 0.8514851 0 0.00000000 0 0 factor #> 7 resting_electro 151 0.4983498 0 0.00000000 0 0 factor #> 8 max_heart_rate 0 0.0000000 0 0.00000000 0 0 integer #> 9 exer_angina 204 0.6732673 0 0.00000000 0 0 integer #> 10 oldpeak 99 0.3267327 0 0.00000000 0 0 numeric #> 11 slope 0 0.0000000 0 0.00000000 0 0 integer #> 12 num_vessels_flour 176 0.5808581 4 0.01320132 0 0 integer #> 13 thal 0 0.0000000 2 0.00660066 0 0 factor #> 14 heart_disease_severity 164 0.5412541 0 0.00000000 0 0 integer #> 15 exter_angina 204 0.6732673 0 0.00000000 0 0 factor #> 16 has_heart_disease 0 0.0000000 0 0.00000000 0 0 factor #> unique #> 1 41 #> 2 2 #> 3 4 #> 4 50 #> 5 152 #> 6 2 #> 7 3 #> 8 91 #> 9 2 #> 10 40 #> 11 3 #> 12 4 #> 13 3 #> 14 5 #> 15 2 #> 16 2