using machine learning to predict risk of type2 diabetes

classification logistic regression LASSO regression random forest decision tree naive bayes knn xgboost

Type 2 diabetes is one of the most prevalent chronic diseases in the United States, affecting the health of millions of people, and putting an enormous financial burden on the US economy.

Mark Y
2024-02-14

Introduction

This “assignment” was inspired on the works of Xie Z, Nikolayeva O, Luo J, Li D. Building Risk Prediction Models for Type 2 Diabetes Using Machine Learning Techniques. Their paper can be accessed via this link.

My objective is to practice and learn how to build predictive models using machine learning techniques, in the spirit of the original study, but using the most recent survey data (2022). It would be a bonus if my models came close to the performance of Dr Xie’s.

To recap, the original definition of an individual with Type 2 Diabetes is: - an individual aged 30 years or older (respondents younger than 30 years old were excluded as they most likely had Type 1 diabetes), - an individual who had been told by a healthcare professional that he/she had Type 2 diabetes, - respondents who had pre-diabetes, or respondents who had diabetes while pregnant, were excluded from the study.

rm(list = ls())
sessionInfo()
# Set packages and dependencies
pacman::p_load("tidyverse", #for tidy data science practice
               "tidymodels", "workflows",# for tidy machine learning
               "pacman", #package manager
               "devtools", #developer tools
               "Hmisc", "skimr", "broom", "modelr",#for EDA
               "jtools", "huxtable", "interactions", # for EDA
               "ggthemes", "ggstatsplot", "GGally",
               "scales", "gridExtra", "patchwork", "ggalt", "vip",
               "ggstance", "ggfortify", # for ggplot
               "DT", "plotly", #interactive Data Viz
               # Lets install some ML related packages that will help tidymodels::
               "usemodels", "poissonreg", "agua", "sparklyr", "dials",#load computational engines
               "doParallel", # for parallel processing (speedy computation)
               "ranger", "xgboost", "glmnet", "kknn", "earth", "klaR", "discrim", "naivebayes",#random forest
               "janitor", "lubridate", "haven")

Data Source

I obtained the latest available Behavioral Risk Factor Surveillance System (BRFSS 2022) data available from the Centers for Disease Control and Prevention.

The Behavioral Risk Factor Surveillance System (BRFSS) is the US’s premier system of health-related telephone surveys that collect state data about U.S. residents regarding their health-related risk behaviors, chronic health conditions, and use of preventive services. Established in 1984 with 15 states, BRFSS now collects data in all 50 states as well as the District of Columbia and three U.S. territories. BRFSS completes more than 400,000 adult interviews each year, making it the largest continuously conducted health survey system in the world.

The BRFSS 2022 data from CDC was stored in an SAS (.XPT) file format. This was imported into R using read_xpt from the haven package. It had 445132 rows representing individual survey responses and 328 columns representing variables.

df <- read_xpt("LLCP2022.XPT")

I included most of the independent variables from the original study, as well as several new variables of interest. Below is a summary of dependent and independent variables used:

Variable Description Values
diabete4 (Ever told) (you had) diabetes? yes, no
bmi5cat Four-categories of BMI (body mass index) 1. underweight, 2. normal weight, 3. overweight, 4.
smoker3 Four-levels of smoker status 1.everyday smoker, 2. someday smoker, 3. former smoker, 4. non-smoker
cvdstrk3 (Ever told) (you had) a stroke? 1.yes, 2. no
cvdcrhd4 (Ever told) (you had) angina or coronary heart disease? 1.yes, 2. no

GENHLTH Question: Would you say that in general your health is: 1 Excellent 71,878 16.15 17.40 2 Very good 148,444 33.35 31.84 3 Good 143,598 32.26 32.48 4 Fair 60,273 13.54 13.69 5 Poor 19,741 4.43 4.29 7 Don’t know/Not Sure 810 0.18 0.19 9 Refused 385 0.09 0.10 BLANK Not asked or Missing 3 . .

_AGEG5YR 1 Age 18 to 24 Notes: 18 <= AGE <= 24 26,941 6.05 11.90 2 Age 25 to 29 Notes: 25 <= AGE <= 29 21,990 4.94 7.72 3 Age 30 to 34 Notes: 30 <= AGE <= 34 25,807 5.80 9.38 4 Age 35 to 39 Notes: 35 <= AGE <= 39 28,526 6.41 7.63 5 Age 40 to 44 Notes: 40 <= AGE <= 44 29,942 6.73 8.41 6 Age 45 to 49 Notes: 45 <= AGE <= 49 28,531 6.41 6.49 7 Age 50 to 54 Notes: 50 <= AGE <= 54 33,644 7.56 7.72 8 Age 55 to 59 Notes: 55 <= AGE <= 59 36,821 8.27 7.31 9 Age 60 to 64 Notes: 60 <= AGE <= 64 44,511 10.00 8.67 10 Age 65 to 69 Notes: 65 <= AGE <= 69 47,099 10.58 6.98 11 Age 70 to 74 Notes: 70 <= AGE <= 74 43,472 9.77 6.32 12 Age 75 to 79 Notes: 75 <= AGE <= 79 32,518 7.31 4.37 13 Age 80 or older Notes: 80 <= AGE <= 99 36,251 8.14 4.94 14 Don’t know/Refused/Missing Notes: 7 <= AGE <= 9 9,079 2.04 2.15

_BMI5CAT Question: Four-categories of Body Mass Index (BMI) 1 Underweight Notes: _BMI5 < 1850 (_BMI5 has 2 implied decimal places) 6,778 1.71 2.03 2 Normal Weight Notes: 1850 <= _BMI5 < 2500 116,976 29.52 30.50 3 Overweight Notes: 2500 <= _BMI5 < 3000 139,995 35.32 34.14 4 Obese Notes: 3000 <= _BMI5 < 9999 132,577 33.45 33.32

CHECKUP1 Question: About how long has it been since you last visited a doctor for a routine checkup? 1 Within past year (anytime less than 12 months ago) 350,944 78.84 74.97 2 Within past 2 years (1 year but less than 2 years ago) 41,919 9.42 10.74 3 Within past 5 years (2 years but less than 5 years ago) 24,882 5.59 6.75 4 5 or more years ago 19,079 4.29 5.13 7 Don’t know/Not sure 5,063 1.14 1.39 8 Never 2,509 0.56 0.83 9 Refused 733 0.16 0.20

INCOME3 Question: Is your annual household income from all sources: (If respondent refuses at any income level, code ´Refused.´) 1 Less than $10,000 10,341 2.39 2.95 2 Less than $15,000 ($10,000 to < $15,000) 11,031 2.55 2.43 3 Less than $20,000 ($15,000 to < $20,000) 14,300 3.31 3.44 4 Less than $25,000 ($20,000 to < $25,000) 20,343 4.71 4.71 5 Less than $35,000 ($25,000 to < $35,000) 42,294 9.79 9.92 6 Less than $50,000 ($35,000 to < $50,000) 46,831 10.84 10.20 7 Less than $75,000 ($50,000 to < $75,000) 59,148 13.69 12.42 8 Less than $100,000? ($75,000 to < $100,000) 48,436 11.21 10.42 9 Less than $150,000? ($100,000 to < $150,000)? 50,330 11.65 11.19 10 Less than $200,000? ($150,000 to < $200,000) 22,553 5.22 5.39 11 $200,000 or more 23,478 5.43 6.13 77 Don’t know/Not sure 36,114 8.36 10.44 99 Refused 47,001 10.87 10.37 BLANK Not asked or Missing 12,932 . .

FLUSHOT7 Question: During the past 12 months, have you had either flu vaccine that was sprayed in your nose or flu shot injected into your arm? 1 Yes 209,256 52.11 44.53 2 No—Go to Section 15.03 PNEUVAC4 188,755 47.01 54.46 7 Don’t know/Not Sure—Go to Section 15.03 PNEUVAC4 2,455 0.61 0.69 9 Refused—Go to Section 15.03 PNEUVAC4 1,073 0.27 0.32 BLANK 43,593 . .

EMPLOY1 Question: Are you currently…? 1 Employed for wages 186,004 42.38 47.34 2 Self-employed 38,768 8.83 9.46 3 Out of work for 1 year or more 8,668 1.97 2.54 4 Out of work for less than 1 year 8,044 1.83 2.56 5 A homemaker 17,477 3.98 4.94 6 A student 11,111 2.53 4.80 7 Retired 137,083 31.23 20.46 8 Unable to work 26,737 6.09 6.41 9 Refused 5,044 1.15 1.48 BLANK Not asked or Missing 6,196 . .

SEXVAR Question: Sex of Respondent 1 Male—Code=1 if LANDSEX1=1 or CELLSEX1=1 or COLGSEX1=1 209,239 47.01 48.69 2 Female—Code=2 if LANDSEX1=2 or CELLSEX1=2 or COLGSEX1=2 235,893 52.99 51.31

MARITAL Question: Are you: (marital status) 1 Married 227,424 51.09 49.33 2 Divorced 57,516 12.92 10.20 3 Widowed 48,019 10.79 7.03 4 Separated 8,702 1.95 2.36 5 Never married 80,001 17.97 24.71 6 A member of an unmarried couple 18,668 4.19 5.20 9 Refused 4,794 1.08 1.18 BLANK Not asked or Missing 8 . .

EDUCAG Question: Level of education completed 1 Did not graduate High School Notes: EDUCA = 1 or 2 or 3 26,011 5.84 11.63 2 Graduated High School Notes: EDUCA = 4 108,990 24.48 27.39 3 Attended College or Technical School Notes: EDUCA = 5 120,252 27.01 30.04 4 Graduated from College or Technical School Notes: EDUCA = 6 187,496 42.12 30.34 9 Don’t know/Not sure/Missing Notes: EDUCA = 9 or Missing 2,383 0.54 0.60

SLEPTIM1 Question: On average, how many hours of sleep do you get in a 24-hour period? 1 - 24 Number of hours [1-24] 439,679 98.78 98.57 77 Don’t know/Not Sure 4,792 1.08 1.23 99 Refused 658 0.15 0.21 BLANK Missing 3 . .

CVDCRHD4 Question: (Ever told) (you had) angina or coronary heart disease? 1 Yes 26,551 5.96 4.40 2 No 414,176 93.05 94.67 7 Don’t know/Not sure 4,044 0.91 0.84 9 Refused 359 0.08 0.10 BLANK Not asked or Missing 2 .

PRIMINSR Question: What is the current primary source of your health insurance? 1 A plan purchased through an employer or union (including plans purchased through another person´s employer) 161,388 36.26 39.07 2 A private nongovernmental plan that you or another family member buys on your own 36,931 8.30 9.28 3 Medicare 135,848 30.52 20.78 4 Medigap 536 0.12 0.15 5 Medicaid 29,072 6.53 8.51 6 Children´s Health Insurance Program (CHIP) 188 0.04 0.06 7 Military related health care: TRICARE (CHAMPUS) / VA health care / CHAMP- VA 15,373 3.45 3.28 8 Indian Health Service 1,385 0.31 0.17 9 State sponsored health plan 12,878 2.89 2.76 10 Other government program 10,630 2.39 2.70 88 No coverage of any type 23,018 5.17 8.07 77 Don’t know/Not Sure 9,890 2.22 3.22 99 Refused 7,991 1.80 1.95 BLANK Not asked or Missing 4 . .

MENTHLTH Question: Now thinking about your mental health, which includes stress, depression, and problems with emotions, for how many days during the past 30 days was your mental health not good? 1 - 30 Number of days Notes: _ _ Number of days 170,836 38.38 41.49 88 None 265,229 59.58 56.10 77 Don’t know/Not sure 6,589 1.48 1.76 99 Refused 2,475 0.56 0.65 BLANK Not asked or Missing 3 . .

CHCKDNY2 Question: Not including kidney stones, bladder infection or incontinence, were you ever told you had kidney disease? 1 Yes 20,315 4.56 3.68 2 No 422,891 95.00 95.87 7 Don’t know / Not sure 1,581 0.36 0.35 9 Refused 343 0.08 0.10 BLANK Not asked or Missing 2 . .

_TOTINDA Question: Adults who reported doing physical activity or exercise during the past 30 days other than their regular job 1 Had physical activity or exercise Notes: EXERANY2 = 1 337,559 75.83 75.85 2 No physical activity or exercise in last 30 days Notes: EXERANY2 = 2 106,480 23.92 23.85 9 Don’t know/Refused/Missing Notes: EXERANY2 = 7 or 9 or Missing 1,093 0.25 0.29

ADDEPEV3 Question: (Ever told) (you had) a depressive disorder (including depression, major depression, dysthymia, or minor depression)? 1 Yes 91,410 20.54 20.47 2 No 350,910 78.83 78.74 7 Don’t know/Not sure 2,140 0.48 0.62 9 Refused 665 0.15 0.17 BLANK Not asked or Missing 7 . .

RENTHOM1 Question: Do you own or rent your home? 1 Own 310,708 69.80 66.63 2 Rent 108,332 24.34 25.81 3 Other arrangement 21,463 4.82 6.11 7 Don’t know/Not Sure 1,099 0.25 0.49 9 Refused 3,521 0.79 0.96 BLANK Not asked or Missing Notes: Due to the nature of the data or the size of the table for display, this information is not printed for this report 9 . .

EXERANY2 Question: During the past month, other than your regular job, did you participate in any physical activities or exercises such as running, calisthenics, golf, gardening, or walking for exercise? 1 Yes 337,559 75.83 75.85 2 No 106,480 23.92 23.85 7 Don’t know/Not Sure 724 0.16 0.18 9 Refused 367 0.08 0.11 BLANK Not asked or Missing 2 . .

BLIND Question: Are you blind or do you have serious difficulty seeing, even when wearing glasses? 1 Yes 23,658 5.56 5.78 2 No 399,910 94.04 93.75 7 Don’t know/Not Sure 1,042 0.25 0.27 9 Refused 667 0.16 0.20 BLANK Not asked or Missing 19,855 . .

DECIDE Question: Because of a physical, mental, or emotional condition, do you have serious difficulty concentrating, remembering, or making decisions? 1 Yes 50,100 11.81 13.34 2 No 370,792 87.42 85.81 7 Don’t know/Not Sure 2,266 0.53 0.56 9 Refused 988 0.23 0.29 BLANK Not asked or Missing 20,986 . .

HLTHPLN Question: Adults who had some form of health insurance 1 Have some form of insurance Notes: PRIMINSR=1, 2, 3, 4, 5, 6, 7, 8, 9, 10 404,229 90.81 86.77 2 Do not have some form of health insurance Notes: PRIMINSR=88 23,018 5.17 8.07 9 Don´t know, refused or missing insurance response Notes: PRIMINSR=77, 99, or missing 17,885 4.02 5.16

DIABETE4 Question: (Ever told) (you had) diabetes? (If ´Yes´ and respondent is female, ask ´Was this only when you were pregnant?´. If Respondent says pre-diabetes or borderline diabetes, use response code 4.)

1 Yes 61,158 13.74 12.04 2 Yes, but female told only during pregnancy—Go to Section 08.01 AGE 3,836 0.86 1.01 3 No—Go to Section 08.01 AGE 368,722 82.83 84.34 4 No, pre-diabetes or borderline diabetes—Go to Section 08.01 AGE 10,329 2.32 2.27 7 Don’t know/Not Sure—Go to Section 08.01 AGE 763 0.17 0.23 9 Refused—Go to Section 08.01 AGE 321 0.07 0.11 BLANK Not asked or Missing 3 . .

_SMOKER3 Question: Four-level smoker status: Everyday smoker, Someday smoker, Former smoker, Non-smoker 1 Current smoker - now smokes every day Notes: SMOKE100 = 1 and SMOKEDAY = 1 36,003 8.09 8.09 2 Current smoker - now smokes some days Notes: SMOKE100 = 1 and SMOKEDAY = 2 13,938 3.13 3.54 3 Former smoker Notes: SMOKE100 = 1 and SMOKEDAY = 3 113,774 25.56 21.87 4 Never smoked Notes: SMOKE100 = 2 245,955 55.25 57.07 9 Don’t know/Refused/Missing Notes: SMOKE100 = 1 and SMOKEDAY = 9 or SMOKE100 = 7 or 9 or Missing 35,462 7.97 9.44

DRNKWK2 Question: Calculated total number of alcoholic beverages consumed per week 0 Did not drink Notes: DROCDY4_=0 or AVEDRNK3=88 188,832 42.42 41.91 1 - 98999 Number of drinks per week Notes: 0 < DROCDY4_ < 990 206,595 46.41 44.78 99900 Don’t know/Not sure/Refused/Missing Notes: AVEDRNK3=.,77,99 or DROCDY4_=900 49,705 11.17 13.32

DRNKANY6 Question: Adults who reported having had at least one drink of alcohol in the past 30 days. 1 Yes Notes: 1 <= ALCDAY4 <= 231 210,891 47.38 46.04 2 No Notes: ALCDAY4=888 187,667 42.16 41.60 7 Don’t know/Not Sure Notes: ALCDAY4=777 3,447 0.77 0.94 9 Refused/Missing Notes: ALCDAY4=999, Missing 43,127 9.69 11.43

_CURECI2 Question: Adults who are current e-cigarette users 1 Not currently using E-cigarettes Notes: ECIGNOW2=1, 4 387,356 87.02 83.59 2 Current E-cigarette user Notes: ECIGNOW2=2,3 22,116 4.97 6.76 9 Don’t know/Refused/Missing Notes: ECIGNOW2=7,9, or missing 35,660 8.01 9.64

_RFSMOK3 Question: Adults who are current smokers 1 No Notes: _SMOKER3 = 3 or 4 359,729 80.81 78.93 2 Yes Notes: _SMOKER3 = 1 or 2 49,941 11.22 11.62 9 Don’t know/Refused/Missing Notes: _SMOKER3 = 9 35,462 7.97 9.44

_HADSIGM Question: Colonoscopy and sigmoidoscopy are exams to check for colon cancer. Have you ever had either of these exams? 1 Yes 213,158 72.82 68.17 2 No—Go to Section 11.06 COLNCNCR 76,372 26.09 30.53 7 Don’t know/Not Sure—Go to Section 11.06 COLNCNCR 1,811 0.62 0.74 9 Refused—Go to Section 11.06 COLNCNCR 1,378 0.47 0.55 BLANK Not asked or Missing Notes: Section 08.01, AGE, is less than 45; 152,413 . .

_INCOMG1 Question: Income categories 1 Less than $15,000 Notes: INCOME3=1,2 21,372 4.80 5.17 2 $15,000 to < $25,000 Notes: INCOME3=3,4 34,643 7.78 7.83 3 $25,000 to < $35,000 Notes: INCOME3=5 42,294 9.50 9.54 4 $35,000 to < $50,000 Notes: INCOME3=6 46,831 10.52 9.81 5 $50,000 to < $100,000 Notes: INCOME3=7,8 107,584 24.17 21.96 6 $100,000 to < $200,000 Notes: INCOME3=9,10 72,883 16.37 15.95 7 $200,000 or more Notes: INCOME3=11 23,478 5.27 5.89 9 Don’t know/Not sure/Missing Notes: INCOME3=77, 99, or missing 96,047 21.58 23.84

_EDUCAG Question: Level of education completed 1 Did not graduate High School Notes: EDUCA = 1 or 2 or 3 26,011 5.84 11.63 2 Graduated High School Notes: EDUCA = 4 108,990 24.48 27.39 3 Attended College or Technical School Notes: EDUCA = 5 120,252 27.01 30.04 4 Graduated from College or Technical School Notes: EDUCA = 6 187,496 42.12 30.34 9 Don’t know/Not sure/Missing Notes: EDUCA = 9 or Missing 2,383 0.54 0.60

_CHLDCNT Question: Number of children in household 1 No children in household Notes: CHILDREN = 88 321,907 72.32 64.10 2 One child in household Notes: CHILDREN = 01 46,241 10.39 13.23 3 Two children in household Notes: CHILDREN = 02 37,923 8.52 10.83 4 Three children in household Notes: CHILDREN = 03 15,975 3.59 4.78 5 Four children in household Notes: CHILDREN = 04 5,521 1.24 1.66 6 Five or more children in household Notes: 05 <= CHILDREN < 88 3,100 0.70 0.97 9 Don’t know/Not sure/Missing Notes: CHILDREN = 99 14,464 3.25 4.43 BLANK 1 . .

_BMI5 Question: Body Mass Index (BMI)

WTKG3 Question: Reported weight in kilograms

HTM4 Question: Reported height in meters

_AGE80 Question: Imputed Age value collapsed above 80 18 - 24 Imputed Age 18 to 24 26,943 6.05 11.90 25 - 29 Imputed Age 25 to 29 22,000 4.94 7.73 30 - 34 Imputed Age 30 to 34 25,840 5.81 9.41 35 - 39 Imputed Age 35 to 39 28,771 6.46 7.79 40 - 44 Imputed Age 40 to 44 30,403 6.83 8.68 45 - 49 Imputed Age 45 to 49 29,580 6.65 6.86 50 - 54 Imputed Age 50 to 54 37,404 8.40 8.54 55 - 59 Imputed Age 55 to 59 38,059 8.55 7.44 60 - 64 Imputed Age 60 to 64 44,681 10.04 8.71 65 - 69 Imputed Age 65 to 69 47,642 10.70 7.07 70 - 74 Imputed Age 70 to 74 44,940 10.10 6.53 75 - 79 Imputed Age 75 to 79 32,616 7.33 4.40 80 - 99 Imputed Age 80 or older 36,253 8.14 4.94

_RACEPR1 Question: Computed race groups used for internet prevalence tables 1 White only, non-Hispanic Notes: _RACE=1 or _RACE=9 and _IMPRACE=1 333,514 74.92 59.20 2 Black only, non-Hispanic Notes: _RACE=2 or _RACE=9 and _IMPRACE=2 35,876 8.06 11.62 3 American Indian or Alaskan Native only, Non-Hispanic Notes: _RACE=3 or _RACE=9 and _IMPRACE=4 7,120 1.60 1.21 4 Asian only, non-Hispanic Notes: _RACE=4 or _RACE=9 and _IMPRACE=3 13,487 3.03 6.11 5 Native Hawaiian or other Pacific Islander only, Non-Hispanic Notes: _RACE=5 2,414 0.54 0.48 6 Multiracial, non-Hispanic Notes: _RACE=6 9,744 2.19 3.12 7 Hispanic Notes: _RACE=7 or _RACE=9 and _IMPRACE==5 42,977 9.65 18.25

_DRDXAR2 Question: Respondents who have had a doctor diagnose them as having some form of arthritis 1 Diagnosed with arthritis Notes: HAVARTH4 = 1 151,148 34.16 26.64 2 Not diagnosed with arthritis Notes: HAVARTH4 = 2 291,351 65.84 73.36 BLANK Don´t know/Not Sure/Refused/Missing Notes: HAVARTH4 = 7 or 9 or Missing 2,633 . .

ASTHMA3 Question: (Ever told) (you had) asthma? 1 Yes 66,694 14.98 15.17 2 No—Go to Section 07.06 CHCSCNC1 376,665 84.62 84.34 7 Don’t know/Not Sure—Go to Section 07.06 CHCSCNC1 1,494 0.34 0.42 9 Refused—Go to Section 07.06 CHCSCNC1 277 0.06 0.08 BLANK Not asked or Missing 2 . .

_DENVST3 Question: Adults who have visited a dentist, dental hygenist or dental clinic within the past year 1 Yes Notes: LASTDEN4=1 292,408 65.69 62.66 2 No Notes: LASTDEN4=2 or 3 or 4 145,703 32.73 35.42 9 Don’t know/Not Sure Or Refused/Missing Notes: LASTDEN4=7 or 9 or Missing 7,017 1.58 1.93 BLANK Missing 4 . .

SDHISOLT Question: How often do you feel socially isolated from others? Is it… 1 Always 8,098 3.19 4.06 2 Usually 13,178 5.19 5.63 3 Sometimes 53,072 20.91 21.62 4 Rarely 70,617 27.82 26.18 5 Never 106,160 41.83 41.21 7 Don’t know/Not Sure 1,696 0.67 0.79 9 Refused 969 0.38 0.50 BLANK Not asked or Missing 191,342 . .

LSATISFY Question: In general, how satisfied are you with your life? 1 Very satisfied 114,252 44.89 42.07 2 Satisfied 123,445 48.51 50.46 3 Dissatisfied 10,758 4.23 4.67 4 Very dissatisfied 3,062 1.20 1.38 7 Don’t know/Not sure 1,864 0.73 0.90 9 Refused 1,107 0.43 0.51 BLANK Not asked or Missing 190,644 . .

DIFFWALK Question: Do you have serious difficulty walking or climbing stairs? 1 Yes 68,081 16.10 13.75 2 No 353,039 83.47 85.78 7 Don’t know/Not Sure 1,221 0.29 0.28 9 Refused 636 0.15 0.19 BLANK Not asked or Missing 22,155 . .

DIFFDRES Question: Do you have difficulty dressing or bathing? 1 Yes 16,813 3.98 3.85 2 No 404,404 95.77 95.81 7 Don’t know/Not Sure 488 0.12 0.15 9 Refused 548 0.13 0.19 BLANK Not asked or Missing 22,879 . .

DEAF Question: Are you deaf or do you have serious difficulty hearing? 1 Yes 38,946 9.13 7.06 2 No 385,539 90.40 92.44 7 Don’t know/Not Sure 1,246 0.29 0.27 9 Refused 757 0.18 0.23 BLANK Not asked or Missing 18,644 . .

PHYSHLTH Question: Now thinking about your physical health, which includes physical illness and injury, for how many days during the past 30 days was your physical health not good? 1 - 30 Number of days 166,386 37.38 36.75 88 None 267,819 60.17 60.54 77 Don’t know/Not sure 8,875 1.99 2.21 99 Refused 2,047 0.46 0.50 BLANK Not asked or Missing 5 . .

CDASSIST Question: As a result of confusion or memory loss, how often do you need assistance with these day-to-day activities? 1 Always 304 4.09 4.57 2 Usually 281 3.78 4.90 3 Sometimes 1,354 18.22 21.00 4 Rarely—Go to Module 13.05 CDSOCIAL 1,447 19.47 19.25 5 Never—Go to Module 13.05 CDSOCIAL 3,954 53.21 49.12 7 Don’t know/Not sure—Go to Module 13.05 CDSOCIAL 78 1.05 1.04 9 Refused—Go to Module 13.05 CDSOCIAL 13 0.17 0.13 BLANK Not asked or Missing Notes: Section 08.01, AGE, is less than 45; or Module 13.01, CIMEMLOS, is coded 2 or 9 437,701 . .

CVDSTRK3 Question: (Ever told) (you had) a stroke. 1 Yes 19,239 4.32 3.56 2 No 424,336 95.33 96.01 7 Don’t know/Not sure 1,274 0.29 0.35 9 Refused 281 0.06 0.08 BLANK Not asked or Missing 2 . .

CVDCRHD4 Question: (Ever told) (you had) angina or coronary heart disease? 1 Yes 26,551 5.96 4.40 2 No 414,176 93.05 94.67 7 Don’t know/Not sure 4,044 0.91 0.84 9 Refused 359 0.08 0.10 BLANK Not asked or Missing 2 . .

data <-
  df %>% 
  dplyr::select("DIABETE4", # response variable
                # personal health
                "_BMI5CAT", "_BMI5", #bmi cat, bmi numeric
                "_SMOKER3", "CVDSTRK3", "CVDCRHD4", #smoke, stroke, heart disease
                "_CURECI2", # e-cig
                #demographics
                # age, income cat, employ, gender, marital, education, home (rent/own)
                "_AGEG5YR", "INCOME3", "EMPLOY1", "SEXVAR", "MARITAL", "_EDUCAG", "RENTHOM1",
                # number children, age numeric, race
                "_CHLDCNT", "_AGE80", "_RACEPR1",
                #self assessment
                "GENHLTH", "PRIMINSR", "MENTHLTH", "BLIND", "DECIDE", "_HLTHPLN", "WTKG3", "HTM4", 
                "DIFFWALK", "DIFFDRES", "DEAF", "PHYSHLTH",
                #habits
                "SLEPTIM1", "_TOTINDA", "EXERANY2",  "_DRNKWK2", "DRNKANY6", 
                #medical 
                "CHECKUP1", "FLUSHOT7", "CVDCRHD4", "CHCKDNY2", "ADDEPEV3",
                "_DRDXAR2", "ASTHMA3", "_DENVST3") %>% 
  janitor::clean_names() %>% 
  mutate(diabete4 = as.factor(case_when(diabete4 == 1 ~ "yes",
                              diabete4 == 2 ~ "no",
                              diabete4 == 3 ~ "no",
                              diabete4 == 4 ~ "no")
                              ),
         bmi5cat = factor(bmi5cat),
         bmi5 = as.numeric(bmi5/100),
         smoker3 = as.factor(case_when(smoker3 == 1 ~ "smoker",
                                       smoker3 == 2 ~ "smoker",
                                       smoker3 == 3 ~ "former smoker",
                                       smoker3 == 4 ~ "non-smoker")
                             ),
         cvdstrk3 = as.factor(case_when(cvdstrk3 == 7 ~ NA_character_,
                                        cvdstrk3 == 9 ~ NA_character_,
                                        .default = as.factor(cvdstrk3)
                                      )
                            ),
         cvdcrhd4 = as.factor(case_when(cvdcrhd4 == 7 ~ NA_character_,
                                        cvdcrhd4 == 9 ~ NA_character_,
                                        .default = as.factor(cvdcrhd4)
                                      )
                            ),
         cureci2 = as.factor(case_when(cureci2 == 9 ~ NA_character_,
                                       .default = as.factor(cureci2)
                                      )
                            ),
         ageg5yr = case_when(ageg5yr == 14 ~ NA_character_,
                                       .default = as.character(ageg5yr)
                                       ),
         ageg5yr = as.numeric(ageg5yr),
                             
         income3 = as.factor(case_when(income3 == 77 ~ NA_character_,
                                       income3 == 99 ~ NA_character_,
                                       .default = as.factor(income3)
                                       )
                             ),
         employ1 = as.factor(case_when(employ1 == 9 ~ NA_character_,
                                       .default = as.factor(employ1)
                                       )
                             ),
         sexvar = as.factor(sexvar),
         marital = as.factor(case_when(marital == 9 ~ NA_character_,
                                       .default = as.factor(marital)
                                       )),
         educag = as.factor(case_when(educag == 9 ~ NA_character_,
                                      .default = as.factor(educag)
                                      )
                            ),
         renthom1 = as.factor(case_when(renthom1 == 7 ~ NA_character_,
                                        renthom1 == 9 ~ NA_character_,
                                        .default = as.factor(renthom1)
                                      )
                            ),
         chldcnt = as.factor(case_when(chldcnt == 1 ~ "0",
                                       chldcnt == 2 ~ "1",
                                       chldcnt == 3 ~ "2",
                                       chldcnt == 4 ~ "3",
                                       chldcnt == 5 ~ "4",
                                       chldcnt == 6 ~ "5 or more",
                                       chldcnt == 9 ~ NA_character_)
                            ),
         age80 = as.numeric(age80),
         racepr1 = as.factor(racepr1),
         genhlth = as.factor(case_when(genhlth == 9 ~ NA_character_,
                                       .default = as.factor(genhlth)
                                       )
                             ),
         priminsr = as.factor(case_when(priminsr == 88 ~ "11", # no coverage
                                        priminsr == 77 ~ NA_character_,
                                        priminsr == 99 ~ NA_character_,
                                        .default = as.factor(priminsr)
                                        )
                              ),
         menthlth = as.numeric(ifelse(menthlth == 88, 0, menthlth)), #filter out 77 and 99 later
         blind = as.factor(case_when(blind == 7 ~ NA_character_,
                                     blind == 9 ~ NA_character_,
                                     .default = as.factor(blind)
                                     )
                           ),
         decide = as.factor(case_when(decide == 7 ~ NA_character_,
                                     decide == 9 ~ NA_character_,
                                     .default = as.factor(decide)
                                     )
                            ),
         hlthpln = as.factor(case_when(hlthpln == 9 ~ NA_character_,
                                       .default = as.factor(hlthpln)
                                       )
                             ),
         wtkg3 = as.numeric(wtkg3 / 100),
         htm4 = as.numeric(htm4 / 100),
         diffwalk = as.factor(case_when(diffwalk == 7 ~ NA_character_,
                                     diffwalk == 9 ~ NA_character_,
                                     .default = as.factor(diffwalk)
                                     )
                            ),
         diffdres = as.factor(case_when(diffdres == 7 ~ NA_character_,
                                     diffdres == 9 ~ NA_character_,
                                     .default = as.factor(diffdres)
                                     )
                            ),
         deaf = as.factor(case_when(deaf == 7 ~ NA_character_,
                                     deaf == 9 ~ NA_character_,
                                     .default = as.factor(deaf)
                                     )
                            ),
         physhlth = as.numeric(ifelse(physhlth == 88, 0, physhlth)), #filter out 77 and 99 later

         sleptim1 = as.numeric(sleptim1), # filter out 77 and 99
         totinda = as.factor(case_when(totinda == 9 ~ NA_character_,
                                       .default = as.factor(totinda)
                                       )
                             ),
         exerany2 = as.factor(case_when(exerany2 == 9 ~ NA_character_,
                                       .default = as.factor(exerany2)
                                       )
                             ),
         drnkwk2 = as.numeric(ifelse(drnkwk2 == 99900, NA_character_, drnkwk2)
                              ),
         drnkany6 = as.factor(case_when(drnkany6 == 7 ~ NA_character_,
                                        drnkany6 == 9 ~ NA_character_,
                                        .default = as.factor(drnkany6)
                                     )
                            ),
         checkup1 = as.factor(case_when(checkup1 == 7 ~ NA_character_,
                                        checkup1 == 8 ~ NA_character_,
                                        checkup1 == 9 ~ NA_character_,
                                       .default = as.factor(checkup1)
                                       )
                             ),
         flushot7 = as.factor(case_when(flushot7 == 7 ~ NA_character_,
                                        flushot7 == 9 ~ NA_character_,
                                        .default = as.factor(flushot7)
                                     )
                            ),
         chckdny2 = as.factor(case_when(chckdny2 == 7 ~ NA_character_,
                                        chckdny2 == 9 ~ NA_character_,
                                        .default = as.factor(chckdny2)
                                     )
                            ),
         addepev3 = as.factor(case_when(addepev3 == 7 ~ NA_character_,
                                        addepev3 == 9 ~ NA_character_,
                                        .default = as.factor(addepev3)
                                     )
                            ),
         drdxar2 = as.factor(drdxar2),
         asthma3 = as.factor(case_when(asthma3 == 7 ~ NA_character_,
                                        asthma3 == 9 ~ NA_character_,
                                        .default = as.factor(asthma3)
                                     )
                            ),
         denvst3 = as.factor(case_when(denvst3 == 9 ~ NA_character_,
                                       .default = as.factor(denvst3)
                                     )
                            )
         )
data <-
  data %>% 
  filter (ageg5yr > 2 & age80 >=30 & menthlth < 77 & physhlth < 77 & sleptim1 < 77) %>%  # filter for age >-30 years definition of type 2 diabetes
  mutate(ageg5yr = as.factor(ageg5yr)
         ) %>% 
  na.omit()

skim(data)

#write_csv(data, "diabetes_cleaned_data.csv")
# check correlation between numeric
data <- read_csv("diabetes_cleaned_data.csv")
data %>% 
  select_if(is.numeric) %>% 
  as.matrix(.) %>% 
  rcorr() %>% 
  tidy() %>% 
  arrange(desc(abs(estimate)))
  ┌───────────────────────────────────────────────────────┐
  │ column1    column2     estimate        n      p.value │
  ├───────────────────────────────────────────────────────┤
  │ exerany2   totinda     1          243049     0        │
  │ age80      ageg5yr     0.995      243049     0        │
  │ wtkg3      bmi5        0.859      243049     0        │
  │ bmi5       bmi5cat     0.826      243049     0        │
  │ wtkg3      bmi5cat     0.738      243049     0        │
  │ htm4       sexvar     -0.698      243049     0        │
  │ age80      employ1     0.611      243049     0        │
  │ employ1    ageg5yr     0.61       243049     0        │
  │ hlthpln    priminsr    0.601      243049     0        │
  │ physhlth   genhlth     0.499      243049     0        │
  │ htm4       wtkg3       0.48       243049     0        │
  │ physhlth   diffwalk   -0.44       243049     0        │
  │ educag     income3     0.433      243049     0        │
  │ addepev3   menthlth   -0.42       243049     0        │
  │ diffwalk   genhlth    -0.418      243049     0        │
  │ diffdres   diffwalk    0.388      243049     0        │
  │ decide     menthlth   -0.379      243049     0        │
  │ employ1    income3    -0.373      243049     0        │
  │ wtkg3      sexvar     -0.355      243049     0        │
  │ priminsr   income3    -0.35       243049     0        │
  │ genhlth    income3    -0.344      243049     0        │
  │ drdxar2    age80      -0.34       243049     0        │
  │ drdxar2    ageg5yr    -0.338      243049     0        │
  │ drnkany6   drnkwk2    -0.332      243049     0        │
  │ physhlth   diffdres   -0.331      243049     0        │
  │ addepev3   decide      0.33       243049     0        │
  │ physhlth   menthlth    0.323      243049     0        │
  │ renthom1   income3    -0.323      243049     0        │
  │ diffwalk   employ1    -0.318      243049     0        │
  │ marital    income3    -0.315      243049     0        │
  │ drdxar2    diffwalk    0.311      243049     0        │
  │ drdxar2    employ1    -0.306      243049     0        │
  │ diffwalk   income3     0.305      243049     0        │
  │ renthom1   marital     0.298      243049     0        │
  │ totinda    genhlth     0.289      243049     0        │
  │ exerany2   genhlth     0.289      243049     0        │
  │ totinda    diffwalk   -0.287      243049     0        │
  │ exerany2   diffwalk   -0.287      243049     0        │
  │ drnkany6   income3    -0.281      243049     0        │
  │ menthlth   genhlth     0.281      243049     0        │
  │ denvst3    income3    -0.271      243049     0        │
  │ diffdres   genhlth    -0.264      243049     0        │
  │ drdxar2    genhlth    -0.264      243049     0        │
  │ flushot7   age80      -0.264      243049     0        │
  │ flushot7   ageg5yr    -0.263      243049     0        │
  │ decide     genhlth    -0.261      243049     0        │
  │ physhlth   decide     -0.257      243049     0        │
  │ totinda    physhlth    0.253      243049     0        │
  │ exerany2   physhlth    0.253      243049     0        │
  │ checkup1   hlthpln     0.247      243049     0        │
  │ genhlth    employ1     0.246      243049     0        │
  │ genhlth    bmi5        0.244      243049     0        │
  │ totinda    income3    -0.242      243049     0        │
  │ exerany2   income3    -0.242      243049     0        │
  │ physhlth   income3    -0.239      243049     0        │
  │ diffwalk   decide      0.238      243049     0        │
  │ drdxar2    physhlth   -0.235      243049     0        │
  │ genhlth    educag     -0.235      243049     0        │
  │ denvst3    educag     -0.232      243049     0        │
  │ decide     income3     0.227      243049     0        │
  │ checkup1   age80      -0.225      243049     0        │
  │ checkup1   ageg5yr    -0.223      243049     0        │
  │ physhlth   employ1     0.219      243049     0        │
  │ diffwalk   ageg5yr    -0.219      243049     0        │
  │ diffwalk   age80      -0.218      243049     0        │
  │ deaf       ageg5yr    -0.216      243049     0        │
  │ deaf       age80      -0.214      243049     0        │
  │ addepev3   genhlth    -0.214      243049     0        │
  │ flushot7   checkup1    0.214      243049     0        │
  │ totinda    educag     -0.213      243049     0        │
  │ exerany2   educag     -0.213      243049     0        │
  │ diffdres   decide      0.212      243049     0        │
  │ priminsr   educag     -0.211      243049     0        │
  │ genhlth    cvdcrhd4   -0.207      243049     0        │
  │ addepev3   physhlth   -0.207      243049     0        │
  │ genhlth    bmi5cat     0.204      243049     0        │
  │ ageg5yr    cvdcrhd4   -0.2        243049     0        │
  │ age80      cvdcrhd4   -0.199      243049     0        │
  │ age80      renthom1   -0.194      243049     0        │
  │ drnkany6   genhlth     0.192      243049     0        │
  │ diffwalk   blind       0.191      243049     0        │
  │ renthom1   ageg5yr    -0.191      243049     0        │
  │ htm4       income3     0.191      243049     0        │
  │ drnkany6   educag     -0.187      243049     0        │
  │ wtkg3      genhlth     0.185      243049     0        │
  │ priminsr   employ1     0.185      243049     0        │
  │ diffwalk   bmi5       -0.185      243049     0        │
  │ employ1    cvdcrhd4   -0.184      243049     0        │
  │ priminsr   renthom1    0.184      243049     0        │
  │ diffdres   income3     0.183      243049     0        │
  │ chckdny2   genhlth    -0.182      243049     0        │
  │ diffwalk   menthlth   -0.181      243049     0        │
  │ denvst3    genhlth     0.18       243049     0        │
  │ deaf       employ1    -0.18       243049     0        │
  │ racepr1    age80      -0.179      243049     0        │
  │ drdxar2    income3     0.179      243049     0        │
  │ drnkany6   employ1     0.178      243049     0        │
  │ blind      income3     0.178      243049     0        │
  │ racepr1    ageg5yr    -0.178      243049     0        │
  │ income3    ageg5yr    -0.177      243049     0        │
  │ flushot7   employ1    -0.177      243049     0        │
  │ denvst3    renthom1    0.177      243049     0        │
  │ drnkany6   diffwalk   -0.176      243049     0        │
  │ age80      income3    -0.176      243049     0        │
  │ totinda    diffdres   -0.174      243049     0        │
  │ exerany2   diffdres   -0.174      243049     0        │
  │ diffdres   employ1    -0.174      243049     0        │
  │ denvst3    priminsr    0.173      243049     0        │
  │ diffdres   menthlth   -0.173      243049     0        │
  │ renthom1   educag     -0.173      243049     0        │
  │ menthlth   income3    -0.172      243049     0        │
  │ diffwalk   educag      0.172      243049     0        │
  │ blind      genhlth    -0.171      243049     0        │
  │ checkup1   employ1    -0.169      243049     0        │
  │ racepr1    renthom1    0.168      243049     0        │
  │ diffwalk   cvdcrhd4    0.168      243049     0        │
  │ denvst3    checkup1    0.167      243049     0        │
  │ decide     blind       0.165      243049     0        │
  │ diffwalk   cvdstrk3    0.164      243049     0        │
  │ totinda    bmi5        0.164      243049     0        │
  │ exerany2   bmi5        0.164      243049     0        │
  │ drnkany6   totinda     0.161      243049     0        │
  │ drnkany6   exerany2    0.161      243049     0        │
  │ genhlth    cvdstrk3   -0.161      243049     0        │
  │ addepev3   diffwalk    0.16       243049     0        │
  │ racepr1    income3    -0.159      243049     0        │
  │ menthlth   age80      -0.158      243049     0        │
  │ menthlth   ageg5yr    -0.157      243049     0        │
  │ chckdny2   diffwalk    0.156      243049     0        │
  │ deaf       diffwalk    0.156      243049     0        │
  │ sleptim1   ageg5yr     0.156      243049     0        │
  │ sleptim1   age80       0.155      243049     0        │
  │ age80      cureci2    -0.154      243049     0        │
  │ denvst3    flushot7    0.153      243049     0        │
  │ menthlth   renthom1    0.153      243049     0        │
  │ ageg5yr    cureci2    -0.153      243049     0        │
  │ physhlth   cvdcrhd4   -0.152      243049     0        │
  │ cvdcrhd4   cvdstrk3    0.151      243049     0        │
  │ flushot7   educag     -0.151      243049     0        │
  │ physhlth   blind      -0.15       243049     0        │
  │ employ1    cvdstrk3   -0.15       243049     0        │
  │ hlthpln    age80      -0.149      243049     0        │
  │ hlthpln    ageg5yr    -0.149      243049     0        │
  │ asthma3    addepev3    0.148      243049     0        │
  │ drdxar2    checkup1    0.148      243049     0        │
  │ addepev3   sexvar     -0.147      243049     0        │
  │ flushot7   hlthpln     0.147      243049     0        │
  │ chckdny2   employ1    -0.147      243049     0        │
  │ denvst3    hlthpln     0.146      243049     0        │
  │ priminsr   marital     0.145      243049     0        │
  │ chckdny2   cvdcrhd4    0.145      243049     0        │
  │ denvst3    totinda     0.145      243049     0        │
  │ denvst3    exerany2    0.145      243049     0        │
  │ genhlth    renthom1    0.145      243049     0        │
  │ drdxar2    diffdres    0.144      243049     0        │
  │ diffdres   blind       0.144      243049     0        │
  │ asthma3    genhlth    -0.143      243049     0        │
  │ chckdny2   physhlth   -0.143      243049     0        │
  │ drnkany6   physhlth    0.143      243049     0        │
  │ decide     renthom1   -0.142      243049     0        │
  │ addepev3   income3     0.14       243049     0        │
  │ hlthpln    educag     -0.139      243049     0        │
  │ educag     employ1    -0.139      243049     0        │
  │ drnkwk2    sexvar     -0.138      243049     0        │
  │ drnkany6   ageg5yr     0.137      243049     0        │
  │ drdxar2    cvdcrhd4    0.137      243049     0        │
  │ drnkany6   age80       0.137      243049     0        │
  │ drdxar2    deaf        0.136      243049     0        │
  │ totinda    employ1     0.135      243049     0        │
  │ exerany2   employ1     0.135      243049     0        │
  │ drdxar2    addepev3    0.133      243049     0        │
  │ htm4       racepr1    -0.133      243049     0        │
  │ addepev3   diffdres    0.133      243049     0        │
  │ hlthpln    racepr1     0.133      243049     0        │
  │ priminsr   genhlth     0.132      243049     0        │
  │ sleptim1   menthlth   -0.131      243049     0        │
  │ denvst3    diffwalk   -0.131      243049     0        │
  │ chckdny2   ageg5yr    -0.131      243049     0        │
  │ chckdny2   age80      -0.131      243049     0        │
  │ diffwalk   bmi5cat    -0.13       243049     0        │
  │ decide     educag      0.13       243049     0        │
  │ hlthpln    renthom1    0.129      243049     0        │
  │ denvst3    marital     0.129      243049     0        │
  │ totinda    menthlth    0.128      243049     0        │
  │ exerany2   menthlth    0.128      243049     0        │
  │ physhlth   cvdstrk3   -0.128      243049     0        │
  │ asthma3    physhlth   -0.128      243049     0        │
  │ deaf       genhlth    -0.128      243049     0        │
  │ totinda    bmi5cat     0.127      243049     0        │
  │ exerany2   bmi5cat     0.127      243049     0        │
  │ ageg5yr    cvdstrk3   -0.127      243049     0        │
  │ drdxar2    decide      0.127      243049     0        │
  │ age80      cvdstrk3   -0.126      243049     0        │
  │ physhlth   educag     -0.126      243049     0        │
  │ hlthpln    income3    -0.126      243049     0        │
  │ racepr1    educag     -0.126      243049     0        │
  │ drdxar2    totinda    -0.125      243049     0        │
  │ drdxar2    exerany2   -0.125      243049     0        │
  │ wtkg3      ageg5yr    -0.125      243049     0        │
  │ genhlth    age80       0.124      243049     0        │
  │ genhlth    ageg5yr     0.124      243049     0        │
  │ wtkg3      age80      -0.123      243049     0        │
  │ addepev3   renthom1   -0.122      243049     0        │
  │ htm4       employ1    -0.122      243049     0        │
  │ blind      employ1    -0.122      243049     0        │
  │ totinda    decide     -0.122      243049     0        │
  │ exerany2   decide     -0.122      243049     0        │
  │ priminsr   racepr1     0.122      243049     0        │
  │ drnkwk2    htm4        0.121      243049     0        │
  │ drnkany6   htm4       -0.121      243049     0        │
  │ drdxar2    chckdny2    0.121      243049     0        │
  │ deaf       blind       0.121      243049     0        │
  │ drdxar2    flushot7    0.121      243049     0        │
  │ decide     employ1    -0.121      243049     0        │
  │ asthma3    menthlth   -0.121      243049     0        │
  │ income3    cvdstrk3    0.121      243049     0        │
  │ drdxar2    bmi5       -0.119      243049     0        │
  │ diffwalk   wtkg3      -0.119      243049     0        │
  │ sexvar     income3    -0.118      243049     0        │
  │ physhlth   bmi5        0.118      243049     0        │
  │ decide     priminsr   -0.118      243049     0        │
  │ denvst3    menthlth    0.115      243049     0        │
  │ menthlth   marital     0.115      243049     0        │
  │ denvst3    physhlth    0.114      243049     0        │
  │ sleptim1   employ1     0.114      243049     0        │
  │ checkup1   priminsr    0.114      243049     0        │
  │ age80      marital    -0.114      243049     0        │
  │ asthma3    diffwalk    0.112      243049     0        │
  │ addepev3   bmi5       -0.112      243049     0        │
  │ marital    ageg5yr    -0.11       243049     0        │
  │ asthma3    bmi5       -0.109      243049     0        │
  │ denvst3    decide     -0.109      243049     0        │
  │ deaf       cvdcrhd4    0.108      243049     0        │
  │ addepev3   ageg5yr     0.107      243049     0        │
  │ blind      educag      0.107      243049     0        │
  │ deaf       decide      0.107      243049     0        │
  │ menthlth   cureci2     0.106      243049     0        │
  │ addepev3   age80       0.106      243049     0        │
  │ blind      menthlth   -0.106      243049     0        │
  │ flushot7   renthom1    0.106      243049     0        │
  │ denvst3    drnkany6    0.106      243049     0        │
  │ asthma3    decide      0.105      243049     0        │
  │ drdxar2    bmi5cat    -0.105      243049     0        │
  │ physhlth   renthom1    0.105      243049     0        │
  │ diffwalk   priminsr   -0.105      243049     0        │
  │ diffdres   cvdstrk3    0.104      243049     0        │
  │ asthma3    drdxar2     0.104      243049     0        │
  │ drnkany6   priminsr    0.104      243049     0        │
  │ diffwalk   renthom1   -0.103      243049     0        │
  │ deaf       income3     0.102      243049     0        │
  │ totinda    ageg5yr     0.102      243049     0        │
  │ exerany2   ageg5yr     0.102      243049     0        │
  │ genhlth    marital     0.101      243049     0        │
  │ totinda    age80       0.101      243049     0        │
  │ exerany2   age80       0.101      243049     0        │
  │ totinda    wtkg3       0.0998     243049     0        │
  │ exerany2   wtkg3       0.0998     243049     0        │
  │ hlthpln    marital     0.0995     243049     0        │
  │ physhlth   deaf       -0.0994     243049     0        │
  │ educag     marital    -0.0992     243049     0        │
  │ menthlth   sexvar      0.0985     243049     0        │
  │ drdxar2    drnkany6   -0.0982     243049     0        │
  │ physhlth   priminsr    0.0982     243049     0        │
  │ htm4       ageg5yr    -0.0969     243049     0        │
  │ racepr1    marital     0.0968     243049     0        │
  │ flushot7   priminsr    0.0964     243049     0        │
  │ htm4       age80      -0.0962     243049     0        │
  │ totinda    renthom1    0.096      243049     0        │
  │ exerany2   renthom1    0.096      243049     0        │
  │ diffdres   bmi5       -0.0957     243049     0        │
  │ renthom1   cureci2     0.0953     243049     0        │
  │ totinda    priminsr    0.0952     243049     0        │
  │ exerany2   priminsr    0.0952     243049     0        │
  │ addepev3   totinda    -0.095      243049     0        │
  │ addepev3   exerany2   -0.095      243049     0        │
  │ addepev3   htm4        0.0946     243049     0        │
  │ addepev3   cureci2    -0.0939     243049     0        │
  │ drnkany6   sexvar      0.0939     243049     0        │
  │ blind      cvdstrk3    0.0938     243049     0        │
  │ decide     marital    -0.0934     243049     0        │
  │ menthlth   bmi5        0.0933     243049     0        │
  │ educag     bmi5       -0.0933     243049     0        │
  │ flushot7   racepr1     0.0932     243049     0        │
  │ chckdny2   income3     0.0929     243049     0        │
  │ income3    cvdcrhd4    0.0917     243049     0        │
  │ totinda    blind      -0.0917     243049     0        │
  │ exerany2   blind      -0.0917     243049     0        │
  │ diffdres   educag      0.0916     243049     0        │
  │ drdxar2    cvdstrk3    0.0909     243049     0        │
  │ decide     cvdstrk3    0.0909     243049     0        │
  │ drdxar2    blind       0.0908     243049     0        │
  │ flushot7   income3    -0.0906     243049     0        │
  │ drdxar2    menthlth   -0.0905     243049     0        │
  │ drnkany6   diffdres   -0.09       243049     0        │
  │ ageg5yr    bmi5       -0.09       243049     0        │
  │ addepev3   marital    -0.0899     243049     0        │
  │ asthma3    sexvar     -0.0893     243049     0        │
  │ chckdny2   cvdstrk3    0.0892     243049     0        │
  │ menthlth   priminsr    0.0888     243049     0        │
  │ drdxar2    educag      0.0887     243049     0        │
  │ diffdres   cvdcrhd4    0.0884     243049     0        │
  │ age80      bmi5       -0.0877     243049     0        │
  │ wtkg3      employ1    -0.087      243049     0        │
  │ chckdny2   totinda    -0.0869     243049     0        │
  │ chckdny2   exerany2   -0.0869     243049     0        │
  │ checkup1   sexvar     -0.0869     243049     0        │
  │ denvst3    diffdres   -0.0868     243049     0        │
  │ checkup1   diffwalk    0.0867     243049     0        │
  │ chckdny2   diffdres    0.0865     243049     0        │
  │ educag     bmi5cat    -0.0865     243049     0        │
  │ drdxar2    htm4        0.0859     243049     0        │
  │ chckdny2   drnkany6   -0.0857     243049     0        │
  │ drnkany6   renthom1    0.0851     243049     0        │
  │ racepr1    employ1    -0.0848     243049     0        │
  │ decide     cureci2    -0.0848     243049     0        │
  │ diffdres   renthom1   -0.0845     243049     0        │
  │ totinda    htm4       -0.0845     243049     0        │
  │ exerany2   htm4       -0.0845     243049     0        │
  │ denvst3    blind      -0.0843     243049     0        │
  │ drdxar2    sexvar     -0.0839     243049     0        │
  │ addepev3   blind       0.083      243049     0        │
  │ denvst3    bmi5        0.0826     243049     0        │
  │ deaf       sexvar      0.0822     243049     0        │
  │ income3    bmi5       -0.082      243049     0        │
  │ deaf       diffdres    0.0819     243049     0        │
  │ diffwalk   htm4        0.0816     243049     0        │
  │ totinda    cvdcrhd4   -0.0816     243049     0        │
  │ exerany2   cvdcrhd4   -0.0816     243049     0        │
  │ totinda    cvdstrk3   -0.0803     243049     0        │
  │ exerany2   cvdstrk3   -0.0803     243049     0        │
  │ blind      renthom1   -0.0802     243049     0        │
  │ checkup1   genhlth    -0.0801     243049     0        │
  │ drnkany6   decide     -0.0801     243049     0        │
  │ asthma3    bmi5cat    -0.0796     243049     0        │
  │ drdxar2    racepr1     0.0788     243049     0        │
  │ physhlth   age80       0.078      243049     0        │
  │ physhlth   wtkg3       0.0774     243049     0        │
  │ physhlth   ageg5yr     0.0773     243049     0        │
  │ menthlth   educag     -0.0771     243049     0        │
  │ physhlth   bmi5cat     0.077      243049     0        │
  │ drnkany6   bmi5        0.0769     243049     0        │
  │ asthma3    diffdres    0.0768     243049     0        │
  │ checkup1   marital     0.0768     243049     0        │
  │ addepev3   bmi5cat    -0.0766     243049     0        │
  │ checkup1   renthom1    0.0763     243049     0        │
  │ hlthpln    employ1    -0.0763     243049     0        │
  │ denvst3    wtkg3       0.0763     243049     0        │
  │ checkup1   htm4        0.0762     243049     0        │
  │ checkup1   cvdcrhd4    0.076      243049     0        │
  │ drnkany6   blind      -0.0749     243049     0        │
  │ checkup1   drnkwk2     0.0748     243049     0        │
  │ htm4       educag      0.0748     243049     0        │
  │ deaf       educag      0.0742     243049     0        │
  │ flushot7   cureci2     0.0741     243049     0        │
  │ chckdny2   blind       0.074      243049     0        │
  │ sexvar     cvdcrhd4    0.074      243049     0        │
  │ drnkany6   cvdstrk3   -0.0736     243049     0        │
  │ flushot7   marital     0.0735     243049     0        │
  │ drnkwk2    income3     0.0734     243049     0        │
  │ asthma3    htm4        0.0732     243049     0        │
  │ blind      priminsr   -0.0729     243049     0        │
  │ blind      cvdcrhd4    0.0727     243049     0        │
  │ drnkany6   cvdcrhd4   -0.0716     243049     0        │
  │ asthma3    income3     0.0716     243049     0        │
  │ deaf       cvdstrk3    0.0711     243049     0        │
  │ diffdres   wtkg3      -0.071      243049     0        │
  │ totinda    deaf       -0.0709     243049     0        │
  │ exerany2   deaf       -0.0709     243049     0        │
  │ flushot7   cvdcrhd4    0.0708     243049     0        │
  │ drdxar2    hlthpln     0.0708     243049     0        │
  │ flushot7   sleptim1   -0.0705     243049     0        │
  │ marital    cureci2     0.0702     243049     0        │
  │ diffdres   priminsr   -0.0699     243049     0        │
  │ denvst3    racepr1     0.0695     243049     0        │
  │ diffwalk   marital    -0.0695     243049     0        │
  │ chckdny2   deaf        0.0693     243049     0        │
  │ genhlth    racepr1     0.069      243049     0        │
  │ denvst3    addepev3   -0.0683     243049     0        │
  │ sleptim1   genhlth    -0.0682     243049     0        │
  │ renthom1   bmi5        0.0672     243049     0        │
  │ sexvar     employ1     0.0668     243049     0        │
  │ denvst3    cureci2     0.0665     243049     0        │
  │ blind      ageg5yr    -0.0665     243049     0        │
  │ drnkwk2    ageg5yr    -0.0663     243049     0        │
  │ sleptim1   decide      0.0661     243049     0        │
  │ drnkwk2    age80      -0.0657     243049     0        │
  │ blind      age80      -0.0655     243049     0        │
  │ htm4       menthlth   -0.0654     243049     0        │
  │ educag     cvdstrk3    0.0653     243049     0        │
  │ drnkwk2    employ1    -0.0652     243049     0        │
  │ drnkany6   racepr1     0.065      243049     0        │
  │ totinda    marital     0.0635     243049     0        │
  │ exerany2   marital     0.0635     243049     0        │
  │ sleptim1   racepr1    -0.0634     243049     0        │
  │ ageg5yr    bmi5cat    -0.0633     243049     0        │
  │ employ1    cureci2    -0.0631     243049     0        │
  │ decide     bmi5       -0.0629     243049     0        │
  │ denvst3    bmi5cat     0.0628     243049     0        │
  │ blind      racepr1    -0.0626     243049     0        │
  │ sexvar     bmi5cat    -0.0625     243049     0        │
  │ drdxar2    wtkg3      -0.0623     243049     0        │
  │ diffwalk   sexvar     -0.0623     243049     0        │
  │ asthma3    renthom1   -0.0618     243049     0        │
  │ chckdny2   decide      0.0618     243049     0        │
  │ checkup1   physhlth   -0.0618     243049     0        │
  │ chckdny2   flushot7    0.0611     243049     0        │
  │ decide     cvdcrhd4    0.0607     243049     0        │
  │ age80      bmi5cat    -0.0607     243049     0        │
  │ checkup1   cureci2     0.0606     243049     0        │
  │ chckdny2   checkup1    0.0606     243049     0        │
  │ denvst3    age80      -0.06       243049     0        │
  │ drnkany6   marital     0.0599     243049     0        │
  │ educag     cureci2    -0.0598     243049     0        │
  │ htm4       genhlth    -0.0589     243049     0        │
  │ sleptim1   renthom1   -0.0588     243049     0        │
  │ denvst3    ageg5yr    -0.0587     243049     0        │
  │ addepev3   priminsr   -0.0585     243049     0        │
  │ htm4       marital    -0.0577     243049     0        │
  │ sleptim1   bmi5       -0.0571     243049     0        │
  │ physhlth   marital     0.0568     243049     0        │
  │ addepev3   chckdny2    0.0562     243049     0        │
  │ sleptim1   physhlth   -0.0561     243049     0        │
  │ checkup1   sleptim1   -0.0561     243049     0        │
  │ asthma3    wtkg3      -0.0561     243049     0        │
  │ denvst3    sleptim1   -0.0559     243049     0        │
  │ denvst3    sexvar     -0.0559     243049     0        │
  │ sleptim1   wtkg3      -0.0557     243049     0        │
  │ totinda    sexvar      0.0555     243049     0        │
  │ exerany2   sexvar      0.0555     243049     0        │
  │ menthlth   bmi5cat     0.0553     243049     0        │
  │ totinda    racepr1     0.0548     243049     0        │
  │ exerany2   racepr1     0.0548     243049     0        │
  │ deaf       priminsr   -0.0544     243049     0        │
  │ denvst3    cvdstrk3   -0.054      243049     0        │
  │ diffdres   ageg5yr    -0.0537     243049     0        │
  │ diffdres   age80      -0.0535     243049     0        │
  │ blind      marital    -0.0535     243049     0        │
  │ drnkwk2    cureci2     0.0535     243049     0        │
  │ drnkany6   bmi5cat     0.0534     243049     0        │
  │ checkup1   bmi5       -0.0533     243049     0        │
  │ diffdres   bmi5cat    -0.0532     243049     0        │
  │ checkup1   bmi5cat    -0.0528     243049     0        │
  │ educag     ageg5yr    -0.0524     243049     0        │
  │ drnkany6   deaf       -0.0522     243049     0        │
  │ flushot7   menthlth    0.0522     243049     0        │
  │ flushot7   sexvar     -0.052      243049     0        │
  │ htm4       renthom1   -0.0517     243049     0        │
  │ checkup1   cvdstrk3    0.0516     243049     0        │
  │ age80      educag     -0.0515     243049     0        │
  │ chckdny2   bmi5       -0.0513     243049     0        │
  │ htm4       decide      0.0512     243049     0        │
  │ asthma3    totinda    -0.0511     243049     0        │
  │ asthma3    exerany2   -0.0511     243049     0        │
  │ drdxar2    priminsr   -0.0508     243049     0        │
  │ sleptim1   cureci2    -0.0506     243049     0        │
  │ sleptim1   bmi5cat    -0.0504     243049     0        │
  │ physhlth   htm4       -0.0503     243049     0        │
  │ menthlth   cvdstrk3   -0.0499     243049     0        │
  │ decide     age80       0.0496     243049     0        │
  │ flushot7   deaf        0.0495     243049     0        │
  │ asthma3    blind       0.049      243049     0        │
  │ marital    sexvar      0.0488     243049     0        │
  │ decide     ageg5yr     0.0487     243049     0        │
  │ checkup1   deaf        0.0484     243049     0        │
  │ addepev3   cvdstrk3    0.0482     243049     0        │
  │ flushot7   htm4        0.0482     243049     0        │
  │ addepev3   employ1    -0.048      243049     0        │
  │ addepev3   wtkg3      -0.048      243049     0        │
  │ wtkg3      menthlth    0.0477     243049     0        │
  │ asthma3    ageg5yr     0.0476     243049     0        │
  │ income3    bmi5cat    -0.0476     243049     0        │
  │ asthma3    age80       0.0475     243049     0        │
  │ educag     cvdcrhd4    0.0473     243049     0        │
  │ asthma3    sleptim1    0.0472     243049     0        │
  │ htm4       blind       0.0472     243049     0        │
  │ diffdres   marital    -0.0471     243049     0        │
  │ drnkwk2    diffwalk    0.0471     243049     0        │
  │ addepev3   sleptim1    0.0467     243049     0        │
  │ addepev3   drnkany6   -0.0467     243049     0        │
  │ hlthpln    cureci2     0.0466     243049     0        │
  │ priminsr   cureci2     0.0464     243049     0        │
  │ hlthpln    menthlth    0.0462     243049     0        │
  │ decide     sexvar     -0.0456     243049     0        │
  │ drnkany6   menthlth    0.0456     243049     0        │
  │ decide     racepr1    -0.0449     243049     0        │
  │ denvst3    cvdcrhd4   -0.0447     243049     0        │
  │ wtkg3      educag     -0.0443     243049     0        │
  │ chckdny2   menthlth   -0.0442     243049     0        │
  │ checkup1   educag     -0.0441     243049     0        │
  │ flushot7   diffwalk    0.0441     243049     0        │
  │ wtkg3      cvdcrhd4   -0.0438     243049     0        │
  │ flushot7   drnkwk2     0.0431     243049     0        │
  │ chckdny2   bmi5cat    -0.043      243049     0        │
  │ physhlth   sexvar      0.0429     243049     0        │
  │ priminsr   cvdstrk3   -0.0426     243049     0        │
  │ drnkwk2    bmi5       -0.0412     243049     0        │
  │ checkup1   drnkany6   -0.0406     243049     0        │
  │ asthma3    chckdny2    0.0404     243049     0        │
  │ income3    cureci2    -0.0404     243049     0        │
  │ asthma3    cvdstrk3    0.04       243049     0        │
  │ asthma3    drnkany6   -0.0397     243049     0        │
  │ sleptim1   marital    -0.0388     243049     0        │
  │ wtkg3      racepr1    -0.0388     243049     0        │
  │ asthma3    marital    -0.0386     243049     0        │
  │ cvdcrhd4   bmi5cat    -0.0384     243049     0        │
  │ chckdny2   drnkwk2     0.0382     243049     0        │
  │ asthma3    cvdcrhd4    0.0382     243049     0        │
  │ decide     bmi5cat    -0.0382     243049     0        │
  │ chckdny2   educag      0.038      243049     0        │
  │ addepev3   checkup1    0.0375     243049     0        │
  │ drnkwk2    genhlth    -0.0372     243049     0        │
  │ flushot7   wtkg3       0.037      243049     0        │
  │ employ1    bmi5cat    -0.0369     243049     0        │
  │ renthom1   cvdstrk3   -0.0369     243049     0        │
  │ menthlth   cvdcrhd4   -0.0367     243049     0        │
  │ cvdcrhd4   bmi5       -0.0359     243049     0        │
  │ checkup1   menthlth    0.0358     243049     0        │
  │ flushot7   decide     -0.0358     243049     0        │
  │ addepev3   cvdcrhd4    0.0355     243049     0        │
  │ marital    bmi5        0.0353     243049     0        │
  │ drnkwk2    hlthpln     0.0353     243049     0        │
  │ addepev3   deaf        0.0348     243049     0        │
  │ racepr1    cvdcrhd4    0.0346     243049     0        │
  │ genhlth    cureci2     0.0345     243049     0        │
  │ hlthpln    sexvar     -0.0342     243049     0        │
  │ denvst3    chckdny2   -0.0339     243049     0        │
  │ checkup1   diffdres    0.0335     243049     0        │
  │ totinda    hlthpln     0.0333     243049     0        │
  │ exerany2   hlthpln     0.0333     243049     0        │
  │ checkup1   racepr1     0.0331     243049     0        │
  │ htm4       priminsr   -0.0331     243049     0        │
  │ deaf       racepr1     0.0329     243049     0        │
  │ priminsr   cvdcrhd4   -0.0329     243049     0        │
  │ denvst3    deaf       -0.0326     243049     0        │
  │ renthom1   bmi5cat     0.0326     243049     0        │
  │ priminsr   age80       0.0326     243049     0        │
  │ hlthpln    cvdcrhd4    0.0325     243049     0        │
  │ priminsr   ageg5yr     0.0325     243049     0        │
  │ sleptim1   diffdres    0.0324     243049     0        │
  │ denvst3    asthma3    -0.032      243049     0        │
  │ drdxar2    drnkwk2     0.0315     243049     0        │
  │ racepr1    bmi5        0.0315     243049     0        │
  │ flushot7   totinda     0.0313     243049     0        │
  │ flushot7   exerany2    0.0313     243049     0        │
  │ hlthpln    decide     -0.0313     243049     0        │
  │ flushot7   cvdstrk3    0.0312     243049     0        │
  │ drnkwk2    bmi5cat    -0.0311     243049     0        │
  │ drnkwk2    physhlth   -0.031      243049     0        │
  │ denvst3    employ1     0.0308     243049     0        │
  │ employ1    bmi5       -0.0302     243049     0        │
  │ drnkwk2    totinda    -0.0301     243049     0        │
  │ drnkwk2    exerany2   -0.0301     243049     0        │
  │ racepr1    bmi5cat     0.0301     243049     0        │
  │ asthma3    checkup1    0.03       243049     0        │
  │ chckdny2   htm4        0.03       243049     0        │
  │ chckdny2   wtkg3      -0.0298     243049     0        │
  │ wtkg3      renthom1    0.0298     243049     0        │
  │ deaf       menthlth   -0.0297     243049     0        │
  │ diffdres   racepr1    -0.0287     243049     0        │
  │ sleptim1   blind       0.0286     243049     0        │
  │ menthlth   racepr1     0.0279     243049     0        │
  │ asthma3    deaf        0.0279     243049     0        │
  │ physhlth   cureci2     0.0279     243049     0        │
  │ wtkg3      decide     -0.0277     243049     0        │
  │ drnkwk2    menthlth    0.0276     243049     0        │
  │ chckdny2   priminsr   -0.0274     243049     0        │
  │ htm4       bmi5cat     0.0272     243049     0        │
  │ sleptim1   hlthpln    -0.0271     243049     0        │
  │ sleptim1   educag      0.0267     243049     0        │
  │ drnkwk2    racepr1    -0.0261     243049     0        │
  │ renthom1   employ1    -0.026      243049     0        │
  │ asthma3    employ1    -0.0258     243049     0        │
  │ blind      bmi5       -0.0258     243049     0        │
  │ wtkg3      income3     0.0256     243049     0        │
  │ deaf       hlthpln     0.0252     243049     0        │
  │ priminsr   sexvar     -0.025      243049     0        │
  │ addepev3   flushot7    0.0243     243049     0        │
  │ drnkwk2    cvdcrhd4    0.0242     243049     0        │
  │ htm4       cvdcrhd4   -0.024      243049     0        │
  │ diffdres   htm4        0.0237     243049     0        │
  │ denvst3    drnkwk2     0.0237     243049     0        │
  │ chckdny2   hlthpln     0.0236     243049     0        │
  │ asthma3    flushot7    0.0236     243049     0        │
  │ deaf       htm4       -0.0235     243049     0        │
  │ educag     sexvar      0.0232     243049     0        │
  │ checkup1   totinda    -0.0232     243049     0        │
  │ checkup1   exerany2   -0.0232     243049     0        │
  │ drnkwk2    wtkg3       0.023      243049     0        │
  │ drnkany6   hlthpln     0.0229     243049     0        │
  │ asthma3    cureci2    -0.0219     243049     0        │
  │ diffwalk   hlthpln     0.0217     243049     0        │
  │ flushot7   genhlth    -0.0215     243049     0        │
  │ hlthpln    genhlth     0.0212     243049     0        │
  │ menthlth   employ1     0.0212     243049     0        │
  │ htm4       bmi5       -0.0209     243049     0        │
  │ denvst3    drdxar2    -0.0209     243049     0        │
  │ drnkany6   cureci2    -0.0206     243049     0        │
  │ addepev3   educag      0.0203     243049     0        │
  │ sleptim1   diffwalk    0.02       243049     0        │
  │ renthom1   sexvar      0.0198     243049     0        │
  │ htm4       cvdstrk3    0.0197     243049     0        │
  │ chckdny2   renthom1   -0.0196     243049     0        │
  │ htm4       cureci2     0.0196     243049     0        │
  │ blind      sexvar     -0.0192     243049     0        │
  │ diffwalk   racepr1    -0.0189     243049     0        │
  │ asthma3    drnkwk2     0.0188     243049     0        │
  │ drnkwk2    cvdstrk3    0.0186     243049     0        │
  │ sleptim1   deaf       -0.0184     243049     0        │
  │ physhlth   racepr1     0.0181     243049     0        │
  │ totinda    cureci2     0.018      243049     0        │
  │ exerany2   cureci2     0.018      243049     0        │
  │ cureci2    cvdcrhd4    0.0178     243049     0        │
  │ marital    cvdstrk3   -0.0174     243049     0        │
  │ hlthpln    blind      -0.0169     243049     0        │
  │ checkup1   blind       0.0165     243049     4.44e-16 │
  │ hlthpln    cvdstrk3    0.0165     243049     4.44e-16 │
  │ asthma3    racepr1    -0.0164     243049     6.66e-16 │
  │ diffdres   cureci2    -0.0163     243049     8.88e-16 │
  │ deaf       wtkg3      -0.0163     243049     8.88e-16 │
  │ age80      sexvar      0.0162     243049     1.33e-15 │
  │ drdxar2    renthom1    0.0162     243049     1.33e-15 │
  │ sexvar     ageg5yr     0.0161     243049     2.44e-15 │
  │ drdxar2    cureci2     0.0157     243049     1.15e-14 │
  │ flushot7   bmi5        0.0154     243049     3.02e-14 │
  │ blind      cureci2    -0.0152     243049     7.48e-14 │
  │ addepev3   racepr1     0.015      243049     1.51e-13 │
  │ blind      bmi5cat    -0.0145     243049     7.6e-13  │
  │ asthma3    priminsr   -0.014      243049     5.39e-12 │
  │ chckdny2   cureci2     0.0139     243049     6.99e-12 │
  │ sleptim1   cvdcrhd4   -0.0136     243049     1.86e-11 │
  │ cvdstrk3   bmi5       -0.0136     243049     2.08e-11 │
  │ drnkwk2    diffdres    0.0134     243049     3.85e-11 │
  │ chckdny2   marital    -0.0133     243049     5.55e-11 │
  │ asthma3    educag      0.013      243049     1.36e-10 │
  │ deaf       bmi5cat    -0.0126     243049     5.65e-10 │
  │ flushot7   bmi5cat     0.0125     243049     6.21e-10 │
  │ sleptim1   cvdstrk3   -0.0124     243049     1.08e-09 │
  │ asthma3    hlthpln     0.0123     243049     1.16e-09 │
  │ cvdstrk3   bmi5cat    -0.0121     243049     2.7e-09  │
  │ deaf       renthom1    0.012      243049     3.46e-09 │
  │ racepr1    sexvar      0.0114     243049     1.75e-08 │
  │ deaf       cureci2     0.0114     243049     2.1e-08  │
  │ checkup1   wtkg3      -0.0112     243049     2.98e-08 │
  │ sleptim1   sexvar      0.0108     243049     8.95e-08 │
  │ sexvar     cureci2    -0.0107     243049     1.47e-07 │
  │ deaf       marital     0.0106     243049     1.53e-07 │
  │ wtkg3      cureci2     0.0106     243049     1.95e-07 │
  │ marital    cvdcrhd4    0.0104     243049     2.68e-07 │
  │ denvst3    htm4        0.0103     243049     3.46e-07 │
  │ priminsr   bmi5        0.0102     243049     4.84e-07 │
  │ racepr1    cureci2     0.0102     243049   5e-07      │
  │                                                       │
  │ sexvar     bmi5       -0.00998    243049     8.71e-07 │
  │ genhlth    sexvar      0.00989    243049     1.08e-06 │
  │ flushot7   drnkany6    0.00988    243049     1.11e-06 │
  │ wtkg3      priminsr   -0.00984    243049     1.24e-06 │
  │ diffdres   sexvar     -0.00974    243049     1.56e-06 │
  │ racepr1    cvdstrk3    0.00958    243049     2.31e-06 │
  │ sleptim1   htm4       -0.00956    243049     2.43e-06 │
  │ chckdny2   sexvar     -0.00928    243049     4.76e-06 │
  │ totinda    sleptim1   -0.00884    243049     1.3e-05  │
  │ exerany2   sleptim1   -0.00884    243049     1.3e-05  │
  │ flushot7   blind      -0.00877    243049     1.55e-05 │
  │ sleptim1   priminsr   -0.00876    243049     1.56e-05 │
  │ drdxar2    sleptim1    0.00838    243049     3.59e-05 │
  │ drnkwk2    marital     0.00827    243049     4.58e-05 │
  │ drdxar2    marital     0.00786    243049     0.000107 │
  │ flushot7   physhlth   -0.00778    243049     0.000125 │
  │ chckdny2   racepr1     0.00753    243049     0.000204 │
  │ chckdny2   sleptim1   -0.00697    243049     0.000594 │
  │ sexvar     cvdstrk3    0.00694    243049     0.000623 │
  │ drnkwk2    priminsr    0.00692    243049     0.000651 │
  │ drnkany6   wtkg3       0.00666    243049     0.00102  │
  │ sleptim1   income3     0.00662    243049     0.0011   │
  │ drnkwk2    renthom1   -0.00644    243049     0.0015   │
  │ drnkwk2    educag      0.00577    243049     0.00443  │
  │ marital    bmi5cat     0.0057     243049     0.00497  │
  │ diffdres   hlthpln     0.00558    243049     0.00598  │
  │ cureci2    cvdstrk3    0.0052     243049     0.0103   │
  │ drnkwk2    blind       0.00506    243049     0.0126   │
  │ addepev3   drnkwk2     0.00486    243049     0.0165   │
  │ drnkany6   sleptim1   -0.00404    243049     0.0466   │
  │ deaf       bmi5       -0.00397    243049     0.05     │
  │ cureci2    bmi5cat    -0.00376    243049     0.0639   │
  │ drnkwk2    sleptim1   -0.00373    243049     0.0657   │
  │ htm4       hlthpln    -0.00354    243049     0.0806   │
  │ priminsr   bmi5cat     0.0032     243049     0.115    │
  │ physhlth   hlthpln    -0.00249    243049     0.219    │
  │ wtkg3      hlthpln    -0.00207    243049     0.308    │
  │ wtkg3      cvdstrk3   -0.00206    243049     0.309    │
  │ hlthpln    bmi5        0.002      243049     0.325    │
  │ addepev3   hlthpln     0.00197    243049     0.332    │
  │ checkup1   income3     0.00189    243049     0.351    │
  │ wtkg3      blind       0.00189    243049     0.351    │
  │ marital    employ1     0.00154    243049     0.448    │
  │ drnkwk2    decide     -0.0015     243049     0.459    │
  │ flushot7   diffdres   -0.00143    243049     0.48     │
  │ cureci2    bmi5        0.00127    243049     0.532    │
  │ hlthpln    bmi5cat    -0.0012     243049     0.554    │
  │ renthom1   cvdcrhd4    0.000892   243049     0.66     │
  │ checkup1   decide      0.000616   243049     0.761    │
  │ drnkwk2    deaf        0.000379   243049     0.852    │
  │ diffwalk   cureci2    -0.000232   243049     0.909    │
  │ wtkg3      marital    -0.000111   243049     0.956    │
  └───────────────────────────────────────────────────────┘

Column names: column1, column2, estimate, n, p.value

# split data
set.seed(2024021401)
data_split <-
  data %>% 
  dplyr::sample_frac(size = 0.05, replace = FALSE) %>% #use 10% of data due to lack of computing power
  initial_split(strata = diabete4) # strata by diabete4
data_train <-
  data_split %>% 
  training()
data_test <-
  data_split %>% 
  testing()
data_fold <-
  data_train %>% 
  vfold_cv(v = 10, strata = diabete4)
# split data
set.seed(2024021401)
data_split_big <-
  data %>% 
  initial_split(strata = diabete4) # strata by diabete4
data_train_big <-
  data_split_big %>% 
  training()
data_test_big <-
  data_split_big %>% 
  testing()
data_fold_big <-
  data_train_big %>% 
  vfold_cv(v = 10, strata = diabete4)
base_rec <-
  recipes::recipe(formula = diabete4 ~.,
                  data = data_train) %>% 
  step_zv(all_predictors())

dummy_rec <-
  base_rec %>% 
  step_dummy(all_nominal_predictors())

normal_rec <-
  dummy_rec %>% 
  step_normalize(all_predictors())


log_rec <-
  base_rec %>% 
  step_log(all_numeric_predictors())
# random forest
rf_spec <-
  rand_forest(trees = 1000L) %>% 
  set_engine("ranger",
             importance = "permutation") %>% 
  set_mode("classification")

rf_spec_for_tuning <-
  rf_spec %>% 
  set_args(mtry = tune(),
           min_n = tune())

# Classification Tree Model
ct_spec <- 
  decision_tree() %>%
  set_engine(engine = 'rpart') %>%
  set_mode('classification') 

ct_spec_for_tuning <-
  ct_spec %>% 
  set_args(tree_depth = tune(),
           min_n = tune(), 
           cost_complexity = tune())

# knn
knn_spec <-
  nearest_neighbor() %>% 
  set_engine("kknn") %>% 
  set_mode("classification")

knn_spec_for_tuning <-
  knn_spec %>% 
  set_args(neighbors = tune(),
           weight_func = tune(),
           dist_power = tune())

# xgboost
xgb_spec <-
  boost_tree(trees = 1000L) %>% 
  set_engine("xgboost") %>% 
  set_mode("classification")

xgb_spec_for_tuning <-
  xgb_spec %>% 
  set_args(tree_depth = tune(),
           min_n = tune(),
           loss_reduction = tune(),
           sample_size = tune(),
           mtry = tune(),
           learn_rate = tune())

# # naive bayes

naive_spec <-
  naive_Bayes() %>%
  set_engine("naivebayes",
             usepoisson = TRUE) %>%
  set_mode("classification")

naive_spec_for_tuning <-
  naive_spec %>% 
  set_args(smoothness = tune(),
           Laplace = tune())

# Logistic Regression Model
logistic_spec <- 
  logistic_reg() %>%
  set_engine(engine = 'glm') %>%
  set_mode('classification') 

# Lasso Logistic Regression Model

logistic_lasso_spec <-
  logistic_reg(mixture = 1, penalty = 1) %>% 
  set_engine(engine = 'glmnet') %>%
  set_mode('classification') 


logistic_lasso_spec_for_tuning <- 
  logistic_lasso_spec %>% 
  set_args(penalty = tune()) #we could let penalty = tune()
base_set <- #works
  workflow_set (
    list(base_rec, dummy_rec, log_rec), #preprocessor
    list(rf_spec, ct_spec,
         rf_spec_for_tuning, ct_spec_for_tuning), #model
    cross = TRUE) #default is cross = TRUE

dummy_set <- #works
  workflow_set (
    list(dummy_rec),
    list(knn_spec, xgb_spec, logistic_spec,
         knn_spec_for_tuning, xgb_spec_for_tuning),
    cross = TRUE)

normal_set <-
  workflow_set(
    list(normal_rec),
    list(logistic_lasso_spec,
         logistic_lasso_spec_for_tuning),
    cross = TRUE)

naive_set <- #works
  workflow_set(
    list(base_rec, log_rec),
    list(naive_spec,
         naive_spec_for_tuning),
    cross = TRUE)

model_set <-
  bind_rows(base_set, dummy_set, normal_set, naive_set)