Data-Informed Thinking + Doing

Inferential Statistics

Drawing inferences about a population based on information about a random sample of NHANES data—using R, Python, and Julia.

Inferential statistics is a branch of statistics that focuses on making inferences and drawing conclusions about a sufficiently large population based on information obtained from a random sample. It involves using probability theory and statistical methods to analyze sample data and generalize the findings to the larger population. By carefully selecting and collecting representative samples, inferential statistics allows researchers to estimate population parameters, test hypotheses, assess the significance of relationships, and make predictions.

This is valuable in overcoming the limitations imposed by time, cost, and logistics—enabling us to make meaningful claims and draw meaningful insights about populations without having to examine every individual within them.

Let’s look at the inferences we can draw from the NHANES dataset.

Getting Started

If you are interested in reproducing this work, here are the versions of R, Python, and Julia used (as well as the respective packages for each). Additionally, Leland Wilkinson’s approach to data visualization (Grammar of Graphics) has been adopted for this work. Finally, my coding style here is verbose, in order to trace back where functions/methods and variables are originating from, and make this a learning experience for everyone—including me.

R.version.string
[1] "R version 4.2.3 (2023-03-15)"
require(devtools)
devtools::install_version("NHANES", version = "2.1.0", repos = "http://cran.us.r-project.org")
devtools::install_version("dplyr", version="1.1.1", repos="http://cran.us.r-project.org")
devtools::install_version("ggplot2", version="3.4.2", repos="http://cran.us.r-project.org")
devtools::install_version("infer", version = "1.0.4", repos = "http://cran.us.r-project.org")
library(NHANES)
library(dplyr)
library(ggplot2)
library(infer)
import sys
print(sys.version)
3.11.4 (v3.11.4:d2340ef257, Jun  6 2023, 19:15:51) [Clang 13.0.0 (clang-1300.0.29.30)]
!pip install pandas==2.0.3
!pip install plotnine==0.12.1
import pandas
import plotnine
using InteractiveUtils
InteractiveUtils.versioninfo()
Julia Version 1.9.2
Commit e4ee485e909 (2023-07-05 09:39 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin22.4.0)
  CPU: 8 × Intel(R) Core(TM) i5-8259U CPU @ 2.30GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, skylake)
  Threads: 1 on 8 virtual cores
Environment:
  DYLD_FALLBACK_LIBRARY_PATH = /Library/Frameworks/R.framework/Resources/lib:/Library/Java/JavaVirtualMachines/jdk1.8.0_241.jdk/Contents/Home/jre/lib/server
using Pkg
Pkg.add(name="CSV", version="0.10.11")
Pkg.add(name="DataFrames", version="1.5.0")
Pkg.add(name="CategoricalArrays", version="0.10.8")
Pkg.add(name="Colors", version="0.12.8")
Pkg.add(name="Cairo", version="1.0.5")
Pkg.add(name="Gadfly", version="1.4.0")
using CSV
using DataFrames
using CategoricalArrays
using Colors
using Cairo
using Gadfly

Importing and Examining Dataset

nhanes_r <- read.csv("../../dataset/nhanes.csv")
str(object=nhanes_r)
'data.frame':	10000 obs. of  77 variables:
 $ X               : int  1 2 3 4 5 6 7 8 9 10 ...
 $ ID              : int  51624 51624 51624 51625 51630 51638 51646 51647 51647 51647 ...
 $ SurveyYr        : chr  "2009_10" "2009_10" "2009_10" "2009_10" ...
 $ Gender          : chr  "male" "male" "male" "male" ...
 $ Age             : int  34 34 34 4 49 9 8 45 45 45 ...
 $ AgeDecade       : chr  " 30-39" " 30-39" " 30-39" " 0-9" ...
 $ AgeMonths       : int  409 409 409 49 596 115 101 541 541 541 ...
 $ Race1           : chr  "White" "White" "White" "Other" ...
 $ Race3           : chr  NA NA NA NA ...
 $ Education       : chr  "High School" "High School" "High School" NA ...
 $ MaritalStatus   : chr  "Married" "Married" "Married" NA ...
 $ HHIncome        : chr  "25000-34999" "25000-34999" "25000-34999" "20000-24999" ...
 $ HHIncomeMid     : int  30000 30000 30000 22500 40000 87500 60000 87500 87500 87500 ...
 $ Poverty         : num  1.36 1.36 1.36 1.07 1.91 1.84 2.33 5 5 5 ...
 $ HomeRooms       : int  6 6 6 9 5 6 7 6 6 6 ...
 $ HomeOwn         : chr  "Own" "Own" "Own" "Own" ...
 $ Work            : chr  "NotWorking" "NotWorking" "NotWorking" NA ...
 $ Weight          : num  87.4 87.4 87.4 17 86.7 29.8 35.2 75.7 75.7 75.7 ...
 $ Length          : num  NA NA NA NA NA NA NA NA NA NA ...
 $ HeadCirc        : num  NA NA NA NA NA NA NA NA NA NA ...
 $ Height          : num  165 165 165 105 168 ...
 $ BMI             : num  32.2 32.2 32.2 15.3 30.6 ...
 $ BMICatUnder20yrs: chr  NA NA NA NA ...
 $ BMI_WHO         : chr  "30.0_plus" "30.0_plus" "30.0_plus" "12.0_18.5" ...
 $ Pulse           : int  70 70 70 NA 86 82 72 62 62 62 ...
 $ BPSysAve        : int  113 113 113 NA 112 86 107 118 118 118 ...
 $ BPDiaAve        : int  85 85 85 NA 75 47 37 64 64 64 ...
 $ BPSys1          : int  114 114 114 NA 118 84 114 106 106 106 ...
 $ BPDia1          : int  88 88 88 NA 82 50 46 62 62 62 ...
 $ BPSys2          : int  114 114 114 NA 108 84 108 118 118 118 ...
 $ BPDia2          : int  88 88 88 NA 74 50 36 68 68 68 ...
 $ BPSys3          : int  112 112 112 NA 116 88 106 118 118 118 ...
 $ BPDia3          : int  82 82 82 NA 76 44 38 60 60 60 ...
 $ Testosterone    : num  NA NA NA NA NA NA NA NA NA NA ...
 $ DirectChol      : num  1.29 1.29 1.29 NA 1.16 1.34 1.55 2.12 2.12 2.12 ...
 $ TotChol         : num  3.49 3.49 3.49 NA 6.7 4.86 4.09 5.82 5.82 5.82 ...
 $ UrineVol1       : int  352 352 352 NA 77 123 238 106 106 106 ...
 $ UrineFlow1      : num  NA NA NA NA 0.094 ...
 $ UrineVol2       : int  NA NA NA NA NA NA NA NA NA NA ...
 $ UrineFlow2      : num  NA NA NA NA NA NA NA NA NA NA ...
 $ Diabetes        : chr  "No" "No" "No" "No" ...
 $ DiabetesAge     : int  NA NA NA NA NA NA NA NA NA NA ...
 $ HealthGen       : chr  "Good" "Good" "Good" NA ...
 $ DaysPhysHlthBad : int  0 0 0 NA 0 NA NA 0 0 0 ...
 $ DaysMentHlthBad : int  15 15 15 NA 10 NA NA 3 3 3 ...
 $ LittleInterest  : chr  "Most" "Most" "Most" NA ...
 $ Depressed       : chr  "Several" "Several" "Several" NA ...
 $ nPregnancies    : int  NA NA NA NA 2 NA NA 1 1 1 ...
 $ nBabies         : int  NA NA NA NA 2 NA NA NA NA NA ...
 $ Age1stBaby      : int  NA NA NA NA 27 NA NA NA NA NA ...
 $ SleepHrsNight   : int  4 4 4 NA 8 NA NA 8 8 8 ...
 $ SleepTrouble    : chr  "Yes" "Yes" "Yes" NA ...
 $ PhysActive      : chr  "No" "No" "No" NA ...
 $ PhysActiveDays  : int  NA NA NA NA NA NA NA 5 5 5 ...
 $ TVHrsDay        : chr  NA NA NA NA ...
 $ CompHrsDay      : chr  NA NA NA NA ...
 $ TVHrsDayChild   : int  NA NA NA 4 NA 5 1 NA NA NA ...
 $ CompHrsDayChild : int  NA NA NA 1 NA 0 6 NA NA NA ...
 $ Alcohol12PlusYr : chr  "Yes" "Yes" "Yes" NA ...
 $ AlcoholDay      : int  NA NA NA NA 2 NA NA 3 3 3 ...
 $ AlcoholYear     : int  0 0 0 NA 20 NA NA 52 52 52 ...
 $ SmokeNow        : chr  "No" "No" "No" NA ...
 $ Smoke100        : chr  "Yes" "Yes" "Yes" NA ...
 $ Smoke100n       : chr  "Smoker" "Smoker" "Smoker" NA ...
 $ SmokeAge        : int  18 18 18 NA 38 NA NA NA NA NA ...
 $ Marijuana       : chr  "Yes" "Yes" "Yes" NA ...
 $ AgeFirstMarij   : int  17 17 17 NA 18 NA NA 13 13 13 ...
 $ RegularMarij    : chr  "No" "No" "No" NA ...
 $ AgeRegMarij     : int  NA NA NA NA NA NA NA NA NA NA ...
 $ HardDrugs       : chr  "Yes" "Yes" "Yes" NA ...
 $ SexEver         : chr  "Yes" "Yes" "Yes" NA ...
 $ SexAge          : int  16 16 16 NA 12 NA NA 13 13 13 ...
 $ SexNumPartnLife : int  8 8 8 NA 10 NA NA 20 20 20 ...
 $ SexNumPartYear  : int  1 1 1 NA 1 NA NA 0 0 0 ...
 $ SameSex         : chr  "No" "No" "No" NA ...
 $ SexOrientation  : chr  "Heterosexual" "Heterosexual" "Heterosexual" NA ...
 $ PregnantNow     : chr  NA NA NA NA ...
head(x=nhanes_r, n=8)
  X    ID SurveyYr Gender Age AgeDecade AgeMonths Race1 Race3    Education MaritalStatus    HHIncome HHIncomeMid Poverty HomeRooms HomeOwn       Work Weight Length HeadCirc Height BMI BMICatUnder20yrs      BMI_WHO Pulse BPSysAve BPDiaAve BPSys1 BPDia1 BPSys2 BPDia2 BPSys3 BPDia3 Testosterone DirectChol TotChol UrineVol1 UrineFlow1 UrineVol2 UrineFlow2 Diabetes DiabetesAge HealthGen DaysPhysHlthBad DaysMentHlthBad LittleInterest Depressed nPregnancies nBabies Age1stBaby SleepHrsNight SleepTrouble PhysActive PhysActiveDays TVHrsDay CompHrsDay TVHrsDayChild CompHrsDayChild Alcohol12PlusYr AlcoholDay AlcoholYear SmokeNow Smoke100  Smoke100n SmokeAge Marijuana AgeFirstMarij RegularMarij AgeRegMarij HardDrugs SexEver SexAge SexNumPartnLife SexNumPartYear SameSex SexOrientation PregnantNow
1 1 51624  2009_10   male  34     30-39       409 White  <NA>  High School       Married 25000-34999       30000     1.4         6     Own NotWorking     87     NA       NA    165  32             <NA>    30.0_plus    70      113       85    114     88    114     88    112     82           NA        1.3     3.5       352         NA        NA         NA       No          NA      Good               0              15           Most   Several           NA      NA         NA             4          Yes         No             NA     <NA>       <NA>            NA              NA             Yes         NA           0       No      Yes     Smoker       18       Yes            17           No          NA       Yes     Yes     16               8              1      No   Heterosexual        <NA>
2 2 51624  2009_10   male  34     30-39       409 White  <NA>  High School       Married 25000-34999       30000     1.4         6     Own NotWorking     87     NA       NA    165  32             <NA>    30.0_plus    70      113       85    114     88    114     88    112     82           NA        1.3     3.5       352         NA        NA         NA       No          NA      Good               0              15           Most   Several           NA      NA         NA             4          Yes         No             NA     <NA>       <NA>            NA              NA             Yes         NA           0       No      Yes     Smoker       18       Yes            17           No          NA       Yes     Yes     16               8              1      No   Heterosexual        <NA>
3 3 51624  2009_10   male  34     30-39       409 White  <NA>  High School       Married 25000-34999       30000     1.4         6     Own NotWorking     87     NA       NA    165  32             <NA>    30.0_plus    70      113       85    114     88    114     88    112     82           NA        1.3     3.5       352         NA        NA         NA       No          NA      Good               0              15           Most   Several           NA      NA         NA             4          Yes         No             NA     <NA>       <NA>            NA              NA             Yes         NA           0       No      Yes     Smoker       18       Yes            17           No          NA       Yes     Yes     16               8              1      No   Heterosexual        <NA>
4 4 51625  2009_10   male   4       0-9        49 Other  <NA>         <NA>          <NA> 20000-24999       22500     1.1         9     Own       <NA>     17     NA       NA    105  15             <NA>    12.0_18.5    NA       NA       NA     NA     NA     NA     NA     NA     NA           NA         NA      NA        NA         NA        NA         NA       No          NA      <NA>              NA              NA           <NA>      <NA>           NA      NA         NA            NA         <NA>       <NA>             NA     <NA>       <NA>             4               1            <NA>         NA          NA     <NA>     <NA>       <NA>       NA      <NA>            NA         <NA>          NA      <NA>    <NA>     NA              NA             NA    <NA>           <NA>        <NA>
5 5 51630  2009_10 female  49     40-49       596 White  <NA> Some College   LivePartner 35000-44999       40000     1.9         5    Rent NotWorking     87     NA       NA    168  31             <NA>    30.0_plus    86      112       75    118     82    108     74    116     76           NA        1.2     6.7        77      0.094        NA         NA       No          NA      Good               0              10        Several   Several            2       2         27             8          Yes         No             NA     <NA>       <NA>            NA              NA             Yes          2          20      Yes      Yes     Smoker       38       Yes            18           No          NA       Yes     Yes     12              10              1     Yes   Heterosexual        <NA>
6 6 51638  2009_10   male   9       0-9       115 White  <NA>         <NA>          <NA> 75000-99999       87500     1.8         6    Rent       <NA>     30     NA       NA    133  17             <NA>    12.0_18.5    82       86       47     84     50     84     50     88     44           NA        1.3     4.9       123      1.538        NA         NA       No          NA      <NA>              NA              NA           <NA>      <NA>           NA      NA         NA            NA         <NA>       <NA>             NA     <NA>       <NA>             5               0            <NA>         NA          NA     <NA>     <NA>       <NA>       NA      <NA>            NA         <NA>          NA      <NA>    <NA>     NA              NA             NA    <NA>           <NA>        <NA>
7 7 51646  2009_10   male   8       0-9       101 White  <NA>         <NA>          <NA> 55000-64999       60000     2.3         7     Own       <NA>     35     NA       NA    131  21             <NA> 18.5_to_24.9    72      107       37    114     46    108     36    106     38           NA        1.6     4.1       238      1.322        NA         NA       No          NA      <NA>              NA              NA           <NA>      <NA>           NA      NA         NA            NA         <NA>       <NA>             NA     <NA>       <NA>             1               6            <NA>         NA          NA     <NA>     <NA>       <NA>       NA      <NA>            NA         <NA>          NA      <NA>    <NA>     NA              NA             NA    <NA>           <NA>        <NA>
8 8 51647  2009_10 female  45     40-49       541 White  <NA> College Grad       Married 75000-99999       87500     5.0         6     Own    Working     76     NA       NA    167  27             <NA> 25.0_to_29.9    62      118       64    106     62    118     68    118     60           NA        2.1     5.8       106      1.116        NA         NA       No          NA     Vgood               0               3           None      None            1      NA         NA             8           No        Yes              5     <NA>       <NA>            NA              NA             Yes          3          52     <NA>       No Non-Smoker       NA       Yes            13           No          NA        No     Yes     13              20              0     Yes       Bisexual        <NA>
tail(x=nhanes_r, n=8)
          X    ID SurveyYr Gender Age AgeDecade AgeMonths   Race1   Race3      Education MaritalStatus    HHIncome HHIncomeMid Poverty HomeRooms HomeOwn    Work Weight Length HeadCirc Height BMI BMICatUnder20yrs      BMI_WHO Pulse BPSysAve BPDiaAve BPSys1 BPDia1 BPSys2 BPDia2 BPSys3 BPDia3 Testosterone DirectChol TotChol UrineVol1 UrineFlow1 UrineVol2 UrineFlow2 Diabetes DiabetesAge HealthGen DaysPhysHlthBad DaysMentHlthBad LittleInterest Depressed nPregnancies nBabies Age1stBaby SleepHrsNight SleepTrouble PhysActive PhysActiveDays TVHrsDay CompHrsDay TVHrsDayChild CompHrsDayChild Alcohol12PlusYr AlcoholDay AlcoholYear SmokeNow Smoke100  Smoke100n SmokeAge Marijuana AgeFirstMarij RegularMarij AgeRegMarij HardDrugs SexEver SexAge SexNumPartnLife SexNumPartYear SameSex SexOrientation PregnantNow
9993   9993 71908  2011_12 female  66     60-69        NA   White   White   College Grad       Widowed 65000-74999       70000    4.55         8     Own Working   88.7     NA       NA    159  35             <NA>    30.0_plus    76      114       70    110     74    114     68    114     72           26       1.86     6.5        29       0.66        94       0.63       No          NA Excellent               0               0           None      None            2       2         22             6           No         No             NA     2_hr  0_to_1_hr            NA              NA              No          1           5     <NA>       No Non-Smoker       NA      <NA>            NA         <NA>          NA        No     Yes     18               1             NA      No           <NA>        <NA>
9994   9994 71909  2011_12   male  28     20-29        NA Mexican Mexican 9 - 11th Grade  NeverMarried   5000-9999        7500    0.46         3    Rent Working   92.3     NA       NA    177  29             <NA> 25.0_to_29.9    68      124       65    124     62    126     64    122     66          490       1.22     3.9        97       0.94        NA         NA       No          NA      <NA>              NA              NA           <NA>      <NA>           NA      NA         NA             6           No        Yes             NA     1_hr       2_hr            NA              NA            <NA>         NA          NA      Yes      Yes     Smoker       18      <NA>            NA         <NA>          NA      <NA>    <NA>     NA              NA             NA    <NA>           <NA>        <NA>
9995   9995 71909  2011_12   male  28     20-29        NA Mexican Mexican 9 - 11th Grade  NeverMarried   5000-9999        7500    0.46         3    Rent Working   92.3     NA       NA    177  29             <NA> 25.0_to_29.9    68      124       65    124     62    126     64    122     66          490       1.22     3.9        97       0.94        NA         NA       No          NA      <NA>              NA              NA           <NA>      <NA>           NA      NA         NA             6           No        Yes             NA     1_hr       2_hr            NA              NA            <NA>         NA          NA      Yes      Yes     Smoker       18      <NA>            NA         <NA>          NA      <NA>    <NA>     NA              NA             NA    <NA>           <NA>        <NA>
9996   9996 71909  2011_12   male  28     20-29        NA Mexican Mexican 9 - 11th Grade  NeverMarried   5000-9999        7500    0.46         3    Rent Working   92.3     NA       NA    177  29             <NA> 25.0_to_29.9    68      124       65    124     62    126     64    122     66          490       1.22     3.9        97       0.94        NA         NA       No          NA      <NA>              NA              NA           <NA>      <NA>           NA      NA         NA             6           No        Yes             NA     1_hr       2_hr            NA              NA            <NA>         NA          NA      Yes      Yes     Smoker       18      <NA>            NA         <NA>          NA      <NA>    <NA>     NA              NA             NA    <NA>           <NA>        <NA>
9997   9997 71910  2011_12 female   0       0-9         5   White   White           <NA>          <NA> 75000-99999       87500    3.37        10     Own    <NA>    6.7     68       42     NA  NA             <NA>         <NA>    NA       NA       NA     NA     NA     NA     NA     NA     NA           NA         NA      NA        NA         NA        NA         NA     <NA>          NA      <NA>              NA              NA           <NA>      <NA>           NA      NA         NA            NA         <NA>       <NA>             NA     <NA>       <NA>            NA              NA            <NA>         NA          NA     <NA>     <NA>       <NA>       NA      <NA>            NA         <NA>          NA      <NA>    <NA>     NA              NA             NA    <NA>           <NA>        <NA>
9998   9998 71911  2011_12   male  27     20-29        NA Mexican Mexican   College Grad       Married 75000-99999       87500    3.25        10     Own Working   96.7     NA       NA    176  31             <NA>    30.0_plus    74      133       74    122     76    132     82    134     66          509       1.06     5.7        63       0.60        NA         NA       No          NA      Good               0               2           None      None           NA      NA         NA             6           No         No              3     1_hr  0_to_1_hr            NA              NA             Yes          5           4     <NA>       No Non-Smoker       NA       Yes            22           No          NA        No     Yes     21               1              1      No   Heterosexual        <NA>
9999   9999 71915  2011_12   male  60     60-69        NA   White   White   College Grad  NeverMarried 65000-74999       70000    5.00         4     Own Working   78.4     NA       NA    169  28             <NA> 25.0_to_29.9    76      147       73    150     72    148     74    146     72          505       0.93     4.9       218       1.25        NA         NA      Yes          56      Good               0               2           None      None           NA      NA         NA             6           No         No              1     2_hr       1_hr            NA              NA             Yes         NA           0     <NA>       No Non-Smoker       NA      <NA>            NA         <NA>          NA        No     Yes     19               2             NA      No           <NA>        <NA>
10000 10000 71915  2011_12   male  60     60-69        NA   White   White   College Grad  NeverMarried 65000-74999       70000    5.00         4     Own Working   78.4     NA       NA    169  28             <NA> 25.0_to_29.9    76      147       73    150     72    148     74    146     72          505       0.93     4.9       218       1.25        NA         NA      Yes          56      Good               0               2           None      None           NA      NA         NA             6           No         No             NA     2_hr       1_hr            NA              NA             Yes         NA           0     <NA>       No Non-Smoker       NA      <NA>            NA         <NA>          NA        No     Yes     19               2             NA      No           <NA>        <NA>
nhanes_py = pandas.read_csv("../../dataset/nhanes.csv")
nhanes_py.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 77 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   Unnamed: 0        10000 non-null  int64  
 1   ID                10000 non-null  int64  
 2   SurveyYr          10000 non-null  object 
 3   Gender            10000 non-null  object 
 4   Age               10000 non-null  int64  
 5   AgeDecade         9667 non-null   object 
 6   AgeMonths         4962 non-null   float64
 7   Race1             10000 non-null  object 
 8   Race3             5000 non-null   object 
 9   Education         7221 non-null   object 
 10  MaritalStatus     7231 non-null   object 
 11  HHIncome          9189 non-null   object 
 12  HHIncomeMid       9189 non-null   float64
 13  Poverty           9274 non-null   float64
 14  HomeRooms         9931 non-null   float64
 15  HomeOwn           9937 non-null   object 
 16  Work              7771 non-null   object 
 17  Weight            9922 non-null   float64
 18  Length            543 non-null    float64
 19  HeadCirc          88 non-null     float64
 20  Height            9647 non-null   float64
 21  BMI               9634 non-null   float64
 22  BMICatUnder20yrs  1274 non-null   object 
 23  BMI_WHO           9603 non-null   object 
 24  Pulse             8563 non-null   float64
 25  BPSysAve          8551 non-null   float64
 26  BPDiaAve          8551 non-null   float64
 27  BPSys1            8237 non-null   float64
 28  BPDia1            8237 non-null   float64
 29  BPSys2            8353 non-null   float64
 30  BPDia2            8353 non-null   float64
 31  BPSys3            8365 non-null   float64
 32  BPDia3            8365 non-null   float64
 33  Testosterone      4126 non-null   float64
 34  DirectChol        8474 non-null   float64
 35  TotChol           8474 non-null   float64
 36  UrineVol1         9013 non-null   float64
 37  UrineFlow1        8397 non-null   float64
 38  UrineVol2         1478 non-null   float64
 39  UrineFlow2        1476 non-null   float64
 40  Diabetes          9858 non-null   object 
 41  DiabetesAge       629 non-null    float64
 42  HealthGen         7539 non-null   object 
 43  DaysPhysHlthBad   7532 non-null   float64
 44  DaysMentHlthBad   7534 non-null   float64
 45  LittleInterest    1564 non-null   object 
 46  Depressed         1427 non-null   object 
 47  nPregnancies      2604 non-null   float64
 48  nBabies           2416 non-null   float64
 49  Age1stBaby        1884 non-null   float64
 50  SleepHrsNight     7755 non-null   float64
 51  SleepTrouble      7772 non-null   object 
 52  PhysActive        8326 non-null   object 
 53  PhysActiveDays    4663 non-null   float64
 54  TVHrsDay          4859 non-null   object 
 55  CompHrsDay        4863 non-null   object 
 56  TVHrsDayChild     653 non-null    float64
 57  CompHrsDayChild   653 non-null    float64
 58  Alcohol12PlusYr   6580 non-null   object 
 59  AlcoholDay        4914 non-null   float64
 60  AlcoholYear       5922 non-null   float64
 61  SmokeNow          3211 non-null   object 
 62  Smoke100          7235 non-null   object 
 63  Smoke100n         7235 non-null   object 
 64  SmokeAge          3080 non-null   float64
 65  Marijuana         4941 non-null   object 
 66  AgeFirstMarij     2891 non-null   float64
 67  RegularMarij      4941 non-null   object 
 68  AgeRegMarij       1366 non-null   float64
 69  HardDrugs         5765 non-null   object 
 70  SexEver           5767 non-null   object 
 71  SexAge            5540 non-null   float64
 72  SexNumPartnLife   5725 non-null   float64
 73  SexNumPartYear    4928 non-null   float64
 74  SameSex           5768 non-null   object 
 75  SexOrientation    4842 non-null   object 
 76  PregnantNow       1696 non-null   object 
dtypes: float64(43), int64(3), object(31)
memory usage: 5.9+ MB
nhanes_py.head(n=8)
   Unnamed: 0     ID SurveyYr  Gender  Age AgeDecade  AgeMonths  Race1 Race3     Education MaritalStatus     HHIncome  HHIncomeMid  Poverty  HomeRooms HomeOwn        Work  Weight  Length  HeadCirc  Height    BMI BMICatUnder20yrs       BMI_WHO  Pulse  BPSysAve  BPDiaAve  BPSys1  BPDia1  BPSys2  BPDia2  BPSys3  BPDia3  Testosterone  DirectChol  TotChol  UrineVol1  UrineFlow1  UrineVol2  UrineFlow2 Diabetes  DiabetesAge HealthGen  DaysPhysHlthBad  DaysMentHlthBad LittleInterest Depressed  nPregnancies  nBabies  Age1stBaby  SleepHrsNight SleepTrouble PhysActive  PhysActiveDays TVHrsDay CompHrsDay  TVHrsDayChild  CompHrsDayChild Alcohol12PlusYr  AlcoholDay  AlcoholYear SmokeNow Smoke100   Smoke100n  SmokeAge Marijuana  AgeFirstMarij RegularMarij  AgeRegMarij HardDrugs SexEver  SexAge  SexNumPartnLife  SexNumPartYear SameSex SexOrientation PregnantNow
0           1  51624  2009_10    male   34     30-39      409.0  White   NaN   High School       Married  25000-34999      30000.0     1.36        6.0     Own  NotWorking    87.4     NaN       NaN   164.7  32.22              NaN     30.0_plus   70.0     113.0      85.0   114.0    88.0   114.0    88.0   112.0    82.0           NaN        1.29     3.49      352.0         NaN        NaN         NaN       No          NaN      Good              0.0             15.0           Most   Several           NaN      NaN         NaN            4.0          Yes         No             NaN      NaN        NaN            NaN              NaN             Yes         NaN          0.0       No      Yes      Smoker      18.0       Yes           17.0           No          NaN       Yes     Yes    16.0              8.0             1.0      No   Heterosexual         NaN
1           2  51624  2009_10    male   34     30-39      409.0  White   NaN   High School       Married  25000-34999      30000.0     1.36        6.0     Own  NotWorking    87.4     NaN       NaN   164.7  32.22              NaN     30.0_plus   70.0     113.0      85.0   114.0    88.0   114.0    88.0   112.0    82.0           NaN        1.29     3.49      352.0         NaN        NaN         NaN       No          NaN      Good              0.0             15.0           Most   Several           NaN      NaN         NaN            4.0          Yes         No             NaN      NaN        NaN            NaN              NaN             Yes         NaN          0.0       No      Yes      Smoker      18.0       Yes           17.0           No          NaN       Yes     Yes    16.0              8.0             1.0      No   Heterosexual         NaN
2           3  51624  2009_10    male   34     30-39      409.0  White   NaN   High School       Married  25000-34999      30000.0     1.36        6.0     Own  NotWorking    87.4     NaN       NaN   164.7  32.22              NaN     30.0_plus   70.0     113.0      85.0   114.0    88.0   114.0    88.0   112.0    82.0           NaN        1.29     3.49      352.0         NaN        NaN         NaN       No          NaN      Good              0.0             15.0           Most   Several           NaN      NaN         NaN            4.0          Yes         No             NaN      NaN        NaN            NaN              NaN             Yes         NaN          0.0       No      Yes      Smoker      18.0       Yes           17.0           No          NaN       Yes     Yes    16.0              8.0             1.0      No   Heterosexual         NaN
3           4  51625  2009_10    male    4       0-9       49.0  Other   NaN           NaN           NaN  20000-24999      22500.0     1.07        9.0     Own         NaN    17.0     NaN       NaN   105.4  15.30              NaN     12.0_18.5    NaN       NaN       NaN     NaN     NaN     NaN     NaN     NaN     NaN           NaN         NaN      NaN        NaN         NaN        NaN         NaN       No          NaN       NaN              NaN              NaN            NaN       NaN           NaN      NaN         NaN            NaN          NaN        NaN             NaN      NaN        NaN            4.0              1.0             NaN         NaN          NaN      NaN      NaN         NaN       NaN       NaN            NaN          NaN          NaN       NaN     NaN     NaN              NaN             NaN     NaN            NaN         NaN
4           5  51630  2009_10  female   49     40-49      596.0  White   NaN  Some College   LivePartner  35000-44999      40000.0     1.91        5.0    Rent  NotWorking    86.7     NaN       NaN   168.4  30.57              NaN     30.0_plus   86.0     112.0      75.0   118.0    82.0   108.0    74.0   116.0    76.0           NaN        1.16     6.70       77.0       0.094        NaN         NaN       No          NaN      Good              0.0             10.0        Several   Several           2.0      2.0        27.0            8.0          Yes         No             NaN      NaN        NaN            NaN              NaN             Yes         2.0         20.0      Yes      Yes      Smoker      38.0       Yes           18.0           No          NaN       Yes     Yes    12.0             10.0             1.0     Yes   Heterosexual         NaN
5           6  51638  2009_10    male    9       0-9      115.0  White   NaN           NaN           NaN  75000-99999      87500.0     1.84        6.0    Rent         NaN    29.8     NaN       NaN   133.1  16.82              NaN     12.0_18.5   82.0      86.0      47.0    84.0    50.0    84.0    50.0    88.0    44.0           NaN        1.34     4.86      123.0       1.538        NaN         NaN       No          NaN       NaN              NaN              NaN            NaN       NaN           NaN      NaN         NaN            NaN          NaN        NaN             NaN      NaN        NaN            5.0              0.0             NaN         NaN          NaN      NaN      NaN         NaN       NaN       NaN            NaN          NaN          NaN       NaN     NaN     NaN              NaN             NaN     NaN            NaN         NaN
6           7  51646  2009_10    male    8       0-9      101.0  White   NaN           NaN           NaN  55000-64999      60000.0     2.33        7.0     Own         NaN    35.2     NaN       NaN   130.6  20.64              NaN  18.5_to_24.9   72.0     107.0      37.0   114.0    46.0   108.0    36.0   106.0    38.0           NaN        1.55     4.09      238.0       1.322        NaN         NaN       No          NaN       NaN              NaN              NaN            NaN       NaN           NaN      NaN         NaN            NaN          NaN        NaN             NaN      NaN        NaN            1.0              6.0             NaN         NaN          NaN      NaN      NaN         NaN       NaN       NaN            NaN          NaN          NaN       NaN     NaN     NaN              NaN             NaN     NaN            NaN         NaN
7           8  51647  2009_10  female   45     40-49      541.0  White   NaN  College Grad       Married  75000-99999      87500.0     5.00        6.0     Own     Working    75.7     NaN       NaN   166.7  27.24              NaN  25.0_to_29.9   62.0     118.0      64.0   106.0    62.0   118.0    68.0   118.0    60.0           NaN        2.12     5.82      106.0       1.116        NaN         NaN       No          NaN     Vgood              0.0              3.0            NaN       NaN           1.0      NaN         NaN            8.0           No        Yes             5.0      NaN        NaN            NaN              NaN             Yes         3.0         52.0      NaN       No  Non-Smoker       NaN       Yes           13.0           No          NaN        No     Yes    13.0             20.0             0.0     Yes       Bisexual         NaN
nhanes_py.tail(n=8)
      Unnamed: 0     ID SurveyYr  Gender  Age AgeDecade  AgeMonths    Race1    Race3       Education MaritalStatus     HHIncome  HHIncomeMid  Poverty  HomeRooms HomeOwn     Work  Weight  Length  HeadCirc  Height   BMI BMICatUnder20yrs       BMI_WHO  Pulse  BPSysAve  BPDiaAve  BPSys1  BPDia1  BPSys2  BPDia2  BPSys3  BPDia3  Testosterone  DirectChol  TotChol  UrineVol1  UrineFlow1  UrineVol2  UrineFlow2 Diabetes  DiabetesAge  HealthGen  DaysPhysHlthBad  DaysMentHlthBad LittleInterest Depressed  nPregnancies  nBabies  Age1stBaby  SleepHrsNight SleepTrouble PhysActive  PhysActiveDays TVHrsDay CompHrsDay  TVHrsDayChild  CompHrsDayChild Alcohol12PlusYr  AlcoholDay  AlcoholYear SmokeNow Smoke100   Smoke100n  SmokeAge Marijuana  AgeFirstMarij RegularMarij  AgeRegMarij HardDrugs SexEver  SexAge  SexNumPartnLife  SexNumPartYear SameSex SexOrientation PregnantNow
9992        9993  71908  2011_12  female   66     60-69        NaN    White    White    College Grad       Widowed  65000-74999      70000.0     4.55        8.0     Own  Working    88.7     NaN       NaN   159.0  35.1              NaN     30.0_plus   76.0     114.0      70.0   110.0    74.0   114.0    68.0   114.0    72.0         26.00        1.86     6.47       29.0       0.659       94.0       0.627       No          NaN  Excellent              0.0              0.0            NaN       NaN           2.0      2.0        22.0            6.0           No         No             NaN     2_hr  0_to_1_hr            NaN              NaN              No         1.0          5.0      NaN       No  Non-Smoker       NaN       NaN            NaN          NaN          NaN        No     Yes    18.0              1.0             NaN      No            NaN         NaN
9993        9994  71909  2011_12    male   28     20-29        NaN  Mexican  Mexican  9 - 11th Grade  NeverMarried    5000-9999       7500.0     0.46        3.0    Rent  Working    92.3     NaN       NaN   177.3  29.4              NaN  25.0_to_29.9   68.0     124.0      65.0   124.0    62.0   126.0    64.0   122.0    66.0        490.43        1.22     3.90       97.0       0.942        NaN         NaN       No          NaN        NaN              NaN              NaN            NaN       NaN           NaN      NaN         NaN            6.0           No        Yes             NaN     1_hr       2_hr            NaN              NaN             NaN         NaN          NaN      Yes      Yes      Smoker      18.0       NaN            NaN          NaN          NaN       NaN     NaN     NaN              NaN             NaN     NaN            NaN         NaN
9994        9995  71909  2011_12    male   28     20-29        NaN  Mexican  Mexican  9 - 11th Grade  NeverMarried    5000-9999       7500.0     0.46        3.0    Rent  Working    92.3     NaN       NaN   177.3  29.4              NaN  25.0_to_29.9   68.0     124.0      65.0   124.0    62.0   126.0    64.0   122.0    66.0        490.43        1.22     3.90       97.0       0.942        NaN         NaN       No          NaN        NaN              NaN              NaN            NaN       NaN           NaN      NaN         NaN            6.0           No        Yes             NaN     1_hr       2_hr            NaN              NaN             NaN         NaN          NaN      Yes      Yes      Smoker      18.0       NaN            NaN          NaN          NaN       NaN     NaN     NaN              NaN             NaN     NaN            NaN         NaN
9995        9996  71909  2011_12    male   28     20-29        NaN  Mexican  Mexican  9 - 11th Grade  NeverMarried    5000-9999       7500.0     0.46        3.0    Rent  Working    92.3     NaN       NaN   177.3  29.4              NaN  25.0_to_29.9   68.0     124.0      65.0   124.0    62.0   126.0    64.0   122.0    66.0        490.43        1.22     3.90       97.0       0.942        NaN         NaN       No          NaN        NaN              NaN              NaN            NaN       NaN           NaN      NaN         NaN            6.0           No        Yes             NaN     1_hr       2_hr            NaN              NaN             NaN         NaN          NaN      Yes      Yes      Smoker      18.0       NaN            NaN          NaN          NaN       NaN     NaN     NaN              NaN             NaN     NaN            NaN         NaN
9996        9997  71910  2011_12  female    0       0-9        5.0    White    White             NaN           NaN  75000-99999      87500.0     3.37       10.0     Own      NaN     6.7    67.6      42.2     NaN   NaN              NaN           NaN    NaN       NaN       NaN     NaN     NaN     NaN     NaN     NaN     NaN           NaN         NaN      NaN        NaN         NaN        NaN         NaN      NaN          NaN        NaN              NaN              NaN            NaN       NaN           NaN      NaN         NaN            NaN          NaN        NaN             NaN      NaN        NaN            NaN              NaN             NaN         NaN          NaN      NaN      NaN         NaN       NaN       NaN            NaN          NaN          NaN       NaN     NaN     NaN              NaN             NaN     NaN            NaN         NaN
9997        9998  71911  2011_12    male   27     20-29        NaN  Mexican  Mexican    College Grad       Married  75000-99999      87500.0     3.25       10.0     Own  Working    96.7     NaN       NaN   175.8  31.3              NaN     30.0_plus   74.0     133.0      74.0   122.0    76.0   132.0    82.0   134.0    66.0        509.00        1.06     5.72       63.0       0.600        NaN         NaN       No          NaN       Good              0.0              2.0            NaN       NaN           NaN      NaN         NaN            6.0           No         No             3.0     1_hr  0_to_1_hr            NaN              NaN             Yes         5.0          4.0      NaN       No  Non-Smoker       NaN       Yes           22.0           No          NaN        No     Yes    21.0              1.0             1.0      No   Heterosexual         NaN
9998        9999  71915  2011_12    male   60     60-69        NaN    White    White    College Grad  NeverMarried  65000-74999      70000.0     5.00        4.0     Own  Working    78.4     NaN       NaN   168.8  27.5              NaN  25.0_to_29.9   76.0     147.0      73.0   150.0    72.0   148.0    74.0   146.0    72.0        505.13        0.93     4.94      218.0       1.253        NaN         NaN      Yes         56.0       Good              0.0              2.0            NaN       NaN           NaN      NaN         NaN            6.0           No         No             1.0     2_hr       1_hr            NaN              NaN             Yes         NaN          0.0      NaN       No  Non-Smoker       NaN       NaN            NaN          NaN          NaN        No     Yes    19.0              2.0             NaN      No            NaN         NaN
9999       10000  71915  2011_12    male   60     60-69        NaN    White    White    College Grad  NeverMarried  65000-74999      70000.0     5.00        4.0     Own  Working    78.4     NaN       NaN   168.8  27.5              NaN  25.0_to_29.9   76.0     147.0      73.0   150.0    72.0   148.0    74.0   146.0    72.0        505.13        0.93     4.94      218.0       1.253        NaN         NaN      Yes         56.0       Good              0.0              2.0            NaN       NaN           NaN      NaN         NaN            6.0           No         No             NaN     2_hr       1_hr            NaN              NaN             Yes         NaN          0.0      NaN       No  Non-Smoker       NaN       NaN            NaN          NaN          NaN        No     Yes    19.0              2.0             NaN      No            NaN         NaN
nhanes_jl = CSV.File("../../dataset/nhanes.csv") |> DataFrames.DataFrame
10000×77 DataFrame
   Row │ Column1  ID     SurveyYr  Gender   Age    AgeDecade  AgeMonths  Race1     Race3     Education       MaritalStatus  HHIncome     HHIncomeMid  Poverty  HomeRooms  HomeOwn  Work        Weight   Length   HeadCirc  Height   BMI      BMICatUnder20yrs  BMI_WHO       Pulse    BPSysAve  BPDiaAve  BPSys1   BPDia1   BPSys2   BPDia2   BPSys3   BPDia3   Testosterone  DirectChol  TotChol  UrineVol1  UrineFlow1  UrineVol2  UrineFlow2  Diabetes  DiabetesAge  HealthGen  DaysPhysHlthBad  DaysMentHlthBad  LittleInterest  Depressed  nPregnancies  nBabies  Age1stBaby  SleepHrsNight  SleepTrouble  PhysActive  PhysActiveDays  TVHrsDay  CompHrsDay  TVHrsDayChild  CompHrsDayChild  Alcohol12PlusYr  AlcoholDay  AlcoholYear  SmokeNow  Smoke100  Smoke100n   SmokeAge  Marijuana  AgeFirstMarij  RegularMarij  AgeRegMarij  HardDrugs  SexEver  SexAge   SexNumPartnLife  SexNumPartYear  SameSex  SexOrientation  PregnantNow
       │ Int64    Int64  String7   String7  Int64  String7    String3    String15  String15  String15        String15       String15     String7      String7  String3    String7  String15    String7  String7  String7   String7  String7  String15          String15      String3  String3   String3   String3  String3  String3  String3  String3  String3  String7       String7     String7  String3    String7     String3    String7     String3   String3      String15   String3          String3          String7         String7    String3       String3  String3     String3        String3       String3     String3         String15  String15    String3        String3          String3          String3     String3      String3   String3   String15    String3   String3    String3        String3       String3      String3    String3  String3  String7          String3         String3  String15        String7
───────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
     1 │       1  51624  2009_10   male        34   30-39     409        White     NA        High School     Married        25000-34999  30000        1.36     6          Own      NotWorking  87.4     NA       NA        164.7    32.22    NA                30.0_plus     70       113       85        114      88       114      88       112      82       NA            1.29        3.49     352        NA          NA         NA          No        NA           Good       0                15               Most            Several    NA            NA       NA          4              Yes           No          NA              NA        NA          NA             NA               Yes              NA          0            No        Yes       Smoker      18        Yes        17             No            NA           Yes        Yes      16       8                1               No       Heterosexual    NA
     2 │       2  51624  2009_10   male        34   30-39     409        White     NA        High School     Married        25000-34999  30000        1.36     6          Own      NotWorking  87.4     NA       NA        164.7    32.22    NA                30.0_plus     70       113       85        114      88       114      88       112      82       NA            1.29        3.49     352        NA          NA         NA          No        NA           Good       0                15               Most            Several    NA            NA       NA          4              Yes           No          NA              NA        NA          NA             NA               Yes              NA          0            No        Yes       Smoker      18        Yes        17             No            NA           Yes        Yes      16       8                1               No       Heterosexual    NA
     3 │       3  51624  2009_10   male        34   30-39     409        White     NA        High School     Married        25000-34999  30000        1.36     6          Own      NotWorking  87.4     NA       NA        164.7    32.22    NA                30.0_plus     70       113       85        114      88       114      88       112      82       NA            1.29        3.49     352        NA          NA         NA          No        NA           Good       0                15               Most            Several    NA            NA       NA          4              Yes           No          NA              NA        NA          NA             NA               Yes              NA          0            No        Yes       Smoker      18        Yes        17             No            NA           Yes        Yes      16       8                1               No       Heterosexual    NA
     4 │       4  51625  2009_10   male         4   0-9       49         Other     NA        NA              NA             20000-24999  22500        1.07     9          Own      NA          17       NA       NA        105.4    15.3     NA                12.0_18.5     NA       NA        NA        NA       NA       NA       NA       NA       NA       NA            NA          NA       NA         NA          NA         NA          No        NA           NA         NA               NA               NA              NA         NA            NA       NA          NA             NA            NA          NA              NA        NA          4              1                NA               NA          NA           NA        NA        NA          NA        NA         NA             NA            NA           NA         NA       NA       NA               NA              NA       NA              NA
     5 │       5  51630  2009_10   female      49   40-49     596        White     NA        Some College    LivePartner    35000-44999  40000        1.91     5          Rent     NotWorking  86.7     NA       NA        168.4    30.57    NA                30.0_plus     86       112       75        118      82       108      74       116      76       NA            1.16        6.7      77         0.094       NA         NA          No        NA           Good       0                10               Several         Several    2             2        27          8              Yes           No          NA              NA        NA          NA             NA               Yes              2           20           Yes       Yes       Smoker      38        Yes        18             No            NA           Yes        Yes      12       10               1               Yes      Heterosexual    NA
     6 │       6  51638  2009_10   male         9   0-9       115        White     NA        NA              NA             75000-99999  87500        1.84     6          Rent     NA          29.8     NA       NA        133.1    16.82    NA                12.0_18.5     82       86        47        84       50       84       50       88       44       NA            1.34        4.86     123        1.538       NA         NA          No        NA           NA         NA               NA               NA              NA         NA            NA       NA          NA             NA            NA          NA              NA        NA          5              0                NA               NA          NA           NA        NA        NA          NA        NA         NA             NA            NA           NA         NA       NA       NA               NA              NA       NA              NA
     7 │       7  51646  2009_10   male         8   0-9       101        White     NA        NA              NA             55000-64999  60000        2.33     7          Own      NA          35.2     NA       NA        130.6    20.64    NA                18.5_to_24.9  72       107       37        114      46       108      36       106      38       NA            1.55        4.09     238        1.322       NA         NA          No        NA           NA         NA               NA               NA              NA         NA            NA       NA          NA             NA            NA          NA              NA        NA          1              6                NA               NA          NA           NA        NA        NA          NA        NA         NA             NA            NA           NA         NA       NA       NA               NA              NA       NA              NA
     8 │       8  51647  2009_10   female      45   40-49     541        White     NA        College Grad    Married        75000-99999  87500        5        6          Own      Working     75.7     NA       NA        166.7    27.24    NA                25.0_to_29.9  62       118       64        106      62       118      68       118      60       NA            2.12        5.82     106        1.116       NA         NA          No        NA           Vgood      0                3                None            None       1             NA       NA          8              No            Yes         5               NA        NA          NA             NA               Yes              3           52           NA        No        Non-Smoker  NA        Yes        13             No            NA           No         Yes      13       20               0               Yes      Bisexual        NA
   ⋮   │    ⋮       ⋮       ⋮         ⋮       ⋮        ⋮          ⋮         ⋮         ⋮            ⋮               ⋮             ⋮            ⋮          ⋮         ⋮         ⋮         ⋮          ⋮        ⋮        ⋮         ⋮        ⋮            ⋮               ⋮           ⋮        ⋮         ⋮         ⋮        ⋮        ⋮        ⋮        ⋮        ⋮          ⋮            ⋮          ⋮         ⋮          ⋮           ⋮          ⋮          ⋮           ⋮           ⋮             ⋮                ⋮               ⋮             ⋮           ⋮           ⋮         ⋮             ⋮             ⋮            ⋮             ⋮            ⋮          ⋮             ⋮               ⋮                ⋮             ⋮            ⋮          ⋮         ⋮          ⋮          ⋮          ⋮            ⋮             ⋮             ⋮           ⋮         ⋮        ⋮            ⋮               ⋮            ⋮           ⋮              ⋮
  9994 │    9994  71909  2011_12   male        28   20-29     NA         Mexican   Mexican   9 - 11th Grade  NeverMarried    5000-9999   7500         0.46     3          Rent     Working     92.3     NA       NA        177.3    29.4     NA                25.0_to_29.9  68       124       65        124      62       126      64       122      66       490.43        1.22        3.9      97         0.942       NA         NA          No        NA           NA         NA               NA               NA              NA         NA            NA       NA          6              No            Yes         NA              1_hr      2_hr        NA             NA               NA               NA          NA           Yes       Yes       Smoker      18        NA         NA             NA            NA           NA         NA       NA       NA               NA              NA       NA              NA
  9995 │    9995  71909  2011_12   male        28   20-29     NA         Mexican   Mexican   9 - 11th Grade  NeverMarried    5000-9999   7500         0.46     3          Rent     Working     92.3     NA       NA        177.3    29.4     NA                25.0_to_29.9  68       124       65        124      62       126      64       122      66       490.43        1.22        3.9      97         0.942       NA         NA          No        NA           NA         NA               NA               NA              NA         NA            NA       NA          6              No            Yes         NA              1_hr      2_hr        NA             NA               NA               NA          NA           Yes       Yes       Smoker      18        NA         NA             NA            NA           NA         NA       NA       NA               NA              NA       NA              NA
  9996 │    9996  71909  2011_12   male        28   20-29     NA         Mexican   Mexican   9 - 11th Grade  NeverMarried    5000-9999   7500         0.46     3          Rent     Working     92.3     NA       NA        177.3    29.4     NA                25.0_to_29.9  68       124       65        124      62       126      64       122      66       490.43        1.22        3.9      97         0.942       NA         NA          No        NA           NA         NA               NA               NA              NA         NA            NA       NA          6              No            Yes         NA              1_hr      2_hr        NA             NA               NA               NA          NA           Yes       Yes       Smoker      18        NA         NA             NA            NA           NA         NA       NA       NA               NA              NA       NA              NA
  9997 │    9997  71910  2011_12   female       0   0-9       5          White     White     NA              NA             75000-99999  87500        3.37     10         Own      NA          6.7      67.6     42.2      NA       NA       NA                NA            NA       NA        NA        NA       NA       NA       NA       NA       NA       NA            NA          NA       NA         NA          NA         NA          NA        NA           NA         NA               NA               NA              NA         NA            NA       NA          NA             NA            NA          NA              NA        NA          NA             NA               NA               NA          NA           NA        NA        NA          NA        NA         NA             NA            NA           NA         NA       NA       NA               NA              NA       NA              NA
  9998 │    9998  71911  2011_12   male        27   20-29     NA         Mexican   Mexican   College Grad    Married        75000-99999  87500        3.25     10         Own      Working     96.7     NA       NA        175.8    31.3     NA                30.0_plus     74       133       74        122      76       132      82       134      66       509           1.06        5.72     63         0.6         NA         NA          No        NA           Good       0                2                None            None       NA            NA       NA          6              No            No          3               1_hr      0_to_1_hr   NA             NA               Yes              5           4            NA        No        Non-Smoker  NA        Yes        22             No            NA           No         Yes      21       1                1               No       Heterosexual    NA
  9999 │    9999  71915  2011_12   male        60   60-69     NA         White     White     College Grad    NeverMarried   65000-74999  70000        5        4          Own      Working     78.4     NA       NA        168.8    27.5     NA                25.0_to_29.9  76       147       73        150      72       148      74       146      72       505.13        0.93        4.94     218        1.253       NA         NA          Yes       56           Good       0                2                None            None       NA            NA       NA          6              No            No          1               2_hr      1_hr        NA             NA               Yes              NA          0            NA        No        Non-Smoker  NA        NA         NA             NA            NA           No         Yes      19       2                NA              No       NA              NA
 10000 │   10000  71915  2011_12   male        60   60-69     NA         White     White     College Grad    NeverMarried   65000-74999  70000        5        4          Own      Working     78.4     NA       NA        168.8    27.5     NA                25.0_to_29.9  76       147       73        150      72       148      74       146      72       505.13        0.93        4.94     218        1.253       NA         NA          Yes       56           Good       0                2                None            None       NA            NA       NA          6              No            No          NA              2_hr      1_hr        NA             NA               Yes              NA          0            NA        No        Non-Smoker  NA        NA         NA             NA            NA           No         Yes      19       2                NA              No       NA              NA
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             9985 rows omitted

Inferential Statistics

Null hypothesis (H0): The claim that is not interesting.
Alternative hypothesis (HA): The claim corresponding with the research hypothesis.
The goal is to disprove the null hypothesis.

colnames(NHANES)
 [1] "ID"               "SurveyYr"         "Gender"           "Age"              "AgeDecade"        "AgeMonths"        "Race1"            "Race3"            "Education"        "MaritalStatus"    "HHIncome"         "HHIncomeMid"      "Poverty"          "HomeRooms"        "HomeOwn"          "Work"             "Weight"           "Length"           "HeadCirc"         "Height"           "BMI"              "BMICatUnder20yrs" "BMI_WHO"          "Pulse"            "BPSysAve"         "BPDiaAve"         "BPSys1"           "BPDia1"           "BPSys2"           "BPDia2"           "BPSys3"           "BPDia3"           "Testosterone"     "DirectChol"       "TotChol"          "UrineVol1"        "UrineFlow1"       "UrineVol2"        "UrineFlow2"       "Diabetes"         "DiabetesAge"      "HealthGen"        "DaysPhysHlthBad"  "DaysMentHlthBad"  "LittleInterest"   "Depressed"        "nPregnancies"     "nBabies"          "Age1stBaby"       "SleepHrsNight"    "SleepTrouble"     "PhysActive"       "PhysActiveDays"   "TVHrsDay"         "CompHrsDay"       "TVHrsDayChild"    "CompHrsDayChild"  "Alcohol12PlusYr"  "AlcoholDay"       "AlcoholYear"      "SmokeNow"         "Smoke100"         "Smoke100n"        "SmokeAge"         "Marijuana"        "AgeFirstMarij"    "RegularMarij"     "AgeRegMarij"      "HardDrugs"        "SexEver"          "SexAge"           "SexNumPartnLife"  "SexNumPartYear"   "SameSex"          "SexOrientation"   "PregnantNow"     
# Create bar plot for Home Ownership by Gender
ggplot(NHANES, aes(x = Gender, fill = HomeOwn)) +
    # Set the position to fill
    geom_bar(position = "fill") +
    ylab("Relative frequencies")

# Density plot of SleepHrsNight colored by SleepTrouble
ggplot(NHANES, aes(x = SleepHrsNight, color = SleepTrouble)) + 
    # Adjust by 2
    geom_density(adjust = 2) + 
    # Facet by HealthGen
    facet_wrap(~ HealthGen)
Warning: Removed 2245 rows containing non-finite values (`stat_density()`).

# From previous step
homes <- NHANES %>%
    select(Gender, HomeOwn) %>%
    filter(HomeOwn %in% c("Own", "Rent"))

diff_orig <- homes %>%   
    # Group by gender
    group_by(Gender) %>%
    # Summarize proportion of homeowners
    summarize(prop_own = mean(HomeOwn == "Own")) %>%
    # Summarize difference in proportion of homeowners
    summarize(obs_diff_prop = diff(prop_own)) # male - female
  
# See the result
diff_orig
# A tibble: 1 × 1
  obs_diff_prop
          <dbl>
1      -0.00783
# Perform 10 permutations
homeown_perm <- homes %>%
  specify(HomeOwn ~ Gender, success = "Own") %>%
  hypothesize(null = "independence") %>% 
  generate(reps = 10, type = "permute") 
Dropping unused factor levels Other from the supplied response variable 'HomeOwn'.
# Print results to console
homeown_perm
Response: HomeOwn (factor)
Explanatory: Gender (factor)
Null Hypothesis: independence
# A tibble: 97,120 × 3
# Groups:   replicate [10]
   HomeOwn Gender replicate
   <fct>   <fct>      <int>
 1 Own     male           1
 2 Own     male           1
 3 Own     male           1
 4 Own     male           1
 5 Own     female         1
 6 Own     male           1
 7 Own     male           1
 8 Own     female         1
 9 Own     female         1
10 Rent    female         1
# ℹ 97,110 more rows
# Perform 100 permutations
homeown_perm <- homes %>%
    specify(HomeOwn ~ Gender, success = "Own") %>%
    hypothesize(null = "independence") %>% 
    generate(reps = 100, type = "permute") %>% 
    calculate(stat = "diff in props", order = c("male", "female"))
Dropping unused factor levels Other from the supplied response variable 'HomeOwn'.
# Dotplot of 100 permuted differences in proportions
ggplot(homeown_perm, aes(x = stat)) + 
    geom_dotplot(binwidth = 0.001)

# Perform 1000 permutations
homeown_perm <- homes %>%
    # Specify HomeOwn vs. Gender, with `"Own" as success
    specify(HomeOwn ~ Gender, success = "Own") %>%
    # Use a null hypothesis of independence
    hypothesize(null = "independence") %>% 
    # Generate 1000 repetitions (by permutation)
    generate(reps = 1000, type = "permute") %>% 
    # Calculate the difference in proportions (male then female)
    calculate(stat = "diff in props", order = c("male", "female"))
Dropping unused factor levels Other from the supplied response variable 'HomeOwn'.
# Density plot of 1000 permuted differences in proportions
ggplot(homeown_perm, aes(x = stat)) + 
    geom_density()


References

Applied Advanced Analytics & AI in Sports