Inferential Statistics
Drawing inferences about a population based on information about a random sample of NHANES data—using R, Python, and Julia.
Inferential statistics is a branch of statistics that focuses on making inferences and drawing conclusions about a sufficiently large population based on information obtained from a random sample. It involves using probability theory and statistical methods to analyze sample data and generalize the findings to the larger population. By carefully selecting and collecting representative samples, inferential statistics allows researchers to estimate population parameters, test hypotheses, assess the significance of relationships, and make predictions.
This is valuable in overcoming the limitations imposed by time, cost, and logistics—enabling us to make meaningful claims and draw meaningful insights about populations without having to examine every individual within them.
Let’s look at the inferences we can draw from the NHANES dataset.
Getting Started
If you are interested in reproducing this work, here are the versions of R, Python, and Julia used (as well as the respective packages for each). Additionally, Leland Wilkinson’s approach to data visualization (Grammar of Graphics) has been adopted for this work. Finally, my coding style here is verbose, in order to trace back where functions/methods and variables are originating from, and make this a learning experience for everyone—including me.
R.version.string
[1] "R version 4.2.3 (2023-03-15)"
require(devtools)
devtools::install_version("NHANES", version = "2.1.0", repos = "http://cran.us.r-project.org")
devtools::install_version("dplyr", version="1.1.1", repos="http://cran.us.r-project.org")
devtools::install_version("ggplot2", version="3.4.2", repos="http://cran.us.r-project.org")
devtools::install_version("infer", version = "1.0.4", repos = "http://cran.us.r-project.org")
library(NHANES)
library(dplyr)
library(ggplot2)
library(infer)
import sys
print(sys.version)
3.11.4 (v3.11.4:d2340ef257, Jun 6 2023, 19:15:51) [Clang 13.0.0 (clang-1300.0.29.30)]
!pip install pandas==2.0.3
!pip install plotnine==0.12.1
import pandas
import plotnine
using InteractiveUtils
InteractiveUtils.versioninfo()
Julia Version 1.9.2
Commit e4ee485e909 (2023-07-05 09:39 UTC)
Platform Info:
OS: macOS (x86_64-apple-darwin22.4.0)
CPU: 8 × Intel(R) Core(TM) i5-8259U CPU @ 2.30GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-14.0.6 (ORCJIT, skylake)
Threads: 1 on 8 virtual cores
Environment:
DYLD_FALLBACK_LIBRARY_PATH = /Library/Frameworks/R.framework/Resources/lib:/Library/Java/JavaVirtualMachines/jdk1.8.0_241.jdk/Contents/Home/jre/lib/server
using Pkg
Pkg.add(name="CSV", version="0.10.11")
Pkg.add(name="DataFrames", version="1.5.0")
Pkg.add(name="CategoricalArrays", version="0.10.8")
Pkg.add(name="Colors", version="0.12.8")
Pkg.add(name="Cairo", version="1.0.5")
Pkg.add(name="Gadfly", version="1.4.0")
using CSV
using DataFrames
using CategoricalArrays
using Colors
using Cairo
using Gadfly
Importing and Examining Dataset
nhanes_r <- read.csv("../../dataset/nhanes.csv")
str(object=nhanes_r)
'data.frame': 10000 obs. of 77 variables:
$ X : int 1 2 3 4 5 6 7 8 9 10 ...
$ ID : int 51624 51624 51624 51625 51630 51638 51646 51647 51647 51647 ...
$ SurveyYr : chr "2009_10" "2009_10" "2009_10" "2009_10" ...
$ Gender : chr "male" "male" "male" "male" ...
$ Age : int 34 34 34 4 49 9 8 45 45 45 ...
$ AgeDecade : chr " 30-39" " 30-39" " 30-39" " 0-9" ...
$ AgeMonths : int 409 409 409 49 596 115 101 541 541 541 ...
$ Race1 : chr "White" "White" "White" "Other" ...
$ Race3 : chr NA NA NA NA ...
$ Education : chr "High School" "High School" "High School" NA ...
$ MaritalStatus : chr "Married" "Married" "Married" NA ...
$ HHIncome : chr "25000-34999" "25000-34999" "25000-34999" "20000-24999" ...
$ HHIncomeMid : int 30000 30000 30000 22500 40000 87500 60000 87500 87500 87500 ...
$ Poverty : num 1.36 1.36 1.36 1.07 1.91 1.84 2.33 5 5 5 ...
$ HomeRooms : int 6 6 6 9 5 6 7 6 6 6 ...
$ HomeOwn : chr "Own" "Own" "Own" "Own" ...
$ Work : chr "NotWorking" "NotWorking" "NotWorking" NA ...
$ Weight : num 87.4 87.4 87.4 17 86.7 29.8 35.2 75.7 75.7 75.7 ...
$ Length : num NA NA NA NA NA NA NA NA NA NA ...
$ HeadCirc : num NA NA NA NA NA NA NA NA NA NA ...
$ Height : num 165 165 165 105 168 ...
$ BMI : num 32.2 32.2 32.2 15.3 30.6 ...
$ BMICatUnder20yrs: chr NA NA NA NA ...
$ BMI_WHO : chr "30.0_plus" "30.0_plus" "30.0_plus" "12.0_18.5" ...
$ Pulse : int 70 70 70 NA 86 82 72 62 62 62 ...
$ BPSysAve : int 113 113 113 NA 112 86 107 118 118 118 ...
$ BPDiaAve : int 85 85 85 NA 75 47 37 64 64 64 ...
$ BPSys1 : int 114 114 114 NA 118 84 114 106 106 106 ...
$ BPDia1 : int 88 88 88 NA 82 50 46 62 62 62 ...
$ BPSys2 : int 114 114 114 NA 108 84 108 118 118 118 ...
$ BPDia2 : int 88 88 88 NA 74 50 36 68 68 68 ...
$ BPSys3 : int 112 112 112 NA 116 88 106 118 118 118 ...
$ BPDia3 : int 82 82 82 NA 76 44 38 60 60 60 ...
$ Testosterone : num NA NA NA NA NA NA NA NA NA NA ...
$ DirectChol : num 1.29 1.29 1.29 NA 1.16 1.34 1.55 2.12 2.12 2.12 ...
$ TotChol : num 3.49 3.49 3.49 NA 6.7 4.86 4.09 5.82 5.82 5.82 ...
$ UrineVol1 : int 352 352 352 NA 77 123 238 106 106 106 ...
$ UrineFlow1 : num NA NA NA NA 0.094 ...
$ UrineVol2 : int NA NA NA NA NA NA NA NA NA NA ...
$ UrineFlow2 : num NA NA NA NA NA NA NA NA NA NA ...
$ Diabetes : chr "No" "No" "No" "No" ...
$ DiabetesAge : int NA NA NA NA NA NA NA NA NA NA ...
$ HealthGen : chr "Good" "Good" "Good" NA ...
$ DaysPhysHlthBad : int 0 0 0 NA 0 NA NA 0 0 0 ...
$ DaysMentHlthBad : int 15 15 15 NA 10 NA NA 3 3 3 ...
$ LittleInterest : chr "Most" "Most" "Most" NA ...
$ Depressed : chr "Several" "Several" "Several" NA ...
$ nPregnancies : int NA NA NA NA 2 NA NA 1 1 1 ...
$ nBabies : int NA NA NA NA 2 NA NA NA NA NA ...
$ Age1stBaby : int NA NA NA NA 27 NA NA NA NA NA ...
$ SleepHrsNight : int 4 4 4 NA 8 NA NA 8 8 8 ...
$ SleepTrouble : chr "Yes" "Yes" "Yes" NA ...
$ PhysActive : chr "No" "No" "No" NA ...
$ PhysActiveDays : int NA NA NA NA NA NA NA 5 5 5 ...
$ TVHrsDay : chr NA NA NA NA ...
$ CompHrsDay : chr NA NA NA NA ...
$ TVHrsDayChild : int NA NA NA 4 NA 5 1 NA NA NA ...
$ CompHrsDayChild : int NA NA NA 1 NA 0 6 NA NA NA ...
$ Alcohol12PlusYr : chr "Yes" "Yes" "Yes" NA ...
$ AlcoholDay : int NA NA NA NA 2 NA NA 3 3 3 ...
$ AlcoholYear : int 0 0 0 NA 20 NA NA 52 52 52 ...
$ SmokeNow : chr "No" "No" "No" NA ...
$ Smoke100 : chr "Yes" "Yes" "Yes" NA ...
$ Smoke100n : chr "Smoker" "Smoker" "Smoker" NA ...
$ SmokeAge : int 18 18 18 NA 38 NA NA NA NA NA ...
$ Marijuana : chr "Yes" "Yes" "Yes" NA ...
$ AgeFirstMarij : int 17 17 17 NA 18 NA NA 13 13 13 ...
$ RegularMarij : chr "No" "No" "No" NA ...
$ AgeRegMarij : int NA NA NA NA NA NA NA NA NA NA ...
$ HardDrugs : chr "Yes" "Yes" "Yes" NA ...
$ SexEver : chr "Yes" "Yes" "Yes" NA ...
$ SexAge : int 16 16 16 NA 12 NA NA 13 13 13 ...
$ SexNumPartnLife : int 8 8 8 NA 10 NA NA 20 20 20 ...
$ SexNumPartYear : int 1 1 1 NA 1 NA NA 0 0 0 ...
$ SameSex : chr "No" "No" "No" NA ...
$ SexOrientation : chr "Heterosexual" "Heterosexual" "Heterosexual" NA ...
$ PregnantNow : chr NA NA NA NA ...
head(x=nhanes_r, n=8)
X ID SurveyYr Gender Age AgeDecade AgeMonths Race1 Race3 Education MaritalStatus HHIncome HHIncomeMid Poverty HomeRooms HomeOwn Work Weight Length HeadCirc Height BMI BMICatUnder20yrs BMI_WHO Pulse BPSysAve BPDiaAve BPSys1 BPDia1 BPSys2 BPDia2 BPSys3 BPDia3 Testosterone DirectChol TotChol UrineVol1 UrineFlow1 UrineVol2 UrineFlow2 Diabetes DiabetesAge HealthGen DaysPhysHlthBad DaysMentHlthBad LittleInterest Depressed nPregnancies nBabies Age1stBaby SleepHrsNight SleepTrouble PhysActive PhysActiveDays TVHrsDay CompHrsDay TVHrsDayChild CompHrsDayChild Alcohol12PlusYr AlcoholDay AlcoholYear SmokeNow Smoke100 Smoke100n SmokeAge Marijuana AgeFirstMarij RegularMarij AgeRegMarij HardDrugs SexEver SexAge SexNumPartnLife SexNumPartYear SameSex SexOrientation PregnantNow
1 1 51624 2009_10 male 34 30-39 409 White <NA> High School Married 25000-34999 30000 1.4 6 Own NotWorking 87 NA NA 165 32 <NA> 30.0_plus 70 113 85 114 88 114 88 112 82 NA 1.3 3.5 352 NA NA NA No NA Good 0 15 Most Several NA NA NA 4 Yes No NA <NA> <NA> NA NA Yes NA 0 No Yes Smoker 18 Yes 17 No NA Yes Yes 16 8 1 No Heterosexual <NA>
2 2 51624 2009_10 male 34 30-39 409 White <NA> High School Married 25000-34999 30000 1.4 6 Own NotWorking 87 NA NA 165 32 <NA> 30.0_plus 70 113 85 114 88 114 88 112 82 NA 1.3 3.5 352 NA NA NA No NA Good 0 15 Most Several NA NA NA 4 Yes No NA <NA> <NA> NA NA Yes NA 0 No Yes Smoker 18 Yes 17 No NA Yes Yes 16 8 1 No Heterosexual <NA>
3 3 51624 2009_10 male 34 30-39 409 White <NA> High School Married 25000-34999 30000 1.4 6 Own NotWorking 87 NA NA 165 32 <NA> 30.0_plus 70 113 85 114 88 114 88 112 82 NA 1.3 3.5 352 NA NA NA No NA Good 0 15 Most Several NA NA NA 4 Yes No NA <NA> <NA> NA NA Yes NA 0 No Yes Smoker 18 Yes 17 No NA Yes Yes 16 8 1 No Heterosexual <NA>
4 4 51625 2009_10 male 4 0-9 49 Other <NA> <NA> <NA> 20000-24999 22500 1.1 9 Own <NA> 17 NA NA 105 15 <NA> 12.0_18.5 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA No NA <NA> NA NA <NA> <NA> NA NA NA NA <NA> <NA> NA <NA> <NA> 4 1 <NA> NA NA <NA> <NA> <NA> NA <NA> NA <NA> NA <NA> <NA> NA NA NA <NA> <NA> <NA>
5 5 51630 2009_10 female 49 40-49 596 White <NA> Some College LivePartner 35000-44999 40000 1.9 5 Rent NotWorking 87 NA NA 168 31 <NA> 30.0_plus 86 112 75 118 82 108 74 116 76 NA 1.2 6.7 77 0.094 NA NA No NA Good 0 10 Several Several 2 2 27 8 Yes No NA <NA> <NA> NA NA Yes 2 20 Yes Yes Smoker 38 Yes 18 No NA Yes Yes 12 10 1 Yes Heterosexual <NA>
6 6 51638 2009_10 male 9 0-9 115 White <NA> <NA> <NA> 75000-99999 87500 1.8 6 Rent <NA> 30 NA NA 133 17 <NA> 12.0_18.5 82 86 47 84 50 84 50 88 44 NA 1.3 4.9 123 1.538 NA NA No NA <NA> NA NA <NA> <NA> NA NA NA NA <NA> <NA> NA <NA> <NA> 5 0 <NA> NA NA <NA> <NA> <NA> NA <NA> NA <NA> NA <NA> <NA> NA NA NA <NA> <NA> <NA>
7 7 51646 2009_10 male 8 0-9 101 White <NA> <NA> <NA> 55000-64999 60000 2.3 7 Own <NA> 35 NA NA 131 21 <NA> 18.5_to_24.9 72 107 37 114 46 108 36 106 38 NA 1.6 4.1 238 1.322 NA NA No NA <NA> NA NA <NA> <NA> NA NA NA NA <NA> <NA> NA <NA> <NA> 1 6 <NA> NA NA <NA> <NA> <NA> NA <NA> NA <NA> NA <NA> <NA> NA NA NA <NA> <NA> <NA>
8 8 51647 2009_10 female 45 40-49 541 White <NA> College Grad Married 75000-99999 87500 5.0 6 Own Working 76 NA NA 167 27 <NA> 25.0_to_29.9 62 118 64 106 62 118 68 118 60 NA 2.1 5.8 106 1.116 NA NA No NA Vgood 0 3 None None 1 NA NA 8 No Yes 5 <NA> <NA> NA NA Yes 3 52 <NA> No Non-Smoker NA Yes 13 No NA No Yes 13 20 0 Yes Bisexual <NA>
tail(x=nhanes_r, n=8)
X ID SurveyYr Gender Age AgeDecade AgeMonths Race1 Race3 Education MaritalStatus HHIncome HHIncomeMid Poverty HomeRooms HomeOwn Work Weight Length HeadCirc Height BMI BMICatUnder20yrs BMI_WHO Pulse BPSysAve BPDiaAve BPSys1 BPDia1 BPSys2 BPDia2 BPSys3 BPDia3 Testosterone DirectChol TotChol UrineVol1 UrineFlow1 UrineVol2 UrineFlow2 Diabetes DiabetesAge HealthGen DaysPhysHlthBad DaysMentHlthBad LittleInterest Depressed nPregnancies nBabies Age1stBaby SleepHrsNight SleepTrouble PhysActive PhysActiveDays TVHrsDay CompHrsDay TVHrsDayChild CompHrsDayChild Alcohol12PlusYr AlcoholDay AlcoholYear SmokeNow Smoke100 Smoke100n SmokeAge Marijuana AgeFirstMarij RegularMarij AgeRegMarij HardDrugs SexEver SexAge SexNumPartnLife SexNumPartYear SameSex SexOrientation PregnantNow
9993 9993 71908 2011_12 female 66 60-69 NA White White College Grad Widowed 65000-74999 70000 4.55 8 Own Working 88.7 NA NA 159 35 <NA> 30.0_plus 76 114 70 110 74 114 68 114 72 26 1.86 6.5 29 0.66 94 0.63 No NA Excellent 0 0 None None 2 2 22 6 No No NA 2_hr 0_to_1_hr NA NA No 1 5 <NA> No Non-Smoker NA <NA> NA <NA> NA No Yes 18 1 NA No <NA> <NA>
9994 9994 71909 2011_12 male 28 20-29 NA Mexican Mexican 9 - 11th Grade NeverMarried 5000-9999 7500 0.46 3 Rent Working 92.3 NA NA 177 29 <NA> 25.0_to_29.9 68 124 65 124 62 126 64 122 66 490 1.22 3.9 97 0.94 NA NA No NA <NA> NA NA <NA> <NA> NA NA NA 6 No Yes NA 1_hr 2_hr NA NA <NA> NA NA Yes Yes Smoker 18 <NA> NA <NA> NA <NA> <NA> NA NA NA <NA> <NA> <NA>
9995 9995 71909 2011_12 male 28 20-29 NA Mexican Mexican 9 - 11th Grade NeverMarried 5000-9999 7500 0.46 3 Rent Working 92.3 NA NA 177 29 <NA> 25.0_to_29.9 68 124 65 124 62 126 64 122 66 490 1.22 3.9 97 0.94 NA NA No NA <NA> NA NA <NA> <NA> NA NA NA 6 No Yes NA 1_hr 2_hr NA NA <NA> NA NA Yes Yes Smoker 18 <NA> NA <NA> NA <NA> <NA> NA NA NA <NA> <NA> <NA>
9996 9996 71909 2011_12 male 28 20-29 NA Mexican Mexican 9 - 11th Grade NeverMarried 5000-9999 7500 0.46 3 Rent Working 92.3 NA NA 177 29 <NA> 25.0_to_29.9 68 124 65 124 62 126 64 122 66 490 1.22 3.9 97 0.94 NA NA No NA <NA> NA NA <NA> <NA> NA NA NA 6 No Yes NA 1_hr 2_hr NA NA <NA> NA NA Yes Yes Smoker 18 <NA> NA <NA> NA <NA> <NA> NA NA NA <NA> <NA> <NA>
9997 9997 71910 2011_12 female 0 0-9 5 White White <NA> <NA> 75000-99999 87500 3.37 10 Own <NA> 6.7 68 42 NA NA <NA> <NA> NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA <NA> NA <NA> NA NA <NA> <NA> NA NA NA NA <NA> <NA> NA <NA> <NA> NA NA <NA> NA NA <NA> <NA> <NA> NA <NA> NA <NA> NA <NA> <NA> NA NA NA <NA> <NA> <NA>
9998 9998 71911 2011_12 male 27 20-29 NA Mexican Mexican College Grad Married 75000-99999 87500 3.25 10 Own Working 96.7 NA NA 176 31 <NA> 30.0_plus 74 133 74 122 76 132 82 134 66 509 1.06 5.7 63 0.60 NA NA No NA Good 0 2 None None NA NA NA 6 No No 3 1_hr 0_to_1_hr NA NA Yes 5 4 <NA> No Non-Smoker NA Yes 22 No NA No Yes 21 1 1 No Heterosexual <NA>
9999 9999 71915 2011_12 male 60 60-69 NA White White College Grad NeverMarried 65000-74999 70000 5.00 4 Own Working 78.4 NA NA 169 28 <NA> 25.0_to_29.9 76 147 73 150 72 148 74 146 72 505 0.93 4.9 218 1.25 NA NA Yes 56 Good 0 2 None None NA NA NA 6 No No 1 2_hr 1_hr NA NA Yes NA 0 <NA> No Non-Smoker NA <NA> NA <NA> NA No Yes 19 2 NA No <NA> <NA>
10000 10000 71915 2011_12 male 60 60-69 NA White White College Grad NeverMarried 65000-74999 70000 5.00 4 Own Working 78.4 NA NA 169 28 <NA> 25.0_to_29.9 76 147 73 150 72 148 74 146 72 505 0.93 4.9 218 1.25 NA NA Yes 56 Good 0 2 None None NA NA NA 6 No No NA 2_hr 1_hr NA NA Yes NA 0 <NA> No Non-Smoker NA <NA> NA <NA> NA No Yes 19 2 NA No <NA> <NA>
nhanes_py = pandas.read_csv("../../dataset/nhanes.csv")
nhanes_py.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 77 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Unnamed: 0 10000 non-null int64
1 ID 10000 non-null int64
2 SurveyYr 10000 non-null object
3 Gender 10000 non-null object
4 Age 10000 non-null int64
5 AgeDecade 9667 non-null object
6 AgeMonths 4962 non-null float64
7 Race1 10000 non-null object
8 Race3 5000 non-null object
9 Education 7221 non-null object
10 MaritalStatus 7231 non-null object
11 HHIncome 9189 non-null object
12 HHIncomeMid 9189 non-null float64
13 Poverty 9274 non-null float64
14 HomeRooms 9931 non-null float64
15 HomeOwn 9937 non-null object
16 Work 7771 non-null object
17 Weight 9922 non-null float64
18 Length 543 non-null float64
19 HeadCirc 88 non-null float64
20 Height 9647 non-null float64
21 BMI 9634 non-null float64
22 BMICatUnder20yrs 1274 non-null object
23 BMI_WHO 9603 non-null object
24 Pulse 8563 non-null float64
25 BPSysAve 8551 non-null float64
26 BPDiaAve 8551 non-null float64
27 BPSys1 8237 non-null float64
28 BPDia1 8237 non-null float64
29 BPSys2 8353 non-null float64
30 BPDia2 8353 non-null float64
31 BPSys3 8365 non-null float64
32 BPDia3 8365 non-null float64
33 Testosterone 4126 non-null float64
34 DirectChol 8474 non-null float64
35 TotChol 8474 non-null float64
36 UrineVol1 9013 non-null float64
37 UrineFlow1 8397 non-null float64
38 UrineVol2 1478 non-null float64
39 UrineFlow2 1476 non-null float64
40 Diabetes 9858 non-null object
41 DiabetesAge 629 non-null float64
42 HealthGen 7539 non-null object
43 DaysPhysHlthBad 7532 non-null float64
44 DaysMentHlthBad 7534 non-null float64
45 LittleInterest 1564 non-null object
46 Depressed 1427 non-null object
47 nPregnancies 2604 non-null float64
48 nBabies 2416 non-null float64
49 Age1stBaby 1884 non-null float64
50 SleepHrsNight 7755 non-null float64
51 SleepTrouble 7772 non-null object
52 PhysActive 8326 non-null object
53 PhysActiveDays 4663 non-null float64
54 TVHrsDay 4859 non-null object
55 CompHrsDay 4863 non-null object
56 TVHrsDayChild 653 non-null float64
57 CompHrsDayChild 653 non-null float64
58 Alcohol12PlusYr 6580 non-null object
59 AlcoholDay 4914 non-null float64
60 AlcoholYear 5922 non-null float64
61 SmokeNow 3211 non-null object
62 Smoke100 7235 non-null object
63 Smoke100n 7235 non-null object
64 SmokeAge 3080 non-null float64
65 Marijuana 4941 non-null object
66 AgeFirstMarij 2891 non-null float64
67 RegularMarij 4941 non-null object
68 AgeRegMarij 1366 non-null float64
69 HardDrugs 5765 non-null object
70 SexEver 5767 non-null object
71 SexAge 5540 non-null float64
72 SexNumPartnLife 5725 non-null float64
73 SexNumPartYear 4928 non-null float64
74 SameSex 5768 non-null object
75 SexOrientation 4842 non-null object
76 PregnantNow 1696 non-null object
dtypes: float64(43), int64(3), object(31)
memory usage: 5.9+ MB
nhanes_py.head(n=8)
Unnamed: 0 ID SurveyYr Gender Age AgeDecade AgeMonths Race1 Race3 Education MaritalStatus HHIncome HHIncomeMid Poverty HomeRooms HomeOwn Work Weight Length HeadCirc Height BMI BMICatUnder20yrs BMI_WHO Pulse BPSysAve BPDiaAve BPSys1 BPDia1 BPSys2 BPDia2 BPSys3 BPDia3 Testosterone DirectChol TotChol UrineVol1 UrineFlow1 UrineVol2 UrineFlow2 Diabetes DiabetesAge HealthGen DaysPhysHlthBad DaysMentHlthBad LittleInterest Depressed nPregnancies nBabies Age1stBaby SleepHrsNight SleepTrouble PhysActive PhysActiveDays TVHrsDay CompHrsDay TVHrsDayChild CompHrsDayChild Alcohol12PlusYr AlcoholDay AlcoholYear SmokeNow Smoke100 Smoke100n SmokeAge Marijuana AgeFirstMarij RegularMarij AgeRegMarij HardDrugs SexEver SexAge SexNumPartnLife SexNumPartYear SameSex SexOrientation PregnantNow
0 1 51624 2009_10 male 34 30-39 409.0 White NaN High School Married 25000-34999 30000.0 1.36 6.0 Own NotWorking 87.4 NaN NaN 164.7 32.22 NaN 30.0_plus 70.0 113.0 85.0 114.0 88.0 114.0 88.0 112.0 82.0 NaN 1.29 3.49 352.0 NaN NaN NaN No NaN Good 0.0 15.0 Most Several NaN NaN NaN 4.0 Yes No NaN NaN NaN NaN NaN Yes NaN 0.0 No Yes Smoker 18.0 Yes 17.0 No NaN Yes Yes 16.0 8.0 1.0 No Heterosexual NaN
1 2 51624 2009_10 male 34 30-39 409.0 White NaN High School Married 25000-34999 30000.0 1.36 6.0 Own NotWorking 87.4 NaN NaN 164.7 32.22 NaN 30.0_plus 70.0 113.0 85.0 114.0 88.0 114.0 88.0 112.0 82.0 NaN 1.29 3.49 352.0 NaN NaN NaN No NaN Good 0.0 15.0 Most Several NaN NaN NaN 4.0 Yes No NaN NaN NaN NaN NaN Yes NaN 0.0 No Yes Smoker 18.0 Yes 17.0 No NaN Yes Yes 16.0 8.0 1.0 No Heterosexual NaN
2 3 51624 2009_10 male 34 30-39 409.0 White NaN High School Married 25000-34999 30000.0 1.36 6.0 Own NotWorking 87.4 NaN NaN 164.7 32.22 NaN 30.0_plus 70.0 113.0 85.0 114.0 88.0 114.0 88.0 112.0 82.0 NaN 1.29 3.49 352.0 NaN NaN NaN No NaN Good 0.0 15.0 Most Several NaN NaN NaN 4.0 Yes No NaN NaN NaN NaN NaN Yes NaN 0.0 No Yes Smoker 18.0 Yes 17.0 No NaN Yes Yes 16.0 8.0 1.0 No Heterosexual NaN
3 4 51625 2009_10 male 4 0-9 49.0 Other NaN NaN NaN 20000-24999 22500.0 1.07 9.0 Own NaN 17.0 NaN NaN 105.4 15.30 NaN 12.0_18.5 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN No NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 4.0 1.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4 5 51630 2009_10 female 49 40-49 596.0 White NaN Some College LivePartner 35000-44999 40000.0 1.91 5.0 Rent NotWorking 86.7 NaN NaN 168.4 30.57 NaN 30.0_plus 86.0 112.0 75.0 118.0 82.0 108.0 74.0 116.0 76.0 NaN 1.16 6.70 77.0 0.094 NaN NaN No NaN Good 0.0 10.0 Several Several 2.0 2.0 27.0 8.0 Yes No NaN NaN NaN NaN NaN Yes 2.0 20.0 Yes Yes Smoker 38.0 Yes 18.0 No NaN Yes Yes 12.0 10.0 1.0 Yes Heterosexual NaN
5 6 51638 2009_10 male 9 0-9 115.0 White NaN NaN NaN 75000-99999 87500.0 1.84 6.0 Rent NaN 29.8 NaN NaN 133.1 16.82 NaN 12.0_18.5 82.0 86.0 47.0 84.0 50.0 84.0 50.0 88.0 44.0 NaN 1.34 4.86 123.0 1.538 NaN NaN No NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 5.0 0.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
6 7 51646 2009_10 male 8 0-9 101.0 White NaN NaN NaN 55000-64999 60000.0 2.33 7.0 Own NaN 35.2 NaN NaN 130.6 20.64 NaN 18.5_to_24.9 72.0 107.0 37.0 114.0 46.0 108.0 36.0 106.0 38.0 NaN 1.55 4.09 238.0 1.322 NaN NaN No NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0 6.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
7 8 51647 2009_10 female 45 40-49 541.0 White NaN College Grad Married 75000-99999 87500.0 5.00 6.0 Own Working 75.7 NaN NaN 166.7 27.24 NaN 25.0_to_29.9 62.0 118.0 64.0 106.0 62.0 118.0 68.0 118.0 60.0 NaN 2.12 5.82 106.0 1.116 NaN NaN No NaN Vgood 0.0 3.0 NaN NaN 1.0 NaN NaN 8.0 No Yes 5.0 NaN NaN NaN NaN Yes 3.0 52.0 NaN No Non-Smoker NaN Yes 13.0 No NaN No Yes 13.0 20.0 0.0 Yes Bisexual NaN
nhanes_py.tail(n=8)
Unnamed: 0 ID SurveyYr Gender Age AgeDecade AgeMonths Race1 Race3 Education MaritalStatus HHIncome HHIncomeMid Poverty HomeRooms HomeOwn Work Weight Length HeadCirc Height BMI BMICatUnder20yrs BMI_WHO Pulse BPSysAve BPDiaAve BPSys1 BPDia1 BPSys2 BPDia2 BPSys3 BPDia3 Testosterone DirectChol TotChol UrineVol1 UrineFlow1 UrineVol2 UrineFlow2 Diabetes DiabetesAge HealthGen DaysPhysHlthBad DaysMentHlthBad LittleInterest Depressed nPregnancies nBabies Age1stBaby SleepHrsNight SleepTrouble PhysActive PhysActiveDays TVHrsDay CompHrsDay TVHrsDayChild CompHrsDayChild Alcohol12PlusYr AlcoholDay AlcoholYear SmokeNow Smoke100 Smoke100n SmokeAge Marijuana AgeFirstMarij RegularMarij AgeRegMarij HardDrugs SexEver SexAge SexNumPartnLife SexNumPartYear SameSex SexOrientation PregnantNow
9992 9993 71908 2011_12 female 66 60-69 NaN White White College Grad Widowed 65000-74999 70000.0 4.55 8.0 Own Working 88.7 NaN NaN 159.0 35.1 NaN 30.0_plus 76.0 114.0 70.0 110.0 74.0 114.0 68.0 114.0 72.0 26.00 1.86 6.47 29.0 0.659 94.0 0.627 No NaN Excellent 0.0 0.0 NaN NaN 2.0 2.0 22.0 6.0 No No NaN 2_hr 0_to_1_hr NaN NaN No 1.0 5.0 NaN No Non-Smoker NaN NaN NaN NaN NaN No Yes 18.0 1.0 NaN No NaN NaN
9993 9994 71909 2011_12 male 28 20-29 NaN Mexican Mexican 9 - 11th Grade NeverMarried 5000-9999 7500.0 0.46 3.0 Rent Working 92.3 NaN NaN 177.3 29.4 NaN 25.0_to_29.9 68.0 124.0 65.0 124.0 62.0 126.0 64.0 122.0 66.0 490.43 1.22 3.90 97.0 0.942 NaN NaN No NaN NaN NaN NaN NaN NaN NaN NaN NaN 6.0 No Yes NaN 1_hr 2_hr NaN NaN NaN NaN NaN Yes Yes Smoker 18.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
9994 9995 71909 2011_12 male 28 20-29 NaN Mexican Mexican 9 - 11th Grade NeverMarried 5000-9999 7500.0 0.46 3.0 Rent Working 92.3 NaN NaN 177.3 29.4 NaN 25.0_to_29.9 68.0 124.0 65.0 124.0 62.0 126.0 64.0 122.0 66.0 490.43 1.22 3.90 97.0 0.942 NaN NaN No NaN NaN NaN NaN NaN NaN NaN NaN NaN 6.0 No Yes NaN 1_hr 2_hr NaN NaN NaN NaN NaN Yes Yes Smoker 18.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
9995 9996 71909 2011_12 male 28 20-29 NaN Mexican Mexican 9 - 11th Grade NeverMarried 5000-9999 7500.0 0.46 3.0 Rent Working 92.3 NaN NaN 177.3 29.4 NaN 25.0_to_29.9 68.0 124.0 65.0 124.0 62.0 126.0 64.0 122.0 66.0 490.43 1.22 3.90 97.0 0.942 NaN NaN No NaN NaN NaN NaN NaN NaN NaN NaN NaN 6.0 No Yes NaN 1_hr 2_hr NaN NaN NaN NaN NaN Yes Yes Smoker 18.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
9996 9997 71910 2011_12 female 0 0-9 5.0 White White NaN NaN 75000-99999 87500.0 3.37 10.0 Own NaN 6.7 67.6 42.2 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
9997 9998 71911 2011_12 male 27 20-29 NaN Mexican Mexican College Grad Married 75000-99999 87500.0 3.25 10.0 Own Working 96.7 NaN NaN 175.8 31.3 NaN 30.0_plus 74.0 133.0 74.0 122.0 76.0 132.0 82.0 134.0 66.0 509.00 1.06 5.72 63.0 0.600 NaN NaN No NaN Good 0.0 2.0 NaN NaN NaN NaN NaN 6.0 No No 3.0 1_hr 0_to_1_hr NaN NaN Yes 5.0 4.0 NaN No Non-Smoker NaN Yes 22.0 No NaN No Yes 21.0 1.0 1.0 No Heterosexual NaN
9998 9999 71915 2011_12 male 60 60-69 NaN White White College Grad NeverMarried 65000-74999 70000.0 5.00 4.0 Own Working 78.4 NaN NaN 168.8 27.5 NaN 25.0_to_29.9 76.0 147.0 73.0 150.0 72.0 148.0 74.0 146.0 72.0 505.13 0.93 4.94 218.0 1.253 NaN NaN Yes 56.0 Good 0.0 2.0 NaN NaN NaN NaN NaN 6.0 No No 1.0 2_hr 1_hr NaN NaN Yes NaN 0.0 NaN No Non-Smoker NaN NaN NaN NaN NaN No Yes 19.0 2.0 NaN No NaN NaN
9999 10000 71915 2011_12 male 60 60-69 NaN White White College Grad NeverMarried 65000-74999 70000.0 5.00 4.0 Own Working 78.4 NaN NaN 168.8 27.5 NaN 25.0_to_29.9 76.0 147.0 73.0 150.0 72.0 148.0 74.0 146.0 72.0 505.13 0.93 4.94 218.0 1.253 NaN NaN Yes 56.0 Good 0.0 2.0 NaN NaN NaN NaN NaN 6.0 No No NaN 2_hr 1_hr NaN NaN Yes NaN 0.0 NaN No Non-Smoker NaN NaN NaN NaN NaN No Yes 19.0 2.0 NaN No NaN NaN
nhanes_jl = CSV.File("../../dataset/nhanes.csv") |> DataFrames.DataFrame
10000×77 DataFrame
Row │ Column1 ID SurveyYr Gender Age AgeDecade AgeMonths Race1 Race3 Education MaritalStatus HHIncome HHIncomeMid Poverty HomeRooms HomeOwn Work Weight Length HeadCirc Height BMI BMICatUnder20yrs BMI_WHO Pulse BPSysAve BPDiaAve BPSys1 BPDia1 BPSys2 BPDia2 BPSys3 BPDia3 Testosterone DirectChol TotChol UrineVol1 UrineFlow1 UrineVol2 UrineFlow2 Diabetes DiabetesAge HealthGen DaysPhysHlthBad DaysMentHlthBad LittleInterest Depressed nPregnancies nBabies Age1stBaby SleepHrsNight SleepTrouble PhysActive PhysActiveDays TVHrsDay CompHrsDay TVHrsDayChild CompHrsDayChild Alcohol12PlusYr AlcoholDay AlcoholYear SmokeNow Smoke100 Smoke100n SmokeAge Marijuana AgeFirstMarij RegularMarij AgeRegMarij HardDrugs SexEver SexAge SexNumPartnLife SexNumPartYear SameSex SexOrientation PregnantNow
│ Int64 Int64 String7 String7 Int64 String7 String3 String15 String15 String15 String15 String15 String7 String7 String3 String7 String15 String7 String7 String7 String7 String7 String15 String15 String3 String3 String3 String3 String3 String3 String3 String3 String3 String7 String7 String7 String3 String7 String3 String7 String3 String3 String15 String3 String3 String7 String7 String3 String3 String3 String3 String3 String3 String3 String15 String15 String3 String3 String3 String3 String3 String3 String3 String15 String3 String3 String3 String3 String3 String3 String3 String3 String7 String3 String3 String15 String7
───────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
1 │ 1 51624 2009_10 male 34 30-39 409 White NA High School Married 25000-34999 30000 1.36 6 Own NotWorking 87.4 NA NA 164.7 32.22 NA 30.0_plus 70 113 85 114 88 114 88 112 82 NA 1.29 3.49 352 NA NA NA No NA Good 0 15 Most Several NA NA NA 4 Yes No NA NA NA NA NA Yes NA 0 No Yes Smoker 18 Yes 17 No NA Yes Yes 16 8 1 No Heterosexual NA
2 │ 2 51624 2009_10 male 34 30-39 409 White NA High School Married 25000-34999 30000 1.36 6 Own NotWorking 87.4 NA NA 164.7 32.22 NA 30.0_plus 70 113 85 114 88 114 88 112 82 NA 1.29 3.49 352 NA NA NA No NA Good 0 15 Most Several NA NA NA 4 Yes No NA NA NA NA NA Yes NA 0 No Yes Smoker 18 Yes 17 No NA Yes Yes 16 8 1 No Heterosexual NA
3 │ 3 51624 2009_10 male 34 30-39 409 White NA High School Married 25000-34999 30000 1.36 6 Own NotWorking 87.4 NA NA 164.7 32.22 NA 30.0_plus 70 113 85 114 88 114 88 112 82 NA 1.29 3.49 352 NA NA NA No NA Good 0 15 Most Several NA NA NA 4 Yes No NA NA NA NA NA Yes NA 0 No Yes Smoker 18 Yes 17 No NA Yes Yes 16 8 1 No Heterosexual NA
4 │ 4 51625 2009_10 male 4 0-9 49 Other NA NA NA 20000-24999 22500 1.07 9 Own NA 17 NA NA 105.4 15.3 NA 12.0_18.5 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA No NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 4 1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
5 │ 5 51630 2009_10 female 49 40-49 596 White NA Some College LivePartner 35000-44999 40000 1.91 5 Rent NotWorking 86.7 NA NA 168.4 30.57 NA 30.0_plus 86 112 75 118 82 108 74 116 76 NA 1.16 6.7 77 0.094 NA NA No NA Good 0 10 Several Several 2 2 27 8 Yes No NA NA NA NA NA Yes 2 20 Yes Yes Smoker 38 Yes 18 No NA Yes Yes 12 10 1 Yes Heterosexual NA
6 │ 6 51638 2009_10 male 9 0-9 115 White NA NA NA 75000-99999 87500 1.84 6 Rent NA 29.8 NA NA 133.1 16.82 NA 12.0_18.5 82 86 47 84 50 84 50 88 44 NA 1.34 4.86 123 1.538 NA NA No NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 5 0 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
7 │ 7 51646 2009_10 male 8 0-9 101 White NA NA NA 55000-64999 60000 2.33 7 Own NA 35.2 NA NA 130.6 20.64 NA 18.5_to_24.9 72 107 37 114 46 108 36 106 38 NA 1.55 4.09 238 1.322 NA NA No NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 1 6 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
8 │ 8 51647 2009_10 female 45 40-49 541 White NA College Grad Married 75000-99999 87500 5 6 Own Working 75.7 NA NA 166.7 27.24 NA 25.0_to_29.9 62 118 64 106 62 118 68 118 60 NA 2.12 5.82 106 1.116 NA NA No NA Vgood 0 3 None None 1 NA NA 8 No Yes 5 NA NA NA NA Yes 3 52 NA No Non-Smoker NA Yes 13 No NA No Yes 13 20 0 Yes Bisexual NA
⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮
9994 │ 9994 71909 2011_12 male 28 20-29 NA Mexican Mexican 9 - 11th Grade NeverMarried 5000-9999 7500 0.46 3 Rent Working 92.3 NA NA 177.3 29.4 NA 25.0_to_29.9 68 124 65 124 62 126 64 122 66 490.43 1.22 3.9 97 0.942 NA NA No NA NA NA NA NA NA NA NA NA 6 No Yes NA 1_hr 2_hr NA NA NA NA NA Yes Yes Smoker 18 NA NA NA NA NA NA NA NA NA NA NA NA
9995 │ 9995 71909 2011_12 male 28 20-29 NA Mexican Mexican 9 - 11th Grade NeverMarried 5000-9999 7500 0.46 3 Rent Working 92.3 NA NA 177.3 29.4 NA 25.0_to_29.9 68 124 65 124 62 126 64 122 66 490.43 1.22 3.9 97 0.942 NA NA No NA NA NA NA NA NA NA NA NA 6 No Yes NA 1_hr 2_hr NA NA NA NA NA Yes Yes Smoker 18 NA NA NA NA NA NA NA NA NA NA NA NA
9996 │ 9996 71909 2011_12 male 28 20-29 NA Mexican Mexican 9 - 11th Grade NeverMarried 5000-9999 7500 0.46 3 Rent Working 92.3 NA NA 177.3 29.4 NA 25.0_to_29.9 68 124 65 124 62 126 64 122 66 490.43 1.22 3.9 97 0.942 NA NA No NA NA NA NA NA NA NA NA NA 6 No Yes NA 1_hr 2_hr NA NA NA NA NA Yes Yes Smoker 18 NA NA NA NA NA NA NA NA NA NA NA NA
9997 │ 9997 71910 2011_12 female 0 0-9 5 White White NA NA 75000-99999 87500 3.37 10 Own NA 6.7 67.6 42.2 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
9998 │ 9998 71911 2011_12 male 27 20-29 NA Mexican Mexican College Grad Married 75000-99999 87500 3.25 10 Own Working 96.7 NA NA 175.8 31.3 NA 30.0_plus 74 133 74 122 76 132 82 134 66 509 1.06 5.72 63 0.6 NA NA No NA Good 0 2 None None NA NA NA 6 No No 3 1_hr 0_to_1_hr NA NA Yes 5 4 NA No Non-Smoker NA Yes 22 No NA No Yes 21 1 1 No Heterosexual NA
9999 │ 9999 71915 2011_12 male 60 60-69 NA White White College Grad NeverMarried 65000-74999 70000 5 4 Own Working 78.4 NA NA 168.8 27.5 NA 25.0_to_29.9 76 147 73 150 72 148 74 146 72 505.13 0.93 4.94 218 1.253 NA NA Yes 56 Good 0 2 None None NA NA NA 6 No No 1 2_hr 1_hr NA NA Yes NA 0 NA No Non-Smoker NA NA NA NA NA No Yes 19 2 NA No NA NA
10000 │ 10000 71915 2011_12 male 60 60-69 NA White White College Grad NeverMarried 65000-74999 70000 5 4 Own Working 78.4 NA NA 168.8 27.5 NA 25.0_to_29.9 76 147 73 150 72 148 74 146 72 505.13 0.93 4.94 218 1.253 NA NA Yes 56 Good 0 2 None None NA NA NA 6 No No NA 2_hr 1_hr NA NA Yes NA 0 NA No Non-Smoker NA NA NA NA NA No Yes 19 2 NA No NA NA
9985 rows omitted
Inferential Statistics
Null hypothesis (H0): The claim that is not interesting.
Alternative hypothesis (HA): The claim corresponding with the research hypothesis.
The goal is to disprove the null hypothesis.
colnames(NHANES)
[1] "ID" "SurveyYr" "Gender" "Age" "AgeDecade" "AgeMonths" "Race1" "Race3" "Education" "MaritalStatus" "HHIncome" "HHIncomeMid" "Poverty" "HomeRooms" "HomeOwn" "Work" "Weight" "Length" "HeadCirc" "Height" "BMI" "BMICatUnder20yrs" "BMI_WHO" "Pulse" "BPSysAve" "BPDiaAve" "BPSys1" "BPDia1" "BPSys2" "BPDia2" "BPSys3" "BPDia3" "Testosterone" "DirectChol" "TotChol" "UrineVol1" "UrineFlow1" "UrineVol2" "UrineFlow2" "Diabetes" "DiabetesAge" "HealthGen" "DaysPhysHlthBad" "DaysMentHlthBad" "LittleInterest" "Depressed" "nPregnancies" "nBabies" "Age1stBaby" "SleepHrsNight" "SleepTrouble" "PhysActive" "PhysActiveDays" "TVHrsDay" "CompHrsDay" "TVHrsDayChild" "CompHrsDayChild" "Alcohol12PlusYr" "AlcoholDay" "AlcoholYear" "SmokeNow" "Smoke100" "Smoke100n" "SmokeAge" "Marijuana" "AgeFirstMarij" "RegularMarij" "AgeRegMarij" "HardDrugs" "SexEver" "SexAge" "SexNumPartnLife" "SexNumPartYear" "SameSex" "SexOrientation" "PregnantNow"
# Create bar plot for Home Ownership by Gender
ggplot(NHANES, aes(x = Gender, fill = HomeOwn)) +
# Set the position to fill
geom_bar(position = "fill") +
ylab("Relative frequencies")
# Density plot of SleepHrsNight colored by SleepTrouble
ggplot(NHANES, aes(x = SleepHrsNight, color = SleepTrouble)) +
# Adjust by 2
geom_density(adjust = 2) +
# Facet by HealthGen
facet_wrap(~ HealthGen)
Warning: Removed 2245 rows containing non-finite values (`stat_density()`).
# From previous step
homes <- NHANES %>%
select(Gender, HomeOwn) %>%
filter(HomeOwn %in% c("Own", "Rent"))
diff_orig <- homes %>%
# Group by gender
group_by(Gender) %>%
# Summarize proportion of homeowners
summarize(prop_own = mean(HomeOwn == "Own")) %>%
# Summarize difference in proportion of homeowners
summarize(obs_diff_prop = diff(prop_own)) # male - female
# See the result
diff_orig
# A tibble: 1 × 1
obs_diff_prop
<dbl>
1 -0.00783
# Perform 10 permutations
homeown_perm <- homes %>%
specify(HomeOwn ~ Gender, success = "Own") %>%
hypothesize(null = "independence") %>%
generate(reps = 10, type = "permute")
Dropping unused factor levels Other from the supplied response variable 'HomeOwn'.
# Print results to console
homeown_perm
Response: HomeOwn (factor)
Explanatory: Gender (factor)
Null Hypothesis: independence
# A tibble: 97,120 × 3
# Groups: replicate [10]
HomeOwn Gender replicate
<fct> <fct> <int>
1 Own male 1
2 Own male 1
3 Own male 1
4 Own male 1
5 Own female 1
6 Own male 1
7 Own male 1
8 Own female 1
9 Own female 1
10 Rent female 1
# ℹ 97,110 more rows
# Perform 100 permutations
homeown_perm <- homes %>%
specify(HomeOwn ~ Gender, success = "Own") %>%
hypothesize(null = "independence") %>%
generate(reps = 100, type = "permute") %>%
calculate(stat = "diff in props", order = c("male", "female"))
Dropping unused factor levels Other from the supplied response variable 'HomeOwn'.
# Dotplot of 100 permuted differences in proportions
ggplot(homeown_perm, aes(x = stat)) +
geom_dotplot(binwidth = 0.001)
# Perform 1000 permutations
homeown_perm <- homes %>%
# Specify HomeOwn vs. Gender, with `"Own" as success
specify(HomeOwn ~ Gender, success = "Own") %>%
# Use a null hypothesis of independence
hypothesize(null = "independence") %>%
# Generate 1000 repetitions (by permutation)
generate(reps = 1000, type = "permute") %>%
# Calculate the difference in proportions (male then female)
calculate(stat = "diff in props", order = c("male", "female"))
Dropping unused factor levels Other from the supplied response variable 'HomeOwn'.
# Density plot of 1000 permuted differences in proportions
ggplot(homeown_perm, aes(x = stat)) +
geom_density()