Hypothesis Testing
Testing statistical inferences on delivery data—using R, Python, and Julia.
Hypothesis testing is valuable in making inferences about on-time delivery by providing a systematic approach to assess the significance of factors influencing delivery performance. It allows us to formulate hypotheses, collect data, and analyze it to determine if there is evidence to support or reject these hypotheses.
With on-time delivery, hypothesis testing helps us evaluate the impact of various factors, such as transportation mode, order volume, or geographic location, on delivery performance. By testing hypotheses and drawing statistical conclusions, we can make informed decisions, optimize operations, and identify strategies to improve on-time delivery rates in supply chain management and logistics.
Let’s look at how we can apply this technique to the delivery dataset.
Getting Started
If you are interested in reproducing this work, here are the versions of R, Python, and Julia used (as well as the respective packages for each). Additionally, Leland Wilkinson’s approach to data visualization (Grammar of Graphics) has been adopted for this work. Finally, my coding style here is verbose, in order to trace back where functions/methods and variables are originating from, and make this a learning experience for everyone—including me.
cat(R.version$version.string, R.version$nickname)
R version 4.2.3 (2023-03-15) Shortstop Beagle
require(devtools)
devtools::install_version("fst", version = "0.9.8", repos = "http://cran.us.r-project.org")
devtools::install_version("dplyr", version = "1.1.2", repos = "http://cran.us.r-project.org")
devtools::install_version("tibble", version = "3.2.1", repos = "http://cran.us.r-project.org")
devtools::install_version("ggplot2", version = "3.4.2", repos = "http://cran.us.r-project.org")
library(fst)
library(dplyr)
library(tibble)
library(ggplot2)
import sys
print(sys.version)
3.11.4 (v3.11.4:d2340ef257, Jun 6 2023, 19:15:51) [Clang 13.0.0 (clang-1300.0.29.30)]
!pip install pandas==2.0.3
!pip install plotnine==0.12.1
import random
import datetime
import pandas
import plotnine
using InteractiveUtils
InteractiveUtils.versioninfo()
Julia Version 1.9.2
Commit e4ee485e909 (2023-07-05 09:39 UTC)
Platform Info:
OS: macOS (x86_64-apple-darwin22.4.0)
CPU: 8 × Intel(R) Core(TM) i5-8259U CPU @ 2.30GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-14.0.6 (ORCJIT, skylake)
Threads: 1 on 8 virtual cores
Environment:
DYLD_FALLBACK_LIBRARY_PATH = /Library/Frameworks/R.framework/Resources/lib:/Library/Java/JavaVirtualMachines/jdk1.8.0_241.jdk/Contents/Home/jre/lib/server
using Pkg
Pkg.add(name="RData", version="1.0.0")
Pkg.add(name="CSV", version="0.10.11")
Pkg.add(name="DataFrames", version="1.5.0")
Pkg.add(name="CategoricalArrays", version="0.10.8")
Pkg.add(name="Colors", version="0.12.8")
Pkg.add(name="Cairo", version="1.0.5")
Pkg.add(name="Gadfly", version="1.4.0")
using Dates # Included in Base
using CSV
using DataFrames
using CategoricalArrays
using Colors
using Cairo
using Gadfly
using MLJ
using GLM
late_shipments <- read.fst("../../dataset/late-shipments.fst")
str(late_shipments)
'data.frame': 1000 obs. of 26 variables:
$ id : num 73003 41222 52354 28471 16901 ...
$ country : chr "Vietnam" "Kenya" "Zambia" "Nigeria" ...
$ managed_by : chr "PMO - US" "PMO - US" "PMO - US" "PMO - US" ...
$ fulfill_via : chr "Direct Drop" "Direct Drop" "Direct Drop" "Direct Drop" ...
$ vendor_inco_term : chr "EXW" "EXW" "EXW" "EXW" ...
$ shipment_mode : chr "Air" "Air" "Air" "Air" ...
$ late_delivery : num 0 0 0 1 0 0 0 0 0 0 ...
$ late : chr "No" "No" "No" "Yes" ...
$ product_group : chr "ARV" "HRDT" "HRDT" "HRDT" ...
$ sub_classification : chr "Adult" "HIV test" "HIV test" "HIV test" ...
$ vendor : chr "HETERO LABS LIMITED" "Orgenics, Ltd" "Orgenics, Ltd" "Orgenics, Ltd" ...
$ item_description : chr "Efavirenz/Lamivudine/Tenofovir Disoproxil Fumarate 600/300/300mg, tablets, 30 Tabs" "HIV 1/2, Determine Complete HIV Kit, 100 Tests" "HIV 1/2, Determine Complete HIV Kit, 100 Tests" "HIV 1/2, Determine Complete HIV Kit, 100 Tests" ...
$ molecule_test_type : chr "Efavirenz/Lamivudine/Tenofovir Disoproxil Fumarate" "HIV 1/2, Determine Complete HIV Kit" "HIV 1/2, Determine Complete HIV Kit" "HIV 1/2, Determine Complete HIV Kit" ...
$ brand : chr "Generic" "Determine" "Determine" "Determine" ...
$ dosage : chr "600/300/300mg" "N/A" "N/A" "N/A" ...
$ dosage_form : chr "Tablet - FDC" "Test kit" "Test kit" "Test kit" ...
$ unit_of_measure_per_pack: num 30 100 100 100 60 20 100 30 30 25 ...
$ line_item_quantity : num 19200 6100 1364 2835 112 ...
$ line_item_value : num 201600 542900 109120 252315 1618 ...
$ pack_price : num 10.5 89 80 89 14.4 ...
$ unit_price : num 0.35 0.89 0.8 0.89 0.24 1.6 0.8 0.55 0.12 0.45 ...
$ manufacturing_site : chr "Hetero Unit III Hyderabad IN" "Alere Medical Co., Ltd." "Alere Medical Co., Ltd." "Alere Medical Co., Ltd." ...
$ first_line_designation : chr "Yes" "Yes" "Yes" "Yes" ...
$ weight_kilograms : num 2719 3497 553 1352 1701 ...
$ freight_cost_usd : num 4085 40917 7845 31284 4289 ...
$ line_item_insurance_usd : num 207.24 895.78 112.18 353.75 2.67 ...
summary(late_shipments)
id country managed_by fulfill_via vendor_inco_term shipment_mode late_delivery late product_group sub_classification vendor item_description molecule_test_type brand dosage dosage_form unit_of_measure_per_pack line_item_quantity line_item_value pack_price unit_price manufacturing_site first_line_designation weight_kilograms freight_cost_usd line_item_insurance_usd
Min. : 92 Length:1000 Length:1000 Length:1000 Length:1000 Length:1000 Min. :0.00 Length:1000 Length:1000 Length:1000 Length:1000 Length:1000 Length:1000 Length:1000 Length:1000 Length:1000 Min. : 1 Min. : 1 Min. : 0 Min. : 0 Min. : 0.0 Length:1000 Length:1000 Min. : 1 Min. : 14 Min. : 0
1st Qu.:19492 Class :character Class :character Class :character Class :character Class :character 1st Qu.:0.00 Class :character Class :character Class :character Class :character Class :character Class :character Class :character Class :character Class :character 1st Qu.: 30 1st Qu.: 450 1st Qu.: 10067 1st Qu.: 7 1st Qu.: 0.1 Class :character Class :character 1st Qu.: 136 1st Qu.: 1900 1st Qu.: 15
Median :39631 Mode :character Mode :character Mode :character Mode :character Mode :character Median :0.00 Mode :character Mode :character Mode :character Mode :character Mode :character Mode :character Mode :character Mode :character Mode :character Median : 60 Median : 2744 Median : 62318 Median : 24 Median : 0.5 Mode :character Mode :character Median : 844 Median : 5887 Median : 95
Mean :40308 Mean :0.07 Mean : 82 Mean : 14291 Mean : 151129 Mean : 40 Mean : 1.4 Mean : 2102 Mean : 11342 Mean : 233
3rd Qu.:63148 3rd Qu.:0.00 3rd Qu.: 100 3rd Qu.: 10000 3rd Qu.: 219520 3rd Qu.: 70 3rd Qu.: 0.9 3rd Qu.: 2403 3rd Qu.: 15533 3rd Qu.: 329
Max. :82005 Max. :1.00 Max. :1000 Max. :333334 Max. :2801262 Max. :1243 Max. :24.9 Max. :154780 Max. :289653 Max. :4939