Data-Informed Thinking + Doing

Averaging Predictions Via Random Forest and Extra Trees

Ensemble methods on decision trees for regression tasks—using R, Python, and Julia.


Appendix A: Environment, Language & Package Versions, and Coding Style

If you are interested in reproducing this work, here are the versions of R, Python, and Julia that I used (as well as the respective packages for each). Additionally, my coding style here is verbose, in order to trace back where functions/methods and variables are originating from, and make this a learning experience for everyone—including me.

cat(
    R.version$version.string, "-", R.version$nickname,
    "\nOS:", Sys.info()["sysname"], R.version$platform,
    "\nCPU:", benchmarkme::get_cpu()$no_of_cores, "x", benchmarkme::get_cpu()$model_name
)
R version 4.2.3 (2023-03-15) - Shortstop Beagle 
OS: Darwin x86_64-apple-darwin17.0 
CPU: 8 x Intel(R) Core(TM) i5-8259U CPU @ 2.30GHz
require(devtools)
devtools::install_version("dplyr", version="1.1.4", repos="http://cran.us.r-project.org")
devtools::install_version("ggplot2", version="3.5.0", repos="http://cran.us.r-project.org")
devtools::install_version("Metrics", version="0.1.4", repos="http://cran.us.r-project.org")

library(package=dplyr)
library(package=ggplot2)
library(package=Metrics)
import sys
import platform
import os
import cpuinfo
print(
    "Python", sys.version,
    "\nOS:", platform.system(), platform.platform(),
    "\nCPU:", os.cpu_count(), "x", cpuinfo.get_cpu_info()["brand_raw"]
)
Python 3.11.4 (v3.11.4:d2340ef257, Jun  6 2023, 19:15:51) [Clang 13.0.0 (clang-1300.0.29.30)] 
OS: Darwin macOS-10.16-x86_64-i386-64bit 
CPU: 8 x Intel(R) Core(TM) i5-8259U CPU @ 2.30GHz
!pip install numpy==1.25.1
!pip install pandas==2.0.3
!pip install scipy==1.11.1

import numpy
import pandas
from scipy import stats
using InteractiveUtils
InteractiveUtils.versioninfo()
Julia Version 1.9.2
Commit e4ee485e909 (2023-07-05 09:39 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin22.4.0)
  CPU: 8 × Intel(R) Core(TM) i5-8259U CPU @ 2.30GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, skylake)
  Threads: 1 on 8 virtual cores
Environment:
  DYLD_FALLBACK_LIBRARY_PATH = /Library/Frameworks/R.framework/Resources/lib:/Library/Java/JavaVirtualMachines/jdk-21.jdk/Contents/Home/lib/server
using Pkg
Pkg.add(name="HTTP", version="1.10.2")
Pkg.add(name="CSV", version="0.10.13")
Pkg.add(name="DataFrames", version="1.6.1")
Pkg.add(name="CategoricalArrays", version="0.10.8")
Pkg.add(name="StatsBase", version="0.34.2")

using HTTP
using CSV
using DataFrames
using CategoricalArrays
using StatsBase

Further Readings

Applied Advanced Analytics & AI in Sports