Preface
Welcome
I Foundations
1
Introduction to R
1.1
Transitioning to R
1.2
Installing R
1.3
Programing in R
1.4
Data cleaning and processing
2
Producing your first plot
3
Basic R Markdown
4
Basic data wrangling
5
Types of data
6
Tidy data
7
Joins
8
Collaborating with git and Github
8.1
Getting Started
8.2
Creating a repository
8.2.1
Starting with an empty project
8.2.2
Starting with an existing project
8.3
Making commits and pushing to the remote
8.3.1
Merge conflicts
8.4
Collaborating
8.4.1
Branching
8.4.2
Forking
8.4.3
Stashing
8.5
General workflow advice
9
A collaborative exploratory data analysis example
II Data visualization & communication
10
Introduction to visualizations
11
Visual perception
12
Color
13
Refining your plots
14
Geographic data
15
Visualizing uncertainty
16
Tables and fonts
17
Websites in R Markdown
18
Flexdashboards
19
Shiny
III Functional programming
20
Data types
21
Iteration
22
Batch load and processing data
23
List columns
24
Writing functions
25
Package development
IV Machine learning
26
Inference vs. prediction
27
Ethics in machine learning
28
Cross validation
29
Cloud computing
30
Extending
lm
: ridge, lasso and elastic net regression
31
Feature Engineering
31.1
Basics of {recipes}
31.2
Creating a recipe
31.2.1
Order matters
31.3
Encoding categorical data
31.3.1
Transformations beyond dummy coding
31.3.2
Handling new levels
31.3.3
Final thoughts on encoding categorical data
31.4
Dealing with low variance predictors
31.5
Missing data
31.5.1
Omission
31.5.2
Encoding and simple imputation
31.5.3
Modeling the missingness
31.5.4
A few words of caution
31.6
Transformations
31.6.1
Box-Cox and similar transformations
31.6.2
An applied example
31.7
Nonlinearity
31.7.1
Polynomial transformations
31.7.2
Splines
31.8
Interactions
31.8.1
Creating interactions “by hand”
31.8.2
Creating interactions with {recipes}
31.9
PCA
31.9.1
PCA with {recipes}
31.10
Wrapping up
32
K
-nearest neighbor
33
Decision Trees
33.0.1
A simple decision tree
33.1
Determining optimal splits
33.2
Visualizing decision trees
33.3
Fitting a decision tree
33.3.1
Load the data
33.4
Tuning decision trees
33.4.1
Decision tree hyperparamters
33.4.2
Conducting the grid search
33.4.3
Finalizing our model fit
34
Bagging and Random Forests
34.0.1
Bagging “by hand”
34.1
Bagged trees
34.1.1
Working with out-of-bag samples
34.1.2
Tuning with OOB samples
34.2
Random Forests
34.2.1
Fitting random forests
34.3
Feature and model interpretation
35
Boosting and Boosted Trees
35.1
Gradient descent
35.2
Boosted trees
35.2.1
Hyperparameters and engines
35.3
Fitting boosted tree models
35.3.1
Model tuning
35.3.2
Wrapping up
36
Model stacking
Social Data Science with R
12
Color