---
title: "Duncan Data Analysis"
author: "J. Fox and S. Weisberg"
date: "2018-04-24"
output: html_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

This R Markdown file reproduces the data analysis described in Section 1.5 of _An R Companion to Applied Linear Regression, 3rd Edition_.  See the text for further explanation.

# Preliminaries

As a first step, we load the **car** package, and examine the first few rows of the `Duncan` data:
```{r}
library("car") # load car and carData packages
head(Duncan, n=10)
dim(Duncan)
```

Obtain summary statistics for the variables in `Duncan`:
```{r}
summary(Duncan)
```

As a first graph, we view a histogram of the variable `prestige`:
```{r, fig.width=5, fig.height=5}
with(Duncan, hist(prestige))
```


## 1.5.1 Examining the Data

The `scatterplotMatrix()` function in the **car** package produces scatterplots for all paris of variables.  A few relatively remote points are marked by case names, in this instance by occupation.
```{r fig.height=8, fig.width=8}
scatterplotMatrix( ~ prestige + education + income, 
    id=list(n=3), data=Duncan)
```

## 1.5.2 Regression Analysis

We use the`lm()` function to fit a linear regression model to the data:
```{r}
(duncan.model <- lm(prestige ~ education + income, data=Duncan))
```

```{r}
summary(duncan.model)
```
The `brief()` and `S()` functions, both in the **car** package, provide alternative summaries of a regression fit.

## 1.5.3 Regression Diagnostics

The `rstudent()` function returns studentized residuals, and the `densityPlot()` function fits an adaptive kernel density estimator to the distribution of the studentized residuals:
```{r fig.height=5, fig.width=5}
densityPlot(rstudent(duncan.model))
```

A `qqPlot()` can be used as a check for nonnormal errors, comparing the studentized residuals to a t-distribution:
```{r fig.height=5,fig.width=5}
qqPlot(duncan.model, id=list(n=3))
```

This next function tests for outliers in the regression:
```{r}
outlierTest(duncan.model)
```

This graph displays influence measures in index plots:
```{r fig.height=6,fig.width=6}
influenceIndexPlot(duncan.model, vars=c("Cook", "hat"), 
    id=list(n=3))
```

Added-variable plots for Duncan's regression, looking for influential cases:

```{r fig.height=4, fig.width=8}
avPlots(duncan.model, 
    id=list(cex=0.75, n=3, method="mahal"))
```

Component-plus-residual plots for the regression, checking for nonlinearity:

```{r fig.height=4, fig.width=8}
crPlots(duncan.model, smooth=list(span=0.7))
```

Tests for non-constant error variance:

```{r}
ncvTest(duncan.model)
ncvTest(duncan.model, var.formula= ~ income + education)
```

Removing the cases `"minister"` and "`conductor'":

```{r}
whichNames(c("minister", "conductor"), Duncan)
duncan.model.2 <- update(duncan.model, subset=-c(6, 16))
summary(duncan.model.2)
```

Comparing the regressions with and without these two cases:
```{r}
compareCoefs(duncan.model, duncan.model.2)
```