R as a modeling tool

Models 101

Diego J. Lizcano, Ph.D.
Unillanos, Villavicencio

Download

Models in R

Simplification of a real system.

It allows us to understand how the parameters interact or are affected if we vary something.

Mean model
Linear regression model
Logistic regression model

Mean model

How does the sample size affect the distribution of the mean? We will use the iris dataset from R and the petal length column.

str(iris)
iter <- 1000 # Numero de iteraciones
n <- 1 # numero de datos que muestreo. Variar hasta 150
means <- rep(NA, iter) # aca almaceno las medias de cada iteracion

for (i in 1:iter){
  d <- sample(iris$Petal.Length, n) # sample toma n (1) datos 
  means[i] <- mean(d) # saca la media y guarda en means 
}

hist(means)
abline(v=mean(iris$Petal.Length), lty=2, lwd=3, col="blue")

Algebra of linear regression model

\[ \begin{aligned} y = \alpha + \beta x + \epsilon \end{aligned} \]

where \(\alpha\) and \(\beta\) are model parameters and \(\epsilon\) is the model error.
\(\alpha\) corresponds to the intercept
\(\beta\) corresponds to the coefficient of x (slope).

- when \(\beta\) = 0, there is no significant relationship between the variables x and y.

Linear regression model

We will use the iris dataset from R.

str(iris)

model1 <- lm(Sepal.Length~Petal.Length, data=iris)
summary(model1)
library(ggplot2)
ggplot(model1, aes(x=Petal.Length, y=Sepal.Length)) + geom_point() + 
  geom_smooth(method = "lm") # try geom_smooth()

newdato <- data.frame(Petal.Length=seq(4, 7, by=0.1))
predict(model1, newdata = newdato) # predice sepalo cuando petalo es de 4 a 7

Linear regression model with 2 or more predictors

We will use the iris dataset from R. Use + to combine effects. : for A:B interactions; and asterisk (*) for effects and interactions, eg A*B = A+B+A:B

str(iris)

model1a <- lm(Sepal.Length~Petal.Length * Petal.Width, data=iris)
summary(model1a)
library(lattice)
newdato<-expand.grid(list(Petal.Length = seq(4, 7, length.out=100), 
                          Petal.Width=seq(1, 2.5, length.out=100)))
newdato$Sepal.Length<-predict(model1a, newdata = newdato) # predice sepalo con petalo de 4 a 7 y 1 a 2.5

levelplot(Sepal.Length~Petal.Length + Petal.Width, data=newdato,
  xlab = "Petal.Length", ylab = "Petal.Width",
  main = "Surface of Sepal.Length")

Logistic regression model

We will use the mtcars dataset from R.

data(mtcars)

model2 <- glm(vs~mpg, data=mtcars, family = binomial(link = "logit"))
summary(model2)
library(ggplot2)
ggplot(mtcars, aes(x=mpg, y=vs)) + geom_point() + 
  geom_smooth(method = "glm", method.args = list(family = "binomial"))

newdato2 <- data.frame(mpg= seq(20, 30, by=0.5))
predict(model2, newdata = newdato2, type='response') # predice vs cuando mpg es de 20 a 30