R as a modeling tool

Models 101

Diego J. Lizcano, Ph.D.
Unillanos, Villavicencio

Download

Models in R

Simplification of a real system.

It allows us to understand how the parameters interact or are affected if we vary something.

  • Mean model
  • Linear regression model
  • Logistic regression model

Mean model

How does the sample size affect the distribution of the mean? We will use the iris dataset from R and the petal length column.

str(iris)
iter <- 1000 # Numero de iteraciones
n <- 1 # numero de datos que muestreo. Variar hasta 150
means <- rep(NA, iter) # aca almaceno las medias de cada iteracion

for (i in 1:iter){
  d <- sample(iris$Petal.Length, n) # sample toma n (1) datos 
  means[i] <- mean(d) # saca la media y guarda en means 
}

hist(means)
abline(v=mean(iris$Petal.Length), lty=2, lwd=3, col="blue")

Algebra of linear regression model


\[ \begin{aligned} y = \alpha + \beta x + \epsilon \end{aligned} \]

  • where \(\alpha\) and \(\beta\) are model parameters and \(\epsilon\) is the model error.

  • \(\alpha\) corresponds to the intercept

  • \(\beta\) corresponds to the coefficient of x (slope).

- when \(\beta\) = 0, there is no significant relationship between the variables x and y.

Linear regression model

We will use the iris dataset from R.

str(iris)

model1 <- lm(Sepal.Length~Petal.Length, data=iris)
summary(model1)
library(ggplot2)
ggplot(model1, aes(x=Petal.Length, y=Sepal.Length)) + geom_point() + 
  geom_smooth(method = "lm") # try geom_smooth()

newdato <- data.frame(Petal.Length=seq(4, 7, by=0.1))
predict(model1, newdata = newdato) # predice sepalo cuando petalo es de 4 a 7

Linear regression model with 2 or more predictors

We will use the iris dataset from R. Use + to combine effects. : for A:B interactions; and asterisk (*) for effects and interactions, eg A*B = A+B+A:B

str(iris)

model1a <- lm(Sepal.Length~Petal.Length * Petal.Width, data=iris)
summary(model1a)
library(lattice)
newdato<-expand.grid(list(Petal.Length = seq(4, 7, length.out=100), 
                          Petal.Width=seq(1, 2.5, length.out=100)))
newdato$Sepal.Length<-predict(model1a, newdata = newdato) # predice sepalo con petalo de 4 a 7 y 1 a 2.5

levelplot(Sepal.Length~Petal.Length + Petal.Width, data=newdato,
  xlab = "Petal.Length", ylab = "Petal.Width",
  main = "Surface of Sepal.Length")

Logistic regression model

We will use the mtcars dataset from R.

data(mtcars)

model2 <- glm(vs~mpg, data=mtcars, family = binomial(link = "logit"))
summary(model2)
library(ggplot2)
ggplot(mtcars, aes(x=mpg, y=vs)) + geom_point() + 
  geom_smooth(method = "glm", method.args = list(family = "binomial"))

newdato2 <- data.frame(mpg= seq(20, 30, by=0.5))
predict(model2, newdata = newdato2, type='response') # predice vs cuando mpg es de 20 a 30