
Diego J. Lizcano, Ph.D.
Unillanos, Villavicencio
Simplification of a real system.
It allows us to understand how the parameters interact or are affected if we vary something.
How does the sample size affect the distribution of the mean? We will use the iris dataset from R and the petal length column.
str(iris)
iter <- 1000 # Numero de iteraciones
n <- 1 # numero de datos que muestreo. Variar hasta 150
means <- rep(NA, iter) # aca almaceno las medias de cada iteracion
for (i in 1:iter){
d <- sample(iris$Petal.Length, n) # sample toma n (1) datos
means[i] <- mean(d) # saca la media y guarda en means
}
hist(means)
abline(v=mean(iris$Petal.Length), lty=2, lwd=3, col="blue")
\[ \begin{aligned} y = \alpha + \beta x + \epsilon \end{aligned} \]
where \(\alpha\) and \(\beta\) are model parameters and \(\epsilon\) is the model error.
\(\alpha\) corresponds to the intercept
\(\beta\) corresponds to the coefficient of x (slope).
We will use the iris dataset from R.
str(iris)
model1 <- lm(Sepal.Length~Petal.Length, data=iris)
summary(model1)
library(ggplot2)
ggplot(model1, aes(x=Petal.Length, y=Sepal.Length)) + geom_point() +
geom_smooth(method = "lm") # try geom_smooth()
newdato <- data.frame(Petal.Length=seq(4, 7, by=0.1))
predict(model1, newdata = newdato) # predice sepalo cuando petalo es de 4 a 7
We will use the iris dataset from R. Use + to combine effects. : for A:B interactions; and asterisk (*) for effects and interactions, eg A*B = A+B+A:B
str(iris)
model1a <- lm(Sepal.Length~Petal.Length * Petal.Width, data=iris)
summary(model1a)
library(lattice)
newdato<-expand.grid(list(Petal.Length = seq(4, 7, length.out=100),
Petal.Width=seq(1, 2.5, length.out=100)))
newdato$Sepal.Length<-predict(model1a, newdata = newdato) # predice sepalo con petalo de 4 a 7 y 1 a 2.5
levelplot(Sepal.Length~Petal.Length + Petal.Width, data=newdato,
xlab = "Petal.Length", ylab = "Petal.Width",
main = "Surface of Sepal.Length")
We will use the mtcars dataset from R.
data(mtcars)
model2 <- glm(vs~mpg, data=mtcars, family = binomial(link = "logit"))
summary(model2)
library(ggplot2)
ggplot(mtcars, aes(x=mpg, y=vs)) + geom_point() +
geom_smooth(method = "glm", method.args = list(family = "binomial"))
newdato2 <- data.frame(mpg= seq(20, 30, by=0.5))
predict(model2, newdata = newdato2, type='response') # predice vs cuando mpg es de 20 a 30