博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
A function to help graphical model checks of lm and ANOVA(转)
阅读量:5966 次
发布时间:2019-06-19

本文共 3205 字,大约阅读时间需要 10 分钟。

As always a more colourful version of this post is available on .

Even if LM are very simple models at the basis of many more complex ones, LM still have some assumptions that if not met would render any interpretation from the models plainly wrong. In my field of research most people were taught about checking ANOVA assumptions using tests like Levene & co. This is however not the best way to check if my model meet its assumptions as p-values depend on the sample size, with small sample size we will almost never reject the null hypothesis while with big sample even small deviation will lead to significant p-values (). As ANOVA and linear models are two different ways to look at the same model () we can check ANOVA assumptions using graphical check from a linear model. In R this is easily done using plot(model), but people often ask me what amount of deviation makes me reject a model. One easy way to see if the model checking graphs are off the charts is to simulate data from the model, fit the model to these newly simulated data and compare the graphical checks from the simulated data with the real data. If you cannot differentiate between the simulated and the real data then your model is fine, if you can then try again!

Below is a little function that implement this idea:

 

lm.test<-function(m  require(plyr)  #the model frame  dat<-model.frame(m)  #the model matrix  f<-formula(m)  modmat<-model.matrix(f,dat)  #the standard deviation of the residuals  sd.resid<-sd(resid(m  #sample size  n<-dim(dat)[1]  #get the right-hand side of the formula    #rhs<-all.vars(update(f, 0~.))  #simulate 8 response vectors from model  ys<-lapply(1:8,function(x) rnorm(n,modmat%*%coef(m),sd.resid))  #refit the models  ms<-llply(ys,function(y) lm(y~modmat[,-1]))  #put the residuals and fitted values in a list  df<-llply(ms,function(x) data.frame(Fitted=fitted(x),Resid=resid(x)))  #select a random number from 2 to 8  rnd<-sample(2:8,1)  #put the original data into the list  df<-c(df[1:(rnd-1)],list(data.frame(Fitted=fitted(m),Resid=resid(m))),df[rnd:8])  #plot   par(mfrow=c(3,3))  l_ply(df,function(x){    plot(Resid~Fitted,x,xlab="Fitted",ylab="Residuals")    abline(h=0,lwd=2,lty=2)  })  l_ply(df,function(x){    qqnorm(x$Resid)    qqline(x$Resid)  })  out<-list(Position=rnd)  return(out)}

 

This function print the two basic plots: one looking at the spread of the residuals around the fitted values, the other one look at the normality of the residuals. The function return the position of the real model in the 3×3 window, counting from left to right and from top to bottom (ie position 1 is upper left graph).

Let’s try the function:

 

#a simulated data frame of independent variablesdat<-data.frame(Temp=runif(100,0,20),Treatment=gl(n = 5,k = 20))contrasts(dat$Treatment)<-"contr.sum"#the model matrixmodmat<-model.matrix(~Temp*Treatment,data=dat)#the coefficientcoeff<-rnorm(10,0,4)#simulate response datadat$Biomass<-rnorm(100,modmat%*%coeff,1)#the modelm<-lm(Biomass~Temp*Treatment,dat)#model checkchk<-lm.test(m)

 

Can you find which one is the real one? I could not, here is the answer:

 

chk$Position[1] 4

Happy and safe modelling!

 

转自:

转载于:https://www.cnblogs.com/payton/p/4368700.html

你可能感兴趣的文章
Django Tips
查看>>
HTML5教程:1.1 迎接新的Web时代
查看>>
wuzhicms刷新按钮的功能开发
查看>>
为apache添加SSL支持
查看>>
MTK DRM常见问题介绍
查看>>
U盘安装ubuntu server 10.4
查看>>
【微度子】客户端
查看>>
Spark cluster 部署
查看>>
requireJS学习笔记
查看>>
arch开机自动认证
查看>>
浅谈 G1 GC 日志格式
查看>>
Linux常用的基本命令10
查看>>
我的友情链接
查看>>
bitcoin转账api,python2.7
查看>>
bash功能特性二 命令别名和历史命令
查看>>
交换机ACL配置
查看>>
Lync Server多SIP域环境和简单URL地址部署
查看>>
Pecl和Pear的区别和联系?
查看>>
几点建议,让Redis在你的系统中发挥更大作用
查看>>
如何在Linux操作系统定时重启Tomcat服务?
查看>>