Random Walker: Is the dataset coming from binomial trials? (let bayesian check)

Source: Bayesian Data Analysis, Andrew Gelman, John B.Carlin etc. (2004, second edition), page 163

Alright, let's start our business.
Consider two dataset: dataset A =(1,1,0,0,0,0,0,1,1,1,1,1,0,0,0,0,0,0,0,0), and dataset B=(1,0,1,0,0,0,1,1,0,0,1,1,1,1,0,0,1,0,1,1).
Question: given dataset A, can we make inference on "theta", the probability that a flipping coin ends up with face.
How does bayesian execute inference?
step 1: assume dataset A comes from binomial distribution (look, this is usually given in textbook problem, in reality, it is not necessarily true, however, people tend to take for granded that dataset A follows binomial distribution)
step 2: figure a prior for "theta". maybe uniform in most cases. make sense, right?
step 3: with prior and likelihood, bayesian can deduce the posterior distribution of "theta".
step 4: done. Give me five....!

However, the dataset A has been modeled as a specified number of iid Bernoulli trials with a uniform prior distribution on the probability of success, say, theta may not actually follow preassumed distribution. I did it a lot without giving even one second of thinking potential assumption really holds. The observed autocorrelation on dataset A is evidence that the model is flawed. To quantify the evidence, we can perform a posterior predictive distributino of T(y^{rep}) by simulation, that is, we assume that dataset A follows iid Bernoulli trials, then we calculate the posterior distribution for the switch.number (the switch.number is the number of times that data change from 0 to 1, or 1 to 0. either way.) and then caculate the probability that posterior switch.number greater than the actually switch.number (in the dataset A , the switch.number is 3). Let's do the experiment! (just for dataset A, as for dataset B, readers, you are smart enough to DIY!)

Following is histogram of simulation of posterior distribution of switch.number.

(Graphic is too small to see ?? just click it, the enlarged one will show in another window)
R program:
y <- c(1,1,0,0,0,0,0,1,1,1,1,1,0,0,0,0,0,0,0,0) # this is the orginal data which is assumed to be binomial distributed, however, whether assumption is valid still pending, leading us to investiage via following program:

set.seed(40)
theta.post <- rbeta(1,sum(y)+1,length(y)-sum(y)+1)

set.seed(40)

y.rep <- array(rbinom(2000,1,theta.post),c(20,100))

switch.number <- array(0,100,1)

for (j in 1:100)

{

for (i in 1:19)

{

if (y.rep[i,j]!=y.rep[i+1,j])

switch.number[j] <- switch.number[j]+1}

}

hist(switch.number,probability=TRUE,breaks=c(1.5:13.5),main = "posterior predictive distribution of number of switches",xlab="switch.number")

lines(density(switch.number),col="red",lwd=3)

abline(v=3, lwd=3,col="blue")

p.value <- sum(switch.number<=3)/100

p.value

result: p.value=0.02 indicating that there is no adequate evidence support the null hypothesis that dataset A comes from binomial distribution.

Random Walker

2/16/2009

Is the dataset coming from binomial trials? (let bayesian check)

No comments:

Statistics Links

Labels