Random Walker

11/05/2013

我的同事Liwen

Liwen是我们公司marketing部门strategic planning team的人。在我的印象中，她是一个非常关注自己职业发展和周围环境的人。记得她上班的第一天就逐个到每个人的cub里面进行自我介绍，还问我们是不是Excel的大牛，以后要向我们请教。不仅如此，她在networking上面非常积极，刚上班的第三个星期就给我们发信要组织marketing跟distribution的中国人lunch and learn，而且每次lunch and learn之前她都会精心准备一番，画小map给第一次来我们会议室的人，Lunch and learn之后的会议总结等等。在我印象中，她每个星期都要跟几个不同的人吃饭，在很短的时间内，她就认识了不少人。她特别关心我们周围的人都在做什么，经常问的一句话就是，“你在做什么project?”或者“你跟你老板处的怎么样啊？”。正因为如此，大家都比较喜欢跟她交流自己的工作，发发牢骚，聊聊天什么的。她会很仔细观察周围的人，然后跟我说，谁谁谁比较会做事情啊，感觉谁谁谁的脾气很好啊，技术很好啊之类的。

11/04/2013

店靠衣装

今天跟好朋友QQ去H-mark，是一个韩国超市。这个超市给我的明显的一个感觉是干净，整洁。整个超市的面积跟Super 88差不多，但是明显能感觉到各个section安排的极有条理，每个section还有一些店员做展示，我们就是因为看了店员展示的食物买了两大盒牛肉。韩国不是有非常多的小菜吗？超市非常精心的把每个小菜打包，封装好，甚至连肉类也包装得给你感觉是打开来就能吃，而且美美的透明的包装（有的包装袋上面还点缀点小绿叶）带带给人一种感觉：我们的菜非常新鲜哦～。相比之下，super 88，东西摆放不仅凌乱，食物表面也经常油腻腻，让人想拿起来仔细看看的欲望都没有。甚至还经常会买到过期食物。

Super 88地点在Boston的黄金地带，客流量非常大，开车只要15分钟。而H-mart在非常偏远的郊区，开车过去都要40分钟之久。但是H-mart却是我经常去逛并且非常喜欢逛的一家店。其实super 88本有更好的条件能有更多的客户，却因为经营不用心，很多客户都宁愿到别的小店购买更新鲜的蔬菜。

做为一家每天都为消费者提供日常用品，消费食物的店，客户一般比较在意的是：(1) 蔬菜水果生鲜肉类的新鲜程度 (2) 价格 (3) 购物环境 (4) 产品外包装 (5) check out款台的服务。

11/02/2013

今天跟表弟打电话，他还在找工作阶段，说现在面试的机会比较少了。找工作也是分周期的，一般到年底机会就是比平时要少一点。美国也是这样的。为了能够控制他打游戏的时间，我跟他约定下周四或者周五提交一份他的作品（他是做动漫设计的哦，很酷的职业吧），而且要比上次给我的那份还要好，期待ing!

最近在读：《自控力》作者Kelly McGonigal, Ph.D

提高自控能力的第一步：学会每天做5-15分钟冥想（刚开始可以是5分钟，后来尽快拓展到10-15分钟）。

原地不动，安静坐好。坐在椅子上，双腿平放在地上或者盘坐在垫子上。背挺直，双腿放在膝盖上，冥想时一定不能烦躁，这是自控力的基本保证。
注意你的呼吸。吸气时在脑海中默念“吸”，呼气时在脑海中默念“呼”。当你发现自己走神的时候，将注意力重新集中在呼吸上。这种反复的训练，能够让大脑前皮质处在高速开启的模式，让大脑处理压力和冲动的区域更加稳定。
感受呼吸，弄清楚自己是怎么走神的。几分钟后你可以不再专注于默念“呼”或者“吸”而是专注于呼吸本身，你会注意到空气从鼻子和嘴巴进入和呼出的感觉，感觉到吸气时胸腹部的扩张跟呼气时胸腹部的收缩。

冥想不是让你什么都不想，而是让你不要太分心，不要忘记了最初的目标，如果你在冥想的时候无法集中注意力，别担心，你只需多做练习，将注意力重新集中在呼吸上。
核心思想：意志力实际上是“我要做”，“我不要”，“我想做”三种力量，他们协同努力，让我们变成更好的自己。

6/09/2013

I'm Back!

居然已经有2年没有写过博客了，用“居然”这个词也不准确，其实自己也知道。原因嘛，各种：主要是懒。懒得去想太多，更别提写了，想总是比写快吧。有时候事情在心里一过就过去了，回顾，大抵也是件费尽的事情。写，要落笔，有时候竟然不知用什么词比较好(很久没写中文了)，还好打字方式有拼音联想，要不真是错别字一大堆(估计这就是为啥现在那个“汉字英雄”，“汉字拼写大会”这类节目那么火了)。

好了，不解释鸟，装得像大明星有一帮粉丝望眼欲穿的等着看你博客更新似的。老娘想写了，不行么？

4/23/2009

Parameter Expansion in Bayesian Hierarchical Modeling

This is the project that I did in my Bayeisan Statistics course, below are the snapshots of my part of my paper.

Abstract: Hierarchical model is devoted to facilitate the simultaneous estimation of severalparameters over similar units. However, some problems pertinent to Bayesian hierarchicalmodeling remain unsolved, that is: if the standard deviation for the second layer of hierarchicalmodel (also called between-study standard deviation) has broad peak at zero, somenoninformative prior such as flat uniform prior, IG(0.01,0.01) which are normally adopted inresearch, may lead to insensitivity in the estimation of such smoothing variance. Apart fromthis, in this prior setting, convergence based on EM and Gibbs sampling may become lessconvergence. In this paper, we bring forward a multiplicative parameter-expansion methodto reparameterize hierarchical model in the context of Bayesian inference which facilitatesconvergence and possesses decent properties. Illustrations in terms of simulation will bedelivered to reveal the two-fold essences of this method.

Require for paper"Parameter Expansion in Bayeisan Hierarchical Modeling" ? Simply send an email to stefanie.cao@gmail.com with title "require paper hierarchical modeling".

4/01/2009

Bayesian Changing-Point Analysis on US Stock Price During Financial Crisis

This is the project that I have done in my Bayesian Statistics Course, below are snapshots of my paper.

Abstract: Breaks in stock market are usually motivated by an exogenous changein surrounding economy uctuation that precipitates a change in regres-sion regimes. However, di erent industries react di erently to economyfactors (such as unemployment,ination,prime interest and oil price)bothin response time and degree of severity. In this article, Bayesian Changing-Point analysis has been used to detect change point in stock price in fourindustries ranging from automobile, nance, hi-tech and fast moving con-sume goods. The time span is set to be Jan.1,2007 to Dec.30,2008 whichis commonly believed cover the whole process of nancial crisis. With thismodel, we can check the change point location and corresponding poste-rior probability. Inferences for the regression coeffcients before and afterchanging points indicate alternation in sensitivity to economy factors inthose industries.

Require report: send me email via stefanie.cao@gmail.com with title "Bayesian Change Point"

3/13/2009

Is p value telling the truth?

Consider a series of experimental testing in drug efficiency, denoted by D1,D2, ...,D20.Suppose now we have two hypotheses:H0 : Drug is not efficient. H1 : Drug is efficient.if cut-off point is preset as 0.05, suppose one of the test results is a p.value=0.045, andanother is p.value=0.016, we statistically reject H0, drawing conclusion that both drugsare efficient. As mentioned above, we need to do 20 tests under the same hypothesisstructure. Then, basically, we can obtain 20 p values, suppose all the p values can belisted below:
Table 1 Drug TestingDrug
D1 D2 D3 D4 D5 D6 D7 D8 D9 D10
p value 0.41 0.31 0.049 0.045 0.016 0.21 0.30 0.209 0.102 0.122
Drug D11 D12 D13 D14 D15 D16 D17 D18 D19 D20
p value 0.121 0.003 0.091 0.40 0.273 0.192 0.167 0.311 0.28 0.22
But the problem is how strong is the evidence that the non-efficient drug istruly coming from non-efficient group?
For more details, see my report, require it ? just send email to stefanie.cao@gmail.com with title "require for p value report".

2/16/2009

Is the dataset coming from binomial trials? (let bayesian check)

Source: Bayesian Data Analysis, Andrew Gelman, John B.Carlin etc. (2004, second edition), page 163

Alright, let's start our business.
Consider two dataset: dataset A =(1,1,0,0,0,0,0,1,1,1,1,1,0,0,0,0,0,0,0,0), and dataset B=(1,0,1,0,0,0,1,1,0,0,1,1,1,1,0,0,1,0,1,1).
Question: given dataset A, can we make inference on "theta", the probability that a flipping coin ends up with face.
How does bayesian execute inference?
step 1: assume dataset A comes from binomial distribution (look, this is usually given in textbook problem, in reality, it is not necessarily true, however, people tend to take for granded that dataset A follows binomial distribution)
step 2: figure a prior for "theta". maybe uniform in most cases. make sense, right?
step 3: with prior and likelihood, bayesian can deduce the posterior distribution of "theta".
step 4: done. Give me five....!

However, the dataset A has been modeled as a specified number of iid Bernoulli trials with a uniform prior distribution on the probability of success, say, theta may not actually follow preassumed distribution. I did it a lot without giving even one second of thinking potential assumption really holds. The observed autocorrelation on dataset A is evidence that the model is flawed. To quantify the evidence, we can perform a posterior predictive distributino of T(y^{rep}) by simulation, that is, we assume that dataset A follows iid Bernoulli trials, then we calculate the posterior distribution for the switch.number (the switch.number is the number of times that data change from 0 to 1, or 1 to 0. either way.) and then caculate the probability that posterior switch.number greater than the actually switch.number (in the dataset A , the switch.number is 3). Let's do the experiment! (just for dataset A, as for dataset B, readers, you are smart enough to DIY!)

Following is histogram of simulation of posterior distribution of switch.number.

(Graphic is too small to see ?? just click it, the enlarged one will show in another window)
R program:
y <- c(1,1,0,0,0,0,0,1,1,1,1,1,0,0,0,0,0,0,0,0) # this is the orginal data which is assumed to be binomial distributed, however, whether assumption is valid still pending, leading us to investiage via following program:

set.seed(40)
theta.post <- rbeta(1,sum(y)+1,length(y)-sum(y)+1)

set.seed(40)

y.rep <- array(rbinom(2000,1,theta.post),c(20,100))

switch.number <- array(0,100,1)

for (j in 1:100)

{

for (i in 1:19)

{

if (y.rep[i,j]!=y.rep[i+1,j])

switch.number[j] <- switch.number[j]+1}

}

hist(switch.number,probability=TRUE,breaks=c(1.5:13.5),main = "posterior predictive distribution of number of switches",xlab="switch.number")

lines(density(switch.number),col="red",lwd=3)

abline(v=3, lwd=3,col="blue")

p.value <- sum(switch.number<=3)/100

p.value

result: p.value=0.02 indicating that there is no adequate evidence support the null hypothesis that dataset A comes from binomial distribution.