Wednesday, December 5, 2012

Global Warming Redux

Matt Asher, a popular blogger on statistics has taken a fresh look at global warming.  He has come to the conclusion that based on the temperature data for the last 131 years, as provided by NASA, that the change in temperature could be explained by random noise and most importantly that the trend in temperature is not statistically significant.  He does this by using an approach of randomly sampling annual changes in temperature from a distribution that has a mean of zero and a standard deviation equal to the historic standard deviation in changes in temperature.  In his starting model, he then simulates this approach many times and shows that 56% of the time the simulated  change in temperature over the entire period is more severe than the actual historic change in temperature.  Based on this, he concludes that the change in temperature over the last 131 years could be explained by random fluctuations.
But is he correct?
I decided to test the data myself using a Bayesian approach.

My core model is an moving average time series model.  This model basically says that the temperature in one year equals the temperature in the prior year + random change.  The random change is a function of the excess of the prior year’s temperature over the mean expected temperature for that year.  The model suggests that extreme temperatures in one year are followed by temperatures closer to the mean in the following year. However, the mean temperature can change over time.  The key issue is to test whether the mean  temperature has a positive or negative trend.

I also allow for secular changes in the temperature trend assumption and in the standard deviation assumption.

The formal model is
[Temp in year t] = [temp in year t-1] + c + b(t - (n+1)/2) + theta * (difference between temperature in year t-1 and mean expected temperature in year t-1)
c= annual trend in change in temperature (the Trend Factor)
b= annual change in Trend Factor
theta = regression to the mean factor
t=1,…, n;  n=131
I also assume that the change in temperature has a standard deviation that can change over time (parameter d  - see code at the bottom of the post).
The Bayesian approach treats every parameter above as a random variable.  It estimates a distribution for the value “c” which is the Trend Factor.  I further calculate the probability that c is greater than zero (called “prob.positive.trend”).  The results of this analysis show that the prob.positive.trend is estimated as 97%!
This suggests that there is very strong evidence that there is a secular trend in the change in temperature.
If I simplify the model to:
[Temp in year t] = [temp in year t-1] + c  + theta * (change in temperature in year t-1)   
and assume that the standard deviation of temperature remains constant, I get  a similar result for the prob.positive.trend.
The estimated distributions of each of the parameters are shown below (click to enlarge):
(You will note that “prob.positive.trend” has a bimodal distribution at 0 and 1.  This value is set to 1 if c>0 and 0 if c<0.  97% of the time c>0)
So why do my results contradict Asher’s?  Firstly Asher shows that if you start taking into account correlations from one year to the next that his 56% probability drops to 39%.  Although the way he does this is a little odd and that potentially is the source of the problem.  He samples data in 2 or 3 year batches resulting in the correlation being “broken” after this period.  It also might be that the Bayesian model is more sensitive than the non-parametric approach followed by Asher. Finally, he does not incorporate the regression to the mean feature of changes in temperature.  This is a significant feature in temperature trends. and constrains a "path" of temperatures from deviating too wildly from the mean.
In summary, proponents for global warning are still safe.  The data from NASA appears to strongly support global warming trends.
Please feel free to leave a comment on this page and let me know why you think the two approaches (mine and Asher’s) lead to different conclusions despite using the same data.
The code for doing the calculations is done in R and JAGS.  The R code is:
temps <- read.csv(file='C:\\\\temperature.csv')
#temps$Temp = average annual temperature calculated from NASA data for last 131 years
beta.dat.jags <- list(n=nrow(temps), y=temps$Temp, start.temp = temps$Temp[1] ) <- jags.model("temp.jags",
                       data = beta.dat.jags,
                       n.chains = 2,
                       n.adapt = 500000)
data.samples = coda.samples(,
                            c('theta', 'c', 'b','d','sigma.mean','prob.positive.trend'),
The results produced by the “summary” function are (click to enlarge):

The JAGS code is:
model {
  for (t in 1:n){
    y[t] ~ dnorm(m[t], tau[t])
    tau[t] <- 1/pow(sigma[t],2)
    sigma[t] <- sigma.unadjusted + d * (t - (1+n)/2)
  for (t in 2:n){
    m[t] <- y[t-1] + c + b * (t - (1+n)/2) + theta * eps[t-1]
    eps[t] <- y[t] - m[t]
  m[1]  <- start.temp - eps[1]
  eps[1] ~ dnorm(0,0001)
  theta ~ dnorm(0,0001)
  c ~ dnorm(0,0001)
  b ~ dnorm(0,0001)
  d ~ dnorm(0,0001)
  prob.positive.trend <- step(c)
  sigma.mean <- mean(sigma[])
  sigma.unadjusted ~ dunif(0,100)

Tuesday, March 22, 2011

Breast Cancer Recurrence New York Times Article

There is an interesting article in the New York times about a doctor who refused to discuss the probability of recurrence of breast cancer with one of his patients recovering from breast cancer therapy.  The article is written by the patient’s husband who deems the doctor’s attitude to be quite bizarre.  So far there are 6 pages of comments to this article.  What is notable is that in both the original article and the comments, no one has yet attempted to provide an estimate of the recurrence probability. 

Well, here goes: the following table shows the estimates of the recurrence rates of 5, 10 and 15 year periods.  The results are from an analysis conducted by Freedman et al.

Years after First Occurrence 5 10 15
Local recurrence 2% 5% 7%
Elsewhere recurrence 1% 2% 6%
Contralateral recurrence 13%

“Local recurrence” means occurrence in the same quadrant as the first cancer.  Contralateral means the other breast.

When the NY Times writer asked the doctor what the probability of recurrence was, the doctor replied that it didn’t matter. It would either happen or it wouldn’t.  The writer was stunned by the response.  I am not surprised by the doctor.  The doctor, despite being an breast cancer expert, probably did not know the answer, and if he did, most likely did not understand the significance of the question.  I have discussed on numerous occasions in this blog that the medical profession is generally very weak in understanding probability and risks and this affects its ability to optimize treatment.  The doctor’s answer is yet another example.  Probability should be a required course for medical students.  The risks of a disease of occurring, disappearing or reappearing after a treatment protocol should have a direct impact on deciding about whether to follow the original treatment protocol at all and what kind of monitoring should follow.

Sunday, March 6, 2011

Winning at Rock-Paper-Scissors

The New York Times has a web program that let’s you play Rock-Paper-Scissors online.  The interesting thing about this game is that it can be shown that the most efficient strategy is to to play a completely random game.  If you don’t play a random game, then your opponent can estimate your strategy and eventually begin to predict your moves.  This is the idea behind the the New York Times game, which is frustratingly difficult to beat.


The concept in developing a good R-P-S strategy is to understand that it is difficult for humans to select a truly random sequence.  There is a natural tendency to favor some choices, or  avoid some choices or think that long runs of a choice should not occur, so that the chance of a repeat of the last choice goes down the longer the prior run.  In a random process, the chance of particular choice (e.g. Rock) is independent of how many Rock choices have been made in the past run.  There are all sorts of other subtleties that are discussed by the R-P-S community (yes, there is  such a thing together with organized competition).  These include:

  1. Men tend to lead with Rock
  2. People who know how to play R-P-S, know rule 1, so they tend to lead with paper
  3. Women tend to lead with scissors.
  4. Scissors is chosen less than the others, on average
  5. People tend to switch to the last move that beat them.
  6. and so on….


Thus if you can estimate the probability that a player is following a a non-random strategy, then you can develop a winning strategy.  Of course by developing a winning strategy, your opponent can theoretically figure out a strategy to beat you.  This sort of recursiveness could go on forever….   

Saturday, January 15, 2011

The Mismeasure of Man

Here is a fascinating discussion by Ralph Horwitz, MD, professor of medicine at Stanford, on the subtleties of statistics in medicine:

Sunday, January 2, 2011

Age and Happiness

A couple of weeks back, The Economist magazine published a cover article on age and happiness.  The idea of the  article was based on the following chart:

The article went on to say that the level of happiness tends show a “U” shape and improves with age from about 45 onwards.  Andrew Gellman, a statistician at Columbia University, looked at some of the same data and produced the following graph:


He concludes that the that the “U” shape is not nearly as clear as The Economist suggests.  Reference to other studies suggests that the “U” shape might have more to do with weak scientific method.


I think for me, the more important question, is what really is “happiness”.  For a fun read, try Happier by Tal Ben-Shahar,

Tuesday, October 19, 2010

Vivitrol, the FDA and Quitters, Inc.

If you are a fan of Stephen King, you know how easy it is to get sucked into the strange and perverted worlds that he creates. It happens word by word, sentence by sentence.  Sooner or later you find a general sense of uneasiness and then queasiness creeping into you.  When this happens to me, I sometimes need to remind myself that this is just fiction and would never happen in reality.  And with this mental safety line, I can pull myself to comfort.  It’s fiction not reality, fiction, not reality…. 

Perhaps one of my favorite Stephen King stories is Quitter’s, Inc.   The main character is Dick Morrison, a man who is suffering from overeating, working too hard and smoking.  At a friend’s recommendation, he signs a contract with a company called Quitter’s Inc. which promises that he will never smoke again.  He soon finds out that the company is run by a mobster and their strategy is simple:  if Morrison is to smoke again, Quitter’s Inc. will do severe harm to those he loves most. Perhaps torture, maybe cutting off a loved one’s finger or worse.  Reading this, I grabbed my mental safety line and left this disturbing story tucked neatly away in my subconscious, where it can do no harm.

That is until I stumbled upon the FDA and Vivitrol.  A few days ago, I was emailed an article from the New England Journal of Medicine suggesting that the FDA needs to regulate organizations providing genetic screening tests.  The essence of the article was that individuals should not be trusted with information about their own health.  That should be left up to the FDA and such individuals’ physicians, because, presumably, they know better.  So I decided to go to the FDA website and see what it is that they do and whether they had my best interests in mind. I began by looking at a page listing drugs that have been recently approved and I picked Vivitrol because it sounded nice.  It turns out that Vivitrol was approved a few years back, but has just received a modified and expanded indication approval.  Vivitrol or naltrexone is used for treating alcohol dependence and more recently for opioid use.

For the original FDA approval process, Vivitrol’s manufacturer tried to make the case to the FDA that Vivitrol has a positive impact for alcoholics by reducing both the frequency and amount of alcohol used by patients.  An “event rate of heavy drinking” measure was used as an index to reflect both the incidence and severity of drinking.  The FDA felt that, whereas the index was novel, it was too complex to understand and did not match what treating physicians might consider  a desirable outcome.  Instead, the FDA proposed that the target measure should be a reduction to zero days of heavy drinking.  So far so good.  The FDA seems to be on track and diligent about its process.  But here is where things get weird.  Heavy drinking is defined by the FDA has having five or more drinks a day for men and four or more for women.  So if you are an male alcoholic, and you reduce your intake to four drinks a day, that is considered a success!  This seems to me more like treading water than success and intuitively, the four-drinks-a-day patient is likely to fall off the wagon in the near future.  Thus even before we do a lick of testing, the question of the appropriateness of the target for a successful outcome is surely open to debate.

The FDA then analyzed the results of Vivitrol on patients who were abstaining from drinking when they started the test program and patients who were still drinking at the start of the program.  The overwhelming majority of patients fall in the latter category. The FDA, in its approval,  concluded that Vivitrol had no positive impact on the latter group, but that it does positively effect the former group.  The FDA states:

…the proportion of patients meeting the definition of treatment success greatly increased, and a difference between Vivitrol and placebo groups was suggested for patients abstinent at baseline.  The efficacy results are summarized in the table below:

Actual number of heavy drinking days per month N (%)
  Placebo 190 mg 380 mg
All patients (abstinent and non-abstinent at baseline)  


11 (5%)

15 (7%)

14 (7%)

Patients abstinent at baseline    


2 (11%)

6 (35%)

6 (35%)


The table refers to the drug study test used by the FDA to assess the drug.  So what does this table mean?  “N” is the number of positive outcomes, the percentages represent the ratio of N to the number total patients with positive and negative outcomes, “190 mg” is the first dose option for Vivitrol and “380 mg” is the second dose option for Vivitrol.  Some more digging showed that  of those that took Vivitrol in the test at the 380 mg dose, only 17 fell into the category “Patients  at abstinent baseline” (i.e. not drinking alcohol at the start of the test period) and who were prescribed the 380 mg dose.  Of these 6 or 35% were deemed successful outcomes and did not drink “heavily” during the trial period.  This compares to the Placebo group from which 2 or 11% did not drink “heavily”.  Since the FDA only approved the drug for people in this “abstinent at baseline” category, it is apparent that the approval process for the 380 mg dose is solely based on a single trial of 17 patients “abstinent at baseline”.    Four more patients in this group abstained from “heavy” drinking compared to the Placebo group, and voila, the drug is approved.  Four patients.  Four. One, two, three, four. 4.  IV. That’s all it took to get this drug approved with a stamp of approval from the FDA that Vivitrol “greatly” increases success. 

Even with this scant evidence, let’s test whether the FDA’s conclusions are reasonable.  We need to look no further than the FDA’s own statistician who reviewed the drug test study.  The statistician wrote a dense 38 page paper of which only a single paragraph discusses the “special / subgroup” of patients that were “abstinent at baseline”.  The paragraph states, “Since the number of patients abstinent at baseline was very small, I explored the possible effect of treatment on the subgroup …. without any attempt to draw a formal statistical inference.”  The statistician then produces a lengthier version of the table shown above. 

The FDA’s approval states the drug “greatly” increased success.  But the FDA’s own statistician states  the sample is too small and no “formal statistical inference” is even attempted. With a bit of magic, the FDA turned its own statistician’s analysis upside down to rationalize the approval of this drug.

Now we start entering into Stephen King territory.  For the next thing that I did was go to the Vivitrol’s website and read the “Highlights of Prescribing Information”.  The document warns of potential side effects of the drug, including:

  • Hepatoxicity
  • Injection site reaction
  • Eosinophilic pneumonia
  • Hypersensitivity including anaphylaxis
  • Anorexia
  • Somnolence
  • Depression and suicidality

The document also tells you that “Any attempt by a patient to overcome the blockade produced by VIVITROL by taking opioids is very dangerous and may lead to fatal overdose.”  Is this the equivalent of “Fall off the wagon and die”?  I believe it was after reading this that Quitter’s Inc. uncomfortably bubbled up from my subconscious.  Vivitrol’s website is also kind enough to tell you how often one might expect a side effect.  Here are some of them:

  • Nausea – 33% of patients in the trial
  • Injection site conditions – 69%
  • Headache – 25%

What can be done about this sorry situation?  Most importantly, the drug efficacy study should be done in conjunction with analyzing the potential dangers and side effects of the drug, rather than as two separate studies.  I have discussed a conceptual model of ranking  good vs. bad outcomes here.  Secondly, particularly when dealing with behavioral issues such as abstaining from drinking, I propose a new approach:  the anti-placebo.  Whereas a placebo is designed to do nothing, an anti-placebo is designed to do something.  In particular, it should mimic the side effects of the drug under investigation and cause a bit of pain and anguish.  For example, how about creating a control group and giving each person in the group something like a shot of dishwashing liquid, or a sucker punch in the gut?  Yes, this does have severe side effects, but would you really want to go boozing it it up with your buddies at the local hot spot, if your stomach were in pain?  This is a barbaric test, yet how much more barbaric is it than subjecting a patient to hepatoxicity, pneumonia, and suicidality, not to mention the potential risk of a “fatal overdose” if the patient falls off the opioid wagon?  A painful belly does not seem too bad when compared to death or suicide.   Simply put, perhaps Vivitrol’s limited positive impact has more to do with making its users sick and unmotivated to drink than any biomechanical mechanism which causes a reduction in alcohol intake? This issue was not pursued by the FDA, but should have been.

All this brings me back to my original quest of getting a better understanding of the FDA and whether it has my best interests in mind. I will let the above analysis speak for itself.

The FDA is not run by the mob. It is run by the government.  Given the statistical evidence of the efficacy of this drug, the line in Vivitrol’s Highlights of Prescribing Information, “Any attempt by a patient to overcome the blockade produced by VIVITROL by taking opioids is very dangerous and may lead to fatal overdose,” could have been written by Stephen King.  I shudder now and reach for my mental safety line, but all I grab is “It’s reality not fiction, reality not fiction…”