Tuesday, October 19, 2010

Vivitrol, the FDA and Quitters, Inc.

If you are a fan of Stephen King, you know how easy it is to get sucked into the strange and perverted worlds that he creates. It happens word by word, sentence by sentence.  Sooner or later you find a general sense of uneasiness and then queasiness creeping into you.  When this happens to me, I sometimes need to remind myself that this is just fiction and would never happen in reality.  And with this mental safety line, I can pull myself to comfort.  It’s fiction not reality, fiction, not reality…. 

Perhaps one of my favorite Stephen King stories is Quitter’s, Inc.   The main character is Dick Morrison, a man who is suffering from overeating, working too hard and smoking.  At a friend’s recommendation, he signs a contract with a company called Quitter’s Inc. which promises that he will never smoke again.  He soon finds out that the company is run by a mobster and their strategy is simple:  if Morrison is to smoke again, Quitter’s Inc. will do severe harm to those he loves most. Perhaps torture, maybe cutting off a loved one’s finger or worse.  Reading this, I grabbed my mental safety line and left this disturbing story tucked neatly away in my subconscious, where it can do no harm.

That is until I stumbled upon the FDA and Vivitrol.  A few days ago, I was emailed an article from the New England Journal of Medicine suggesting that the FDA needs to regulate organizations providing genetic screening tests.  The essence of the article was that individuals should not be trusted with information about their own health.  That should be left up to the FDA and such individuals’ physicians, because, presumably, they know better.  So I decided to go to the FDA website and see what it is that they do and whether they had my best interests in mind. I began by looking at a page listing drugs that have been recently approved and I picked Vivitrol because it sounded nice.  It turns out that Vivitrol was approved a few years back, but has just received a modified and expanded indication approval.  Vivitrol or naltrexone is used for treating alcohol dependence and more recently for opioid use.

For the original FDA approval process, Vivitrol’s manufacturer tried to make the case to the FDA that Vivitrol has a positive impact for alcoholics by reducing both the frequency and amount of alcohol used by patients.  An “event rate of heavy drinking” measure was used as an index to reflect both the incidence and severity of drinking.  The FDA felt that, whereas the index was novel, it was too complex to understand and did not match what treating physicians might consider  a desirable outcome.  Instead, the FDA proposed that the target measure should be a reduction to zero days of heavy drinking.  So far so good.  The FDA seems to be on track and diligent about its process.  But here is where things get weird.  Heavy drinking is defined by the FDA has having five or more drinks a day for men and four or more for women.  So if you are an male alcoholic, and you reduce your intake to four drinks a day, that is considered a success!  This seems to me more like treading water than success and intuitively, the four-drinks-a-day patient is likely to fall off the wagon in the near future.  Thus even before we do a lick of testing, the question of the appropriateness of the target for a successful outcome is surely open to debate.

The FDA then analyzed the results of Vivitrol on patients who were abstaining from drinking when they started the test program and patients who were still drinking at the start of the program.  The overwhelming majority of patients fall in the latter category. The FDA, in its approval,  concluded that Vivitrol had no positive impact on the latter group, but that it does positively effect the former group.  The FDA states:

…the proportion of patients meeting the definition of treatment success greatly increased, and a difference between Vivitrol and placebo groups was suggested for patients abstinent at baseline.  The efficacy results are summarized in the table below:

Actual number of heavy drinking days per month N (%)
  Placebo 190 mg 380 mg
All patients (abstinent and non-abstinent at baseline)  


11 (5%)

15 (7%)

14 (7%)

Patients abstinent at baseline    


2 (11%)

6 (35%)

6 (35%)


The table refers to the drug study test used by the FDA to assess the drug.  So what does this table mean?  “N” is the number of positive outcomes, the percentages represent the ratio of N to the number total patients with positive and negative outcomes, “190 mg” is the first dose option for Vivitrol and “380 mg” is the second dose option for Vivitrol.  Some more digging showed that  of those that took Vivitrol in the test at the 380 mg dose, only 17 fell into the category “Patients  at abstinent baseline” (i.e. not drinking alcohol at the start of the test period) and who were prescribed the 380 mg dose.  Of these 6 or 35% were deemed successful outcomes and did not drink “heavily” during the trial period.  This compares to the Placebo group from which 2 or 11% did not drink “heavily”.  Since the FDA only approved the drug for people in this “abstinent at baseline” category, it is apparent that the approval process for the 380 mg dose is solely based on a single trial of 17 patients “abstinent at baseline”.    Four more patients in this group abstained from “heavy” drinking compared to the Placebo group, and voila, the drug is approved.  Four patients.  Four. One, two, three, four. 4.  IV. That’s all it took to get this drug approved with a stamp of approval from the FDA that Vivitrol “greatly” increases success. 

Even with this scant evidence, let’s test whether the FDA’s conclusions are reasonable.  We need to look no further than the FDA’s own statistician who reviewed the drug test study.  The statistician wrote a dense 38 page paper of which only a single paragraph discusses the “special / subgroup” of patients that were “abstinent at baseline”.  The paragraph states, “Since the number of patients abstinent at baseline was very small, I explored the possible effect of treatment on the subgroup …. without any attempt to draw a formal statistical inference.”  The statistician then produces a lengthier version of the table shown above. 

The FDA’s approval states the drug “greatly” increased success.  But the FDA’s own statistician states  the sample is too small and no “formal statistical inference” is even attempted. With a bit of magic, the FDA turned its own statistician’s analysis upside down to rationalize the approval of this drug.

Now we start entering into Stephen King territory.  For the next thing that I did was go to the Vivitrol’s website and read the “Highlights of Prescribing Information”.  The document warns of potential side effects of the drug, including:

  • Hepatoxicity
  • Injection site reaction
  • Eosinophilic pneumonia
  • Hypersensitivity including anaphylaxis
  • Anorexia
  • Somnolence
  • Depression and suicidality

The document also tells you that “Any attempt by a patient to overcome the blockade produced by VIVITROL by taking opioids is very dangerous and may lead to fatal overdose.”  Is this the equivalent of “Fall off the wagon and die”?  I believe it was after reading this that Quitter’s Inc. uncomfortably bubbled up from my subconscious.  Vivitrol’s website is also kind enough to tell you how often one might expect a side effect.  Here are some of them:

  • Nausea – 33% of patients in the trial
  • Injection site conditions – 69%
  • Headache – 25%

What can be done about this sorry situation?  Most importantly, the drug efficacy study should be done in conjunction with analyzing the potential dangers and side effects of the drug, rather than as two separate studies.  I have discussed a conceptual model of ranking  good vs. bad outcomes here.  Secondly, particularly when dealing with behavioral issues such as abstaining from drinking, I propose a new approach:  the anti-placebo.  Whereas a placebo is designed to do nothing, an anti-placebo is designed to do something.  In particular, it should mimic the side effects of the drug under investigation and cause a bit of pain and anguish.  For example, how about creating a control group and giving each person in the group something like a shot of dishwashing liquid, or a sucker punch in the gut?  Yes, this does have severe side effects, but would you really want to go boozing it it up with your buddies at the local hot spot, if your stomach were in pain?  This is a barbaric test, yet how much more barbaric is it than subjecting a patient to hepatoxicity, pneumonia, and suicidality, not to mention the potential risk of a “fatal overdose” if the patient falls off the opioid wagon?  A painful belly does not seem too bad when compared to death or suicide.   Simply put, perhaps Vivitrol’s limited positive impact has more to do with making its users sick and unmotivated to drink than any biomechanical mechanism which causes a reduction in alcohol intake? This issue was not pursued by the FDA, but should have been.

All this brings me back to my original quest of getting a better understanding of the FDA and whether it has my best interests in mind. I will let the above analysis speak for itself.

The FDA is not run by the mob. It is run by the government.  Given the statistical evidence of the efficacy of this drug, the line in Vivitrol’s Highlights of Prescribing Information, “Any attempt by a patient to overcome the blockade produced by VIVITROL by taking opioids is very dangerous and may lead to fatal overdose,” could have been written by Stephen King.  I shudder now and reach for my mental safety line, but all I grab is “It’s reality not fiction, reality not fiction…”     

Wednesday, September 8, 2010

Tossing Coins on WolframAlpha

For those of you that are not familiar with WolframAlpha, it is a wonderful website that is part search engine, part super computer.   The site allows you to enter mathematical or scientific related questions on all sorts of subjects.

Take for instance, doing a search for “Coin Toss.”  The site gives you a bunch of interesting probabilities associated with tossing coins as well as other related probability questions.  One of the related items is the probability of winning the Powerball lottery.  The probability of winning the grand prize is 1 in 195 249 054.  Does this mean that it is a good bet to buy a Powerball ticket if the prize exceeds $195 million? Would it be a good financial decision to buy a ticket if the payout is $365 million, the maximum that Powerball has ever reached? 

It turns out that providing a satisfactory answer to this question needs a lot more than just probability analysis.  I will be providing some answers in the near future, but let me know your thoughts in the meantime.

Friday, August 27, 2010

Expected Moral Value

“CAMBRIDGE, Mass.—Scientists at Harvard University have found that humans can make difficult moral decisions using the same brain circuits as those used in more mundane choices related to money and food...

It seems that our capacity for complex, life-and-death decisions depends on brain structures that originally evolved for making more basic, self-interested decisions about things like obtaining calories," says Shenhav, a doctoral student in psychology at Harvard. "Many of the brain regions we find to be active in major moral decisions have been shown to perform similar functions when people and animals make commonplace decisions about ordinary goods such as money and food”

Do humans make complex moral decisions with the same neurological processes used to determine whether we should get fries or mashed potatoes with our meal?  See US News & World Report for more on this.

Sunday, August 1, 2010

Common Sense

Pierre Laplace: "The theory of probabilities is basically just common sense reduced to calculus; it makes one appreciate with exactness that which accurate minds feel with a sort of instinct, often without being able to account for it."

Thursday, July 22, 2010

New York Times on the Probability of Rare Events

The New York Times has published a short article about the commonness of rare events.  The article discusses the fraudulent scheme of a psychic sending out emails predicting the outcome of a ball game.  After 8 successful ballgame predictions, the psychic asks for $10 before providing you the next prediction for the ninth game.  The article states that the psychic is guessing, so how did he do it? 

This is a version  of a well known scheme that works like this.  The psychic sends out batches of 256 emails, 50% of them predicting the first team will win, 50% predicting the second will win for each ball game.  Half of the recipients receive a correct answer and half receive the wrong answer.  The next week, the psychic sends an email only to those that received the correct answer in the first week.  Again half of these are told the first team will win, half are told the second team will win.  This continues for eight weeks.  At the end of the eight weeks, one recipient will have received 8 correct predictions and a request for $10 for the next prediction.If the psychic starts with 25,600, emails, after 8 ballgames, there will be 100 recipients who have received perfect predictions.

The psychic’s success rate can be dramatically improved if he weights his predictions to the favorites for each game.  For example, suppose the average game has one team with betting odds of winning of 1.5:1, and that these odds are a reasonable predictor of the outcome.  This means that the psychic only has to send out 59.5 emails (instead of 256) on average to produce one recipient with 8 good predictions.  This is calculated as 59.5*(1.5/(1.5+1))^8 = 1.  If the psychic is choosing only one game from a weekly roster of games, he could choose even more lob-sided games and further improve the chance of producing a winning series of predictions.

A version of this occurs naturally in the financial markets, although with less of an intent to deceive.  Assume that there are 1,024 traders.  If each makes a key investment decision every 6 months, over a four year period, there will be roughly four that have a perfect record, assuming that each correct decision occurs with a probability of 50%.  These four traders are likely to be considered Wall Street geniuses.

Now real life is always more complicated than one might expect.  With traders, there is a strong likelihood that at least some of them are smarter than the rest and are more likely to make good decisions. Let’s assume that “Good” traders are 60% correct in their decisions, “Average” traders are 50% correct, and “Poor” traders are 40% correct.  Let’s further assume that the split of Good, Average and Poor traders is 10%, 80%, 10%.  The question that is most interesting is: if you come across a trader with 8 correct decisions, what is the probability that that trader comes from the “Good” category.  This is the key decision an investor must make when  looking to place his or her funds with an investment manager or trader (or a manager must make before paying out fat trader bonuses).

This can be solved in the following way:

Trader Strength Probability of correct decision Proportion of traders Probability of 8 correct decisions Percentage of decisions which are correct Probability Good Decision came from the selected Class
Column a b c d e
Good 60% 10% 1.7% 0.2% 34.5%
Average 50% 80% 0.4% 0.3% 64.2%
Poor 40% 10% 0.1% 0.0% 1.3%
Total   100%   0.5% 100.0%

Column c = a^8

Column d =b * c

Column e = d / (Total column d)

The model suggests that if you come across a trader with an astounding run of 8 correct decisions, you only have a 34.5% chance that that trader is “Good” and not “Average” or “Poor”.

Sunday, July 18, 2010

Success and Failure in Medical Testing

There is nothing quite as unsettling or at times scary as getting back a medical test result that suggests that you have a particular disease or one of your bodily parts is not functioning correctly. Your despair is undoubtedly deepened when your doctor inevitably tells you that the test that has uncovered your sad predicament is in fact very accurate. “97% accurate,” the doctor might say.

Such bad news is faced by people everyday. Yet a little understanding of probability can restore some hope to the situation. This is particularly the case for tests that are conducted for unusual diseases or for which you have no history of potential exposure. So how do you figure out what your real risk is?

The first thing to notice is that the doctor is quoting the accuracy of turning up a false positive, namely that the test indicates that you have the disease when in fact you don’t. What is not being quoted is the conditional probability of given that you have a positive test result, what is the probability that you have the disease. This might sound like the same thing, but in fact, the two probabilities are very different.

To make further headway into the issue, it is important to know what the probability of occurrence of the stated disease is. Let’s take as an example testing for cancer. The following example uses made up assumptions to demonstrate the point, and would not represent actual risks.

Say the cancer test says that you do not have cancer 97% of the time, when in fact you do not, and says that you do have cancer 99% of the time, when in fact you do have cancer. The first measure (97%) is called the “specificity” of the test. The second measure (99%), is called the sensitivity of the test.

Now assume that two in a hundred people of the test taking age do in fact have cancer. Armed with this information, we can now find the solution with the help of a magic probability box. The box looks at the intersection of having and not having cancer, with the results of the test.

We start by cross multiplying the proportion of the population having and not having cancer with the sensitivity and specificity of the test, respectively.

Have Cancer No Cancer Total
Test says “No Cancer” 2% x
98% x

Test says “Cancer” 2% x
98% x


Next we calculate the results and add up the rows and columns.

Have Cancer No Cancer Total
Test says “No Cancer” 0.02% 95.06% 95.08%
Test says “Cancer” 1.98% 2.94% 4.92%
Total 2.00% 98% 100.00%

Note that the bottom row shows the proportion of the population having and not having cancer and that the total in the two calculated columns and rows both add up to 100%. That is, the four calculated cells add up to 100% and these four cells represent the entire universe of “event” outcomes.

Finally, we are ready to answer the question: If the test says cancer, do you in fact have cancer. The probability of this is

= Prob(having cancer and the test says you have cancer) / Prob(test says you have cancer)

= 1.98% / 4.92%

= 40.24%

Now what is the probability that if the test says cancer, you do not have cancer. The probability of this is

= Prob(not having cancer and the test says you have cancer) / Prob(test says you have cancer

= 2.94% / 4.92%

= 59.75%

We are left with a remarkably counter-intuitive result: You are more likely not to have cancer, than to have cancer despite the so called “accuracy” (sensitivity and specificity) of the test appearing to be quite high, namely 99% and 97%. Why does this happen? It occurs because the low incidence of the disease in the population is in fact lower than the error rate of the test. The test is more likely to be erratic than for you to have cancer.

The implications of this simple example are profound in how we conduct our lives. There is of course the unnecessary worry and concern that results of medical tests produce. More problematic is if the conduct of the test has a cost. This is particularly the case where the cost is not just monetary but physical.

Take for instance the test for Down Syndrome with the use of an Amniocentesis. An amniocentesis collects amniotic fluid, the fluid in the womb, and tests the fetal cells for Down Syndrome. The test is done in the 16-18th week of pregnancy. The test increases the risk of miscarriage by about 0.75%. The test is often recommended for pregnant women 35 years and older. The risk of Down increases with age and is about 1/250 at the time of the test for a 35 year old and 1/20 for a 45 year old.

Let’s assume we have 250 woman of the same age and risk class taking the Amniocentesis and we want to find out the probability that the test will be successful. The question is how to define success. This might be as much a qualitative as a quantitative issue, but one way to do it might be:

  • Success = discover Down, Down exists
  • Failure = Do not have Down but have a miscarriage as a result of the test
  • Neutral = Other outcomes are deemed to be “neutral” and we take the opinion that these are neither good nor bad. This category relates to pregnant woman who do not have a Down pregnancy and do not have a miscarriage from the test.

To simplify matters, we assume that the sensitivity and specificity of the test are both 100%. Now we create a new magic probability box and look at the risks for a 35 year old, making the following assumptions:

  • probability of a miscarriage from test for all pregnancies: 0.75%
  • probability of Down in population of 35 year-old pregnant woman at time of test: 1/250

Miscarriage from Test No Miscarriage from Test Total
Down 0.75% (1/250) 99.25% (1/250) 1/250
No Down 0.75% (249/250) 99.25% (249/250) 249/250

Completing the table gives:

Miscarriage from Test No Miscarriage from Test Total
Down 0.00003 0.00397 0.004
No Down 0.00747 0.98853 0.996
Total 0.0075 0.9925 1.000

For a group of 250 women, we have

  • Successes = 250*0.004 =1
  • Failures = 250*0.00747 = 1.8675 = miscarriage for women without Down fetus
  • Total successes and failures = 1 + 1.8675 = 2.8675

We note that there are more unnecessary miscarriages than Down discoveries!

We now seek the probability of “success” of the test, given that the test produces either a success or failure, according to our prior description.

  • Prob (success, given a success or failure in the test)
    • = 1 / (1+1.8675)
    • = 34.9%
  • Prob (Failure, given a success or failure in the test)
    • = 1.8675 / (1+1.8675)
    • = 65.1%

The result is quite dramatic if you consider that failures result in an unnecessary miscarriage, and leads to the question whether the test produces more harm than good. To answer that question, probability calculations alone will not suffice.

For a 45 year old, the success probability increases to 87.5%.

The advice most doctors give their patients on deciding whether to do the test is to quote the extra risk of miscarriage of 0.75%. This seems very small. However, when you translate this small extra risk to the conditional probability of success or failure, the matter becomes far from clear. The issues become even more complex when you make a value judgment of whether the benefit of a discovery of a Down pregnancy is a better or worse outcome than the causation of miscarriage.

DISCLAIMER: The above analysis does not recommend you should not get tested for cancer (where I made up the probabilities above), or not have an amniocentesis. The above is just a simplified model of how you can think about these matters, and could be quite inappropriate for your circumstances. Individuals should discuss these matters with their doctors who might be aware of many other factors. You should not rely on the simplified models only when making life altering decisions. Speak to your doctor!!

Saturday, July 17, 2010

Random Variables, Probabilities and the Power of the Unknown

I have started this blog as a recreational investigation into the world of random variables, randomness and probabilities. I will be exploring interesting ways in which probability and randomness affect our lives and will be looking at ways of solving easy, difficult and unusual probability questions. The blog will explore ways of predicting the future, and shedding light on the unknown. Probability techniques will be discussed as a way of improving critical thinking and uncovering flaws in our day-to-day decision making processes.

I hope this blog will be the product of a collaborative process between me and the blog's readers. As such, I encourage all readers to contribute ideas, post questions and challenges, offer new ideas and solutions.

Feel free to email me at x@randomvariable.us with any thoughts or questions.