Was ist der Unterschied zwischen McNemars Test und dem Chi-Quadrat-Test und woher wissen Sie, wann Sie welche verwenden müssen?


30

Ich habe versucht, in verschiedenen Quellen nachzulesen, bin mir aber immer noch nicht sicher, welcher Test für meinen Fall geeignet wäre. Es gibt drei verschiedene Fragen, die ich zu meinem Datensatz stelle:

  1. Die Probanden werden zu verschiedenen Zeiten auf Infektionen mit X getestet. Ich möchte wissen, ob die Anteile von Positiv für X danach mit den Anteilen von Positiv für X zuvor zusammenhängen:

                 After   
               |no  |yes|
    Before|No  |1157|35 |
          |Yes |220 |13 |
    
    results of chi-squared test: 
    Chi^2 =  4.183     d.f. =  1     p =  0.04082 
    
    results of McNemar's test: 
    Chi^2 =  134.2     d.f. =  1     p =  4.901e-31

    Da es sich bei den Daten meines Wissens um wiederholte Messungen handelt, muss ich den McNemar-Test verwenden, mit dem geprüft wird, ob sich der Anteil der positiven Werte für X geändert hat.

    Aber meine Fragen scheinen den Chi-Quadrat-Test zu benötigen - zu testen, ob der Anteil des Positiven für X danach mit dem Anteil des Positiven für X zuvor zusammenhängt.

    Ich bin mir nicht mal sicher, ob ich den Unterschied zwischen McNemars Test und dem Chi-Quadrat richtig verstehe. Was wäre der richtige Test, wenn meine Frage lauten würde: "Ist der Anteil der mit X nach infizierten Personen anders als zuvor?"

  2. Ein ähnlicher Fall, bei dem ich anstelle von vorher und nachher zwei verschiedene Infektionen zu einem Zeitpunkt messe:

            Y   
          |no  |yes|
    X|No  |1157|35 |
     |Yes |220 |13 |

    Welcher Test wäre hier richtig, wenn die Frage lautet: "Beziehen sich höhere Anteile einer Infektion auf höhere Anteile von Y?"

  3. Wenn meine Frage lautete: "Bezieht sich die Infektion Y zum Zeitpunkt t2 auf die Infektion X zum Zeitpunkt t1?", Welcher Test wäre angemessen?

                  Y at t2   
                |no  |yes|
    X at t1|No  |1157|35 |
           |Yes |220 |13 |

In all diesen Fällen habe ich den McNemar-Test verwendet, aber ich habe meine Zweifel, ob dies der richtige Test ist, um meine Fragen zu beantworten. Ich verwende R. Kann ich glmstattdessen ein Binomial verwenden? Wäre das analog zum Chi-Quadrat-Test?


1
Haben Sie versucht, stats.stackexchange.com/questions/tagged/mcnemar-test-Themen hier auf Mcnemar-Test zu lesen ?
TTNPHNS

Was meinst du mit "Beziehung zwischen zwei Wahrscheinlichkeiten"?
Michael M

@ttnphns Ich habe sie durchgesehen, konnte sie aber nicht in meine Frage umformulieren. Nach längerem Nachdenken kann ich anscheinend zwei Fragen beantworten, die auf dem Q1 basieren: Chi-sq würde mir sagen, ob der Anteil von + ve X after mit dem Anteil von + ve X before zusammenhängt, während Mcnemar mir sagen würde, ob es a gegeben hat Änderung der Proportionen. Habe ich recht?
Anto

Sie können hier keinen standardmäßigen Unabhängigkeitstest verwenden, da jede Person durch zwei Werte dargestellt wird, die nicht-zufällige Stichproben verursachen. χ2
Michael M

Danke @MichaelMayer. Ich habe mcnemar benutzt, bis ich das gesehen habe . Wo der Mcnemar erklärt wird, sagt er, was das Tun eines Chi-sq im gleichen Fall beantworten würde. Ich bin ziemlich ratlos. Wie jeder Test auf dieser Seite aussagt, muss ich mich für das Chi-Quadrat entscheiden, aber da es sich um Messungen zum selben Thema handelt, muss ich mich für McNemar entscheiden!
Anto

Antworten:


48

Es ist sehr bedauerlich, dass McNemars Test für die Menschen so schwer zu verstehen ist. Ich stelle sogar fest, dass am oberen Rand der Wikipedia-Seite angegeben ist, dass die Erklärung auf der Seite für die Menschen schwer zu verstehen ist. Die typische kurze Erklärung für McNemars Test ist entweder, dass es sich um einen "Innersubjekt-Chi-Quadrat-Test" handelt, oder dass es sich um einen "Test der marginalen Homogenität einer Kontingenztabelle" handelt. Ich finde beides nicht sehr hilfreich. Erstens ist es nicht klar, was mit "Chi-Quadrat" innerhalb von Subjekten gemeint ist, da Sie Ihre Subjekte immer zweimal messen (einmal für jede Variable) und versuchen, die Beziehung zwischen diesen Variablen zu bestimmen. Darüber hinaus "marginale Homogenität" (Tragischerweise kann auch diese Antwort verwirrend sein. Wenn dies der Fall ist, kann es hilfreich sein, meinen zweiten Versuch weiter unten zu lesen .)

Sehen wir uns an, ob wir ein Argumentationsverfahren für Ihr Top-Beispiel durchführen können, um herauszufinden, ob (und wenn ja, warum) der McNemar-Test angemessen ist. Sie haben gesetzt:

Bildbeschreibung hier eingeben

Dies ist eine Kontingenztabelle, also eine Chi-Quadrat-Analyse. Außerdem wollen Sie die Beziehung zwischen verstehen und A f t e r , und der Chi-Quadrat - Test prüft , ob eine Beziehung zwischen den Variablen, so auf dem ersten Blick scheint es , wie der Chi-Quadrat - Test sein muss die Analyse, die Ihre Frage beantwortet. BeforeAfter

Es sei jedoch darauf hingewiesen, dass wir diese Daten auch wie folgt darstellen können:

Bildbeschreibung hier eingeben

Wenn Sie die Daten auf diese Weise betrachten, könnten Sie denken, Sie könnten einen regulären alten Test durchführen. Aber ein t- Test ist nicht ganz richtig. Es gibt zwei Probleme: Erstens, da in jeder Zeile Daten aufgelistet sind, die von demselben Subjekt gemessen wurden, möchten wir keinen Zwischensubjekt- T- Test durchführen, sondern einen Innersubjekt- T- Test. Zweitens ist die Varianz eine Funktion des Mittelwerts , da diese Daten als Binom verteilt sind . Dies bedeutet, dass es keine zusätzliche Unsicherheit gibt, über die Sie sich Sorgen machen müssen, sobald der Stichprobenmittelwert geschätzt wurde (dh Sie müssen die Varianz nicht nachträglich schätzen). Sie müssen sich also nicht auf die t- Verteilung beziehen, sondern können die verwenden ztttttzVerteilung. (Um mehr darüber zu erfahren, kann es hilfreich sein, meine Antwort hier zu lesen: Der Test im Vergleich zum χ 2- Testzχ2 .) Daher würden wir einen Test innerhalb der Probanden benötigen . Das heißt, wir brauchen einen Innersubjekttest auf Gleichheit der Proportionen. z

BeforeAfterBeforeAfter are independent, you almost certainly want to know if the treatment works (a question chi-squared does not answer). This is very similar to any number of treatment vs. control studies where you want to see if the means are equal, except that in this case your measurements are yes/no and they are within-subjects. Consider a more typical t-test situation with blood pressure measured before and after some treatment. Those whose bp was above your sample average beforehand will almost certainly tend to be among the higher bps afterwards, but you don't want to know about the consistency of the rankings, you want to know if the treatment led to a change in mean bp. Your situation here is directly analogous. Specifically, you want to run a within-subjects z-test of equality of proportions. That is what McNemar's test is.

So, having realized that we want to conduct McNemar's test, how does it work? Running a between-subjects z-test is easy, but how do we run a within-subjects version? The key to understanding how to do a within-subjects test of proportions is to examine the contingency table, which decomposes the proportions:

AfterNoYestotalBeforeNo1157351192Yes22013233total1377481425
Obviously the Before proportions are the row totals divided by the overall total, and the After proportions are the column totals divided by overall total. When we look at the contingency table we can see that those are, for example:
Before proportion yes=220+131425,After proportion yes=35+131425
What is interesting to note here is that 13 observations were yes both before and after. They end up as part of both proportions, but as a result of being in both calculations they add no distinct information about the change in the proportion of yeses. Moreover they are counted twice, which is invalid. Likewise, the overall total ends up in both calculations and adds no distinct information. By decomposing the proportions we are able to recognize that the only distinct information about the before and after proportions of yeses exists in the 220 and 35, so those are the numbers we need to analyze. This was McNemar's insight. In addition, he realized that under the null, this is a binomial test of 220/(220+35) against a null proportion of .5. (There is an equivalent formulation that is distributed as a chi-squared, which is what R outputs.)

There is another discussion of McNemar's test, with extensions to contingency tables larger than 2x2, here.


Here is an R demo with your data:

mat = as.table(rbind(c(1157, 35), 
                     c( 220, 13) ))
colnames(mat) <- rownames(mat) <- c("No", "Yes")
names(dimnames(mat)) = c("Before", "After")
mat
margin.table(mat, 1)
margin.table(mat, 2)
sum(mat)

mcnemar.test(mat, correct=FALSE)
#  McNemar's Chi-squared test
# 
# data:  mat
# McNemar's chi-squared = 134.2157, df = 1, p-value < 2.2e-16
binom.test(c(220, 35), p=0.5)
#  Exact binomial test
# 
# data:  c(220, 35)
# number of successes = 220, number of trials = 255, p-value < 2.2e-16
# alternative hypothesis: true probability of success is not equal to 0.5
# 95 percent confidence interval:
#  0.8143138 0.9024996
# sample estimates:
# probability of success 
#              0.8627451 

If we didn't take the within-subjects nature of your data into account, we would have a slightly less powerful test of the equality of proportions:

prop.test(rbind(margin.table(mat, 1), margin.table(mat, 2)), correct=FALSE)
#  2-sample test for equality of proportions without continuity
#  correction
# 
# data:  rbind(margin.table(mat, 1), margin.table(mat, 2))
# X-squared = 135.1195, df = 1, p-value < 2.2e-16
# alternative hypothesis: two.sided
# 95 percent confidence interval:
#  0.1084598 0.1511894
# sample estimates:
#    prop 1    prop 2 
# 0.9663158 0.8364912 

That is, X-squared = 133.6627 instead of chi-squared = 134.2157. In this case, these differ very little, because you have a lot of data and only 13 cases are overlapping as discussed above. (Another, and more important, problem here is that this counts your data twice, i.e., N=2850, instead of N=1425.)


Here are the answers to your concrete questions:

  1. The correct analysis is McNemar's test (as discussed extensively above).
  2. This version is trickier, and the phrasing "does higher proportions of one infections relate to higher proportions of Y" is ambiguous. There are two possible questions:

    • It is perfectly reasonable to want to know if the patients who get one of the infections tend to get the other, in which case you would use the chi-squared test of independence. This question is asking whether susceptibility to the two different infections is independent (perhaps because they are contracted via different physiological pathways) or not (perhaps they are contracted due to a generally weakened immune system).
    • It is also perfectly reasonable to what to know if the same proportion of patients tend to get both infections, in which case you would use McNemar's test. The question here is about whether the infections are equally virulent.
  3. Since this is once again the same infection, of course they will be related. I gather that this version is not before and after a treatment, but just at some later point in time. Thus, you are asking if the background infection rates are changing organically, which is again a perfectly reasonable question. At any rate, the correct analysis is McNemar's test.
    Edit: It would seem I misinterpreted your third question, perhaps due to a typo. I now interpret it as two different infections at two separate timepoints. Under this interpretation, the chi-squared test would be appropriate.

@Alexis As far as I can make out, you and gung seem to be talking past each other. Even the so-called "unpaired" or "independent samples" t-test, or the "one-way" or "independent samples ANOVA", actually requires paired data in gung's sense: for each subject, you must record both a categorical group membership variable and a continuous outcome variable. (If the group membership variable has two levels, we usually use the unpaired t-test; for 3+ levels you need one-way ANOVA).
Silverfish

2
When explaining which test to use, I show both ways of looking at it - if you have observations of a continuous variable, one for each subject, and the subjects come from 2 (or 3+) groups and you're interested in differences between groups, then use the independent-samples t-test (or one-way ANOVA). Then confirm your choice by looking at your data table: do you have, for each subject, two pieces of information: category for group membership and the continuous variable. We can even turn things around and say the t-test is a kind of test of association between binary and continuous variable.
Silverfish

2
Paired t-test (or correlated samples ANOVA) is used if, for each subject, you have two (or 3+) continuous readings, taken under different conditions, and you want to test for differences between conditions. This is "paired" in a different sense. But in this question, we have two categorical variables recorded for each subject. Looking at the data table, the recorded values of those categorical variables must come in pairs. But this doesn't mean that the study design itself is paired. This is confusing (as gung notes). But if you know your study design, this can resolve it (as alexis notes)
Silverfish

@Silverfish If you have two observations (of the same nominal variable) made on each subject, in what sense is that not a paired design?
Alexis

1
@Alexis It's that "of the same variable" which is key - and potentially confusing. You might know it represents the same variable, albeit under different conditions or at different times, but depending on the way we lay the data table out, they may appear to be recorded as different variables (eg a separate "before" and "after" variable).
Silverfish

22

Well, it seems I've made a hash of this. Let me try to explain this again, in a different way and we'll see if it might help clear things up.

The traditional way to explain McNemar's test vs. the chi-squared test is to ask if the data are "paired" and to recommend McNemar's test if the data are paired and the chi-squared test if the data are "unpaired". I have found that this leads to a lot of confusion (this thread being an example!). In place of this, I have found that it is most helpful to focus on the question you are trying to ask, and to use the test that matches your question. To make this more concrete, let's look at a made-up scenario:

You walk around a statistics conference and for each statistician you meet, you record whether they are from the US or the UK. You also record whether they have high blood pressure or normal blood pressure.

Here are the data:

mat = as.table(rbind(c(195,   5),
                     c(  5, 195) ))
colnames(mat)        = c("US", "UK")
rownames(mat)        = c("Hi", "Normal")
names(dimnames(mat)) = c("BP", "Nationality")
mat
#         Nationality
# BP        US  UK
#   Hi     195   5
#   Normal   5 195

At this point, it is important to figure out what question we want to ask of our data. There are three different questions we could ask here:

  1. We might want to know if the categorical variables BP and Nationality are associated or independent;
  2. We might wonder if high blood pressure is more common amongst US statisticians than it is amongst UK statisticians;
  3. Finally, we might wonder if the proportion of statisticians with high blood pressure is equal to the proportion of US statisticians that we talked to. This refers to the marginal proportions of the table. These are not printed by default in R, but we can get them thusly (notice that, in this case, they are exactly the same):

    margin.table(mat, 1)/sum(mat)
    # BP
    #    Hi Normal 
    #   0.5    0.5 
    margin.table(mat, 2)/sum(mat)
    # Nationality
    #  US  UK 
    # 0.5 0.5 

As I said, the traditional approach, discussed in many textbooks, is to determine which test to use based on whether the data are "paired" or not. But this is very confusing, is this contingency table "paired"? If we compare the proportion with high blood pressure between US and UK statisticians, you are comparing two proportions (albeit of the same variable) measured on different sets of people. On the other hand, if you want to compare the proportion with high blood pressure to the proportion US, you are comparing two proportions (albeit of different variables) measured on the same set of people. These data are both "paired" and "unpaired" at the same time (albeit with respect to different aspects of the data). This leads to confusion. To try to avoid this confusion, I argue that you should think in terms of which question you are asking. Specifically, if you want to know:

  1. If the variables are independent: use the chi-squared test.
  2. If the proportion with high blood pressure differs by nationality: use the z-test for difference of proportions.
  3. If the marginal proportions are the same: use McNemar's test.

Someone might disagree with me here, arguing that because the contingency table is not "paired", McNemar's test cannot be used to test the equality of the marginal proportions and that the chi-squared test should be used instead. Since this is the point of contention, let's try both to see if the results make sense:

chisq.test(mat)
#  Pearson's Chi-squared test with Yates' continuity correction
# 
# data:  mat
# X-squared = 357.21, df = 1, p-value < 2.2e-16
mcnemar.test(mat)
#  McNemar's Chi-squared test
# 
# data:  mat
# McNemar's chi-squared = 0, df = 1, p-value = 1

The chi-squared test yields a p-value of approximately 0. That is, it says that the probability of getting data as far or further from equal marginal proportions, if the marginal proportions actually were equal is essentially 0. But the marginal proportions are exactly the same, 50%=50%, as we saw above! The results of the chi-squared test just don't make any sense in light of the data. On the other hand, McNemar's test yields a p-value of 1. That is, it says that you will have a 100% chance of finding marginal proportions this close to equality or further from equality, if the true marginal proportions are equal. Since the observed marginal proportions cannot be closer to equal than they are, this result makes sense.

Let's try another example:

mat2 = as.table(rbind(c(195, 195),
                      c(  5,   5) ))
colnames(mat2)        = c("US", "UK")
rownames(mat2)        = c("Hi", "Normal")
names(dimnames(mat2)) = c("BP", "Nationality")
mat2
#         Nationality
# BP        US  UK
#   Hi     195 195
#   Normal   5   5
margin.table(mat2, 1)/sum(mat2)
# BP
#     Hi Normal 
#  0.975  0.025 
margin.table(mat2, 2)/sum(mat2)
# Nationality
#  US  UK 
# 0.5 0.5 

In this case, the marginal proportions are very different, 97.5%50%. Let's try the two tests again to see how their results compare to the observed large difference in marginal proportions:

chisq.test(mat2)
#  Pearson's Chi-squared test
# 
# data:  mat2
# X-squared = 0, df = 1, p-value = 1
mcnemar.test(mat2)
#  McNemar's Chi-squared test with continuity correction
# 
# data:  mat2
# McNemar's chi-squared = 178.605, df = 1, p-value < 2.2e-16

This time, the chi-squared test gives a p-value of 1, meaning that the marginal proportions are as equal as they can be. But we saw that the marginal proportions are very obviously not equal, so this result doesn't make any sense in light of our data. On the other hand, McNemar's test yields a p-value of approximately 0. In other words, it is extremely unlikely to get data with marginal proportions as far from equality as these, if they truly are equal in the population. Since our observed marginal proportions are far from equal, this result makes sense.

The fact that the chi-squared test yields results that make no sense given our data suggests there is something wrong with using the chi-squared test here. Of course, the fact that McNemar's test provided sensible results doesn't prove that it is valid, it may just have been a coincidence, but the chi-squared test is clearly wrong.

Let's see if we can work through the argument for why McNemar's test might be the right one. I will use a third dataset:

mat3 = as.table(rbind(c(190,  15),
                      c( 60, 135) ))
colnames(mat3)        = c("US", "UK")
rownames(mat3)        = c("Hi", "Normal")
names(dimnames(mat3)) = c("BP", "Nationality")
mat3
#         Nationality
# BP        US  UK
#   Hi     190  15
#   Normal  60 135
margin.table(mat3, 1)/sum(mat3)
# BP
#     Hi Normal 
# 0.5125 0.4875 
margin.table(mat3, 2)/sum(mat3)
# Nationality
#    US    UK 
# 0.625 0.375 

This time we want to compare 51.25% to 62.5% and wonder if in the population the true marginal proportions might have been the same. Because we are comparing two proportions, the most intuitive option would be to use a z-test for the equality of two proportions. We can try that here:

prop.test(x=c(205, 250), n=c(400, 400))
#  2-sample test for equality of proportions with continuity correction
# 
# data:  c(205, 250) out of c(400, 400)
# X-squared = 9.8665, df = 1, p-value = 0.001683
# alternative hypothesis: two.sided
# 95 percent confidence interval:
#   -0.18319286 -0.04180714
# sample estimates:
# prop 1 prop 2 
# 0.5125 0.6250 

(To use prop.test() to test the marginal proportions, I had to enter the numbers of 'successes' and the total number of 'trials' manually, but you can see from the last line of the output that the proportions are correct.) This suggests that it is unlikely to get marginal proportions this far from equality if they were actually equal, given the amount of data we have.

Is this test valid? There are two problems here: The test believes we have 800 data, when we actually have only 400. This test also does not take into account that these two proportions are not independent, in the sense that they were measured on the same people.

Let's see if we can take this apart and find another way. From the contingency table, we can see that the marginal proportions are:

% high BP: 190+15400% US: 190+60400
What we see here is that the 190 US statisticians with high blood pressure show up in both marginal proportions. They are both being counted twice and contributing no information about the differences in the marginal proportions. Moreover, the 400 total shows up in both denominators as well. All of the unique and distinctive information is in the two off-diagonal cell counts (15 and 60). Whether the marginal proportions are the same or different is due only to them. Whether an observation is equally likely to fall into either of those two cells is distributed as a binomial with probability π=.5 under the null. That was McNemar's insight. In fact, McNemar's test is essentially just a binomial test of whether observations are equally likely to fall into those two cells:
binom.test(x=15, n=(15+60))
#  Exact binomial test
# 
# data:  15 and (15 + 60)
# number of successes = 15, number of trials = 75, p-value = 1.588e-07
# alternative hypothesis: true probability of success is not equal to 0.5
# 95 percent confidence interval:
#   0.1164821 0.3083261
# sample estimates:
# probability of success 
#                    0.2 

In this version, only the informative observations are used and they are not counted twice. The p-value here is much smaller, 0.0000001588, which is often the case when the dependency in the data is taken into account. That is, this test is more powerful than the z-test of difference of proportions. We can further see that the above version is essentially the same as McNemar's test:

mcnemar.test(mat3, correct=FALSE)
#  McNemar's Chi-squared test
# 
# data:  mat3
# McNemar's chi-squared = 27, df = 1, p-value = 2.035e-07

If the non-identicallity is confusing, McNemar's test typically, and in R, squares the result and compares it to the chi-squared distribution, which is not an exact test like the binomial above:

(15-60)^2/(15+60)
# [1] 27
1-pchisq(27, df=1)
# [1] 2.034555e-07

Thus, when you want to check the marginal proportions of a contingency table are equal, McNemar's test (or the exact binomial test computed manually) is correct. It uses only the relevant information without illegally using any data twice. It does not just 'happen' to yield results that make sense of the data.

I continue to believe that trying to figure out whether a contingency table is "paired" is unhelpful. I suggest using the test that matches the question you are asking of the data.


1
You got my vote. :)
Alexis

11

The question of which test to use, contingency table χ2 versus McNemar's χ2 of a null hypothesis of no association between two binary variables is simply a question of whether your data are paired/dependent, or unpaired/independent:

Binary Data in Two Independent Samples
In this case, you would use a contingency table χ2 test.

For example, you might have a sample of 20 statisticians from the USA, and a separate independent sample of 37 statisticians from the UK, and have a measure of whether these statisticians are hypertensive or normotensive. Your null hypothesis is that both UK and US statisticians have the same underlying probability of being hypertensive (i.e. that knowing whether one is from the USA or from the UK tells one nothing about the probability of hypertension). Of course it is possible that you could have the same sample size in each group, but that does not change the fact of the samples being independent (i.e. unpaired).

Binary Data in Paired Samples
In this case you would use McNemar's χ2 test.

For example, you might have individually-matched case-control study data sampled from an international statistician conference, where 30 statisticians with hypertension (cases) and 30 statisticians without hypertension (controls; who are individually matched by age, sex, BMI & smoking status to particular cases), are retrospectively assessed for professional residency in the UK versus residency elsewhere. The null is that the probability of residing in the UK among cases is the same as the probability of residing in the UK as controls (i.e. that knowing about one's hypertensive status tells one nothing about one's UK residence history).

In fact, McNemar's test analyzes pairs of data. Specifically, it analyzes discordant pairs. So the r and s from χ2=[(rs)1]2(r+s) are counts of discordant pairs.

Anto, in your example, your data are paired (same variable measured twice in same subject) and therefore McNemar's test is the appropriate choice of test for association.

[gung and I disagreed for a time about an earlier answer.]

Quoted References
"Assuming that we are still interested in comparing proportions, what can we do if our data are paired, rather than independent?... In this situation, we use McNemar's test."–Pagano and Gauvreau, Principles of Biostatistics, 2nd edition, page 349. [Emphasis added]

"The expression is better known as the McNemar matched-pair test statistic (McNemar, 1949), and has been a mainstay of matched-pair analysis."—Rothman, Greenland, & Lash. Modern Epidemiology, page 286. [Emphasis added]

"The paired t test and repeated measures of analysis of variance can be used to analyze experiments in which the variable being studied can be measured on an interval scale (and satisfies other assumptions required of parametric methods). What about experiments, analogous to the ones in Chapter 5, where the outcome is measured on a nominal scale? This problem often arises when asking whether or not a an individual responded to a treatment or when comparing the results of two different diagnostic tests that are classified positive or negative in the same individuals. We will develop a procedure to analyze such experiments, Mcnemar's test for changes, in the context of one such study."—Glanz, Primer of Biostatistics, 7th edition, page 200. [Emphasis added. Glanz works through an example of a misapplication the contingency table χ2 test to paired data on page 201.]

"For matched case-control data with one control per case, the resultant analysis is simple, and the appropriate statistical test is McNemar's chi-squared test... note that for the calculation of both the odds ratio and the statistic, the only contributors are the pairs which are disparate in exposure, that is the pairs where the case was exposed but the control was not, and those where the control was exposed but the case was not."—Elwood. Critical Appraisal of Epidemiological Studies and Clinical Trials, 1st edition, pages 189–190. [Emphasis added]


7

My understanding of McNemar's test is as follows: It is used to see whether an intervention has made a significant difference to a binary outcome. In your example, a group of subjects are checked for infection and the response is recorded as yes or no. All subjects are then given some intervention, say an antibiotic drug. They are then checked again for infection and response is recorded as yes/no again. The (pairs of) responses can be put in the contigency table:

             After   
           |no  |yes|
Before|No  |1157|35 |
      |Yes |220 |13 |

And McNemar's test would be appropriate for this.

It is clear from the table that many more have converted from 'yes' to 'no' (220/(220+13) or 94.4%) than from 'no' to 'yes' (35/(1157+35) or 2.9%). Considering these proportions, McNemar's P value (4.901e-31) appears more correct than chi-square P value (0.04082 ).

If contigency table represents 2 different infections (question 2), then Chi-square would be more appropriate.

Your 3rd question is ambiguous: you first state relating Y at t2 with Y at t1 but in the table you write 'X' at t1 vs Y at t2. Y at t2 vs Y at t1 is same as your first question and hence McNemar's test is needed, while X at t1 and Y at t2 indicates different events are being compared and hence Chi-square will be more appropriate.

Edit: As mentioned by Alexis in the comment, matched case-control data are also analyzed by McNemar's test. For example, 1425 cancer patients are recruited for a study and for each patient a matched control is also recruited. All these (1425*2) are checked for infection. The results of each pair can be shown by similar table:

             Normal   
           |no  |yes|
Cancer|No  |1157|35 |
      |Yes |220 |13 |

More clearly:

                                    Normal:
                                    No infection   Infection  
Cancer patient:     No infection    1157            35      
                    Infection       220             13      

It shows that it is much more often that cancer patient had infection and control did not, rather than the reverse. Its significance can be tested by McNemar's test.

If these patients and controls were not matched and independent, one can only make following table and do a chisquare test:

            Infection
            No    Yes
Cancer  No  1377   48
        Yes 1192  233

More clearly:

                No infection        Infection
No cancer       1377                48
Cancer          1192                233

Note that these numbers are same as margins of the first table:

> addmargins(mat)
      After
Before   No  Yes  Sum
   No  1157   35 1192
   Yes  220   13  233
   Sum 1377   48 1425

That must be the reason for use of terms like 'marginal frequencies' and 'marginal homogeneity' in McNemar's test.

Interestingly, the addmargins function can also help decide which test to use. If the grand-total is half the number of subjects observed (indicating pairing has been done), then McNemar's test is applicable, else chisquare test is appropriate:

> addmargins(mat)
      Normal
Cancer   No  Yes  Sum
   No  1157   35 1192
   Yes  220   13  233
   Sum 1377   48 1425
> 
> addmargins(mat3)
      Infection
Cancer   No  Yes  Sum
   No  1377   48 1425
   Yes 1192  233 1425
   Sum 2569  281 2850

The R codes for above tables are as from answers above:

mat = as.table(rbind(c(1157, 35), 
                      c( 220, 13) ))
colnames(mat) <- rownames(mat) <- c("No", "Yes")
names(dimnames(mat)) = c("Cancer", "Normal")

mat3 = as.table(rbind(c(1377, 48), 
                     c(1192, 233) ))
colnames(mat3) <- rownames(mat3) <- c("No", "Yes")
names(dimnames(mat3)) = c("Cancer", "Infection")

Following pseudocode may also help knowing the difference:

subject_id      result_first_observation    result_second_observation   
1               no                          yes                     
2               yes                         no                      
...

mcnemar.test(table(result_first_observation, result_second_observation))



pair_id     result_case_subject     result_control_subject  
1           no                      yes                     
2           yes                     no                      
...

mcnemar.test(table(result_case_subject, result_control_subject))



subject_id      result_first_test       result_second_test
1               yes                     no
2               no                      yes
..

chisq.test(table(result_first_test, result_second_test))

Edit:

mid-p variation of peforming McNemar test ( https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3716987/ ) is interesting. It compares b and c of contingency table, i.e. number who changed from yes to no versus number who changed from no to yes (ignoring number of those who remained yes or no through the study). It can be performed using binomial test in python, as shown at https://gist.github.com/kylebgorman/c8b3fb31c1552ecbaafb

It could be equivalent to binom.test(b, b+c, 0.5) since in a random change, one would expect b to be equal to c.


3
Not only for intervention analysis: it is used to analyze matched case-control data in an observational sense as well.
Alexis

Given the description / setup prior to the table for Q3, I suspect the "X" is a typo, but that was a good catch & this is a useful contribution to the thread +1.
gung - Reinstate Monica

@mso Edited Q3. it is X at t1! otherwise, as you say it not different from Q1. this Q is over a year old and surprised to see someone coming back to it with the same thoughts that confused me. Following with much interest!
Anto

My apologies, the OP has clarified Q3, evidently it is 2 different diseases at 2 different times. Again, good catch.
gung - Reinstate Monica
Durch die Nutzung unserer Website bestätigen Sie, dass Sie unsere Cookie-Richtlinie und Datenschutzrichtlinie gelesen und verstanden haben.
Licensed under cc by-sa 3.0 with attribution required.