
Like all non-parametric statistics, the Chi-square is robust with respect to the distribution of the data. To put it a different way, if the sample size of either the rows, or the columns, of the table are fixed, the theoretical assumptions of the chi-square test of independence are violated and the test of homogeneity should be implied instead – but you get the same conclusion regardless.The Chi-square statistic is a non-parametric (distribution free) tool designed to analyze group differences when the dependent variable is measured at a nominal level. The test of homogeneity, by contrast, is derived from the assumption that the sample sizes for columns (or equivalently only the rows) has been pre-specified. Why then are they different tests? The chi-square test of independence assumes that sampling error plays a role in both which column categories were selected in the data and which row categories were selected. The actual calculation for the chi-square test of homogeneity is identical to that of the chi-square test of independence the data input, a contingency table, is also the same. The difference between the chi-square tests of independence and of homogeneity

The same calculation can be applied to larger tables. In this example the table evaluated had two columns and three rows, excluding totals. Earlier in the article it was stated that the p-values was 0.052 rather than 0.055 the difference is due to rounding errors in the calculation. As this is a chi-square test, we can look up the test statistic and the degrees of freedom for the chi-square distribution, and get a p-value of 0.055. In this example the degrees of freedom are then (3 – 1) * (2 – 1) = 2. This is given by ( R – 1) * ( C – 1), where R is the number of rows, excluding totals, and C is the number of columns, excluding totals. To compute a p-value, we need to know the degrees of freedom. If we repeat the same calculation for the cells excluding the totals and add them up, this give a value of 5.8. Where O is the observed value in a cell, E is the expected value, the formula is (O – E) ² /E. We start by calculating the cell chi-square value. A chi-square statistic can be calculated which summarizes the overall extent of the sampling error. The extent to which the observed counts differ from expected counts reflects sampling error. Performing the same calculations for the remaining rows of the table, we get the following table of expected counts below. That is, among the 32 people that live alone, we would expect 8.3% of them, 2.6 people on average, to be on a diet, and 24.4 of the people living with others to be on a diet.

If we assume that these two groups of people are the same in terms of their diet practices, we would then expect that in each group 8.3% of people will be on a diet. The table above shows that across both groups of people (i.e., those living alone and those living with others), we have observed that 27/327 = 8.3% of people are living alone. The table below shows the counts, which is the number of people in each combination of living status and diet practice, along with totals. As this is greater than 0.05, by convention the conclusion is that the difference is due to sampling error, although the closeness of 0.05 to 0.052 makes this very much a “line ball” conclusion. However, only 32 in the table are classified as living alone, so it is likely that these results reflect a relatively high degree of sampling error.Ī chi-square test of homogeneity tests whether differences in a table like this are consistent with sampling error.


It shows that people that people who live with others are marginally more likely to be on a diet but are much less likely to watch what they eat and drink and are much more likely to eat and drink whatever they feel like. Example of a chi-square test of homogeneityĬonsider the table below. For example, in a table showing political party preference in the rows and states in the columns, the test has the null hypothesis that each state has the same party preferences. The chi-square test of homogeneity tests to see whether different columns (or rows) of data in a table come from the same population or not (i.e., whether the differences are consistent with being explained by sampling error alone).
