Intraclass correlation and aggregation

Imagine that:

  • You have a sample of 1000 teams each with 10 members.
  • You measured team functioning by asking each team member how well they think their team is functioning using a reliable multi-item numeric scale.
  • You want to describe the extent to which the measure of team effectiveness is a property of the team member's idiosyncratic belief or a property of a shared belief about the team.

In this and related situations (e.g., aggregating to organisations), many researchers report the intraclass correlation (e.g., Table 1 in Campion & Medsker, 1993). Thus, my questions are:

  1. What descriptive labels would you attach to different values of the intra-class correlation? I.e., the aim is to actually relate the values of the intra-class correlation to qualitative language such as: "When the intraclass correlation is greater than x, it suggests that the attitudes are modestly/moderately/strongly shared across team members."
  2. Do you think the intraclass correlation is the appropriate statistic or would you use a different strategy?

I think (1) is not a statistical question but a subject-area one. E.g., in the described example it would be up to those who study group psychology to determine appropriate language for the strength of ICCs. This is analogous to a Pearson correlation -- what constitutes 'strong' differs depending on whether one is working in, for example, sociology or physics.

(2) is to an extent also subject-area specific -- it depends on what researchers are aiming to measure and describe. But from a statistical point of view ICC is a reasonable metric for within-team relatedness. However I agree with Mike that when you say you'd like to

"describe the extent to which the measure of team effectiveness is a property of the team member's idiosyncratic belief or a property of a shared belief about the team"

then it is probably more appropriate to use variance components in their raw form than to convert them into an ICC.

To clarify, think of the ICC as calculated within a mixed model. For a single-level mixed model with random group-level intercepts $b_i \sim N(0, \sigma^2_b)$ and within-group errors $\epsilon_{ij} \stackrel{\mathrm{iid}}{\sim} N(0, \sigma^2)$, $\sigma^2_b$ describes the amount of variation between teams and $\sigma^2$ describes variation within teams. Then, for a single team, we get a response covariance matrix of $\sigma^2 \mathbf{I} + \sigma^2_b \mathbf{1}\mathbf{1}'$ which when converted to a correlation matrix is $\frac{\sigma^2}{\sigma^2 + \sigma^2_b} \mathbf{I} + \frac{\sigma^2_b}{\sigma^2 + \sigma^2_b} \mathbf{1}\mathbf{1}'$. So, $\frac{\sigma^2_b}{\sigma^2 + \sigma^2_b} = \mathrm{ICC}$ describes the level of correlation between effectiveness responses within a team, but it sounds as though you may be more interested in $\sigma^2$ and $\sigma^2_b$, or perhaps $\frac{\sigma^2}{\sigma^2_b}$.

1) With correlations, you can never really give sensible cut-offs, but the general rules of the normal correlation apply I'd say.

2) Regarding the appropriateness of the ICC : depending on the data, the ICC is equivalent to an F-test (see eg Commenges & Jacqmin, 1994 and Kistner & Muller, 2004). So in essence, the mixed model framework can tell you at least as much about your hypothesis, and allows for simultaneously testing more hypotheses than the ICC.

Cronbach's $\alpha$ is also directly related to the ICC, and another measure that is (was?) often reported, albeit in the context of agreement between items within a group. This approach comes from psychological questionnaires, where a cut-off of 0.7 is rather often used to determine whether the questions really group into the studied factors.

Paul Bliese has an article discussing the intraclass correlation in teams research. He writes that

In [his extensive] experience with U.S. Army [teams] data ...he never encountered ICC(1) values greater than .30 [, and that he] typically [sees] values between .05 and .20.

He goes on to suggest that he would be

surprised to find ICC(1) values greater than .30 in most applied field research.

I have read articles that cite this article, arguably inappropriately, suggesting an ICC(1) value of greater than .05 is needed to justify aggregation.


  • Bliese, P. D. (2000). Within-group agreement, non-independence, and reliability: Implications for data aggregation and analysis. PDF