What is the Difference Between Cause and Correlation?
Cause and correlation are terms that are often confused or used incorrectly. A correlation means a relationship between two or more things: when one increases, the other increases, or when one increases, the other decreases. A cause is something that results in an effect; for example, heating water to a certain temperature will make it boil. The crucial point is that a correlation between two things does not necessarily mean that one causes the other. If there is a relationship between two phenomena, A and B, it could be that A causes B, or it could be that B is responsible for A; other possibilities are that some other factor is the reason for both A and B, or that they have independent causes that just happen to run in parallel.
Researchers trying to find reasons for various things will often use statistical methods to establish correlations: this may be the first step toward establishing the cause. Scientists and statisticians can use a formula to determine the strength of a relationship between two phenomena. This gives a figure, known as the square of the correlation coefficient, or R2, which always lies between 0 and 1, with a value closer to 1 indicating a stronger correlation.
When the R2 value is high, this relationship may merit further investigation; however, researchers should beware of jumping to conclusions. It is possible to identify all sorts of strong, but meaningless, correlations. In one very well known example, the R2 for the number of highway fatalities in the US between 1996 and 2000, and the quantity of lemons imported from Mexico during the same period, is 0.97 — a very strong correlation — but it is extremely unlikely that one causes the other.
A correlation, particularly when reported in the media, is often described as a “link,” which can be misleading, as it can be taken to mean that one of the factors causes the other. For example, a study that found that men who drink four cups of green tea a day had a lower risk of stroke than those who did not drink it might generate the headline "Green Tea Cuts Stroke Risk." This implies that drinking green tea will directly lower the risk of stroke, but that isn't proven by the study. Other factors, like the fact that the study was conducted on men in Japan who have different diets and exercise habits than men in Western countries, could have influenced the results. While there could be a more direct causal relationship here, a broader study would be needed and more variables would need to be considered.
If factor A is responsible for factor B, there will be a strong correlation between the two, but the reverse is not necessarily the case. Proving beyond reasonable doubt that A is responsible for B requires much more than a high R2 value. Having established a strong relationship, researchers will then need to come up with ideas as to how A might affect B then test these ideas by experiment. It is often the case that more than one possible cause can be identified. In these instances, a good method is to conduct experiments in which all but one of the factors remains constant, and then determine from this the factor that responsible for the effect.
For example, a plant that grows in a temperate climate may be dormant during the winter, and start growing in spring. One theory would be that increased average temperatures trigger growth, while another might be that longer periods of daylight are responsible. To determine which is the case, one sample of plants might be subjected to increasing temperatures and constant hours of daylight, while another might experience constant temperature and increasing daylight. The cause could then be determined from which set of plants starts growing. If neither set begins to grow, a third experiment might be performed, in which both temperature and daylight are increased; if this results in growth, then the researchers might conclude that a combination of both factors is required.
In some cases, a given cause will always result in a particular effect; for example, the Earth’s gravity will always make an object fall if no other force is acting on it. In other cases, however, the effect is not guaranteed. It is known that ionizing radiation and certain chemicals are causes of cancer, but not everyone exposed to these factors will develop the disease, as there is an element of chance involved. Both factors can alter DNA, and sometimes this will result in a cell becoming cancerous, but this will not happen every time. If, however, one were to plot levels of exposure to these factors against the incidence of cancer in a large sample of otherwise similar people, a strong correlation would be expected.
Although researchers have criteria for pursuing possible causes of a phenomenon based on the strength of the correlations, the factor with the highest R2 value is not necessarily the one responsible. Scientists and researchers will reject factors that show a weak correlation, but, as noted, completely irrelevant factors can produce a very high R2, as can factors that appear for the same reason as the thing being investigated. The likelihood of A causing B is therefore not necessarily proportional to the strength of the correlation.
Confusion of Cause and Correlation
A lot of confusion between cause and correlation results from the way findings are reported in the media. A relationship might be described as a “cause” — it might be reported that violent video games cause violent behavior, when all that has been found is a correlation, for example. It may be that aggressive people are more likely to play violent games, so such people would behave more aggressively with or without the influence of the games.
Research has shown that violent games may influence aggression. It also shows that a number of other factors may be responsible for violent behavior, among them, poorer socioeconomic status, mental illness, abusive childhoods, and bad parenting. Possibly, such games may increase the likelihood of violent behavior in an individual with a predisposition toward aggression resulting from other factors, but stating that violent video games cause violent behavior is not justified by the known facts.
Health is another area where confusion can arise. Those who read or hear of the many things that have been reported as causing, or being linked to, cancer might never eat, drink, or leave their homes again. A “cause” may only be a correlation, and a “link” is just that: it does not identify a definite cause of cancer. A great deal of research is going on into the reasons why cancer develops, and scientists frequently find links, but when these are reported in the media, people should look or listen carefully for qualifying words like “may,” “might increase,” or “could have an effect,” before drawing any conclusions.
I always remember a great example of this which was that there seemed to be a greater instance in the UK of particular cancer illness around nuclear power stations. This led to concerns that these stations caused the cancer.
I think after some research, what was actually discovered was that power stations tend to be built on the coast (perhaps needing the water for cooling) and that old people tend to retire to the coast. So what was actually happening was that a greater density of old people led to a greater density of illnesses which old people get, including cancer.
Even with strong correlations we must be very cautious, there must be more supporting evidence to even suggest that it increases a risk.
For instance: there is a correlation between obesity and breast cancer, but there is no observational clinical evidence to support the idea that obesity causes breast cancer or has any significant affect on any cancer. Yet it is accepted by many clinicians to be a "cause".
You might as well say that there is a very high correlation between wearing a bra and getting breast cancer. Both are equally valid.
@DentalFloss, I can definitely see why causality and correlation would be confusing to people in psychology. There is also an importance in the difference between cause and correlation in biology. People often like to talk about thing involving evolution as though they were caused by something specific, like a natural disaster or an act of humans or another species. While sometimes things do cause other things in the natural world, it is often difficult for biologists to do more than speculate about correlation, rather than identify things as definite causes.
When I studied psychology, my professor said one of the things she expected every single one of us to get out of that class was the difference between cause and correlation. Specifically, that data correlation does not equal cause.
Post your comments