본문 바로가기

Mining

Simpson's paradox

위키정의:
In probability and statistics, Simpson's paradox (or the Yule-Simpson effect) is an apparent paradox in which a correlation (trend) present in different groups is reversed when the groups are combined. This result is often encountered in social-science and medical-science statistics, and it occurs when frequency data are hastily given causal interpretations. Simpson's Paradox disappears when any causal relations are derived systematically – i.e. through formal analysis.



예를 보면 쉽다.

1)
데이비드는 1995, 1996년 모두 데릭보다 성적이 좋지만 2년의 결과를 합치면 좋지 않다.

  1995 1996  Combined 
Derek Jeter  12/48 .250 183/582 .314  195/630 .310 
David Justice 104/411 .253  45/140 .321  149/551 .270 


2) 신장결석에 대한 두 가지 치료법에 대한 비교.

   Treatment A  Treatment B
 성공률  78% (273/350)  83% (289/350)

전체 성공률만 보면, 치료법 B가 좋아보인다.

하지만 나누어 보면,  결과가 너무 달라진다.
Treatment ATreatment B
Small Stones93% (81/87) 87% (234/270)
Large Stones73% (192/263) 69% (55/80)
Both78% (273/350) 83% (289/350)





At best, Simpson's Paradox is used to argue that association is not causation.
At worst, Simpson's Paradox is used to argue that induction is impossible in observational studies.







참고.
http://en.wikipedia.org/wiki/Simpson's_paradox
web.augsburg.edu/~schield/MiloPapers/99ASA.pdf



티스토리를 오랜만에 사용하는데..
글쓰기가 너무 불편.. ;;

'Mining' 카테고리의 다른 글

R - Special Values  (0) 2011.04.14
R - Import data (SAS to R, DB to R)  (0) 2011.04.08
R - 데이터 타입 (Data Types)  (0) 2011.04.07
인과관계, 상관관계 (causality, correlation)  (0) 2010.12.12
The square root sampling relationship  (0) 2010.08.04