Twyman's law
Twyman's law is the principle that "the more unusual or interesting the data, the more likely they are to have been the result of an error of one kind or another". It is named after William Anthony Twyman and has been described as one of the most important laws of data analysis.[1][2][3]
The law is based on the fact that errors in data measurement and analysis can lead to observed quantities that are wildly different from typical values. These errors are usually more common than real changes of similar magnitude in the underlying process being measured. For example, if an analyst at a software company notices that the number of users has doubled overnight, the most likely explanation is a bug in logging, rather than a true increase in users.[2]
The law can also be extended to situations where the underlying data is influenced by unexpected factors that differ from what was intended to be measured. For example, when schools show unusually large improvements in test scores, subsequent investigation often reveals that those scores were driven by fraud.[4]
See also
References
- Marsh, Catherine; Elliott, Jane. Exploring Data. Polity. p. 46. ISBN 978-0-7456-2283-5.
- Kohavi, Ron; Tang, Diane; Xu, Ya (2020). Trustworthy Online Controlled Experiments: A Practical Guide to A/B Testing. Cambridge University Press. p. 39. ISBN 978-1-108-72426-5.
- Ehrenberg, A. S. C.; Twyman, W. A. (1967). "On Measuring Television Audiences". Journal of the Royal Statistical Society. Series A (General). 130 (1): 1–60. doi:10.2307/2344037. ISSN 0035-9238. JSTOR 2344037.
- "When test scores are too good to be true", The Hechinger Report, 2011-03-07