From “Observations” to “Data”: The Changing Language of Science

By Neuroskeptic | March 19, 2016 10:32 am

Today we hear a lot about scientific data – data sharing, data integrity, and Big Data, are all hot topics in science. Yet is science really about “data”? Did scientists in the past talk about it as much as we do?

To find out, I ran some PubMed searches to find papers published in the last century, 1915 to 2015. I searched for “data” and for other alternative terms that can be used to refer to scientific findings. Here’s a graph of the percentage of biomedical journal articles published each year, that have each certain word in the title.


100 years ago, the term “data” was almost never seen in the titles of scientific articles. Instead, papers were commonly described as being “notes” or “observations”. The term “results” was also used, but less than it is today.

This first era lasted until the late 1940s. At this point, the word “results” became the most popular term. “Data” and “findings” also gained in popularity somewhat, while “notes” became almost extinct. “Data” became quite popular by the late 1950s, but its usage then peaked and declined.

Finally, over the past few decades, we’ve seen the second rise of “data”, which has been growing slowly but steadily since 1980, and has lately overtaken “results” as the most common word out of the ones I examined. “Observations” has been in decline since 1960 and is now very rare.

What does this mean?

My impression is that what we’re seeing here is the gradual ‘specialization’ of science. In 1915, scientists seem to have preferred everyday terminology to describe their work. “Notes” and “Observations” are not specifically scientific terms. A historian, or a lawyer, or even a movie critic could use those words.

In the second era, after WW2, the term “results” gained in popularity. This is still an everyday word, although it has some special connotations in science. Today, the rise of “data” seems to reflect a reversal in the relationship between science and the rest of the world. My impression is that “data” is being used more and more widely in normal discourse but this is a borrowing, so to speak, from science, whereas previously, science was borrowing from everyday life.

CATEGORIZED UNDER: graphs, history, science, select, Top Posts
  • Uncle Al

    Making observations to create theory is vastly different from elegantly confabulated theory that must be correct and need not be tested (M/String-theory, SUSY, dark matter, baryogenesis). The former is a 21st century grant-funding high risk endeavor, thus excluded by DCF/ROI. The latter is a cheap-to-realize sure thing. Management is rewarded for counting things, hence declared theory derived from prior accumulated Big Data (via somebodies elses’ wallets). Example:

    Superluminal neutrinos, explained. (* = repeat first four digits) arXiv:1312.4837, 1304.0038, 1210.5248, 1205.0145, 1204.0484, 1203.4052; 1202.3319, *.0469; 1201.6496, *.5847, *.4147, *.2085, *.1368, *.1322, *.0915; 1112.6217, *.4779, *.4714, *.3753, *.3050, *.2689, *.1222, *.0815, *.0527, *.0353, *.0300; 1111.7181, *.6579, *.6330, *.4994, *.4532, *.3888, *.2271, *.1574, *.0805, *.0093; 1110.6673, *.6577, *.6571, *.6408, *.4754, *.3581, *.3540, *.3071, *.2685, *.2463, *.2236, *.2219, *.2170, *.2146, *.2060, *.2015, *.1943, *.1875, *.1790, *.1330, *.1253, *.0931, *.0762, *.0521, *.0644, *.0456, *.0451, *.0449, *.0430, *.0392, *.0351, *.0245, *.0243, *.0239, *.0234; 1109.6312, *.6308, *.6296, *.6282, *.4897. Theory is the inexpensive part of physics that seeks monopoly through yelling.

    arXiv:1109.6562 (crackpots, by peer vote). Superluminal neutrinos were a loose fiberoptic connector to a clock. Cf: the Pioneer anomaly, arXiv:/gr-qc/9808081 (1998) dwindling to arXiv:1103.522 (2011, Phong shading).

  • Денис Бурчаков

    Perhaps the rise of the “data” also reflects the rise of computer use and evidence-based medicine? Chronologically they coincide. “Data” is a key term in biostatistics, which tell us whether data are good, bad, ugly or insufficient. And biostatistics became far more available since 1985, due to relatively simple to use packages, especially recent ones. The whole concept of evidence-based medicine revolves around heaps of data – the bigger the better. Observations may be scientific, but they are somewhat unique, because we “observe” one thing at a time. Data is a different beast – all about taking a heap of values and then calculate mean, median, mode, SD, p, CI and so on quantum satis.

  • Pingback: From “Observations” to “Data”: The Changing Language of Science – News Radical()

  • Pingback: Assignment #2 | MAT2572 Probability w/ Statistics, FA2016()

  • Pingback: DDR pg 127-128 Question 1, 2, 3 – PSYCH 250()

  • Pingback: Scientific Papers Are Getting Less Readable - Neuroskeptic()

  • Pingback: Scientific Papers Are Getting Less Readable – mysosts()



No brain. No gain.

About Neuroskeptic

Neuroskeptic is a British neuroscientist who takes a skeptical look at his own field, and beyond. His blog offers a look at the latest developments in neuroscience, psychiatry and psychology through a critical lens.


See More

@Neuro_Skeptic on Twitter


Discover's Newsletter

Sign up to get the latest science news delivered weekly right to your inbox!

Collapse bottom bar