Data Fatigue

By Sean Carroll | August 17, 2011 10:41 am

Hello out there in blog-land. I’ve been traveling (and working!) too much to actually blog, most recently at the terrific SciFoo Camp held at Google. This is an informal “unconference,” where on the first night participants scramble to a big whiteboard to suggest events for the next day and a half. I helped organize a session on “Time” that turned out to be popular, featuring short talks by Geoffrey West, Max Tegmark, David Eagleman, Mark Changizi, and Martin Rees. Other interesting sessions I went to talked about sleep, narratives, the brain, the Turing Test, and why the difficulty of putting chiral fermions on a lattice is evidence against the idea that we live in a computer simulation. (That last one was from David Tong.)

But just between you and me, while staring at the intimidating whiteboard full of interesting possibilities for what to do next, I was struck by a depressing insight: I am tired of data.

This isn’t to say that I am tired of experiments. We can’t learn anything about the world without looking at it, and my favorite areas of physics are bubbling along with provocative new results (or at least hints thereof). When data is taken by an experiment in the cause of deciding some scientific question, that’s fine.

It’s the fetishization of data for its own sake that I find fatiguing. It’s hardly surprising that, surrounded by sci-tech folks at the Googleplex, one would be overwhelmed by talk of data collection, data visualization, data analysis, and so on. And good for them! We are being swamped by data in unprecedented forms and quantities, and it’s a crucially important task to sort it all out and understand how we can use it.

I’m just personally kind of exhausted by it all. (And it’s my blog, so if I want to bust out the occasional irrational rant, who will stop me?) Data — like theory! — is a tool we use in the quest for a higher goal — understanding. If people want to show me that they understand some unanticipated new phenomenon on the basis of some data that they collected and analyzed, I am as enthusiastic as ever. But my standards are rising for simply being impressed by new ways of gathering or visualizing data for its own sake.

At least, for the moment. Next time I see a really pretty picture, I’ll undoubtedly forget I said any of this.

CATEGORIZED UNDER: Personal, Technology
  • Mike

    Sean,

    Did you and Max talk about his new paper that comments some on your recent work? I wouldn’t characterize his paper as “data-heavy.”

    Hope this is the subject of a new post — his paper and your work, not necessarily your conversation. ;)

  • http://terrybollinger.com Terry Bollinger

    A few simple observations about having abundant data, from someone who works in that world on a daily basis…

    Some of the more remarkable insights back during the heyday of physics in the early 1900s were little more than novel ways of interpreting rules derived from simple but unexpected experimental results. Louis de Broglie’s remarkable insight about the wave nature of matter was not much more than a novel way of looking at a simple algebraic formula. Even Dirac’s astonishingly predictive equation came more from floundering about on an issue that was not much more than how to factor a simple algebraic equation — the full mass-energy-momentum equivalence equation — in a way that kept it nice and linear. These are by no means the only examples, either.

    Which brings up an interesting point. Using only minimal data, no computers, and mostly rather simple math, the men and at least one irrepressible woman (three cheers for Emily Noether!) of the early 1900s did more to move physics towards its goal a fully completed theory than have the entire sum of men, women, data, theorizing, and of course massive computational power that have followed in the years since.

    So what does that say about net intellectual efficiency since the early 1900s? Is it possible that, like a field of candy surrounding a hungry child who should be looking for real food, much of this data and power has become more of a distraction than a help?

    I am inclined to think exactly that. Speaking as a poor, humble computer type, one who perhaps is a bit more aware of the failing of my own field than those outside of it may be, my suggestion would be this: Never underestimate the power and insight buried within even the simplest experimentally validated physics equation.

    Whether you thing physics is just math (Max Tegmark) or math is just physics (me, for whatever that’s worth), the point here is the same: Math, and especially simple math that ties very directly to rock-solid experimental physics, is always trying to tell us something important… and we usually don’t listen well.

    Is that still true, though? Or did that bold early band of miners like Dirac and Noether and Pauli and Einstein tap out all the easy veins of insight, leaving the future with only the hardest and most difficult of ores to mine, ones that can be brought to light only by using the most wondrously powerful machinery and the most subtle of methods?

    Many people think exactly that way. I don’t buy it, even for a moment.

    In fact, here’s a prediction I will make that I think is based on pretty good historical data: The next great insight on the nature of time will have exactly nothing to do powerful computers, amazing analytical programs, or vastly subtle and incredibly complex mathematics. As in the past, that next great insight will instead turn out to be something as simple as, say, the last insight of realizing that time can vary from frame to frame.

    That’s one reason why I think Sean Carroll’s idea of “time conservation” — that is, of paired universes whose times in some sense “cancel out” when viewed from within the greater scheme of things — is so much more interesting than almost anything out of the exhausting and endlessly quivering threads of string theory. His time symmetry idea has simplicity to it, simplicity of a type that would have been readily comprehensible to those early physics miners with their minimal tools and less data and precious brain time.

    So maybe there is some hope out there, even with all that so pretty and so delicious and so strangely empty computer-generated physics eye candy laying about and pulling everyone’s attention every which way but loose. Maybe folks can start looking again for a few simple but truly novel ideas, perhaps even ideas still left buried within simple but rock-solid physics relationships. Maybe physics can get back on track to its original goal: Devising a theory of all physics that at its heart believes the universe really is simple.

  • Mike

    Terry,

    Perhaps it’s like Wheeler said, “[i]t is my opinion that everything must be based on a simple idea. And it is my opinion that this idea, once we have finally discovered it, will be so compelling, so beautiful, that we will say to one another, yes, how could it have been any different.”

  • AI

    Most data is of little to no value and we simply ignore it without even realising it.

    Only a very tiny proportion of all data derivable from our physical reality is of any interest to us. For example no one analyzes the exact locations, shapes and sizes of each and every grain of sand on some particular beach. No one bothers with studying the exact pattern of leaves on every single tree in some forest. It’s only data that we believe can be distilled into some more generally applicable principles that we find valuable and worthy of study. For example studying shapes of quartz grains in general or patterns of oak leaves in general makes much more sense.

  • Matt
  • Sir Marmite Luney-Binns

    More fatigue for you Sean :)

    http://xkcd.com/930/

  • http://warrickball.blogspot.com/2011/02/more-on-bad-astro-code.html Warrick

    I think part of the problem is that usually we create data in order to test a hypothesis. You know, we formulate an experiment or run a simulation or take samples or something. Nowadays, though, it sometimes feels like we’re creating data and finding new ways of representing it without knowing what we’re trying to achieve thereby. We’re making answers without questions.

    This isn’t always bad. For a start, some data is surely useful and takes time to gather, even without knowing what we’ll use it for yet. These are answers which we’re pretty sure will end up with questions. Also we need some observation in order to ask the first questions. But I also feel like there’s a growing trend of making and presenting data without really setting out to or planning how to learn from it.

  • http://terrybollinger.com Terry Bollinger

    Mike,

    Wheeler/Feynman absolutely got the simplicity idea!

    Along those lines, anyone who has not read Feynman’s thesis and the various related descriptions of how those two worked together on it is missing a delightful story of high-risk exploration of truly nutty ideas. The nutty idea in Feynman’s thesis was first proposed by Wheeler, after Feynman floundered about badly and ended up inadvertently describing ordinary mirror reflection of photons in an amazingly complicated way. Wheeler’s idea that he proposed as an alternative was that the recoil of an electron when it gives off a photon is actually the result of a photon sent backwards in time by whatever entity eventually receives the emitted photon, even if that event does not occur until billions of years in the future.

    Picture that the next time you shine a flashlight towards an empty region of the sky: According to Feynman’s thesis, no photon is emitted from your flashlight until it “finds” its recipient atom somewhere billions of light years out there… and your hand then receives a tiny, tiny nudge backwards at a backwards-traveling photon from that future atom arrives at the same moment of emission of your photon. That kind of theorizing makes a person feel downright connected! (Fully disclosure: Feynman later backed off from the idea after vacuum polarization was discovered. The latter effect, however, was one of many problems that led Feynman towards his final masterpiece, QED, which handles vacuum polarization without even blinking.)

    The most striking aspect of the Wheeler/Feynman team is how Wheeler always seemed to come up with the truly weird ideas. In addition to backwards-traveling photons, Wheeler also once postulated that there is only one electron in the universe. This multiplexed-to-the-max electron travels to and fro a nearly infinite number of times from the start and the end of time, first as an electron going forward, then as a positron going backwards. Talk about a long commute! I should immediately note that it doesn’t work, because it implies equal quantities of matter and antimatter in the universe. But once again Wheeler’s rather whacky thinking deeply inspired Feynman. In fact, it was this very idea from Wheeler that triggered Feynman to derive his masterpiece QED framework, a framework in which electrons bounce off of energetic photons and then travel into the past as positrons.

    Feynman was the more intellectually conservative of he pair, but he was also more than willing to test out Wheeler’s ideas to see if he could make them go somewhere interesting and testable. Thus when anyone talks about “the greatest physicist of of the late 1900s,” my candidate is always “Wheeler/Feynman.” It was the teaming of these two, not the isolated individuals, that produced some of the most genuinely original work, and the most beautifully simple insights, to be found in all of modern physics.

  • http://terrybollinger.com Terry Bollinger

    AI (Comment #4),

    You raise a very interesting set of questions with your observations about both similarities and differences, and how perceive them. In fact, these are the sorts of question that I think we have a lot of difficulty handling well precisely because they are so deeply ingrained.

    Here’s just one example of what I mean: In terms of the unfolding of time, why exactly do two of your oak leaves look so much alike? The easy answer is “because they are both from oak trees, and oak trees all have similar leaves.” But if you keep the focus on why the question is true in terms of time, you are driven backwards in time to look at the origins of both leaves. Do that far enough and you realize something a bit non-intuitive: The oak leaves are similar because both have chain of information that trace backward in time until at some point they merge, possibly at multiple junctions, into singular sets of ancestral genes that determine shape. Drawn out in spacetime, these chains take the form of fork-like Vs — very tangled and distorted Vs, granted, but ones with shared origins. The leaves are similar because at one point in the past they were the same object.

    These simple Vs are also the hidden mechanisms that enable information theory, because if you think about it, how does one radio receive a signal from another radio if at some time in the past the protocols for doing so did not originate on a single piece of paper or computer screen on which they were designed? In other words, earlier information transfers both beget and enable new, more subtle information transfers that rely on those earlier transfers to “understand” the data that has been sent. The earlier information divergences also form the kernels of the critical similarities that you noted, where leaves are similar enough to be recognized as members of a particular information tree, yet different enough to convey new information about the microclimate, tree branches, and plethora of insects that have subsequently played out their subtle melody of variation on that basic model. The past trees of the sand grains are more complex and encompass, for example, other deeper and more ancient shared models of why the earth itself generates bodies of quartz in solid form, and why ancient winds (it’s not water for sand, incidentally) have similar profiles that in turn create similar sizes and shapes of grains. But those grains also have their new messages encoded in their variations, telling micro and nano stories about the rocks from which they came an, for example, valuable ores they may indicate.

    That’s not the full story, however. We live in a universe in which the spaces of that are of most relevance to us have three or two dimensions, and that in turn means the impact of distant objects for the most part falls off as some power of that distance. (Note that this is not true for one dimensional worlds! That is why a car accident five miles ahead of you on the only road in to work can have a very direct impact indeed on your local situation.)

    In a universe where effects tend overall to fade with distance, information trees that are rooted in the past also tend to fade in their overall impact. The exception occurs when some principle of conservation causes specific features to stand out out or join together, so that the conserved feature feature fades more slowly or not at all with distance.

    A good example is volume in sand. Starting very deeply indeed, at the fundamental particle level, the shared identity of electrons causes them to exhibit a unique form of mutual repulsion that has nothing to do with charge. We call that effect “volume.” While it is by no means an absolutely conserved quantity (lemon squeezers affect it, and black holes are the ultimate annihilation of it), it is nonetheless a very strongly conserved quantity for most objects on the surface of the earth. So even as the more detailed trees and details of individual sand grains fade into oblivion (usually!) at the level of the beach goer, volume becomes additive. Additive volume enables another complex suite of properties having to do with how surface interactions between grains in media of air or water are able to convey, albeit weakly, another property of the sand grains: a much-weakened form of stiffness, or resistance to being moved. For sand in air and sand in water, the conveyance of the deeper stiffness of stone is quite weak, but in the mix of air, water, and sand at the edge of the waves it becomes noticeably higher.

    Such properties are often called “emergent,” but that my be too fancy a term for it. Such properties are often more like interesting and tasty recipes, with a bit of this ancient tree (“volume” from identical electrons) added to that ancient or more recent tree (“hot sand!”) to produce a set of properties that are expressed on the same scales that humans most like.

    As humans we have a unique ability to select among these ancient forests of information trees and use them both to simplify and to analyze the world around us. For a day on the beach the emergent smorgasbord a bit of volume, a bit of stiffness, a lot of give, and warmth conserved from the son combines into a new expression that we like just fine for a relaxing day on the beach.

    And to close: Deep within all of this lies time, or at least time as poor Boltzmann defined it. We stand at the tips of a nearly infinite grove of information trees, some recent, many as ancient as the universe itself, all interleaving and fading and unifying in complicated and unexpected ways to give that remarkable result we think of so casually as “now.”

  • Andrew Reeves

    Sean,

    Can you link me to some of David Tongs work describing why chiral fermions on a lattice might suggest we are not living in a computer simulation? I have always loved this topic and would like to read more about it. (I am not a physicist but please do link to any actual physics articles as I can typically understand a good deal of the math and concepts).

    Thanks!

    • http://blogs.discovermagazine.com/cosmicvariance/sean/ Sean

      It’s not really “work,” I’m afraid, just a brief talk that David gave. The argument is simple if you know the chiral-fermion story. Computers on which we might be simulated are generally digital; but the inability to but chiral fermions on a lattice implies that reality is fundamentally analog. Therefore, reality is not a computer simulation. (I don’t buy the first assumption, but it’s an interesting argument.)

  • Andrew Reeves

    Thanks Sean, you’ve given me some great topics to start looking up and reading about. Maybe you could ask for David Tongs to do a guest post about this subject…?

    So just for the hell of it I’d like to ask a question. If we created an artificial intelligence in a simulated but simplified world using digital computers (where the world followed the same physics as we now know them but any uncertainty or randomness was generated by a pseudo-random number generator), is there a way that the simulated intelligence could determine they were in a digital reality versus analog?

    • http://blogs.discovermagazine.com/cosmicvariance/sean/ Sean

      One of the reasons I’m skeptical of the original argument is that all computers are “really” analog, given that they are constructed in the real world (which is analog by this definition). Digital-ness is just a useful approximation, in some sense.

      Here is another article by David on a similar theme: http://fqxi.org/community/forum/topic/897

  • http://none rob

    First time visitor; liberal arts guy fascinated by science, impressed with your work, and your respondents’ responses, much of which I can understand word-by-word, some I can actually comprehend in concept.

    I have a liberal arts reaction that extends your complaint about too much data collected apparently for its own sake (if not for the sake of grants?). I’d like to push it beyond data to the conclusions based on some of it–that is, logical conclusions that could easily be drawn but seem to be excluded from the debate. I’m talking about the Climate issue.

    Sci Am’s daily newsletter, about ten days ago, included an article on an aspect of Global Warming data, centering on man-made emissions, primarily those from hydrocarbons. It generated the usual mix of what, to a non-scientist, looks like sound responses on both sides, and a few weird ones on both sides.

    The importance of the debate is, what does it mean in practical terms for mankind? The immediate importance does NOT lie in the damage the (increasing) consumption of hydrocarbons poses over a pretty long time, it lies in the politico-economic question of whether the major powers, and even allies among those powers, are forced to go to war with one another to secure a share of a critically reduced critical resource. The more ‘developed’ the country, the smaller the percentage of current supply it can forgo before it reaches stall velocity: losing, say, 30% of supply doesn’t mean its citizenry lives on average 30% worse; it would mean economic collapse well before 30% is reached.

    I’m so turned off by the apparent behavior of Warmists toward nay-sayers in their own professional community, that I tend to discount anything they put forward. But the scientific argument itself is moot. The Warmists’ view should prevail, even if it’s wrong, because ANYTHING that tends to move governments away from oil is in the whole world’s interest.

    There won’t be enough oil resources, reachable at any plausible cost before critical shortages start, to cause the kind of cataclysmic changes the Warmists otherwise perhaps correctly predict. But we developed nations have plenty of the resources for war, and you can bet we’ll save the last increment of hydrocarbons to power the logistics system to deliver those resources, men and bombs, on target, and keep them up and running.

  • http://terrybollinger.com Terry Bollinger

    In #12 Andrew Reeves asked:

    > … is there a way that the simulated intelligence could determine
    > they were in a digital reality versus analog[?]

    Not really.

    The first problem is that it is always possible that the programmer could just insert a block of to keep the virtual intelligence from ever wondering such things. But you have asked the question, so clearly that has not happened in your particular case, yes? So, moving on…

    The second problem is that the programmer could prevent the intelligence from every seeing the data that would answer the question. The simplest way to do that would be to change its memory so that it distinctly recalls proving itself to be real anytime the question comes up. But again, you are not convinced yet, so that clearly has not happened in your particular case, yes? So, moving on…

    The third problem is the one Sean Carroll mentioned, which is that you really can’t make many safe assumptions about what the ├╝bercomputer in question can or cannot do. David Tong apparently took the heuristic of arguing that reality is “too analog” to be a computer. The weakness of that argument shows in the observation that up until the late 1950s the majority of useful, working computers were analog, not digital. similarly, Feynman kicked off the field of quantum computer by observing that classical digital computer just can’t handle the strange implication of entanglement, at least not without introducing enormous delays in processing. I think in some ways Feynman’s argument there is stronger for saying that we don’t live in any kind of computer that we would recognize. But of course, who is to say that the ├╝bercomputer is not a quantum computer?

    And that brings up a fourth issue, which to me is the strongest argument there is for saying that you and I don’t live in a simulated reality: Occam’s Razor.

    You see, it turns out there’s only one way to build a true quantum computer: By borrowing existing quantum effects from existing particles and atoms. There is no other way, because nothing in classical physics can emulate these remarkable “down at the bottom” effects.

    Now think of that in terms of Occam’s Razor: To build a universe with the complexity and unique effects that we see constantly at the quantum level, you have to… use up most of another universe that simply lends its own quantum effects to ours? Occam says ouch to that one; the structure has just become too complicated for no really good reason.

    So, here’s my answer: Our universe is its own best motherboard, not because it exhibits any particular signs of any particular type of computing, but because it exhibits a family of quantum effects that as best we can tell would require outrageously Rube Goldberg ideas and contraptions just to emulate them on some other-universe ├╝bercomputer.

    And to close, here’s a question: What is it about quantum effects that is so hard to emulate?

    Let me suggest an answer that I assure you won’t pop out from any computer-rich lattice methods that assume as a given in their construction a lattice of almost infinitely precise spatial and temporal locations. Lattice methods are programming methods, not physics, and it’s important to keep that distinction in one’s head when looking for fundamentals.

    Speaking intentionally only at a conceptually level, I would suggest that the most important reason for why quantum effects are so very odd is that quantum systems are right at the ratty edge of having enough information available to exist. As a direct consequence, they become sort of… well… forgetful.

    For example, here’s a simple question: Why do atoms occupy space? Well, you can and should describe volume by using the uncertainty principle. However, another way to look at the same issue is simply to say that very-low-mass electrons — mass being one indicator of available information content — just don’t have enough information storage in them to remember where they are. Conversely, if you increase their effective mass through special relativity (velocity), their memories improve immensely. Sufficiently accelerated electrons having enough spatial precision even to probe for quarks within protons and neutrons. But the garden variety low-temperature (relatively speaking!) electrons? Very forgetful beasts. They can remember that “home” is somewhere around some atom most of the time, but they are flatly incapable of getting very precise about it.

    That’s a good thing too! Otherwise we’d all be in the bottom of a black hole the size of a large piece of gravel. Forgetfulness means volume, and many other nifty effects as well.

    So for me that is another reason, an even stronger reason, to think we are “real as is.” A universe that is so severely stressed to remember where it is at the bottom of its physics sounds to me like a quite sincere universe that is just doing the best it can, without tricks.

    And within that argument there also lies an experimental option for your question: If you are truly dedicated to finding a crack in the armor of independent existence, some sort of data proof that out universe may not be quite what it looks like, the first place you should look is for something that violates the uncertainty principle. The uncertainty principle is the guard at the end of the tunnel that ends in nothing, and if you can find something beyond that, you have indeed violated the boundary of our universe and found evidence of something very strange going on on the other side. That, indeed, would be a revolutionary finding — and I should also note, one for which all of experimental physics has found exactly zero evidence, not even a hint of violation. Heisenberg’s Guard remains firmly in place.

  • http://devicerandom.org devicerandom

    For me it is the opposite. I like data in itself. When I was in academia, I realized that what I liked more whas building maps of my data, and I still am fascinated by the idea of data collection, classification and preservation. I talked about it, sideways, here.

  • David

    I’m afraid I see a troubling trend in academia and in science in particular. In order to determine merit and allot funding, we have by necessity moved away from looking critically at what people are doing (that requires thinking which is hard) and more towards a straightforward quantification (that requires simple addition so is easy). If you observe or experiment, tell funders how much data you have acquired and that will serve as a benchmark, the more the better. If you simulate, tell funders how many equations your simulation can handle, and that will serve as a benchmark, the more the better. And if you have more and you come up with an innovative way to store or display that more, you are golden.

  • Pingback: Kepler Data Visualization and Data Fatigue « Technical Communication at UAHuntsville()

NEW ON DISCOVER
OPEN
CITIZEN SCIENCE
ADVERTISEMENT

Discover's Newsletter

Sign up to get the latest science news delivered weekly right to your inbox!

Cosmic Variance

Random samplings from a universe of ideas.

About Sean Carroll

Sean Carroll is a Senior Research Associate in the Department of Physics at the California Institute of Technology. His research interests include theoretical aspects of cosmology, field theory, and gravitation. His most recent book is The Particle at the End of the Universe, about the Large Hadron Collider and the search for the Higgs boson. Here are some of his favorite blog posts, home page, and email: carroll [at] cosmicvariance.com .

ADVERTISEMENT

See More

ADVERTISEMENT
Collapse bottom bar
+

Login to your Account

X
E-mail address:
Password:
Remember me
Forgot your password?
No problem. Click here to have it e-mailed to you.

Not Registered Yet?

Register now for FREE. Registration only takes a few minutes to complete. Register now »