Deep time

By Sean Carroll | August 22, 2005 11:42 am

Dinosaur comics discusses entropy and the fate of the universe.

Dinosaur comics

T. Rex muses on the Poincare recurrence theorem and Boltzmann’s suggested resolution of the arrow of time problem, but Dromiceiomimus seems to have a better understanding of the lessons of modern cosmology. Utahraptor, meanwhile, argues that the universe is not manifestly ergodic, and insists that the entropy problem is not yet resolved.

  • Steinn Sigurdsson

    Shouldn’t this be in the “great errors civilians make” category?

    Because given infinite time, we could still skip not only a finite subset of possibilities, we could skip an arbitary number of infinite subsets of possibilities.

    In fact, annoyingly, we could just repeat a finite number of possibilities an infinite number of times and be exceedingly boring. Missing out on almost all of the infinite possibilities through sheer stubbornness.

    But now I feel like I am channeling Max, so I’d better stop… 😉

  • Pingback: dasmoo »()

  • bittergradstudent

    Steinn: but doesn’t that violate the spirit of the approach Boltzmann took in formulating stat mech (that all microstates have equal probability?)

  • Steinn Sigurdsson

    All accessible microstates, surely.

    There may well be physically possible states, which are not accessible from (some given) initial conditions.

    Maybe if it can be shown that the ensemble of universes includes all possible initial conditions, then given infinite time all allowed microstates occur; though I’d like to see a proof by construction (ie I can’t convince myself that it is impossible to exclude some subset of conceivable microstates by any such evolution).

    Given some particular set of “initial condition” on a “small enough” space (which I think may still be infinite), I think a heuristic proof is possible that many possible microstates are never actually reached, even given infinite time.

    For a finite initial spatial extent, but infinite time, I think the proof is trivial.

    Actually, surely this is trivial: even under Boltzman, we could have parity constraints – for example an infinite universe for an infinite time might still be such as to exclude anyone ever being lefthanded. It would be physically conceivable for people to be lefthanded, but a strong constraint forbidding any actua l person from actually achieving left handedness in reality. And this would be purely arbitary in that it could have been right handedness that was excluded.

    Anyway, you can see where I am going with this…

    As long as we don’t get into a semantic argument about “possible”.

  • Maynard Handley

    I so damn sick of people not understanding this.
    Here is an essay from my wiki (not yet on-line) that explains EXACTLY what is going on here.
    (It’s in mediawiki syntax, but even so should be pretty easy to read.
    The only thing that’s probably not clear is {{sc|pdf}} means PDF in small caps.)

    =The Second Law of Thermodynamics and Boltzmann’s H-Theorem=

    ==The issue==
    When one reads about statistical mechanics, both in textbooks and in popular
    works, some misunderstandings on the subject that date from the late 1800s
    and early 1900s still remain common. These tend to cluster around three

    * The second law of thermodynamics is absolute but a particular system can evolve towards a “less likely” state.
    * The reversibility and recurrence “paradoxes”.
    * Are the systems of classical mechanics ergodic?

    Physicists of the modern world have basically made their peace with the first
    point. Although it was a big deal in the late 1800s, and considered a blow
    against kinetic/atomic theory, no-one nowadays has a problem with the reality
    that ice-cubes in the sun melt, alongside the theoretical possibility that
    just once, this time, liquid water placed under a warm sun might freeze.
    Even so, the understanding of what this actually means mathematically
    is pretty limited and, if pressed, the details are usually wrong.

    The second point, even more so, is usually completely botched, and the
    horrible explanations given for it usually poison the understanding of the
    other two points. So with this in mind, let’s examine the issue properly.

    ==A little history of the H theorem==

    In the mid-1800s Clausius came up with the idea of entropy and the second
    law of thermodynamics. At this time, recall, the very idea of atoms was
    controversial; some felt that the concepts of thermodynamics were primal and
    did not need to be justified or derived from models of how the world was
    constructed, while at the same time others were pursuing the kinetic theory
    of gases and trying to use its successes to prove the existence of atoms.

    Against this background, the most significant thing yet proved about the
    kinetic theory of gases was Maxwell’s velocity distribution. But some people
    were unhappy with various aspects of the proof. The proof then, just like the
    proof one usually sees today, assumed that the velocities of interacting
    molecules were uncorrelated, something some felt was not justifiable.
    (On the other hand, by making this assumption, the proof showed the
    generality of the resultant distribution regardless of whatever details
    one might assume of the interaction of the molecules.)
    Boltzmann, to deal with this, came up with the Boltzmann Transport Equation,
    which more explicitly dealt with the interactions. It was fairly easy to show
    that a maxwellian distribution was static under this equation, in other words
    would not change with time. But Boltzmann wanted to show something more; that
    any other distribution would monotonically evolve towards the maxwellian

    To this end he defined a quantity (which we now call H), a property of a
    particular distribution, and proved two things

    * dH/dt = 0 for a maxwellian distribution and
    * dH/dt
    The sad thing is that this same
    mishmash of poorly thought-out arguments and counter-arguments still appears
    in today’s textbooks. I remember being bugged by the sloppiness of these
    arguments back almost twenty years ago when I was an undergraduate.

    None of this is necessary — there’s a perfectly good, perfectly simple
    explanation for what’s going on that doesn’t require this handwaving.
    However to get to that point, we need a slight detour. I’m going to give the detour in
    more detail than is needed just to deal with this problem because the ideas are
    interesting, worth remembering, and best understood in a non-thermodynamics
    context than hasn’t been poisoned with invalid arguments.

    ==Data compression==

    Let’s switch to an apparently very different problem, the problem of data
    compression as performed by computers. Data compression consists of two parts.

    The first stage, called modeling, transforms fragments of the data in some
    way so as to generate a stream of so-called symbols. Modeling varies from
    compression scheme to compression scheme — in JPEG, for example, it
    involves, among other things, splitting the image into 8×8 blocks and
    performing a 2D DCT (something like a Fourier transform) on the data in each
    8×8 block.

    Modeling is specific to each compression scheme and the details do not
    matter to us. What matters is that after modeling the result is a stream
    of what we might abstractly call symbols. Suppose that our modeling results
    in symbols that can have values 0..255. The simplest way to store these values
    would simply be to use 8 bits for each symbol. This, however, would be far
    from optimal if some symbols are very much more common than other symbols.

    ===entropy coding===
    What is done in data compression is to encode the symbols using what is called
    entropy coding. Entropy coding comes in two forms, Huffman coding and
    Arithmetic coding.

    {{infobox1|A second part of the theory
    is that the bit stream you construct has to be readable, even though there
    are no markers between the (variable length) bit strings indicating where
    one stops and the next starts. This implies that the collection of bit strings
    you use has to possess what is called the prefix property.}}
    Huffman coding uses shorter strings of bits for symbols
    that are more common, and longer strings of bits for symbols that are less
    common. The theory tells you (given the probabilities of different symbols)
    the optimal way to map symbls onto bit strings.

    Arithmetic coding achieves the same goal as Huffman coding, namely using fewer
    bits to encode the more common symbols, in a way that is somewhat more
    efficient than Huffman coding, but quite a bit more difficult to understand.
    However it’s not relevant to our discussion.

    ===an example: compressing english text===
    So, given what we have said above, suppose we want to compress some data.
    To avoid getting bogged down in irrelevant details, let us assume that the
    data we want to compress is English language text encoded using 8-bit ASCII
    using LATIN-1 high-bit encoding,
    and that we are going to ignore the modelling stage of compression.

    So the problem we have given ourselves is that we have symbols which are 8-bit
    ASCII characters, 0..255. Right away we know that some characters are going
    to be far more common than others. The characters with the high bit set (ie
    with a value >127) are highly unlikely. These refer to diphthongs accented
    characters, punctuation symbols rarely used in English and so on.
    Punctuation characters are less likely than many letters, and capital letters
    are less frequent than lower case letters.
    Certain letters are much more likely than other letters.

    ====the probability distribution function for english text====
    Compression is all about having an accurate mathematical model of the
    probability structure of the data.
    As a first approximation, we can consider the probability of each individual
    ASCII character. This gives us an array of 256 probabilities. In some vague
    sense that philosophers can argue over, there is presumably some sort of
    “ideal” probability distribution function ({{sc|pdf}}) for English language text
    that incorporates all text that has been and can be written, and that’s what
    our compression program is targetting. But, of course, we can’t just conjure
    up that ideal, so what we do is gather a large body of what we hope is
    representative English text, calculate the empirical (as opposed to ideal)
    statistics for that text, and treat those (sample) statistics as representative
    of all English text and thus equal to our philosophical ideal.
    We can then use these empirical probabilities to
    construct a Huffman code (or to drive an arithmetic coder), and we have a
    way to compress English ASCII text.

    ===the mathematical entropy associated with any discrete {{sc|pdf}}===
    Now let’s step back a little from this example and consider the general
    issue. As ”’mathematicians”’, we can define a quantity, named
    the ”’mathematical entropy”’, for any {{sc|pdf}}. The entropy is defined as

    S=-Sum[ probability(symbol)*lb( probability(symbol) ),
    summed over all symbols ]

    where lb() is the binary log (ie log to base 2) of a number.

    This may seem a bit much to take in, but really it’s not hard.
    Let’s assume we have four symbols, A, B, C, D, and that the probabilities are
    (A, 1/2) (B, 1/4), (C, 1/8), (D, 1/8)
    The entropy associated with this {{sc|pdf}} is 1*.5 + 2*.25 + 3*.125 + 3*.125 =1.75.

    Note that perfect entropy
    coding of a collection of symbols with some given {{sc|pdf}} means that each symbol
    will take, on average, -lb( probability(symbol) ) bits to encode.
    (Probabilities are less than one, the log is negative, so we add a minus
    sign to make the result positive.)
    So perfect entropy coding of our example would utilize
    1 bit to encode an A, 2 bits to encode a B, and 3 bits to encode a C or a D.
    It should be obvious from the above calculation that the entropy of the {{sc|pdf}}
    is nothing more than the average number of bits required per symbol to perfectly
    entropy encode a stream of data conforming to this {{sc|pdf}}.

    {{infobox| arithmetic coding |
    In fact arithmetic coding entropy encodes data using a non-integral number
    of bits per symbol, so we can actually approach perfect entropy coding in real
    computer programs. This is a pretty neat trick, and I’d recommend you read
    up on how it is done if you have time.
    You may wonder what happens when the probabilities of a symbol are not nice
    power-of-two probabilities as in the example. In that case, Huffman encoding
    cannot generate perfect entropy coding results, because the length of a
    Huffman code is obviously some integer number of bits, while the perfect
    entropy code might be some irrational number of bits, say 3.7569…
    In this case the average number of bits required to Huffman encode the symbol
    stream will be larger than the entropy; the entropy is a lower bound, the
    absolute best we can do.

    There are, of course, different {{sc|pdf}}s for different
    sets of material we may consider compressing, for example the statistics,
    and thus the {{sc|pdf}}, associated with the set of all photos, information very
    valid to the design of a compression scheme like JPEG, are very different
    from the statistics for English language text.

    ===entropy is a property of a {{sc|pdf}}, not a finite sample from that {{sc|pdf}}===
    At this stage we now need to point out an essential point,
    ”’the”’ essential point to understanding this stuff, both in the context
    of data compression and later in the physics context:
    ”’The {{sc|pdf}} describing the distribution of symbols is a property of some abstract infinite stream of symbols, for example some vague idea of the set of all English text.”’
    Now the properties of a {{sc|pdf}} will almost certainly be measured empirically,
    using as large a collection as is feasible of the type of material we want
    to compress, for example a large collection of English documents.
    From the statistics of this sample stream, an estimate of the entropy of
    the {{sc|pdf}} governing these symbols is then a simple calculation.
    The {{sc|pdf}} is, however, some sort of ideal entity not linked to
    the particular sample material we used; the particular symbol stream
    used to design a compression algorithm is simply regarded as a
    representative sample from an infinite stream of symbols.

    ===a misleading concept. the “entropy” of a finite sample===
    Switch now from the idea of all English text to focus on a particular
    piece of English text, a particular file we wish to compress.
    For any ”’specific”’ piece of English text, we can compress the stream of
    symbols using an entropy coder and the {{sc|pdf}} for English text, and the
    compressed data will have some size, meaning some average number of bits
    per symbol.
    We can call this, if we want, the entropy of this ”’specific”’ piece of English
    text, but it is conceptually a very different thing from the mathematical
    entropy we defined for the English language {{sc|pdf}}. This specific entropy (ie the
    average number of bits required per symbol to represent the text) may be
    rather larger than the entropy of the English language {{sc|pdf}} (for example the
    text may be something written by James Joyce, or an article about words to
    use in scrabble), or this specific entropy may be less than that
    of the English language {{sc|pdf}} (for example the text may
    be written for children, and may utilize only short simple words with
    very little punctuation).

    ===if you want to learn more about data compression and entropy coding===
    {{infobox| correlation between symbols |
    The most important subject we have omitted from the discussion above,
    interesting but not relevant to where we are going with this, is
    exploitation of the correlation between
    successive symbols to reduce the number of bits required for compression,
    something that gets us into Markov models. (An obvious example is that the
    letter q is almost always followed by the letter u, and surely a compression
    scheme should be able to exploit that somehow.)
    While Markov models are a
    theoretically powerful method of doing so, there are severe practical
    problems with using them because of a combinatorial explosion in the number
    of probabilities one has to keep track of. The major goal of modeling
    is to attempt to restructure the data stream from its initial form, where
    there are obvious correlations between various pieces of data, to some
    intermediate form whose symbols are, as far as is practical, independent of
    each other. How best to do this clearly depends on the type of data and the
    techniques used for text, still images, video, general audio or speech are
    all very different.
    The rest of the book is concerned with the details of the modeling used by
    JPEG2000 — fascinating but very dense.
    If you are interested in the details of entropy coding beyond what I’ve
    discussed, IMHO by far the best introduction is Chapter 2 of
    [ the JPEG2000 book by Taubman and Marcellin].
    (This is an expensive book and, unless you are really interested in the
    subject, you probably won’t want to read most of it, so I’d suggest borrowing
    a copy from a library or a friend rather than buying it.)

    ==The H theorem refers to {{sc|pdf}}s, not samples==

    {| style=”float:right; margin-left: 1em; width:50%;” cellpadding=5 cellspacing=1 border=0
    |align=left width=100% style=”background-color:#f3f3ff; border:1px solid”|
    ”’physics entropy rather than cs entropy”’

    Note that the explanation above utilized by logarithms to base 2 to calculate the
    entropy for the purposes of computer science. In physics, with a different set of
    concerns we calculate entropy using logarithms to base e, but the essential points
    remain the same.
    Note also that the explanation above dealt with a discrete {{sc|pdf}}.
    There are interesting technical mathematical challenges when one goes from a
    discrete {{sc|pdf}} to a continuous {{sc|pdf}}, like for example, a gaussian, but
    we will ignore those and focus on the important thing which is that, after all
    the pain of proving the results, the bottom line is that our ideas from discrete
    {{sc|pdf}}s map over to continuous {{sc|pdf}}s pretty much as we’d expect.

    With the above detour out the way, let’s return to Boltzmann;
    perhaps you can already see what the fundamental issue is.
    Boltzmann’s theorem refers to ”'{{sc|pdf}}s”’. It says that the time evolution of a
    {{sc|pdf}} occurs in a certain way.
    Meanwhile the reversion and recurrence paradoxes refer to
    specific instances of a mechanical system, ”’not”’ to {{sc|pdf}}s. As such, what they
    do or don’t say is irrelevant to Boltzmann’s theorem.

    ===a rigorous mathematical view of the Boltzmann transport equation===
    More specifically we can say that, from the point of view of a nicely
    manageable mathematical structure, we want to talk about {{sc|pdf}}s.
    We can, as mathematicians, define a mathematical structure that is a function
    of space and time and that has as its value at each space-time point a value
    which is a probability density function for a velocity. This is a more careful,
    more explicit way of defining the function of Boltzmann’s transport equation.
    If we now define a way in which this {{sc|pdf}}-valued function evolves with time
    (the Boltzmann transport equation) we have a perfectly consistent well defined
    mathematical problem. We can now prove various properties of this
    mathematical system, one of which is that (assuming various properties of
    specific transport equation we’re using), the entropy of the {{sc|pdf}} associated
    with each spatial point is monotonically non-decreasing.
    (This mathematical result holds for any {{sc|pdf}}, but is physically only useful for
    situations where a {{sc|pdf}} plausibly suggests itself.
    For the most part such situations are either equilibrium [ie the
    pdf is the maxwellian-boltzmann distribution], or “different equilibrium at
    different points of space” eg a gas with some non-uniform temperature
    distribution. )

    ===a real world view of a collection of molecules===
    OK, this is a fully consistent mathematical construction.
    However to some extent in the real world, we don’t deal with {{sc|pdf}}s,
    we deal with finite collections of real atoms or molecules.
    For example a finite collection of real gas molecules does ”’not”’ according
    to the Boltzmann transport equation. The very idea makes no sense, since
    the entities referred to in the two situations (on the one hand a {{sc|pdf}}-valued
    function, on the other hand a large collection of positions and velocities)
    are completely different.
    A collection of real gas molecules evolves according to the laws of mechanics
    rather than the Boltzmann transport equation, and therefore is indeed subject to
    the issues of reversibility and recurrence, properties that can be proved for
    mechanical (hamiltonian) systems.

    Now, going back to the transport equation, the pdf that we associate with any
    particular point of space-time at equilibrium is, of course, the maxwellian
    distribution. With this distribution in mind, note that, just as we did with our
    specific piece of English text, we can calculate a ”’specific”’ entropy for a specific
    collection of gas molecules. Such a calculation would first calculate the appropriate
    “temperature” parameter for this collection of molecules, perhaps based on the
    standard deviation of the distribution of speeds of all the moelcules. It would
    then loop over all the molecules, for each one calculating, for that molecule’s
    velocity, an appropriate probability from the maxwellian pdf, multiplying that
    probability by the log of that probability, and summing the results.
    Just as in the case of compressing a particular piece of English text, this calculation
    might result in a value higher or lower than the entropy of the maxwellian {{sc|pdf}} at
    the temperature we calculated for this system.

    ===connection between the mathematical ideal and the collection of molecules===
    The connection between the mathematical ideal and the real world is that
    # assuming the mathematical {{sc|pdf}} is chosen correctly, things happen in the real world as frequently or infrequently as the probabilities of the {{sc|pdf}}, ie sampling the properties of a large number of molecules and binning the results will give you values just like what you’d expect from the {{sc|pdf}}
    # the {{sc|pdf}} for most physical situations is astonishingly peaked, meaning that physical configurations of molecules that don’t match everyday experience have ridiculously low probabilities. (Compare, for example, the statistics of some randomly chosen piece of English text. We expect it to have statistics much like that of the English language pdf, but would not be surprised to learn that, for example, this piece of text utilizes 1% more “e”‘s or 5% fewer “w”‘s than the pdf tells us are the case for the entire universe of English language text. However when dealing with, of order say 10^18 molecules that have had a chance to equilibrate, we would expect to wait much longer than the age of the universe before seeing deviations of order 1% between statistics calculated for our collection of molecules as compared to the appropriate value calculated from our {{sc|pdf}}}.)

    ==Reconciliation between thermodynamics and Boltzmann==

    So in summary what we can say is that
    # Boltzmann was right, in that the H-theorem does provide a mathematical proof of the monotonic increase of entropy AS HE DEFINED IT.
    # His opponents were right in that real mechanical systems, in theory (though hardly in practice) can reduce their entropy AS THEY DEFINED IT.
    # We would all be better off using a different word to distinguish the entropy of a {{sc|pdf}}, a nice, clearly defined mathematical construction, from the “entropy” of a specific mechanical system, a rather less well defined mathematical construction. (You can come up with a consistent mathematical definition for this “specific” entropy, but the result doesn’t quite mean what you probably think it means.)
    # The fact that the {{sc|pdf}} entropy is (in practical terms) equal to the (per-instance) entropy is an example of a not-infrequent situation in science: two conceptually very different mathematical ideas, when not well understood, are considered to be the same thing. At first this allows for progress, but once the field is understood, the conflating of the two ideas (which usually occurs through using language inexactly) is inexcusable. Unfortunately it is a rare case indeed where textbook writers are willing to break with the past and modify their language so as to undo this confusion.

    Another view of this is to bring classical thermodynamics into the mix.
    One mathematically consistent way to look at the world is via statistical mechanics, utilizing
    {{sc|pdf}}s and appropriately defined entropy as I have discussed.
    Another mathematically consistent viewpoint is axiomatic thermodynamics which takes
    concepts like temperature, entropy, and the second law as unprovable starting points.
    What is not consistent, and where one gets into trouble, claiming things like “the second
    law is only true on average” is where one attempts to utilize the statistical mechanics
    viewpoint, but applies it not to the calculation of {{sc|pdf}}s, but to the calculation of
    the average properties of some ”’specific”’ collection of molecules.
    If you’re going to do this, you need to be very careful about exactly what you are claiming
    is a specific property of your collection of molecules vs what is a property of the set of
    of all collections of molecules. The astute reader will realize that Gibbs’ ensembles are,
    essentially, a way to deal with this issue and, that, though not using my language, he is
    concerned with calculating {{sc|pdf}}’s and their properties.

    =Zermelo’s Criticism of the H-Theorem=

    Along with the misguided attacks on the H-theorem, those that mistake the
    evolution of the pdf for the evolution of the system, that we have discussed,
    there is a more interesting attack, first presented by Zermelo.
    The argument goes thus:
    Liouville’s theorem tells us that under evolution via a Hamiltonian, the
    measure of a subset of phase space does not change. It’s a short step from
    this to showing that this means that the H of a mechanical system cannot
    change (for any {{sc|pdf}}). After thinking about this for a few seconds, this
    actually becomes quite reasonable, especially when thought of in the context
    of our description of file compression above. What we have is a system with
    a certain amount of uncertainty (the initial {{sc|pdf}}) along with deterministic
    evolution in time which is not adding any more uncertainty.
    (How can one reconcile this with Boltzmann’s proof of the H theorem?
    That proof includes an expression describing the scattering after
    interaction of two components, and reduces this to some sort of probabilistic
    expression. If the Hamiltonian is taken as gospel, this reduction must be
    invalid, and must be ignoring correlations in the components from earlier
    interactions that, although apparently small, are actually essential.)

    This is something of a kick in the pants, and strikes me as much
    more problematic than the earlier attacks on the H theorem.
    My take on this matter (and I’d love to be corrected if I am wrong) is that this
    can be viewed in two ways.
    * One could attempt to argue that H (or the equivalent, entropy) has not really increased because there exist fiendishly complicated correlations between the various components of the system; these correlations are, however, not in any way apparent to our eyes, and so the system appears to have become more disordered. It’s hard to keep this up, however, across all physical phenomena, for all of time. This argument is essentially claiming that the disorder of the world (and its increase) is only in our brains, not in reality.
    * Alternatively one could argue that, although these correlations between components grow for some amount of time, every so often something occurs that ruins the coherence, and that it is ultimately this something that is driving the second law. In the pre-quantum past this something was called “molecular disorder”, and now we might call it “collapse of the wave function”. This is the view I espouse and is, I suspect, what most physicists would agree with if pushed. What is interesting is that so important an issue, ”’the”’ driver of entropy increase, is simply not mentioned in the same elementary textbooks that make such a mess of explaining the supposed problems with the second law.

    The reader will, I trust, not have missed the remarkable similarity between
    this discussion and the general problem of the evolution of quantum systems.


    A final related issue that sometimes causes confusion, though more so in the past,
    is the issue of ergodicity. Ergodicity is the claim that a ”’specific”’ mechanical
    system, if left for long enough, evolves through all the states of the {{sc|pdf}},
    with the amount of time spent in the neighborhood of each state being
    proportional to the probability associated by the {{sc|pdf}} with that neighborhood.

    The ergodic assumption is, to clarify, not a part of the calculation of a {{sc|pdf}}
    or how a {{sc|pdf}} evolves in time; it is useful when trying to connect the abstract
    idea of a {{sc|pdf}} to the concrete reality of a specific physical system, the idea
    being something like: if the ergodic hypothesis is true, then the specific
    mechanical system (collection of gas molecules or whatever), evolves through
    enough states over a macroscopic period of time that what our senses and our
    instruments see is simply an average, moreover that average (over time, for
    this specific instance of the mechanical system), is the same as the average
    one calculates by averaging over the {{sc|pdf}}.
    Maxwell and Boltzmann on occasion justified what they were doing on the basis
    of the ergodic hypothesis.

    If one is even slightly familiar with measure theory, the ergodic assumption
    appears to have to be false; one is trying to map a trajectory (a single
    continuous line) onto a volume, and measure theory tells us that while this
    can be done, it cannot be done with a continuous mapping. The bottom line is
    that mathematicians fairly easily proved that the ergodic assumption was
    false. However it appears that what Maxwell and Boltzmann meant by the ergodic
    assumption was not the exact ergodic assumption described above but something
    that looks pretty much the same to physicists but not to mathematicians, the quasi-ergodic
    assumption, which assumes that while the system will not pass through every
    state in available,it will passes ”’arbitrarily close”’ to every state available.

    Even this less demanding quasi-ergodic assumption is not
    necessarily true for certain specific states of certain specific mechanical
    systems. One can imagine, for example, a collection of billiard balls arranged
    so carefully (as a lattice perhaps) in such a way that as time goes by they
    continue to bounce off each other, while retaining the lattice structure,
    forever. But this is clearly a somewhat pathological example.
    Mathematicians have delighted in looking at this problem in ever
    finer detail, asking if there are conditions one might place on either the
    initial state or the collection of forces (ie the hamiltonian/lagrangian)
    governing the time-evolution of the system, that will compel the system to
    either be or not be quasi-ergodic. Their conclusion seems to be that
    many interesting classical-mechanical systems are in fact quasi-ergodic.

    This discussion is, to be honest, quite irrelevant to our real-world interest in
    statistical mechanics. In the real world, what we want to know is to what
    extent the averages we can calculate easily (ie averages over a {{sc|pdf}}) will
    match what we measure (ie averages over some finite volume and some finite
    timespan of the evolution of a specific instance of a mechanical system);
    sometimes we are more ambitious and also want to know the extent of the
    deviations we might expect our real world measurements to take from the {{sc|pdf}}
    Ergodicity, as the mathematicians deal with it, is basically useless for this task.
    * First of all it is clear that minute perturbations to the mechanical system, (for example gravitational effects of other planets) while presumably having no effect on large scale averages like density and pressure, have a significant effect on ergodicity or the lack thereof — return again to our finely balanced lattice of moving billiard balls. Ergodicity is an astonishingly brittle property.
    * Secondly real world systems are, of course, quantum mechanical, and while classical mechanics is frequently a fine approximation to their behavior, it’s not at all obvious that a system that is proved to be ergodic or not as a classical system is actually such as a quantum mechanical system.
    * Thirdly it is not relevant to know that over a _long enough_ duration of time a time average matches a {{sc|pdf}} average. One wants to know behavior over a specific duration of time, eg the duration that one’s experimental sensors are active.

    As far as I can tell, as real world physicists, we pretty much simply
    make the assumption that for any system we care about the myriad sources
    of randomness in the world (minute perturbations, quantum effects, the finite
    size and duration of measurements), all blend together in such a way that
    {{sc|pdf}} averages are expected to match experimental results. I know of no
    mathematical results that come even close to proving that this is actually
    expected to be the case for real-world conditions, though it seems like the
    sort of thing that could be proved if one were smart enough.

  • Saucy Wench

    Yeah, I read all that. *rolls eyes*

  • Steinn Sigurdsson

    Oh my, now we’re all, like, serious and stuff.

    First, lets remember that there is a qualitative difference between infinite time and merely a very long time.

    Now, let me present a toy model of the universe. Fans of some recent speculation may feel free to treat it as a real model of the universe.

    Consider a 2-D Euclidean sheet.
    Divide it into unit pixels.
    Without loss of generality, let each pixel have two states, 1 and 0.

    Let there be some initial time, T0, and let time advance in discrete step, dT.

    Each pixel may change state according to some rules, based only on those nearby pixels in “causal contact” (ie after N steps, only pixels whose distance, s, is less than NdT away affect the pixel), for some metric on the space.

    Now, assume that “a priori” if you sample a patch of this space of k pixels, the probability of any microstate is just 1/2^k

    Now, there are logically 4 possibilities

    a) finite space and finite time

    b) finite space and infinite time

    c) infinite space and finite time

    d) infinite space and infinite time

    So, in any of the 4 possibilities above, is it logically possible for all finite substates to be generated?

    For a) and b) there are only finite number of allowed states, in either case it is possible, but not necessary that all states are accessed (given long enough a finite time)

    For c) clearly you can NOT access all states;

    so the only remaining option is d). For which we can ask, whether all possible finite states are reached somewhere on the sheet at some time.

    The answer to that is “depends” – it depends on the sheet “initial condition” and on the rules.

    Further, I would confidently claim that for such a system, in fact for any infinite system, the answer to whether all states are reached or whether some finite, or infinite, subset is never reached is formally indeterminable for most system rules for changing states (because for a lot of rules this reduces to the Turing halting problem).

    So there. We may have infinite time and either finite or infinite space, but not only is it logically possible that some states are never reached, the actual answer may be unobtainable.

    Any resemblance to holography or Wolfram’s speculations is a pure coincidence.

    The possibility of continuous states vs discrete states is interesting; in QFT there is an assumption of asymptotice static and flat background space, this does not hold in reality, and given our actual cosmology combined with finite speed of light the question of true continuous quantum states is somewhat ill determined.

  • Slacker

    Cox has a stupid face.


Discover's Newsletter

Sign up to get the latest science news delivered weekly right to your inbox!

Cosmic Variance

Random samplings from a universe of ideas.

About Sean Carroll

Sean Carroll is a Senior Research Associate in the Department of Physics at the California Institute of Technology. His research interests include theoretical aspects of cosmology, field theory, and gravitation. His most recent book is The Particle at the End of the Universe, about the Large Hadron Collider and the search for the Higgs boson. Here are some of his favorite blog posts, home page, and email: carroll [at] .


See More

Collapse bottom bar