<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Should the Data be Public?</title>
	<atom:link href="http://blogs.discovermagazine.com/cosmicvariance/2006/06/23/should-the-data-be-public/feed/" rel="self" type="application/rss+xml" />
	<link>http://blogs.discovermagazine.com/cosmicvariance/2006/06/23/should-the-data-be-public/</link>
	<description>Random samplings from a universe of ideas.</description>
	<lastBuildDate>Mon, 09 Nov 2009 05:46:49 -0600</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Particle Physics 2.0? &#171; Charm &#38;c.</title>
		<link>http://blogs.discovermagazine.com/cosmicvariance/2006/06/23/should-the-data-be-public/comment-page-1/#comment-18025</link>
		<dc:creator>Particle Physics 2.0? &#171; Charm &#38;c.</dc:creator>
		<pubDate>Wed, 04 Apr 2007 02:20:28 +0000</pubDate>
		<guid isPermaLink="false">http://blogs.discovermagazine.com/cosmicvariance/2006/06/23/should-the-data-be-public/#comment-18025</guid>
		<description>[...] A couple of issues are raised. One is whether the data should be made available to the public (in ASCII four-vectors or whatever); after all the taxpayers fund us, shouldn&#8217;t they get their money&#8217;s worth? I certainly agree that this is desirable, although extremely complicated. Our experimental architectures have not been designed to enable this in a simple manner (it can take literally months for a new collaboration member to learn to access data!), but if this was specified as a requirement from the beginning, as I believe it is for NASA projects, it could probably be done at the expense of a lot of physicist-years. However what is in question is not the data, but the analyses that follow, and even projects that release their data allow that what you extract from the data is your work. [...]</description>
		<content:encoded><![CDATA[<p>[...] A couple of issues are raised. One is whether the data should be made available to the public (in ASCII four-vectors or whatever); after all the taxpayers fund us, shouldn&#8217;t they get their money&#8217;s worth? I certainly agree that this is desirable, although extremely complicated. Our experimental architectures have not been designed to enable this in a simple manner (it can take literally months for a new collaboration member to learn to access data!), but if this was specified as a requirement from the beginning, as I believe it is for NASA projects, it could probably be done at the expense of a lot of physicist-years. However what is in question is not the data, but the analyses that follow, and even projects that release their data allow that what you extract from the data is your work. [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Tony Smith</title>
		<link>http://blogs.discovermagazine.com/cosmicvariance/2006/06/23/should-the-data-be-public/comment-page-1/#comment-17994</link>
		<dc:creator>Tony Smith</dc:creator>
		<pubDate>Sun, 09 Jul 2006 20:00:35 +0000</pubDate>
		<guid isPermaLink="false">http://blogs.discovermagazine.com/cosmicvariance/2006/06/23/should-the-data-be-public/#comment-17994</guid>
		<description>Nathaniel, an &quot;experimentalist&quot;, said that he &quot;... disagree[s] with public data. ...&quot; because he will only get &quot;... After working for six years, Three papers. Out of two hundred authors ...&quot;.

Nathaniel goes on to say that &quot;... there&#039;s a simple solution that should satisfy you theorists nicely: JOIN THE EXPERIMENT! ...&quot;.

A flaw in Nathaniel&#039;s solution is that not every theorist/analyst will get to be affiliated with the experiment collaboration.

It seems to me that a more comprehensive, even simpler, solution would be to make the data public, in a format that is the work-product of Nathaniel and his fellow experimenters, by a paper authored by Nathaniel and his fellow experimenters.
Then, any theorist/analyst (whether or not affiliated) should cite that paper, so that Nathaniel et al would have a very high citation rating.

Further, if any theorist/analyst might ask Nathaniel et al for help in understanding the data, Nathaniel et al should be listed as coauthors for providing such help.

I have tried to follow that spirit in stuff that I have written. For example, in my writings about Fermilab T-quark data, I give explicit credit to Erich Ward Varnes whose 1997 UC Berkely PhD thesis contained data that I found very useful.

Tony Smith
http://www.valdostamuseum.org/hamsmith/</description>
		<content:encoded><![CDATA[<p>Nathaniel, an &#8220;experimentalist&#8221;, said that he &#8220;&#8230; disagree[s] with public data. &#8230;&#8221; because he will only get &#8220;&#8230; After working for six years, Three papers. Out of two hundred authors &#8230;&#8221;.</p>
<p>Nathaniel goes on to say that &#8220;&#8230; there&#8217;s a simple solution that should satisfy you theorists nicely: JOIN THE EXPERIMENT! &#8230;&#8221;.</p>
<p>A flaw in Nathaniel&#8217;s solution is that not every theorist/analyst will get to be affiliated with the experiment collaboration.</p>
<p>It seems to me that a more comprehensive, even simpler, solution would be to make the data public, in a format that is the work-product of Nathaniel and his fellow experimenters, by a paper authored by Nathaniel and his fellow experimenters.<br />
Then, any theorist/analyst (whether or not affiliated) should cite that paper, so that Nathaniel et al would have a very high citation rating.</p>
<p>Further, if any theorist/analyst might ask Nathaniel et al for help in understanding the data, Nathaniel et al should be listed as coauthors for providing such help.</p>
<p>I have tried to follow that spirit in stuff that I have written. For example, in my writings about Fermilab T-quark data, I give explicit credit to Erich Ward Varnes whose 1997 UC Berkely PhD thesis contained data that I found very useful.</p>
<p>Tony Smith<br />
<a href="http://www.valdostamuseum.org/hamsmith/" rel="nofollow">http://www.valdostamuseum.org/hamsmith/</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Nathaniel</title>
		<link>http://blogs.discovermagazine.com/cosmicvariance/2006/06/23/should-the-data-be-public/comment-page-1/#comment-17995</link>
		<dc:creator>Nathaniel</dc:creator>
		<pubDate>Fri, 30 Jun 2006 14:31:45 +0000</pubDate>
		<guid isPermaLink="false">http://blogs.discovermagazine.com/cosmicvariance/2006/06/23/should-the-data-be-public/#comment-17995</guid>
		<description>I too, am an experimentalist (neutrinos) and I disagree with public data.  Technical issues have already been discussed, but here&#039;s the rub:

After working for six years on MINOS, I will get ONE (count &#039;em) paper.  OK, I&#039;ll be fair. Three papers. Out of two hundred authors.  If the data were made available publicly, then this paper wouldn&#039;t even get cited... some theorist would come along, do a slightly more sophisticated analysis, and I the paper wouldn&#039;t even get cited.

Even worse, to make the data public we now have to publish the methods and documentation how to use the data (which will NEVER just be a list of 4-vectors; there are correlations and resolution functions on every experiment) and that will take the experimentalists even more work.

Don&#039;t get me wrong.. I love what I do.  But I slave over computer code, measure crosstalk, invent calibration sources, crawl under dusty machines, travel, travel, travel, sit on interminably phone calls (every day) so that I can get those few weeks of analysing the data before everyone else.  Now I can&#039;t even do that?

Happily, there&#039;s a simple solution that should satisfy you theorists nicely: JOIN THE EXPERIMENT!   I need a three more people in my calibration group to measure attenuation curves.  I need two more to get automated processing running and document things.   We need people to think deeply about statistics, and to make sure our MC models are good. We need people who understand the theory well to suggest what fits to make and the best way of presenting the data.  But, of course, that&#039;s a lot of work, so not many of you take us up on the offer.

---Nathaniel</description>
		<content:encoded><![CDATA[<p>I too, am an experimentalist (neutrinos) and I disagree with public data.  Technical issues have already been discussed, but here&#8217;s the rub:</p>
<p>After working for six years on MINOS, I will get ONE (count &#8216;em) paper.  OK, I&#8217;ll be fair. Three papers. Out of two hundred authors.  If the data were made available publicly, then this paper wouldn&#8217;t even get cited&#8230; some theorist would come along, do a slightly more sophisticated analysis, and I the paper wouldn&#8217;t even get cited.</p>
<p>Even worse, to make the data public we now have to publish the methods and documentation how to use the data (which will NEVER just be a list of 4-vectors; there are correlations and resolution functions on every experiment) and that will take the experimentalists even more work.</p>
<p>Don&#8217;t get me wrong.. I love what I do.  But I slave over computer code, measure crosstalk, invent calibration sources, crawl under dusty machines, travel, travel, travel, sit on interminably phone calls (every day) so that I can get those few weeks of analysing the data before everyone else.  Now I can&#8217;t even do that?</p>
<p>Happily, there&#8217;s a simple solution that should satisfy you theorists nicely: JOIN THE EXPERIMENT!   I need a three more people in my calibration group to measure attenuation curves.  I need two more to get automated processing running and document things.   We need people to think deeply about statistics, and to make sure our MC models are good. We need people who understand the theory well to suggest what fits to make and the best way of presenting the data.  But, of course, that&#8217;s a lot of work, so not many of you take us up on the offer.</p>
<p>&#8212;Nathaniel</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ars Mathematica &#187; Blog Archive &#187; Releasing LHC Data</title>
		<link>http://blogs.discovermagazine.com/cosmicvariance/2006/06/23/should-the-data-be-public/comment-page-1/#comment-17985</link>
		<dc:creator>Ars Mathematica &#187; Blog Archive &#187; Releasing LHC Data</dc:creator>
		<pubDate>Wed, 28 Jun 2006 06:15:51 +0000</pubDate>
		<guid isPermaLink="false">http://blogs.discovermagazine.com/cosmicvariance/2006/06/23/should-the-data-be-public/#comment-17985</guid>
		<description>[...] I saw a story on Cosmic Variance that I found vaguely shocking. At the SUSY06 conference, there was a rancorous discussion about whether the data from the Large Hadron Collider should be made public. This is probably my ignorance about how high-energy physics works, but I have trouble believing that the answer is anything other than &#8220;of course&#8221; (perhaps after an embargo period to reward the people actually working on the detector). Some good news that comes out of the comment thread is that in astronomy such public data is readily available. [...]</description>
		<content:encoded><![CDATA[<p>[...] I saw a story on Cosmic Variance that I found vaguely shocking. At the SUSY06 conference, there was a rancorous discussion about whether the data from the Large Hadron Collider should be made public. This is probably my ignorance about how high-energy physics works, but I have trouble believing that the answer is anything other than &ldquo;of course&rdquo; (perhaps after an embargo period to reward the people actually working on the detector). Some good news that comes out of the comment thread is that in astronomy such public data is readily available. [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: David Heffernan</title>
		<link>http://blogs.discovermagazine.com/cosmicvariance/2006/06/23/should-the-data-be-public/comment-page-1/#comment-18024</link>
		<dc:creator>David Heffernan</dc:creator>
		<pubDate>Sun, 25 Jun 2006 13:50:30 +0000</pubDate>
		<guid isPermaLink="false">http://blogs.discovermagazine.com/cosmicvariance/2006/06/23/should-the-data-be-public/#comment-18024</guid>
		<description>On Belle we do make small amounts of data available on request, but only a fraction of the total data set. Students use it for high school science projects, for example.  Are there any other HEP experiments that do this?

I think the biggest problem with releasing data from the LHC experiments would be the shear volume.  How much would CMS or ATLAS record in a day?  What kind of background reduction are people expecting here?</description>
		<content:encoded><![CDATA[<p>On Belle we do make small amounts of data available on request, but only a fraction of the total data set. Students use it for high school science projects, for example.  Are there any other HEP experiments that do this?</p>
<p>I think the biggest problem with releasing data from the LHC experiments would be the shear volume.  How much would CMS or ATLAS record in a day?  What kind of background reduction are people expecting here?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Paolo Bizzarri</title>
		<link>http://blogs.discovermagazine.com/cosmicvariance/2006/06/23/should-the-data-be-public/comment-page-1/#comment-18023</link>
		<dc:creator>Paolo Bizzarri</dc:creator>
		<pubDate>Sat, 24 Jun 2006 20:24:04 +0000</pubDate>
		<guid isPermaLink="false">http://blogs.discovermagazine.com/cosmicvariance/2006/06/23/should-the-data-be-public/#comment-18023</guid>
		<description>Sean,

let me comment this phrase of yours:

&quot;In principle I&#039;m in favor of releasing the data, in practice I doubt that it would work. Without an intimate knowledge of the idiosyncrasies of the detector, too many spurious results would be hard to resist.&quot;

My idea is that the release of data is really similar to release of software code in open source projects (my professional field).

For example, for software products like Netscape/Mozilla/Firefox, the code was originally proprietary and secret; then, there was the decision to release the code of the product itself. The idea was to create a large community of developers, able to contribute to the improvement of the product.

However, for more than one year after the release of the code, the contribution from the developers outside the original development team was minimal. There was a lot of interest from other developers, but they were not able to provide any significant change to the code.

The reason was understood shortly after. The code itself was only part of the knowledge that had built the product. Each line of code was the result of several decisions made by the developers, and contained assumptions there were not easy to make explicit.

In short, the code was the result of a long and complex process, but in order to contribute to the code, you had first to became part of the process. Only after the assumptions became clearer, it was possible for other people to make significant contributions.

Which is the relation I see with LHC data ?

Data are the result of complex processes, where there is a lot of hidden knowledge that is necessary in order to understand what a number really mean in a certain context. People outside the process cannot understand what the raw data can really mean, without a proper understanding of the process itself.

However, if the parallel I have made is anything significat, IT IS useful to make data available, as far as you understand that you have to make clear which is process through which they are produced and elaborated.

Then, other people can make useful proposal on how to improve the understanding of data. In fact, making the process public has significatively improved the process itself.</description>
		<content:encoded><![CDATA[<p>Sean,</p>
<p>let me comment this phrase of yours:</p>
<p>&#8220;In principle I&#8217;m in favor of releasing the data, in practice I doubt that it would work. Without an intimate knowledge of the idiosyncrasies of the detector, too many spurious results would be hard to resist.&#8221;</p>
<p>My idea is that the release of data is really similar to release of software code in open source projects (my professional field).</p>
<p>For example, for software products like Netscape/Mozilla/Firefox, the code was originally proprietary and secret; then, there was the decision to release the code of the product itself. The idea was to create a large community of developers, able to contribute to the improvement of the product.</p>
<p>However, for more than one year after the release of the code, the contribution from the developers outside the original development team was minimal. There was a lot of interest from other developers, but they were not able to provide any significant change to the code.</p>
<p>The reason was understood shortly after. The code itself was only part of the knowledge that had built the product. Each line of code was the result of several decisions made by the developers, and contained assumptions there were not easy to make explicit.</p>
<p>In short, the code was the result of a long and complex process, but in order to contribute to the code, you had first to became part of the process. Only after the assumptions became clearer, it was possible for other people to make significant contributions.</p>
<p>Which is the relation I see with LHC data ?</p>
<p>Data are the result of complex processes, where there is a lot of hidden knowledge that is necessary in order to understand what a number really mean in a certain context. People outside the process cannot understand what the raw data can really mean, without a proper understanding of the process itself.</p>
<p>However, if the parallel I have made is anything significat, IT IS useful to make data available, as far as you understand that you have to make clear which is process through which they are produced and elaborated.</p>
<p>Then, other people can make useful proposal on how to improve the understanding of data. In fact, making the process public has significatively improved the process itself.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Amara</title>
		<link>http://blogs.discovermagazine.com/cosmicvariance/2006/06/23/should-the-data-be-public/comment-page-1/#comment-18022</link>
		<dc:creator>Amara</dc:creator>
		<pubDate>Sat, 24 Jun 2006 19:25:27 +0000</pubDate>
		<guid isPermaLink="false">http://blogs.discovermagazine.com/cosmicvariance/2006/06/23/should-the-data-be-public/#comment-18022</guid>
		<description>Noone has yet mentioned the &lt;a href=&quot;http://marsrovers.nasa.gov/gallery/all/spirit.html&quot; rel=&quot;nofollow&quot;&gt;Mars Rover data&lt;/a&gt; (link for Spirit). That is one of the most visible and successful open planetary science databases that exists now.</description>
		<content:encoded><![CDATA[<p>Noone has yet mentioned the <a href="http://marsrovers.nasa.gov/gallery/all/spirit.html" rel="nofollow">Mars Rover data</a> (link for Spirit). That is one of the most visible and successful open planetary science databases that exists now.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Richard E.</title>
		<link>http://blogs.discovermagazine.com/cosmicvariance/2006/06/23/should-the-data-be-public/comment-page-1/#comment-18021</link>
		<dc:creator>Richard E.</dc:creator>
		<pubDate>Sat, 24 Jun 2006 02:30:58 +0000</pubDate>
		<guid isPermaLink="false">http://blogs.discovermagazine.com/cosmicvariance/2006/06/23/should-the-data-be-public/#comment-18021</guid>
		<description>I have been thinking more about this, and I think the argument that &quot;theorists will always make a hash of data analysis&quot; is bogus.

Again turning to the analogy with cosmology/astrophysics, I suspect many theorists in cosmology (myself included) are learning more about Bayesian statistics, priors, Markov Chains and all the rest of it than we would have ever dreamed. And I can certainly point to some deeply flawed papers in the literature that might never have seen the light if the raw data was not freely available.   However, theorists *can* learn this stuff, and don&#039;t like to look silly in public, so they have plenty of motivation for doing so.

In the end the theorists will either learn enough of the subtleties to do it themselves, or work with experimentalists who know how to perform the relevant analyses. Many (most?) papers are flawed in some way, and the community would react to the flurry of theorist-written data-driven papers by hiking its overall level of skepticism a notch or two. Just as it did with the arrival of the Arxiv, which does an end-run around peer review (for what that is worth, but don&#039;t get me started)

As Sean pointed out above, one side-effect of the present system is that it is very hard for particle theorists to collaborate with experimentalists, if the whole collaboration needs to sign off on papers that any single member writes (and this is *after* the data is in the public domain).  Again speaking from my own experience, my foray into the world of data-analysis has largely been conducted in collaboration with someone who understood the issues involved at the outset (although not an &quot;experimentalist&quot; in the strict sense of the term), and it is a singularly productive mode of collaboration. To the extent that the &quot;rules&quot; of experimental particle physics discourage this sort of collaboration they are clearly ounter-productive.

Secondly, the cosmological community has benefitted greatly from the development of the Cosmomc package which greatly simplifies the Monte Carlo Markov Chain analyses of cosmological data. (It is not theorist-proof however, as I have seen several publicly displayed figures that showed chains which, to my now practiced eye, were clearly unconverged).  My guess is that if more experimental particle physicss data was made publicly available it would seed a small industry in the development of software tools that facilitated its analysis.</description>
		<content:encoded><![CDATA[<p>I have been thinking more about this, and I think the argument that &#8220;theorists will always make a hash of data analysis&#8221; is bogus.</p>
<p>Again turning to the analogy with cosmology/astrophysics, I suspect many theorists in cosmology (myself included) are learning more about Bayesian statistics, priors, Markov Chains and all the rest of it than we would have ever dreamed. And I can certainly point to some deeply flawed papers in the literature that might never have seen the light if the raw data was not freely available.   However, theorists *can* learn this stuff, and don&#8217;t like to look silly in public, so they have plenty of motivation for doing so.</p>
<p>In the end the theorists will either learn enough of the subtleties to do it themselves, or work with experimentalists who know how to perform the relevant analyses. Many (most?) papers are flawed in some way, and the community would react to the flurry of theorist-written data-driven papers by hiking its overall level of skepticism a notch or two. Just as it did with the arrival of the Arxiv, which does an end-run around peer review (for what that is worth, but don&#8217;t get me started)</p>
<p>As Sean pointed out above, one side-effect of the present system is that it is very hard for particle theorists to collaborate with experimentalists, if the whole collaboration needs to sign off on papers that any single member writes (and this is *after* the data is in the public domain).  Again speaking from my own experience, my foray into the world of data-analysis has largely been conducted in collaboration with someone who understood the issues involved at the outset (although not an &#8220;experimentalist&#8221; in the strict sense of the term), and it is a singularly productive mode of collaboration. To the extent that the &#8220;rules&#8221; of experimental particle physics discourage this sort of collaboration they are clearly ounter-productive.</p>
<p>Secondly, the cosmological community has benefitted greatly from the development of the Cosmomc package which greatly simplifies the Monte Carlo Markov Chain analyses of cosmological data. (It is not theorist-proof however, as I have seen several publicly displayed figures that showed chains which, to my now practiced eye, were clearly unconverged).  My guess is that if more experimental particle physicss data was made publicly available it would seed a small industry in the development of software tools that facilitated its analysis.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: adam</title>
		<link>http://blogs.discovermagazine.com/cosmicvariance/2006/06/23/should-the-data-be-public/comment-page-1/#comment-18020</link>
		<dc:creator>adam</dc:creator>
		<pubDate>Sat, 24 Jun 2006 00:55:23 +0000</pubDate>
		<guid isPermaLink="false">http://blogs.discovermagazine.com/cosmicvariance/2006/06/23/should-the-data-be-public/#comment-18020</guid>
		<description>I&#039;m on the side of the &#039;proprietary period then fully open&#039; model for data distribution (so that the team get to use the data in the short term, then all the raw data and data products get released; the problem here is serving potentially large amounts of raw data, of course, so there might be some fees for getting the raw data).

The data belongs to taxpayers, so far as I&#039;m concerned.</description>
		<content:encoded><![CDATA[<p>I&#8217;m on the side of the &#8216;proprietary period then fully open&#8217; model for data distribution (so that the team get to use the data in the short term, then all the raw data and data products get released; the problem here is serving potentially large amounts of raw data, of course, so there might be some fees for getting the raw data).</p>
<p>The data belongs to taxpayers, so far as I&#8217;m concerned.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: superweak</title>
		<link>http://blogs.discovermagazine.com/cosmicvariance/2006/06/23/should-the-data-be-public/comment-page-1/#comment-18019</link>
		<dc:creator>superweak</dc:creator>
		<pubDate>Fri, 23 Jun 2006 20:05:49 +0000</pubDate>
		<guid isPermaLink="false">http://blogs.discovermagazine.com/cosmicvariance/2006/06/23/should-the-data-be-public/#comment-18019</guid>
		<description>There&#039;s a catch here: by the time a dataset is understood enough to be ready for release, the collaborations will, as Sean puts it, have swept up the low-hanging fruit.  It takes &lt;em&gt;years&lt;/em&gt; for complex detectors to be fully understood, and the calibrations, systematics checks, and corrections are on the whole done by people using that information to do an analysis.  (Even the Quaero public interface for testing hypotheses against D0&#039;s data restricts you to a few well-understood samples.)  Any early public release of all the data would most likely result in lots of junk preprints as people saw badly-understood detector effects and called them new physics -- if CDF II had just gone and immediately published the four-vectors rolling out of its reconstruction software, I&#039;m sure someone would have noticed a huge excess of monojet+missing energy events.  Certainly there are theorists who are conversant with issues of triggers, fake rate, and such.  However they are not paid to sit around all day thinking about how the information from the detector could be &lt;em&gt;wrong&lt;/em&gt;, and experimentalists are.

In my experience experimentalists are extremely suspicious of results, since they know what actually goes into making them -- hence the tradition of requiring confirmation from an independent experiment for discovery claims.  Without some kind of (at least short-term) data encapsulation, problems will arise: imagine experiments looking at each other&#039;s data! (Related reasons are behind the rise of &quot;blind analyses,&quot; where a collaboration hides its data from &lt;em&gt;itself&lt;/em&gt;, for fear that it will find what it wants to find.)  Even if only a small fraction of a collaboration reads a paper thoroughly, that&#039;s still an awful lot of experience-years.

And finally, a point that vaguely amuses me: we are used to a feedback system where either (a) theorists predict something, experiment finds it, theorists claim vindication because the prediction was ante hoc instead of post hoc, or (b) experiment finds something unexpected, everyone scrambles to see how models could accomodate this result, some things can&#039;t and are excluded.  What happens to this waltz if the theorists get to look at the data at the same time the experimentalists do?

[Note: none of this implies that I don&#039;t think processed HEP data should be released after a (longish) while, or that short-term data release might not be a good thing if we find ourselves with a one-detector ILC.]</description>
		<content:encoded><![CDATA[<p>There&#8217;s a catch here: by the time a dataset is understood enough to be ready for release, the collaborations will, as Sean puts it, have swept up the low-hanging fruit.  It takes <em>years</em> for complex detectors to be fully understood, and the calibrations, systematics checks, and corrections are on the whole done by people using that information to do an analysis.  (Even the Quaero public interface for testing hypotheses against D0&#8217;s data restricts you to a few well-understood samples.)  Any early public release of all the data would most likely result in lots of junk preprints as people saw badly-understood detector effects and called them new physics &#8212; if CDF II had just gone and immediately published the four-vectors rolling out of its reconstruction software, I&#8217;m sure someone would have noticed a huge excess of monojet+missing energy events.  Certainly there are theorists who are conversant with issues of triggers, fake rate, and such.  However they are not paid to sit around all day thinking about how the information from the detector could be <em>wrong</em>, and experimentalists are.</p>
<p>In my experience experimentalists are extremely suspicious of results, since they know what actually goes into making them &#8212; hence the tradition of requiring confirmation from an independent experiment for discovery claims.  Without some kind of (at least short-term) data encapsulation, problems will arise: imagine experiments looking at each other&#8217;s data! (Related reasons are behind the rise of &#8220;blind analyses,&#8221; where a collaboration hides its data from <em>itself</em>, for fear that it will find what it wants to find.)  Even if only a small fraction of a collaboration reads a paper thoroughly, that&#8217;s still an awful lot of experience-years.</p>
<p>And finally, a point that vaguely amuses me: we are used to a feedback system where either (a) theorists predict something, experiment finds it, theorists claim vindication because the prediction was ante hoc instead of post hoc, or (b) experiment finds something unexpected, everyone scrambles to see how models could accomodate this result, some things can&#8217;t and are excluded.  What happens to this waltz if the theorists get to look at the data at the same time the experimentalists do?</p>
<p>[Note: none of this implies that I don't think processed HEP data should be released after a (longish) while, or that short-term data release might not be a good thing if we find ourselves with a one-detector ILC.]</p>
]]></content:encoded>
	</item>
</channel>
</rss>
