A note on open genomics

By Razib Khan | August 28, 2012 12:25 am

A few months ago I purchased a decent desktop just to crunch ADMIXTURE and other packages to analyze genomic data. More recently I set up a ~100 GB Dropbox account, and have started to “push” all of my output files from ADMIXTURE, PLINK, etc., as well as various scripts (Perl, shell, R, etc.) into the public folder (more precisely, a script is running ADMIXTURE and moving the files into the appropriate Dropbox folders as I type this, and Dropbox syncs with the online folders). I’m doing this for two reasons.

First, I want to make the pipeline of data generation easier for me. Instead of running ADMIXTURE, and then processing the files laboriously with R to generate plots, I’ve now created a system where a few automated scripts begin ADMIXTURE runs, and then another script creates files for distruct, and runs distruct, and then trims the images output and converts them into PNGs. This should allow me to resurrect my side projects, even while I’m rather busy with the “main events” of my life.

Second, I am beginning to feel that the promise of the “genome blogging revolution” kind of faded out. Granted, there’s only so much you can do with the same data sets, so I’m going to try and put together large pedigree files in my Dropbox account. But it seems like people need more of a push. Toward that end I hope that distribution of scripts which make the process more “turnkey” will stimulate people going forward.

Addendum: I know that some of the first paragraph is going to be gibberish to some readers. But I hope you’ll appreciate the outcomes of that gibberish!

MORE ABOUT: Open genomics
  • Thomas

    Have you considered Google Drive/Docs, if data limits start to be a problem in Dropbox. Their price per GB is substantially lower, in addition to the start-up amount of 20GB. Food for thought.

  • pconroy

    I think the era of Genome Blogging is just beginning – we might be nearing the end of the first phase though, which is the limited set of SNP’s generated by 23andMe’s and FTDNA Family Finder autosomal tests, but wait till the gusher of Exome and Full Sequence data gets going and we could have “amateurs” discovering medically actionable results…

    Phase II is almost upon us. Many pioneers are already looking at their 23andMe Exome Pilot data for stuff right now…

  • jose

    Hopefully this is just the “Trough of Disillusionment” phase of the classic tech Hype Cycle. I actually do think the Hype Cycle is a useful high level model for thinking about the diffusion of new technology.

  • biologist

    Put your code on github and your data in git-annex.

  • https://plus.google.com/109962494182694679780/posts Razib Khan

    #4, you’re right. i have a git account, i should start using it.


Discover's Newsletter

Sign up to get the latest science news delivered weekly right to your inbox!

Gene Expression

This blog is about evolution, genetics, genomics and their interstices. Please beware that comments are aggressively moderated. Uncivil or churlish comments will likely get you banned immediately, so make any contribution count!

About Razib Khan

I have degrees in biology and biochemistry, a passion for genetics, history, and philosophy, and shrimp is my favorite food. In relation to nationality I'm a American Northwesterner, in politics I'm a reactionary, and as for religion I have none (I'm an atheist). If you want to know more, see the links at http://www.razib.com


See More


RSS Razib’s Pinboard

Edifying books

Collapse bottom bar