A gene is all of the DNA elements required in cis for the properly regulated production of a set of RNAs whose sequences overlap in the genome.I formulated that definition c. 1990, when I started teaching genetics to graduate students. I think that the course I actually taught was quite different from the plans leading to that formulation, but I remember sitting for several hours in a coffee shop in Newark airport and coming up that definition. This was after the discovery of splicing, transposable elements, remote enhancers, overlapping genes, nested genes, long noncoding RNAs and many short noncoding RNAs, and I imagined discussing literature on each of these topics and its implications for how a gene might be defined. 1990 was before “tweet-length” could be applied, before the discovery of microRNAs and (most significantly) before complete genome sequences and high-throughput data in the style of ENCODE.
In 2014, as part of my plan to write more but shorter posts, I will also report the history of my own understanding of several of the issues that make defining “a gene” problematic.
Mark Gerstein almost immediately pointed out that he had published a very similar definition in 2007:
The gene is a union of genomic sequences encoding a coherent set of potentially overlapping functional products.See PubMed: Pubmed ID 17567988 or
Gerstein lab: http://archive.gersteinlab.org/papers/e-print/grgenerev/preprint.pdf or
Genome Biology http://genome.cshlp.org/content/17/6/669.long