RNA Expression Analysis Using a 30 Base Pair Resolution Escherichia coli Genome Array

Douglas W. Selinger1, Kevin J. Cheung2,Rui Mei3, Eric M. Johansson3, Craig S.Richmond5, Frederick R. Blattner5,David J. Lockhart3,4, and George M. Church1*

1Department of Genetics, HarvardMedical School, 200 Longwood Ave, Boston, MA 02115, 2HarvardCollege, Cambridge, MA 02138, 3Affymetrix Inc., 3380Central Expressway, Santa Clara, CA, 4Genomics Institute of theNovartis Research Foundation, 3115 Merryfield Row, San Diego, CA 92121,5Laboratory of Genetics, University of Wisconsin,Madison, WI 53706. *Corresponding author (e-mailchurch@arep.med.harvard.edu).

High density DNA microarrays allow the simultaneous quantitation oflarge numbers of transcripts. For smaller, completely-sequenced organisms,every open reading frame (ORF) in the genome can be assayed. All of theseanalyses to date have focused strictly on ORFs, usually with a single assayper ORF, and with no attention paid to intergenic regions.

In this study, we describe the first use of a "genome" array,which has probes for both ORFs and intergenic regions in the sequencedmodel organism Escherichia coli. This array, synthesized byAffymetrix using a highly parallel light-directed in situoligonucleotide synthesis method, contains almost 300,000 oligonucleotideprobes of known sequence. This large number of oligos allows the genome tobe sampled at an average resolution of 1 oligo probe every 30 bases.Intergenic regions are probed at a higher resolution with 1 probe every 6bases compared with 1 every 60 for the ORFs. A "reversecomplement" array has also been designed which allows the oppositestrand to be probed in the same way.

A genome array with dense lateral coverage of the genome has a number ofuses, including the identifying transcript starts and stops, studyingoperon structure, identifying small and antisense RNAs, and potentiallyacting as a whole-genome RNA secondary structure readout. Theseapplications are demonstrated using a software package we have developedcalled Genome Array Processing Software, or 'GAPS', which allows oligo byoligo analysis of array results. We have carried out a comparison of E.coli cells growing in log phase vs. stationary phase and have shownthat the assay can sensitively and accurately identify ORFs which are knownto be growth-phase regulated, as well as identify new putativelygrowth-phase regulated genes. Complete coverage of both strands has alsoproduced the perhaps surprising result that the vast majority of the genomeis transcribed at a detectable level.