Biomedical Sciences Graduate Program(s)
Biochemistry, Molecular Biology and Genetics
Structural, Computational Biology and Biophysics
Biomedical Engineering
Research Description
High-density oligonucleotide expression arrays have revolutionized our
approach to the discovery of gene function, biological networks and
diagnosis of disease. Because the number of features that can be fabricated
on an array is exponentially growing over time, a number of exciting new
types of arrays have recently emerged. These include: (1) genotyping
arrays, which detect single nucleotide polymorphisms (SNPs) across the
genome; (2) all exon arrays, which can measure the expression levels of
alternative isoforms; (3) re-sequencing arrays, which allow researchers to
perform comparative sequence analysis; and (4) genomic tiling arrays which
probe the non-repeat sequence of a genome at high resolution and have
already been applied to the discovery of un-annotated transcription
(possibly functional non-coding RNA) (Kapranov P. et. al., Kampa, D. et.
al., Cheng, J. et. al.), methylated DNA, transcription factor binding sites
(Cawley S. et. al.), regions of chromatin modification (Bernstein B.E. et.
al.) and origins of replication (Jeon Y. et. al.). Unlike expression
arrays, however, there is less room to design features that are sensitive
and specific to their targets leading to even greater ³probe specific
effects².
In collaboration with scientists at Affymetrix, a major focus of our
laboratory is the application of physical models to the process of
hybridization with the aims of improving array design and analysis. While
at Affymetrix, I worked on precise physical modeling of probes with a probe
modeling team in its analysis of controlled concentration spike data sets
(i.e., targets are spiked in at 14 different pre-determined concentrations
in both simple and complex genomic sample backgrounds). We and other groups
working outside of Affymetrix have found that the scaled intensity versus
concentration profile for responsive probes satisfies a generalized Langmuir
adsorption isotherm form, which is expected from fundamental surface
physical chemistry. This form naturally accounts for (1) probe specific
affinity as a function of sequence of the probes, (2) saturation effects at
high concentrations of target, and (3) non-specific background hybridization
which also depends on the affinity of the probes. These models have been
applied to the commercial arrays to (1) select probes that are responsive to
target concentrations (Mei R. et. al.) and (2) improve low level statistical
methods that estimate relative target concentration from intensity data.
Moreover, a reasonable percentage of the observed variation in the fitted
duplex hybridization energies is explained by hydrogen bond or nearest
neighbor models.
Nevertheless, fundamental mysteries remain. The Langmuir model fails in
that the apparent binding capacity of features, which should be a constant,
varies by 2-3 orders of magnitude. Following hybridization, there is a
stringent wash, which has not been explicitly modeled and may play an
important role in explaining this effect. Cross-hybridization of targets to
un-intended probes has been observed as spurious signal of a probe in a
probe set in the expression arrays and marginalized through the use of
robust analysis methods. However, better understanding of
cross-hybridization at the molecular level is particularly important, for
example, in interpreting genomic tiling array data. Currently, physical
models that are capable of predicting the likelihood that a given target
will cross-hybridize to a probe and produce positive signal do not exist.
Another poorly explored area is the hybridization behavior of Affymetrix
probes as a function of the number of synthesized nucleotides. We will
address these questions by developing improved physical models to explain
existing publicly available controlled spike-in data as well as working with
new spike-in data sets generated by scientists at Affymetrix.
In collaboration with the faculty here at UVA and through the use of
publicly available data, we will analyze both expression and genomic tiling
array data to better understand (and model) the role that transcription
factors and histone modifications play in transcription. We are also
interested in understanding how replication timing is driven by chromatin
structure, gene density, duplex stability, and the process of transcription
as well as helping discover and characterize origins of replication (e.g.
testing the hypothesis that efficient origins contain a consensus motif in
mammalian systems.)
|