TITLE OF THE INVENTION
CLONE IDENTIFICATION BY DIRECT DETECTION OF NUCLEIC ACID BINDING PROTEINS FROM VERTEBRATE CELL EXPRESSION SYSTEMS AND APPLICATIONS THEREOF
This application claims priority from U.S. Provisional Application Serial
No. 60/330,633 filed October 26, 2001. The entirety of that provisional
application is incorporated herein by reference.
BACKGROUND OF THE INVENTION
Field of the Invention
The present invention relates generally to methods for screening,
identifying and selecting DNA clones encoding polypeptides which, as expressed
in vertebrate cells, specifically bind nucleic acid (DNA and RNA) sequences, or
DNA clones expressing epigenetic factors that affect protein-nucleic acid
interactions. The present invention further relates to methods of combinatorial
expression of known genes in an array format for the purpose of assessing
functional linage by nucleic acid binding assays.
Background of the Technology
For both DNA binding protein clone screening and for effective
polypeptide drug selection, it is desirable to express vertebrate polypeptides in
vertebrate systems because the binding constant and binding specificity of an
expressed polypeptide is largely influenced by post-translational modifications such as phosphorylation, glycosylation, cleavage and folding. These properties are
cell-type specific, rarely inter-changeable and tightly attributable to the functional phenotype of a living cell. Therefore, significant efforts have been made to develop a vertebrae clone expression and selection system for expression cloning
of vertebrate genes.
A pioneer technique that describes the subdivision of positive cDNA pools opened the possibility of clone selection in vertebrate cells. Wong, et al.. Science,
228, 810-815 (1985); U. S. Patent No. 4,675,285 to Clark, et al. However, this method lacks a sensitive high throughput clone-selection application for identifying
the cDNA of DNA or RNA binding proteins. Application of this technology for
cloning of a DNA binding protein has been successful only when high level
transient gene expression was examined (Tsai. et al.. Nature, 339, 446-451 (1989).
Previous applications of polyacrylamide gel electrophoresis for detection of
nucleic acid binding proteins as a clone selection method have suffered numerous
disadvantages. See, e.g., Varshavskv, "Electrophoretic assay for DNA-Binding
Proteins", Methods in Enzymol., 151, 551-565 (1987). Alternative technologies which are suitable for selection of cDNA clones coding for nucleic acid binding proteins either completely avoid the expression of the protein of interest in living
cells (U.S. Patent No. 5,654,150) or apply indirect, reporter detection technologies
(U.S. Patent Nos. 5,989,814, 5,610,015, and 6,261,772, SenGupta. et al. Proc.
Natl. Acad. Sci. USA, 93, 8496-8501 (1996), and Inouve. et al.. DNA Cell Biol., 7, 731-742 (1994)). Indirect detection technologies have been developed mainly because of the lack of a direct, sensitive high throughput detection method that
would decrease false-positive clone identifications. The present invention provides a vertebrate clone expression, screening and selection system for expression
cloning vertebrate genes encoding nucleic acid binding proteins and for selection of clones for epigenetic factors that influence interactions between proteins and nucleic acids. The present invention further provides a method for identification of
a majority of genes in a combinatorial gene expression format that is compatible with cDNA arrays and protein arrays. In this format of the invention functional
linkages of nucleic acid binding proteins can be efficiently assessed.
Some of the work disclosed herein is reported in Dobi. et al.. "Mammalian
Expression Cloning of Nucleic Acid Binding Proteins by Agarose Thin-Layer
Gelshift Clone Selection", BioTechniques, 33, 868-872 (2002).
SUMMARY OF THE INVENTION
According to a first aspect of the invention, there is provided a method for detecting a cDNA clone encoding a vertebrate nucleic acid binding protein that
specifically binds to a tester nucleotide sequence. This method comprises the steps
of: (a) obtaining a library of vertebrate cDNA clones to be tested for encoding the
vertebrate nucleic acid binding protein, wherein each of the cDNA clones is
operably inserted into a vertebrate expression vector; (b) introducing a portion of the library of vertebrate cDNA clones into a culture of vertebrate host cells under
conditions such that a polypeptide encoded by a cDNA clone operably inserted into the vertebrate expression vector is expressed; and (c) contacting a polypeptide encoded by a cDNA clone that is expressed in step (b) with the tester nucleotide sequence under conditions such that the nucleic acid binding protein, if present,
specifically binds to the tester nucleotide sequence, thereby forming a specific protein-nucleic acid complex.
The invention method of detecting a cDNA clone encoding a vertebrate nucleic acid binding protein further comprises electrophoretically separating the specific protein-nucleic acid complex from protein and nucleic acid not in that
complex, using high-resolution gel electrophoresis according to invention methods
disclosed herein. In particular, the method of detecting a cDNA clone encoding a vertebrate nucleic acid binding protein further comprises the steps of (d)
electrophoretically separating the specific protein-nucleic acid complex from
protein and tester nucleotide sequence not present in the specific protein-nucleic
acid complex in a thin-layer polysaccharide hydrogel under conditions such that
migration of the specific protein-nucleic acid complex is detectably retarded compared to migration of the tester probe and to migration of the protein; and (e)
detecting the specific protein-nucleic acid complex, if any, that was separated from
the protein and the tester nucleotide sequence in step (d), by detecting retarded migration of protein or tester nucleotide sequence. In this step, detection of
specific protein-nucleic acid complex indicates that a cDNA clone encoding the
vertebrate nucleic acid binding protein has been detected in the tested portion of
the library of vertebrate cDNA clones.
In the invention method of detecting a cDNA clone encoding a vertebrate
nucleic acid binding protein, preferably the tested portion of the library of vertebrate cDNA clones consists of about 500 cDNA clones that are expressed in the culture of vertebrate host cells.
In another aspect, the invention method of detecting a cDNA clone encoding a vertebrate nucleic acid binding protein further comprises steps of subdividing the portion of the library of vertebrate cDNA clones in which a cDNA clone encoding said vertebrate nucleic acid binding protein has been detected into subportions and separately testing each of said subportions for the presence of a cDNA clone encoding the vertebrate nucleic acid binding protein by performing steps (b) through (e).
In a preferred embodiment of the invention method of detecting a cDNA clone encoding a vertebrate nucleic acid binding protein, the method further comprises the following steps to control for varying concentrations of vector DNA in separately amplified portions of a library: step (f), prior to step (b), mixing a control DNA operably inserted into the vertebrate expression vector with the library of vertebrate cDNA clones at a known molar ratio, where this control DNA encodes a control DNA binding protein that specifically binds to a control nucleotide sequence; step (g), prior to step (b) and after step (f), amplifying the portion of the cDNA library by replication in a host cell; and (h) providing the control nucleotide sequence during the contacting of the polypeptide encoded by a cDNA clone that is expressed in step (b) with the tester nucleotide sequence. This contacting of the protein and nucleotide sequences is performed under conditions such that the control nucleic acid binding protein, if present, specifically binds to the control nucleotide sequence, thereby forming a specific control protein-nucleic acid complex. After this, the method includes step (i), detecting the specific control protein-nucleic acid complex, if any, that was separated from the control protein and the control nucleotide sequence in step (d), by detecting retarded
migration of the control protein or said control nucleotide sequence. Thereafter, this form of the method includes repeating steps (b)-(i) with a second portion of the library, after mixing the control DNA with the library, and determining the ratio of the amount of specific control protein-nucleic acid complex detected using proteins expressed from the portion of the library to the amount of the specific control protein-nucleic acid complex (detected under the same conditions) using proteins expressed from the second portion of the library. In this method, the ratio of the amount of control protein-nucleic acid complex in the two portions of the library indicates the ratio of cDNA clones in those two portions, after separate amplification of each portion as in step (g).
In the invention method, due to the sensitivity of detection methods of the invention (see Example 1 , FIG. 2), for detecting a cDNA clone encoding a vertebrate nucleic acid binding protein one may use a vertebrate host cell that endogenously expresses the same vertebrate nucleic acid binding protein or another vertebrate nucleic acid binding protein that specifically binds to the tester nucleotide sequence.
In yet another aspect, the invention relates to a method for identifying a polynucleotide encoding a nucleic acid binding protein that specifically binds to a tester nucleotide sequence. This method comprises the steps of: (a) obtaining a protein encoded by a polynucleotide to be tested for encoding the nucleic acid binding protein; (b) contacting the protein with the tester nucleotide sequence under conditions such that the nucleic acid binding protein, if present, specifically binds to the tester nucleotide sequence, thereby forming a specific protein-nucleic acid complex; and (c) electrophoretically separating the specific protein-nucleic
acid complex from protein and tester nucleotide sequence not present in the specific protein-nucleic acid complex. This separation uses the invention method
employing a thin-layer polysaccharide hydrogel under conditions such that migration of the specific protein-nucleic acid complex is detectably retarded compared to migration of the tester probe and to migration of the protein. This method for identifying a polynucleotide encoding a nucleic acid binding protein
further comprises a step (d), detecting the specific protein-nucleic acid complex, if
any, that was separated from the protein and the tester nucleotide sequence in step
(c), by detecting retarded migration of the protein or the tester nucleotide sequence. In this step, detection of the specific protein-nucleic acid complex identifies the
polynucleotide as a polynucleotide encoding the nucleic acid binding protein.
Still another aspect of the invention re relates to a method for identifying a
nucleic acid binding protein that specifically binds to a tester nucleotide sequence.
This method method comprises the steps of: (a) obtaining a polypeptide preparation to be tested for the presence of the nucleic acid binding protein; (b)
contacting the polypeptide preparation with the tester nucleotide sequence under
conditions such that the nucleic acid binding protein, if present, specifically binds
to the tester nucleotide sequence, thereby forming a specific protein-nucleic acid complex; and (c) electrophoretically separating the specific protein-nucleic acid complex from protein and tester nucleotide sequence not present in the specific
protein-nucleic acid complex, according to the invention method, using a thin-layer
polysaccharide hydrogel; and (d) detecting the specific protein-nucleic acid complex, if any, that was separated from the protein and the tester nucleotide sequence in step (c), as above. In this step, detection of the specific protein-nucleic
acid complex indicates that the nucleic acid binding protein was present in the
polypeptide preparation.
In this method, the specific protein-nucleic acid complex may be detected
by detecting retarded migration of the tester nucleotide sequence in the specific protein-nucleic acid complex, for instance, where the tester nucleotide sequence
includes a labeling moiety and the specific protein-nucleic acid complex is detected
by detecting retarded migration of that labeling moiety. Alternatively, or in addition, the specific protein-nucleic acid complex may be detected by detecting
retarded migration of the protein in the specific protein-nucleic acid complex, for instance, where protein in the specific protein-nucleic acid complex is detected by
an antibody that specifically binds to the protein. When antibody is used to detect
protein in the complex, the antibody that specifically binds to the protein in the
specific protein-nucleic acid complex may be bound to the specific protein-nucleic
acid complex prior to the step (c) of electrophoretically separating the specific
protein-nucleic acid complex from protein and tester nucleotide sequence.
A variety of polypeptide sources may be tested for nucleic acid binding
proteins according to the invention methods. For instance, the polypeptide
preparation may be obtained by extraction of polypeptides from a vertebrate cell expressing either an endogenous or exogenous gene for the protein. In preferred
embodiments, the polypeptide preparation is obtained by expression of a cDNA
clone library in a vertebrate expression system, particularly a mammalian expression system. Preferably, the vertebrate expression system comprises an expression vector and a vertebrate host cell that expresses polypeptides from cDNA clones carried in the expression vector. Alternatively, the vertebrate
expression system comprises a vector and a cell-free coupled transcription-
translation system that expresses polypeptides from cDNA clones carried in the
vector.
Another aspect of the invention relates to a method for separating a specific
protein-nucleic acid complex from protein and nucleic acid not present in that specific protein-nucleic acid complex. This method comprises a step of
electrophoretically separating the specific protein-nucleic acid complex from the
nucleic acid and the protein in a thin-layer polysaccharide hydrogel under conditions such that migration of the specific protein-nucleic acid complex is detectably retarded compared to migration of the protein and to migration of the
nucleic acid. In this method, preferably the thin-layer polysaccharide hydrogel
comprises an agarose gel and has a thickness of less than or equal to about 3 mm.
More preferably the thin-layer polysaccharide hydrogel has a thickness in the range
of about 2.0 mm to about 2.5 mm. The thin-layer polysaccharide hydrogel
comprises about 0.5 % to about 2.5 % (w/v) agarose, preferably about 1.0 % (w/v)
agarose. This separation method of the invention is useful for detecting complexes
of proteins with DNA or with RNA.
Yet another aspect of the invention relates to a method for identifying an agent that inhibits or enhances formation of a specific protein-nucleic acid complex, using the separation method of the invention. More in particular, the
invention provides a method for identifying an agent that inhibits or enhances
formation of a specific protein-nucleic acid complex comprising a nucleic acid binding protein and a tester nucleotide sequence, which method comprises the steps of: (a) contacting the nucleic acid binding protein with the tester nucleotide
sequence in a first sample and a second sample, in the presence and absence,
respectively, of a putative agent to be tested for inhibiting or enhancing formation
of the specific protein-nucleic acid complex. This step is performed under conditions such that, in the absence of the putative agent, the nucleic acid binding protein specifically binds to the tester nucleotide sequence, thereby forming the specific protein-nucleic acid complex. Next is step (c), electrophoretically
separating the specific protein-nucleic acid complex formed in each of the first sample and the second sample, from protein and tester nucleotide sequence not
present in the specific protein-nucleic acid complex, according to the invention
method, above; and (d) determining the amount of the specific protein-nucleic acid
complex, if any, in each of the first sample and the second sample, that was
separated from the protein and the tester nucleotide sequence in step (c), as
described above. In this method, a decrease in the amount of the specific protein- nucleic acid complex in the presence of the putative agent in the first sample
compared to the amount of the specific protein-nucleic acid complex in the absence
of the putative agent in the second sample indicates that the putative agent inhibits
formation of said specific protein-nucleic acid complex; and an increase in the
amount of the specific protein-nucleic acid complex in the presence of the putative agent in the first sample compared to the amount of the specific protein-nucleic
acid complex in the absence of the putative agent in the second sample indicates
that the putative agent enhances formation of the specific protein-nucleic acid
complex.
BRIEF DESCRIPTION OF THE DRAWINGS
The present invention may be better understood with reference to the
accompanying drawings in which:
FIG. 1 shows a flowchart of a preferred embodiment of the invention method for identification of a DNA clone that expresses a nucleic acid binding protein for a selected nucleotide sequence ("tester probe").
FIG. 2 illustrates the sensitivity of the invention method for detection of
clones expressing two different DNA binding proteins, including detection of expression of a protein from a clone in a host cell that endogenously produces the
same protein. Expression vectors for either the Mus musculus SATB1 gene or the
Rattus norvegicus HMGI(Y) embryonic brain-specific cDNA were mixed with
cDNA library pools in the indicated ratios, and complexes of DNA with DNA
binding proteins were detected by electrophoresis as described in Example 1. The
autoradiograph shows that both proteins were clearly detected by the invention
method even at the lowest ratio (1 :500) of target clone to library clones, producing the smallest number cells expressing the target clones. (B) A clone expressing
mouse SATB1 protein was mixed into an expression library at a 1 : 500,000 ratio,
and the expressed protein pools were tested for the presence of SATB1 DNA-binding activity according to the invention method. A band of complexes of
the SATB1 protein with labeled target DNA is visible in the analysis of plate 7,
row E, lane 13.
FIG. 3 illustrates identification of the SATB1 protein-expressing clone in a library pool at a ratio of 1:500,000 using an antibody-protein-DNA complex in a
"supershift" assay, as described in Example 2, in which anti-SATBl antibody is
added to the protein-DNA complexes before electrophoresis, resulting in
"supershifting" of the tester DNA to antibody-protein-DNA complexes that migrate more slowly than DNA or protein-DNA complexes.
FIG. 4 illustrates the capability of the invention methods for identifying an unknown binding protein, as described in Example 3. (A) Complexes of DNA and an unknown binding protein are detected in plate 6, row G, pool 8 (upper gel). The
clone was isolated and sequenced and identified as an isoform of the hnRNPDO
gene. Inclusion of a reference DNA for a known binding protein in each sample,
which produces a relatively constant intensity at the appropriate gelshift position
(lower gel), shows that the higher intensity at the hnRNPDO gelshift position in the upper gel is not a false-positive clone identification resulting from a higher protein
concentration than in surrounding samples. (B) The effect of antibody specific to
the hnRNPDO gene product in retarding migration of the hnRNPDO protein-DNA complexes is demonstrated in a supershift assay.
DETAILED DESCRIPTION OF THE INVENTION
Recent advancements in gene discovery emphasize the importance of gene identification. However, for effective drug discovery and development, what
matters is only the ultimate functional phenotype resulting from gene expression.
The present invention emphasizes gene function as the measure of biological
relevance. The present inventor has recognized the importance of both genetic and epigenetic elements in the ultimate gene function and has provided technology to identify aspects of both of these elements, by developing a concept which the inventor calls the "Functionomics" paradigm. This term refers to the identification
of a gene, a group of genes, an epigenetic factor, a group of epigenetic factors or
any combination of the above, by detection of functional changes. In contrast, the
previously developed Genomics and Proteonomics paradigms identify changes in mRNA and protein levels respectively. Another previously described paradigm, "Functional Genomics", is a gene discovery paradigm. The purpose of the
Functional Genomics paradigm is to identify a gene by a functional assay.
The Functionomics paradigm of the present invention further develops the
Functional Genomics concept. The central hypothesis of the Functionomics
paradigm is that genetic or epigenetic function is the ultimate measure of
phenotype. Therefore, the aim of the Functionomics paradigm is identification of
both the epigenic and genetic denominators of biological function. The
Functionomics aspect of the present invention provides the ability to assess
changes in RNA and DNA binding functions of a given protein and the ability to
identify both genetic and epigenetic factors underlying those functional changes.
The present Functionomics paradigm is capable of addressing the genotype- phenotype paradox. For example, 31 ,000 human genes have been identified by the
Human Genome Consortium, close to the figure of 19,000 of a nematode worm.
However, the phenotypic and functional differences of the compared species are
significant. The reason for these differences is that the product of a single gene is
diversified by epigenetic factors. First, mRNA editing and splicing can lead to
diversification. RNA-binding proteins are essential in this process. Second,
post-translational modification is a powerful mechanism to achieve functional diversification. In living cells, proteins and exogenous polypeptide drugs are subjected to phosphorylation, glycosylation, cleavage, folding, degradation,
protein-protein interaction and ligand interactions that determine biological activity, half-life and intracellular transport. Epigenetic processing is sharply different among various cell types and among various species. Therefore, the
functional properties of a gene product can be assessed objectively only if the gene
product is subjected to proper epigenetic modifications.
The present invention operates by measuring RNA and DNA binding
functions of gene products. These functions are central to RNA editing and transport, transcriptional regulation and viral life cycles. Moreover, RNA and
DNA binding functions are determined by both genetic and epigenetic elements specific to a given cell type and are fundamental functions. The invention provides
technology for directly assessing nucleic acid binding activity of a gene product,
thereby advantageously avoiding any indirect detection approaches.
Applications of the present invention technology may be considered either
"divisional" or "combinatorial", depending on whether a library of clones to be
examined are divided into separately tested samples or are combined from individual clones or less complex clone mixtures. Divisional applications are
suitable for selection of an unknown gene. Combinatorial applications are designed for selection of both clones and epigenetic factors.
In divisional applications the only initial requirements to perform a clone selection are a clone library and an RNA or DNA sequence that serves as a specific
binding site for a desired protein. To identify a DNA sequence coding for a binding protein, a DNA library is cloned in an expression format (for instance, in a plasmid or viral vector). The library is divided into pools of small numbers of
clones and desired vertebrate cells are transformed or transfected with each pool
separately to express each product of a recombinant gene in the post-translationally
modified form(s) characteristic of the desired host cells, according to principles well known in the art. After expression of the divided clones, the cells are lysed, protein extracts are tested by RNA or DNA binding assay, and clone selection is further performed as described below.
In combinatorial applications of the present invention, a gene of a known
RNA or DNA binding protein, in an expression-ready format, combined with
either divided pools of an expression library, pre-selected expression clones of
post-translational modification enzymes, polypeptide drug variants, and/or with expression clones of suspected dimerization partners. Alternatively, the
transformed expression host cells are simply treated by ligand variants. Desired
host cells are co-transformed or co-transfected by any of the above combinations of
clones, and the protein products are co-expressed. Then an RNA or DNA-binding
assay is performed and analyzed by high throughput electrophoretic mobility shift assay ("EMSA"), or by column chromatography as described below.
Combinatorial applications of the invention include a combinatorial matrix
arrangement and co-expression of functionally linked genes which are purposely
organized in functional groups to allow high throughput, simultaneous assay of proteins of already isolated genes, or of native or artificial ligands, in an organized
fashion. Moreover, the elements of a matrix may correspond to elements of a
cDNA microarray or to a protein chip manifesting the functional assay format of the array technology. This "Functionom" matrix arrangement provides significant capacity for representation of spatial and temporal factors when it is applied to different cell types or organotypic cell cultures. This feature of the combinatorial
application allows identification of different sets of modification enzymes at
different stages of development or in different cell types, thereby permitting
identification of the particular factor(s) that lead to activation or inhibition of a particular DNA or RNA binding activity. Combinatorial applications are
compatible with both cDNA microarray and Proteonomics technologies and can provide fundamental evidence for the functional activity of genes or proteins
detected by microarray technologies.
More particularly, the present inventor has developed technology for direct
clone screening and selection in a vertebrate host for genes encoding nucleic acid
binding polypeptides or polypeptide drugs that act on either DNA or RNA binding
proteins. This invention permits rapid and efficient identification of such a clone
via the polypeptide product which is expressed in vertebrate cells and therefore
exhibits vertebrate post-translational modifications.
Selected embodiments of the present invention originally were developed
for cloning the gene for a particular gene repressor protein that binds to a specific
target sequence and thereby represses an associated gene during brain development. Proper phosphorylation of the repressor is necessary for specific
binding to the target DNA sequence. Previously, this cloning was attempted using
a phage display library, a phage expression library and the so-called "Yeast One-
Hybrid System" without success. Using the present invention, the cloning was
successfully and efficiently completed. The invention has therefore been shown to provide efficient cloning and screening of vertebrate genes for nucleic acid binding proteins, through expressing the gene product in its natural environment that
provides the proper post-translational modifications whereby the protein is fully
active. This feature is essential for precise functional characterization of a
vertebrate nucleic acid binding protein and for identification of its coding
sequence.
One main feature that the invention provides, therefore, is a complete clone screening and selection system for genes of DNA and RNA binding proteins expressed in vertebrate cells. In one aspect, the invention provides means for
optimizing vertebrate cell transformation or transfection conditions, by detecting or
measuring the nucleic acid binding function of an expressed control protein. In another aspect, the invention provides a high throughput electrophoretic mobility
shift assay (EMSA) for detection of complexes between a nucleic acid and a
nucleic acid binding protein, using a novel application of layer agarose gels for
EMSA. This gel system combines a high vacuum and high temperature blotting
technique for agarose gel desiccation with the use of high efficiency transfer matrix (preferably, quaternary aminated nylon membranes) for blotting nucleic
acid-protein complexes from agarose gels. Another aspect of the invention
provides improvements in overcoming sample-to-sample variations in
signal-to-noise ratio during clone screening, by introducing a reference probe to
detect a known nucleic acid binding protein that is expressed in the vertebrate host cells, thereby providing a functional indication of the total protein level in each cell
lysate sample as a basis for normalization of those levels in comparing different
samples. The invention also provides high throughput chromatography screening methodology for nucleic acid binding proteins, permitting automated operation of the screening system.
The general vertebrate cloning and screening process for genes encoding
nucleic acid binding proteins includes steps of library division and protein expression, followed by a nucleic acid binding assay as a clone screening and selection step, where binding is analyzed by either EMSA or a chromatography
assay, preferably in a high throughput assay format.
1. Library division.
The genetic library for screening may be any DNA library, for example, a random permutation library, subtractive library or any desired homogenous or mixed DNA sequences. Preferably, the library is a primary library, to provide the
most accurate representation of the source genetic material.
The library is constructed in a vector, preferably a shuttle vector comprising
a strong promoter/enhancer construct that controls expression of the cloned DNA
sequence in the vertebrate cell or cell line desired to be the host for expression and
post-translational modification. Advantageously, the promoter is a strong
promoter. Preferably, the promoter and other elements of the expression system
will produce recombinant proteins at a minimum level of at least about 1%, more preferably, about 10% of the total protein content. The vector may contain a
replication origin that allows replication of the vector in vertebrate cells, as well as preferably another replication origin for amplification of the DNA in a bacterial
host, and one or more selective marker genes, such as an antibiotic resistance gene.
Alternatively, the vector can be a phage/viral vector system and the divided library
amplification may be performed in non-bacterial cells or in vitro. For amplification in bacteria, cells are transformed with the library-vector construct.
The transformation efficiency, extent of representation and average insert length
may be determined to assess library quality.
Preferably, the vector DNA representing the primary library that is used to
transform vertebrate cells is produced under conditions that prevent methylation of the vector DNA. Numerous vertebrate genes are subjected to gene silencing by the
methylation of GC-nucleotide rich sequences. Therefore, it is preferable to
transform or transfect vertebrate cells with a non-methylated form of the vector
DNA. For example, it is advantageous to grow the library vector in dcm- and dam- bacterial strains.
The number of clones per polypeptide test sample is a critical parameter of
the direct vertebrate screening method of the invention. For instance, a library
aliquot of about 500,000 cfu (colony formation unit), may be divided into 1000
wells with about 500 cfu/well clone representation. Greater library division of such an aliquot (into more cultures) is preferred, such as 2,500, 5,000 or 10,000
wells, that would result in 200, 100 or 50 clones/well representations respectively.
For libraries of better quality (in terms of representativeness and insert length, for
instance), the initial library size may be decreased below 500,000. In addition, if
robotic processing is available, then screening of larger libraries, over 500,000 clones, may be readily accomplished using methods of the present invention.
Accurate estimation of the nucleic acid concentration is critical for efficient
transformation and screening of vertebrate expression host cells. For DNA vectors,
it is difficult to determine vector DNA concentrations with the required accuracy since common techniques for measuring bacterial DNA content also detect contaminating RNA which can cause 50-80% inaccuracy. During the transfection
step, a 50% error in DNA concentration can result in an order of magnitude drop in protein expression level. Therefore, the samples may be further treated (e.g., with
RNase) and/or further purified to remove contaminating RNA.
Alternatively, in another aspect the invention provides a method for determination of optimal nucleic acid concentration for cell transformation or
transfection, by empirically testing a control combination of nucleic acid binding
protein and "tester" nucleic acid sequence that binds to the control protein. The
gene encoding the control binding protein, for instance a protein that specifically binds to a particular tester DNA sequence, is cloned into the same vector carrying
the library clones to be screened. The vector DNA carrying the control gene is mixed with an excess of the library-vector DNA, for instance, at a 1 :500 molar
ratio, respectively.
The mixture of control and library vector DNA is amplified, for instance, in
a bacterial host. Preferably, multiple (for example, 1-10) control cultures are
inoculated with about 500 cfu/culture of the control transformant mixture. The
transformant cells are incubated under conditions that select for transformants, for example, antibiotic selection conditions. Vector DNA is isolated from the bacterial
cell cultures and is purified by conventional methods, for instance, gel filtration
chromatography, or any purification technique that eliminates small molecule and
toxic protein contaminants.
After transfection of multiple cell cultures (as described below) with a
range of concentrations of the test DNA mixture, transformed cells are incubated to
allow protein expression. Care should be taken to avoid incubating a large number of cell cultures concentrated in a small area as such crowding produces locally high
concentrations of toxic gases that slow cell growth, alter cell metabolism and lower
the expression rate of a recombinant protein. Thus, the expression level of proteins is improved by either more distant relative positioning of cell culture plates or by moderate ventilation of the cell cultures during the incubation time. After incubation to allow library expression, the transformed vertebrate
cells are resuspended in lysis buffer to test for DNA binding. For example, sample
mixtures of control DNA:library vector DNA with ratios bracketing 1:500 are
carried through the amplification and purification procedure for use as test DNA mixtures in optimization of transfection of vertebrate host cells which library
division produces about 500 clones/culture.
Conventional lysis buffers may be used, but the KCl concentration of the
buffer is preferably adjusted for every cell type tested, since coding regions frequently contain signal sequences that guide a recombinant protein into the cell
nucleus and, therefore, conditions that lyse the nucleus must be used. In general, 450-600 mM of KCl is sufficient to release the nuclear content of the cell.
However, it is preferablef to verify that the nuclei of the selected host cells are
properly lysed to make the binding proteins accessible for the binding assay, for
instance, by visual inspection or biochemical assay (e.g., for DNA).
The cognate tester DNA for the control binding protein is mixed with
protein lysates of the control DNA-transformed cells and tested for DNA binding,
for instance, in an Electrophoretic Mobility Shift Assay (EMSA), such as the EMSA described below. The control vector DNA concentration that results in the greatest amount of protein-DNA complex (for instance, the highest band intensity in EMSA) is used as a concentration standard during a fluorometric DNA
concentration determination for all separately amplified portions of the library.
For instance, each of the 1000 aliquots of purified DNA of the library described above is diluted with the highest possible accuracy to the concentration of DNA
recorded for the control DNA dilution that gave optimum detection of the control DNA binding protein.
2. Protein expression.
Any vertebrate cell which can be genetically transformed or transfected to
express a desired library is suitable for use in the present invention. For optimal
post-translational modification of the expressed protein, any kind of vertebrate cells or cell lines may be used such as stem cells, glioma cell lines, epithelial cell
lines, primary cell cultures, organotypic cultures, provided that the expression host
provides a high yield of the recombinant protein relative to total cell protein
content. For delivery of the library clones, any technology may be used that provides efficient transformation and expression, including transformation,
transfection or, if a viral delivery system is used, infection. Preferably, the
combination delivery system and expression host provides an expression level of a
control protein in the total cellular protein of at least about 1%, more preferably at
least about 3% and most preferably, at least about 10%.
For example, to test the direct vertebrate screening method of the invention, human kidney epithelial cells were plated in 24 well plates at a density of about
25% confluence and were incubated under conditions that produced about 50%
confluence within 24 hr. Then, 1000 cell culture wells were transfected with the 1000 purified library vector DNA dilutions described above, adjusted for optimal DNA concentration by reference to the optimal control DNA concentration for
screening. Following transformation or transfection, the cells go through a short
recovery period (typically 3-5 hrs), and then recombinant proteins are expressed, for instance, during a 48 hr incubation period. Culture conditions were selected such that by the end of the incubation for protein expression the cells reach a density of about 200,000 cells per culture well. Then the cells were resuspended in
lysis buffer designed to keep proteins intact, support the subsequent DNA binding
reaction and not interfere with the EMSA, according to principles well known in
the art. A typical lysis buffer composition to meet these requirements is 20 mM HEPES (pH 7.9), 450 mM KCl, 1.5 mM MgCl2, 0.2 mM EDTA, 0.1% IGEPAL,
25% glycerol, 0.5 mM AEBSF, 10 mg/L Leupeptin, and 10 mg/L Aprotinin.
However, the required concentrations of lysis buffer components, especially the KCl and non-ionic detergent concentrations, may be optimized empirically for a
given cell type prior to the actual experiment by microscopically monitoring the lysis process.
To select conditions that provide sufficient DNA binding protein for
detection, the following exemplary calculations apply, which can be adapted to
cells of different sizes and varying expression levels, as needed. For example, the protein weight in 200,000 cells (in one well, under the exemplary conditions above), is 100 microgram. The lysis buffer volume is 50 microliter. Therefore, the
protein concentration is 2 microgram/microliter. For the binding assay described
below, the maximum applied volume ratio is 1/10, due to the necessary salt and
detergent dilution, and the binding reaction volume is 10 microliter. Therefore, by the addition of 1 microliter of cell lysate into the binding mixture, the added total protein amount is 2 microgram. If the transfection efficiency is 50% and the
representation of a positive clone is 1 :500 (according to the clone input ratio), then 200 cells are expressing the positive clone (50% of 200,000/500=200). However,
the 10% recombinant protein expression level may be only a theoretical optimum. In practice this level may be lower, for instance, only about 1 % recombinant
protein, because living cells may activate pathways to degrade and eliminate excessively represented proteins.
In summary, in the above example, the positive clone is assumed to be
expressed in transformed 200 cells at a level of 1% of the total protein content of
those 200 cells. Therefore, the representation of the desired DNA binding protein
is 1 :100,000 relative to the total protein mass. The applied total protein mass in each binding reaction is 2 micrograms. One-hundred thousandth fraction of 2
microgram of protein is 20 picogram, which represents the amount of DNA- binding protein in the binding reaction mixture. Assuming that this protein has a
molecular weight of 100 kDa, 20 picogram of the protein corresponds to a
molecular number of 200 attomol (200-quintillionth of Avogadro's number). The molecular number of the tester DNA is 10 femtomol; therefore, using conventional
radiolabeling and detection technology for DNA binding protein assays, a signal
would be detectable only with DNA that is labeled with a specific activity of at
least about 5000 cpm/femtomol, where the quantitative binding of 200 attomol typically results in a detectable about 1000 cpm signal.
However, within the 1 : 100,000 binding protein/background protein ratio assumed here, the abundance of non-specific binding proteins is likely to be high.
Thereby, the signal-to-noise ratio is likely to be decreased, and the specific signal is likely to drop below 200 attomol of bound DNA. In addition, abundant non-
specific binding proteins may distort the gel migration characteristics of the
specific complex and thereby cause a gel migration anomaly.
In summary, if a library that represents 500,000 individual clones is divided
into 1000 culture wells under the above typical conditions, then a nucleic acid binding polypeptide can be detected only by gel electrophoretic conditions that are highly sensitive by conventional standards. If a library of 500,000 clones is divided into 2,500, 5,000 or 10,000 wells, then binding protein detection may not
require sophisticated EMSA conditions or, for a primary screening, may not require gel electrophoresis at all. However, for any screening using a division of a
500,000 clone library into over 1,000 samples, an automated process using robotic
manipulators would be preferred.
3. Clone Screening by High Throughput Electrophoretic Mobility Shift Assay (EMSA).
For high throughput testing according to the invention screening method,
the generally used polyacrylamide gel electrophoresis system is inadequate.
Migration in this gel system is described by the "reptation with stretch" model which predicts increased migrational stress, presumably due to the pore size of the
gel matrix being similar to the size of migrating DNA-protein complexes. This
stress can lead to decreased stability of the complex during the gel migration.
Considering the modified Ogston sieving model, the application of a large pore gel
matrix is necessary. Agarose gels have a large pore matrix, but the resolution of conventional agarose gels is poor and transfer of the DNA-protein complex from
the gel to a membrane for analysis of results may take several hours. For instance,
U. S. Patent No. 4,983,268 issued to Kirkpatrick. et al. discloses high gel strength
low electroendosmosis agarose preparations suitable for use in the present
invention.
In addition to agarose, certain other polysaccharide are suitable for use in gel electrophoresis in the practice of the present invention. As used herein, the
term "polysaccharide" refers to a native or derivatized thermoreversible and/or
pH-reversible hydrogel-forming polysaccharide, preferably selected from among: agar, agarose, curdlan, gellan, konjac, pectin, pullulan, and to a lesser degree
alginate and carrageenan, and the like; as well as thermoreversible or pH-reversible
hydrogel combinations of any of the foregoing with either another polysaccharide
or a non-polysaccharide polymer which associates firmly with the gel-forming polymers so that it does not significantly dissociate under electrophoretic
conditions. (Linear polyacrylamide does not associate firmly with the gel-forming
polymers of this invention.) Methods for making and using such polysaccharide
hydrogels for electrophoresis are described, for instance, in U. S. Patent No. 5,143,646.
In another aspect, therefore, the present invention provides a method of separating nucleic acids, and particularly a method for detection of protein-nucleic
acid complexes, using thin-layer polysaccharide hydrogels, preferably agarose gels,
by combining the application of high temperature and high vacuum for gel drying
with the use of highly efficient binding substrate (for example, quaternary amine derivatized membrane) for nucleic acid transfer, for use in a high throughput
EMSA method that facilitates application of robotic technology.
To overcome the problems with conventional agarose and other polysaccharide hydogels, the gel is prepared in a thin-layer format (for example,
less than or equal to about 3 mm, preferably about 2 to 2.5 mm), which improves
the resolution of the gels. Preferably, and more importantly for efficient detection, the transfer after electrophoresis of the protein-nucleic acid complex from the gel matrix onto a membrane surface is carried out by the simultaneous application of high vacuum and high temperature. At high temperature, the agarose gel material normally would collapse. However, if a thin layer agarose is prepared with, for
instance, a three square centimeter/milliliter surface/volume ratio, then by applying
a vacuum (for instance, less than about 20 torr), the resulting high evaporation rate keeps the agarose at low temperature despite use of a relatively high environmental
temperature for blotting (e.g., preferably about 80°C). A high blotting temperature
is necessary because, if high vacuum is applied without high temperature, then the
gel structure suffers lateral distortion. Also, high temperature facilitates a higher evaporation rate that keeps the gel temperature low and thereby prevents the gel
matrix from deformation and ensures a rapid and accurate transfer. This blotting process reduces the blotting time substantially, for instance to about 20 minutes,
allowing the completion of a gel running and blotting procedure within about
90 minutes
In addition, the quality of the blotting is ensured by the use of an efficient
DNA binding substrate for the transfer membrane, preferably a quaternary amine
derivatized nylon membrane, that retains quantitatively the blotted nucleic acids
and complexes. By reducing the loaded sample volume, for instance to about 5 microliter, and by using wedge-shaped loading wells, the resolution of the gel is also markedly improved. As a general practice, maintaining the gel temperature at
below ambient temperature (preferably about 8°C) throughout the electrophoresis also improves complex detection, by increasing their stability during migration.
The invention also provides means to reduce the effects of variability in the
signal to noise ratio that results from unequal protein concentrations among individual cell lysates, by introducing two radioactively labeled DNA probes that are tested in parallel for each cell lysate binding assay. One of these DNA probes is called a "reference" and the other one is called a "tester". The reference DNA
sequence is different from the sequence of the tester and binds a known DNA
binding protein that is expressed in the vertebrate cloning host. The binding level
of the reference DNA is proportional to the total protein concentration in a sample
and therefore provides a baseline binding intensity to which the binding of the
tester sequence can be compared and normalized. Thus, the ratio of free/bound
reference probe serves as a quantitative normalization value to assess the
significance of free/bound ratio differences of the tester in different lysate samples.
The DNA binding reaction mixture typically contains various inhibitors of non-specific binding, such as about 1 mg/ml of non-specific competitor DNA and
about 0.1% by weight of carrier protein (e.g., Serum Albumin), as well as
radioactively labeled tester or reference DNA (preferably about 10 femtomol) and
preferably about 2 microgram of protein lysate. It is preferable to keep the bound/free DNA ratio at or below about 50% during the reaction. Therefore, prior
to large scale analyses (for instance, 1000 assays with the reference and 1000
assays with the tester DNA, in the example above), optimal protein concentration
of the reaction mixture is assessed by titration against the reference DNA.
The above EMSA screening procedure can be performed under the
described conditions by manual processing by one person at a rate of 300 assays/day. Robotic technology can also be applied for this step. 4. Screening by high throughput chromatography. The invention also provides methods for screening samples for DNA
binding proteins by high throughput chromatography, for instance, either instead of, or as a pre-screening step for, the above high throughput EMSA assay. Thus, if
a culture well from the library division contains a relatively small number of clones
(e.g., only about 100 cfu), then chromatography screening procedures can be
introduced. For chromatographic screening, the binding reaction should be
adjusted to provide approximately 50% saturation of the reference DNA, to ensure the optimum bound/free probe ratio of about 50%. Then the binding mixture is
passed through a chromatography column that separates DNA-protein complexes
from free probe, under conditions well known in the art. The amount of the free
probe and/or the bound fraction is assessed by either radioactive or non-radioactive
technologies. A decreased concentration of free probe is an indication of a positive clone (expressing a DNA binding protein) because the DNA binding protein of the
positive clone will increase DNA retention in the protein-bound fraction.
A titration reaction was performed using the above model system
conditions, to assess what representation of a clone of a DNA-binding protein in a
primary library distribution well would provide sufficient DNA retention to detect a positive signal by measuring only the bound fraction and/or the free probe. The results suggest that about a 1 :100 representation of a DNA-binding clone per
culture is sufficient to detect either an increase in the protein-bound DNA or a
decrease in free probe, allowing screening by chromatography.
Application of reference and tester binding reactions is also recommended for chromatography screening. For instance, the chromatography procedure can be performed in a 96 well format with a non-radioactive detection method and can be operated automatically. Clone candidate wells selected by chromatography are
preferably further analyzed by EMSA, to confirm the presence and amount of DNA binding protein.
5. Data analysis and single clone isolation. Both conventional autoradiography and known phospho-imaging recording
methods are suitable for recording of EMSA results. The results are preferably
evaluated independently by two investigators and then compared. If by both
detection techniques a positive clone-containing well is identified, then the clone candidate may be further tested, for instance, in three independent EMSA tests
comparing tester and reference bound/free ratios.
The present invention methods have been successfully applied to detect a
gene expressed in vertebrate cells which encodes a protein which binds to a tester
DNA that is similar to sequences known to be recognized by a particular binding protein. This test of the technology, conducted under the above conditions,
resulted in five culture wells (out of 1000 library portions) that contained positive
clone candidates. One out of the five clone candidates was further processed. The representation of a single positive clone in the total clones in the well was 1 :500; therefore, the resulting clone pool required further division.
Bacterial cells were transformed with plasmid DNA mixture from the pool
of 500 different clones and were divided among 100 wells with a 5 clone/well
representation. The procedures described under sections 1, 2, 3 and 5, above, were repeated, and one positive well containing at least one clone expressing the DNA binding protein was identified. The EMSA signal was clearly detectable and the
DNA-protein complex was clearly distinct from the background.
Bacterial cultures were again transformed with the plasmid mixture of the
positive culture well, and single colonies representing single clones were isolated.
Plasmid DNA was purified from the bacterial cells, and the plasmids were analyzed for insert size by PCR and by digestion with restriction endonucleases.
Kidney epithelial cells were transfected with the individual plasmids.
Finally, a single positive clone was identified by EMSA. The DNA
sequence of the positive clone was determined and a homology search was performed against deposited sequences in the GenBank. A single matching gene
was identified, with a 99% homology. The identified gene encodes a protein with
known affinity to nucleic acid sequences strikingly similar to the tester DNA
sequence.
The test operation of the invention was-performed by EMSA. However, a 96 well format chromatography screening for nucleic acid binding polypeptides
also has been assessed and found suitable for high throughput screening.
The invention technology also can be applied for cloning of RNA-binding
proteins, by using RNA sequences in the binding tests, under conditions well known in the art.
The invention also is suitable for screening clones to isolate polypeptide
drugs that are effective against a DNA or RNA binding protein, for instance, a regulator protein of a virus or a cancer cell. Moreover, the effective polypeptide can be selected in the cell that is the natural target for the viral infection or can be
selected in the cancer cell of interest. The cloning technology can be applied for a viral delivery system, when the library is infected into the cells by a carrier virus such as HIV or adenovirus. Then a regulator of the viral replication or viral protein
expression (for example the TAR-binding protein in HIV) can be monitored in the
divided wells by RNA or DNA binding assay. The positive well where the viral regulator protein binding activity is decreased is further divided, and the clone
responsible for the inhibition is identified from the library.
The invention also can be applied to screening of receptor blocker drugs
that interfere with the DNA binding properties of a particular receptor. For
instance, the invention can be used for cloning of protein kinases, protein
glycosylases, proteolytic enzymes and enzymes assisting proper folding. If an
enzyme activates or inhibits a target protein with respect to its DNA or RNA
binding activity, then the gene of the target protein can be co-transfected with a
library. After co-expression of the divided library with the target protein, a binding assay will identify any positive well where the DNA or RNA binding activity of the
target protein is impaired. With further dilution the effective clone can be isolated and then characterized.
In summary, the invention can be used for cloning DNA and RNA binding proteins, to clone genes of anti-viral drugs, anti-cancer drugs, receptor blockers,
kinases, glycosylases, and peptidases, so long as the function of the desired gene
can be detected by monitoring a protein with DNA or RNA binding activity.
Advantages of the invention methods over presently known systems for cloning and screening of genes for nucleic acid binding proteins include the fact that the polypeptides are expressed in vertebrate cells or cell lines which provide
proper post-translational processing, such as phosphorylation, glycosylation,
cleavage and folding, appropriate for vertebrate cells. The nucleotide binding
characteristics of these polypeptides are therefore optimal. In contrast, conventional phage expression library screening technology operates in bacterial
cells, whereas another known technology, the so-called "Yeast One-Hybrid
System", operates in yeast cells, thereby providing post-translational modifications
representative only of a bacterial or yeast cell, respectively. A yeast system is less than ideal for direct screening of clones for DNA-binding receptors, activators, suppressors and transcription factors, for example, from the human brain.
Accordingly, efforts to introduce genes for mammalian post-translational
modifications (for instance, protein kinases) into yeast cells have been made, to
compensate for this serious disadvantage, but reproduction in yeast of the total vertebrate environment provided by the present invention would presumably
require introduction of hundreds or thousands of kinases, glycosylases, proteolytic
enzymes and enzymes that assist in protein folding. Although mammalian
expression systems, such as the "Mammalian One-Hybrid System", have recently
been developed, these technologies rely on uncertain "reporter" technologies for clone detection and therefore identify a significant number of false positive clones.
The traditional way to identity a DNA binding protein is DNA-affinity chromatography, requiring grams of cell or tissue samples that are impossible to
access in many cases, for example, from human specimens. Once the protein is purified, there is still the challenge of clone isolation ahead. In contrast, the
present invention can operate in human cell lines of choice to clone the desired gene. Therefore, another advantage is the flexibility of the invention system. The procedure can use any kind of library such as subtractive library, cDNA library or
combinatorial random peptide library. The only initial information needed to start
a clone selection is a DNA or an RNA sequence for which identification of specific
binding proteins is desired. Alternatively, the system can be used to identify clones expressing any polypeptide that activates or inhibits a DNA or RNA binding
function of another protein.
Another advantage of the invention is that a broad range of vectors can be
chosen or constructed, and the approach is not limited to any particular components
with inherent limitations. Further, the invention methods can be completely
performed by automated systems, using conventional robotic stations for DNA manipulations, such as those used in current human genome analysis projects.
Thus, the invention technology is remarkably cost and time effective, and the pay
off is multiplied because, once (and only once) a cell lysate collection of divided library pools (for instance, 1000 portions) is prepared, then those portions can be
tested for many different DNA binding proteins. Also, "lysate libraries" may be generated from multiple tissue samples at different stages of development, for
instance, from embryonic brain, newborn brain and adult brain, and used
comparatively to select the most effective polypeptide drugs for each patient age
range. The same approach can be extended to different tissue types or brain areas,
thereby permitting comparative drug selection to be performed in a spacio-
temporal fashion.
The invention provides efficient clone selection in vertebrate cells expressing both DNA and RNA binding proteins, as well as proteins that inhibit or
activate nucleic acid binding polypeptides. Thus, the invention can be used for
screening for polypeptide drugs that inhibit or facilitate the nucleic acid binding
properties of a given binding factor. For example, to identify a peptide drug that blocks the binding of a DNA-binding receptor (Vitamin D Receptor) in a brain
tumor cell, one can transform a random library into that particular cell line and
perform clone selection to identify clones of polypeptide that blocks the VDR
binding to its cognate element.
Qualitative and quantitative assessments of relative binding inhibition of
multiple positive clones can also be assessed using the invention methods. The
effectiveness of a polypeptide drug, for instance, an anti-tumor drug, is assessed
"on target". Moreover, the clone is on hand right away. If the drug is now
introduced for therapeutic purposes into the same tumor cells as used for screening, the post-translational modification of the cells will support the effectiveness of the
drug. Also, only those drugs will be selected which escape the proteolytic
inactivation sufficiently to provide an adequate half-life in the host cell. Therefore,
the cloning system of the invention gives an indication of complete drug effectiveness in a given cell type. This is a unique feature of the invention.
Another problem addressed by the invention is the cloning of RNA-binding proteins by direct detection. Hybrid systems recently applied for cloning RNA
binding proteins in mammalian cells rely on reporter detection, by definition indicating that detection is "indirect". The acquired information includes both signal and meaningless noise from effects related only to the reporter system.
Moreover, indirect detection technologies rely on artificial protein fusion constructs. Disadvantageous fusion of two proteins alone may result in a large
number of false positive clones and may hinder detection of real positives. In contrast, the present technology does not use fusion proteins or reporter systems,
but directly detects and follows the visible DNA or RNA-protein complex
throughout the entire procedure, focusing solely on the basic function, the protein
nucleic acid interaction. One ultimate benefit of the approach presented here is
therefore the high reliability of positive signals.
The discovery and availability of the majority of genes in the human
genome presents a different application of the invention technology for clone
selection. This application uses combinatorial co-expression of previously
identified genes. In this combinatorial application, genes in expression-ready format are arranged in a matrix according to functional linkage groups. These
groups are then combined with genes coding for RNA or DNA binding proteins to
form combinatorial expression pools. Vertebrate cells are transfected separately with each combinatorial expression pool, and the proteins are co-expressed. The
tested binding proteins are assayed by the invention technology. A partner element
in an expression matrix may be represented on a microarray. Moreover, an
expression matrix element may represent the combination of multiple functionally linked partners in connection with an assayed DNA or RNA binding protein. All of these elements may be represented on a cDNA array. For example, assume an
unknown member of a protein family forms a heterodimer with a known protein in
order to bind an RNA element. Then, vectors coding for individual members of
this protein family are co-expressed with the vector expressing the known nucleic acid binding protein. This combination may be achieved with one complete combination of all involved expression vectors or by individual combinations of
the known binding protein and one or a few members of the protein family, or may
be achieved with a partially complete combination to assess functional
complementation.
A benefit of the sensitive detection technology is that it allows high clone representation for detection of protein-nucleic acid binding (that is, high library
pool complexity). Therefore, numerous clones may be combined in one expression
matrix element. This number is lower however if co-expression of multiple proteins in the same cell is required for the detection of a change in nucleic acid
binding. The combinatorial matrix arrangement will reveal the coding sequence of
an unknown interaction partner.
Vectors representing cDNAs of complete signaling pathways may also be
combined and co-expressed in expression matrix elements to identify interacting members. In this arrangement the invention technology identifies a functionally important epigenetic signal that has an effect on DNA or RNA binding properties
of a protein. Furthermore, in replicated forms of a combinatorial matrix the
transfected cells may be stimulated by different epigenetic ligands, and the signaling pathway that leads to DNA or RNA binding of a protein can be identified by simply monitoring the RNA or DNA binding activity of that protein. In addition, this same matrix may be produced in multiple replicated formats, each
representing the functional characteristics of different cell or tissue types. All the combinatorial arrangements may be utilized to assay for functional verification of
down or up regulated gene expression detected by a cDNA microarray or by proteomics technology.
In another aspect the invention provides a method for detection of protein-
DNA complexes using antibodies for a specific DNA binding protein. Thus, the thin-layer agarose gel electrophoresis method of the invention is highly suitable for
small or large-scale analysis of antibody-protein-DNA complexes in A so-called
"supershift assay." The pore-size of traditionally applied polyacrylamide gels does
not allow the larger sized particles of antibody-protein-DNA complexes to migrate into the matrix and consequently fails to resolve these complexes. Instead, these complexes appear as a large unresolved aggregate at the loading well of the
polyacrylamide gel. Therefore, the interpretation of specific antibody recognition
of a DNA-protein complex is ambiguous using a polyacrylamide gel. By contrast,
the larger pore size of agarose thin layers allow a large antibody-protein-DNA
complex to migrate into the gel during electrophoresis and resolves the migrating
complex without ambiguity. Use of the gel shift in an array application for testing
large numbers of antibodies for specific protein recognition provides a high
throughput method for the proteomics paradigm. For purposes of illustrating preferred embodiments of the present invention,
the following, non-limiting examples are included. These results demonstrate the
feasibility of DNA-binding protein gene detection using the invention methods.
EXAMPLE 1 : Detection of a desired clone in a library of 500.000 clones
Overview. FIG. 1 shows a flowchart of a preferred embodiment of the
invention method for identification of a DNA clone that expresses a nucleic acid binding protein for a selected nucleotide sequence ("tester probe"), using electrophoresis in thin layer agarose to perform a "gel shift" assay of the invention. Typically, the first stage, "selection of a positive pool" containing a
desired clone, begins with preparing or otherwise obtaining a cDNA expression library containing about 500,000 clones (clone-forming-units = "cfu"), and
separating the library into about 1000 subfractions or "pools" (in practice, the use
often 96-well microculture plates allows convenient processing of 960 separate pools). Each pool is separately amplified in a microculture of a conventional
microbial host, typically E. coli. Plasmids are extracted and purified from each microculture, and each plasmid preparation is separately introduced (e.g.,
transfected) into a microculture of an appropriate vertebrate or mammalian
expression host. Lysates of transfected expression hosts are then tested for DNA
binding proteins by agarose thin-layer electrophoresis, according to the invention
method, using a reference (control) probe and a tester probe. The former contains a nucleotide sequence that binds a reference protein known to be expressed in the
expression host, either endogenously or by inclusion in the expression vector used
to prepare the clone library, while the latter contains a sequence for which a clone
encoding a cognate binding protein is being sought. Positive pools containing a clone expressing the desired binding protein are identified by detection of a band of tester probe whose migration is retarded due to complex formation with a binding protein, in a sample where the band of reference probe complexed with reference
protein appears at about the same level as other samples not showing tester probe complexes, thereby providing a control for consistency of protein and probe levels
across multiple samples.
After selection of positive pools, each such pool (containing about 500
clones) is further sub-divided, for instance, into about 100 subfractions (e.g., into
96 wells of a microculture plate), amplified in the microbial host and tested for
expression in separate microcultures of an appropriate vertebrate expression host. Positive subfractions or pools containing a clone expressing a desired binding
protein are again identified by complex formation with the tester probe, with or
without comparison to reference probe which is less likely to be needed for these subfractions which contain an average of only five clones each and, therefore, less
prone to some artifacts that the reference probe is designed to detect. Individual
positive clones are then subjected to clone verification, including various tests to
identify and characterize the protein forming complexes with the tester probe.
Gel Preparation. The gel matrix is preferably agarose, but also may
comprise dextrins or any other non-polyacrylamide-based gel matrix with suitably
large pores, in which a nucleic acid molecule and a protein or the complex of the
two (or a complex of the two bound to an antibody) migrate freely under
electrophoretic conditions, free of any ionic or covalent immobilization or retardation activities. The gel preferably has a thickness of about 3 mm or less,
more preferably about 2 to 2.5 mm.
After melting agarose in 1 x TBE buffer (45 mM Tris-borate and 10 mM
EDTA, pH 8.0), the gel matrix is allowed to solidify for 12-24 hrs at 22°C. Then the gel is melted again by bringing the gel to a boil, and the liquid is placed at room
temperature and allow ed to cool to 65 °C. The agarose thin-layers are then
prepared by pouring the still liquid gel solution into gel casettes. After solidification at room temperature, the gel is placed at 4°C for at least 2 hr, or sealed to prevent evaporation and stored at 4°C for a maximum of four weeks.
The above repeated cycle of melting and solidification increases the firmness of the gel matrix and improves the sharpness and resolution of the migrating
DNA-protein complexes.
FIG. 2 illustrates the sensitivity of the invention method for detection of
clones expressing two different DNA binding proteins, including detection of
expression of a protein from a clone in a host cell that endogenously produces the
same protein. Expression vectors for either the Mus musculus S ATB 1 gene (Dickinson, et al.. "A tissue-specific MAR/SAR DNA-binding protein with unusual binding site recognition", Cell, 70, 631-645 (1992)) or the Rattus
norvegicus HMGI(Y) embryonic brain-specific cDNA (Dobi unpublished) were
mixed with library pools in 1 :50, 1 :150 and 1 :500 ratios and complexes of DNA
and DNA binding proteins were detected by electrophoresis as described above. As shown in the autoradiograph (panel A), both proteins were clearly detected by
the invention method even at the lowest 1 :500 ratio of desired clone to library
clones. Expression of the HMGI(Y) protein above the endogenous level in the host
kidney epithelial cell line (BCGR) is clearly detectable, even at the 1 :500 ratio of
desired to library clones. For panel (B), a clone expressing the mouse SATB1
protein was mixed into an expression library in a 1 :500,000 ratio. After dividing the library into 960 pools, proteins of each pool were expressed in kidney epithelial
cells (PEAKrapid, Edge BioSystems), and the protein pools were tested for the
presence of SATB 1 DNA-binding activity according to the invention method. A band of complexes of the SATBl protein with labeled tester DNA is visible in the
analysis of plate 7, row E, lane 13. The analysis also shows that the intensity of reference probe binding in this sample does not deviate from other reference reaction intensities, indicating that the pool identified as containing the SATB 1 clone is a "true-positive" pool (that is, not an artifact due to different amounts of protein extracts or DNAs in different samples).
EXAMPLE 2: Detection and characterization of a desired clone using antibodies
This example illustrates the use of an antibody to detect and identify a particular nucleic acid binding protein, for instance, where a library is known to
contain at least one protein that binds to a tester probe but may also contain one or
more additional binding protein(s) specific for that same tester probe.
In particular, FIG. 3 demonstrates identification of a SATB 1 protein-
expressing clone added to a cDNA clone library at a ratio of 1 :500,000, using an antibody-protein-DNA complex in a "supershift" gel electrophoresis assay of the
invention. The protein pool that contained the expressed SATBl protein was
mixed with anti-SATBl antibody and the tester probe. Upon specific recognition,
an antibody-protein-DNA complex is formed that is observed as a "supershifted" band that migrates in the thin-layer agarose gel (lane 5) more slowly than the
SATBl protein-DNA complext alone ("complex 1"). Antibody that does not
recognize the SATBl protein does not produce this supershift (lane 6). As a control, a specific DNA competitor (W, "wildtype") is shown to compete for SATBl protein (complex 1). This competitor is specific for SATBl and, therefore,
does not compete for endogenous HMGI(Y) (complex 2). The letter M indicates
mutant competitor DNA that does not bind to SATB 1 or HMGI(Y) proteins.
EXAMPLE 3 : Detection and characterization of a novel rat forebrain binding protein. FIG. 4 illustrates the capability of the invention methods for identifying a
clone for an unknown binding protein from an expression library of cDNA clones
from a selected tissue. The cDNA library originated from embryonic rat forebrain and, therefore, embryonic rat forebrain nuclear proteins were tested in the gel shift
and supershift assays to determine whether an antibody can specifically bind to the
protein-DNA complex of the tester DNA and rat forebrain nuclear proteins.
Complexes of tester DNA and an unknown binding protein are detected in plate 6, row G, pool 8 (panel A, upper gel). The clone was isolated and sequenced and
identified as an isoform of the hnRNPDO gene. Dimethyl-arginilation and specific
phosphorylation and splicing significantly alter the nucleic acid binding
characteristics and subcellular localization (availability) of hnRNPDO gene products. Therefore, successful expression cloning of this isoform in a non-
mammalian expression system is unlikely. Inclusion of the reference DNA probe
for a known binding protein in each sample, which produces a relatively constant
intensity at the appropriate gelshift position (lower gel), shows that the higher intensity at the hnRNPDO gelshift position in the upper gel is not a "false-positive"
identification resulting from a higher protein concentration than in surrounding samples. After identification of the coding sequence of the hnRNPDO gene, antibody specific to this protein was obtained and formation of antibody-protein-
DNA complex was tested in a supershift assay (panel B). The specific antibody caused "supershifting" of protein-probe complex bands into several slower moving
bands, perhaps indicating complexes containing multiple binding protein and/or antibody molecules, while control antibody that is not specific for the binding protein does not produce any such supershift.
* * *
All publications and patents cited herein are hereby incoφorated by
reference herein in their entirety.
While the foregoing specification teaches the principles of the present invention, with examples provided for the purpose of illustration, it will be
appreciated by one skilled in the art from reading this disclosure that various
changes in form and detail can be made without departing from the concept, spirit
and scope of the invention.