[go: up one dir, main page]

WO2004005547A2 - Procede - Google Patents

Procede Download PDF

Info

Publication number
WO2004005547A2
WO2004005547A2 PCT/GB2003/002895 GB0302895W WO2004005547A2 WO 2004005547 A2 WO2004005547 A2 WO 2004005547A2 GB 0302895 W GB0302895 W GB 0302895W WO 2004005547 A2 WO2004005547 A2 WO 2004005547A2
Authority
WO
WIPO (PCT)
Prior art keywords
sequences
sequence
consensus sequences
chromatin
identified
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/GB2003/002895
Other languages
English (en)
Other versions
WO2004005547A3 (fr
Inventor
Robert Otto Johannes Weinzierl
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ip2ipo Innovations Ltd
Original Assignee
Imperial College Innovations Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from GB0215547A external-priority patent/GB0215547D0/en
Priority claimed from PCT/GB2002/003080 external-priority patent/WO2003004702A2/fr
Application filed by Imperial College Innovations Ltd filed Critical Imperial College Innovations Ltd
Priority to AU2003281288A priority Critical patent/AU2003281288A1/en
Publication of WO2004005547A2 publication Critical patent/WO2004005547A2/fr
Publication of WO2004005547A3 publication Critical patent/WO2004005547A3/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6813Hybridisation assays
    • C12Q1/6827Hybridisation assays for detection of mutation or polymorphism
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61PSPECIFIC THERAPEUTIC ACTIVITY OF CHEMICAL COMPOUNDS OR MEDICINAL PREPARATIONS
    • A61P43/00Drugs for specific purposes, not provided for in groups A61P1/00-A61P41/00
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids

Definitions

  • the present invention relates to methods, their uses and products obtained therefrom.
  • the present invention relates to Hypersensitive Sites (HSs) and methods for identifying HS consensus sequences, HS sequences and HS core sequences.
  • HSs Hypersensitive Sites
  • the large amount of DNA present in eukaryotic cells needs to be efficiently stored into a small space, the cell nucleus. This is achieved by packaging DNA molecules into chromatin, which involves looping DNA molecules around histones to create nucleosomal DNA-protein complexes. Subsequent coiling of these nucleosomal complexes into solenoid and higher order structures increases the packaging density further.
  • HSs 'Hypersensitive Sites'
  • nuclease hypersensitive sites are genomic regions that are up to two orders of magnitude more accessible to nuclease digestion in purified nuclei preparations in comparison to bulk nuclear DNA (Nedospasov and Georgiev, 1980; Wu, 1980).
  • HSs are highly specific localized DNA access points for a variety of factors involved in transcription, replication, repair, recombination and attachment to the nuclear matrix.
  • Some HSs are permanently present ('constitutive' HSs), whereas other HSs are only formed in response to specific endogenous or exogenous stimuli ('regulated' HSs). Wifh the advent of extensive sequence data from a variety of eukaryotic genome projects there is renewed interest in bioinformatic tools suitable for identifying functional regulatory elements on the DNA sequence level.
  • the present invention relates to novel and useful aspects concerning HSs.
  • the present invention is based upon the surprising finding that HS consensus sequences can be identified from a plurality of HSs.
  • the present invention relates to HS consensus sequences derived from a plurality of HS sequences - such as HS core sequences.
  • HS consensus sequences may be used to allow the prediction of other HS sequences using bioinformatic tools rather than using exclusively experimental tools.
  • bioinformatic tools rather than using exclusively experimental tools.
  • the availability of large portions of the human genome sequence presents an opportunity for identifying, mapping and analysing HSs on a large scale using computational approaches.
  • the present invention relates to a method for identifying one or more HS consensus sequences comprising the steps of: (a) providing a plurality of HS core sequences; (b) using a search algorithm to search for a plurality of motifs that are shared by the HS core sequences; and(c) returning one or more HS consensus sequences comprising a plurality of motifs identified in step (b).
  • the method according to the first aspect may be implemented in a variety of ways.
  • the principle is that the availability of a plurality of HS core sequences, which may have been identified conventionally using known experimental tools (or even previous bioinformatic tools), can be used to generate large numbers of HS core sequences (the numbers of which will increase in the future) allowing the definition and extraction of sequence-based rules that can be used to identify other sites in genomes that also fulfil these rules and are therefore candidate HSs.
  • the search algorithm includes a statistical model such as a Gibbs-statistical model, a Markov-statistical model, a Gaussian-statistical model, a Poisson-statistical model and a Monte Carlo-statistical model.
  • a statistical model such as a Gibbs-statistical model, a Markov-statistical model, a Gaussian-statistical model, a Poisson-statistical model and a Monte Carlo-statistical model.
  • the search algorithm comprises a word counting method or a probabilistic method.
  • the HS consensus sequences are returned as a regular expression or a sequence logo. More preferably, the HS consensus sequences are returned as a weight matrix. Most preferably, the weight matrix is a position specific scoring matrix (PSSM).
  • PSSM position specific scoring matrix
  • returning the HS consensus sequences as a PSSM comprises the step of computing a score for finding a matching sequence in the plurality of HS consensus sequences.
  • the plurality of HS core sequences are identified using Global Analysis of Chromatin Topology or Hypergenomic Display.
  • the present invention relates to a method for identifying one or more HS sequences comprising the steps of: (a) identifying a plurality of HS core sequences; (b) using a search algorithm to search for a plurality of motifs that are shared by the plurality of HS core sequences; (c) returning one or more HS consensus sequences comprising a plurality of the motifs identified in step (b); and (d) searching for one or more HS sequences comprising one or more HS consensus sequences.
  • step (c) returns the HS consensus sequences as a PSSM comprising the steps of: (i) providing a plurality of HS consensus sequences; (ii) computing the score for finding a matching sequence in the plurality of HS consensus sequences; and (iii) identifying HS sequences in one or more DNA sequences that were not part of the plurality of HS core sequences using the PSSMs identified.
  • the search algorithm is used in a word counting method or a probabilistic method.
  • the search algorithm includes a statistical model such as a Gibbs-statistical model, a Markov-statistical model, a Gaussian-statistical model, a Poisson-statistical model and a Monte Carlo-statistical model.
  • the HS consensus sequences are returned as a regular expression or a sequence logo. More preferably, the HS consensus sequences are returned as a weight matrix. Most preferably, the weight matrix is a position specific scoring matrix (PSSM).
  • PSSM position specific scoring matrix
  • returning the HS consensus sequences as a PSSM comprises the step of computing a score for finding a matching sequence in the plurality of HS consensus sequences.
  • the plurality of HS core sequences are identified using Global Analysis of Chromatin Topology or Hypergenomic Display.
  • the DNA sequences are from a database of DNA sequences.
  • one or more HS sequences comprising the HS consensus sequences are searched by searching for clusters of cw-elements.
  • the most probable arrangement of czs-elements in the cluster are integrated using the Niterbi algorithm.
  • a forward-backward algorithm to consider the sum of all paths through a hidden Markov model is used.
  • the plurality of HS core sequences are identified using Global Analysis of Chromatin Topology or Hypergenomic Display.
  • the present invention relates to a method for identifying an HS core sequence comprising the steps of: (a) providing a D ⁇ A sequence in the sense or antisense orientation that is not part of the plurality of HS core sequences; (b) providing an HS sequence; and (c) searching the D ⁇ A sequence for the presence a hypersensitive restriction site.
  • the HS sequence is between about 50 nucleotides to about 200 nucleotides in length.
  • the DNA sequence is 1 kb in length.
  • the method for identifying an HS core sequence comprises the additional step of using the identified HS consensus sequences or HS sequences to prepare a nucleic acid construct.
  • the methods according to the present invention comprise the additional step of using the identified HS consensus sequences or HS sequences in an assay (or assay development program) and/or a pharmaceutical (or in the preparation of or development of a pharmaceutical).
  • the present invention relates to a method of treating a disease associated with chromatin structure in a subject, the method comprising administering to the subject an effective amount of a chromatin modulating (e.g. modifying) agent capable of modulating (e.g. modifying) the chromatin structure to a non-diseased form.
  • a chromatin modulating (e.g. modifying) agent capable of modulating (e.g. modifying) the chromatin structure to a non-diseased form.
  • the present invention relates to a pharmaceutical composition
  • a pharmaceutical composition comprising a chromatin modulating agent and a pharmaceutically acceptable carrier, diluent, excipient or adjuvant or any combination thereof.
  • the present invention relates to a method of preventing and/or treating a disorder comprising administering a chromatin modulating agent wherein said chromatin modulating agent is capable of modulating an HS to cause a beneficial preventative and/or therapeutic effect.
  • the present invention relates to the use of a chromatin modulating agent in the preparation of a pharmaceutical composition for the treatment of an HS related disorder.
  • the present invention relates to one or more HS consensus sequences identifiable, preferably identified using the methods of the present invention or a variant, derivative, or homologue thereof.
  • the present invention relates to an HS sequence identifiable, preferably identified using the methods of the present invention or a variant, derivative, or homologue thereof.
  • the present invention relates to weight matrices identifiable, preferably identified using the methods of the present invention. More preferably, the weight matrices are PSSMs.
  • the present invention relates to a recording medium bearing machine readable instructions for implementing the first to the third aspects of the invention.
  • the present invention relates to a computer system loaded with machine readable instructions for implementing the first to the third aspects of the invention
  • Figure 1 is a diagrammatic representation of a HS core sequence comprising 100 nucleotides of genomic sequence immediately adjacent to a hypersensitive Mbo I target site (204 bp in total).
  • Figure 2 is a diagrammatic representation of the identification of a HS sequence in Human ⁇ - globin constitutive HS5 (Genbank Accession No. AF064190). A single strong signal as indicated by a distinct peak of predicted HS potential centered around position 6,200 in the nucleotide sequence, is detected which coincides precisely with the experimentally mapped constitutive HS5 (Dhar et al., 1990).
  • Figure 3 is a diagrammatic representation of the identification of HS sequences in Mouse mammary tumour virus 3' long terminal repeat (MMTV-3' LTR; Genbank Accession No. MMTPRO). Both experimentally mapped HSs, including a constitutive and a glucocorticoid- inducible site, are reliably detected as indicated by distinct peaks of predicted HS potential centered around positions 200 and around 1300 in the nucleotide sequence (Zaret and Yamamoto, 1984).
  • Figure 4 is a diagrammatic representation of the identification of HS sequences in Human vascular endothelial growth factor A promoter (Genbank Accession No. AF005785). A strong signal is detected at position 2600. Experimental data shows the presence of two HSs in this area (Liu et al, 2001). The presence of a single broad peak suggests that in some cases the clustering algorithm of CISTER causes artifactual merging of motif clusters from adjacent HSs. Also, two other experimentally mapped sites are only weakly detected.
  • Figure 5 is a diagrammatic representation of the identification of HS sequences in Human erythropoietin (embedded in 13 kb of human genome sequence). Strong signals are detected from an experimentally mapped regulated 5' located HS and from two HSs located at the 3' end of the gene (Zhang et al, 2000). Some merging of the predicted HS signals from the two separate 3' HSs is observed in the computer prediction due to the CISTER algorithm. The program also shows a HS signal within the transcribed region, which is compatible with the experimentally observed emergence of hypersensitivity of the gene at the onset of active expression.
  • Figure 6 is a diagrammatic representation of the identification of HS sequences in human c- Myc (embedded in 55 kb of human genome sequence). This is the largest region yet analysed. HSs surrounding the 5' and 3' end of the c-Myc gene (Mautner et al, 1995) are reliably detected. There are some additional strong signals in the surrounding regions for which no experimental data is currently available. This indicates the high signal/noise ratio achievable with the current set-up.
  • Figure 7 is a diagrammatic representation of the result of a computer-based experiment.
  • the DNA sequence tested consists of a continuous string of the 59 HS core sequences (shown in blue or light grey and dark grey), preceded by the same sequence randomised (shown in red or medium grey). This procedure therefore creates a test sequence (TestSeq) which contains two halves: the first half is random (and thus should lack HS-specif ⁇ c motifs), whereas the other half is 'packed' with all the HS sequences.
  • PSSMs were compiled from two non-overlapping subsets of the HS core sequences: motifs were separately derived from 'collection A' (shown in light blue or light grey) and 'collection B' (shown in dark blue or dark grey).
  • FIG. 8 schematically illustrates a general purpose computer (132) of the type that may be used to perform the methods in accordance with the present invention.
  • the computer (132) includes a central processing unit (134), a read only memory (136), a random access memory
  • a hard disk drive 140
  • a display driver 142
  • display 144
  • a user input/output circuit 146
  • keyboard 148
  • mouse 150
  • the central processing unit (134) may execute program instructions stored within the ROM (136), the RAM (138) or the hard disk drive (140) to carry out processing of signal values that may be stored within the RAM (138) or the hard disk drive (140).
  • the program may be written in a wide variety of different programming languages.
  • the computer program itself may be stored and distributed on a recording medium, such as a compact disc, or may be downloaded over a network link (not illustrated).
  • the general purpose computer (132) when operating under control of an appropriate computer program effectively forms an apparatus for performing aspects of the present invention - such as identifying one or more HS consensus sequences, HS sequences and HS core sequences.
  • Figure 9 is a Table listing SEQ ID No. 3 to 55.
  • Figure 10 is a Table listing HS consensus sequences.
  • Figure 11 is a Table listing HS consensus sequences, which are shown in bold. Instead of a • more precise PSSM they are written in code to indicate redundancies in certain positions.
  • Figure 12 is a Table listing YEBIS PSSMs.
  • Figure 13 is a Table listing YEBIS-MATRIX PSSMs. HYPERSENSITIVE SITES
  • HSs Nuclease Hypersensitive Sites
  • HSs are genomic sites that are highly susceptible to nuclease attack under experimental conditions - typically by approximately two orders of magnitude as compared to bulk chromatin (see Stalder et al, 1980; Wu, 1980). All available data suggests that HSs are mostly free of nucleosomes, but contain a number of transcription factor complexes that are bound to specific sequence motifs present in the genomic DNA.
  • HSs can be viewed as the gateways to the genome for the vast majority of molecules involved in regulating gene expression and many other important genomic functions, such as DNA replication, repair, recombination and insertion of retroviral genomes (reviewed in Gross and
  • Garrard, 1988 They expose or hide gene regulatory signals and therefore constitute one of the most important epigenetic regulatory layers that are superimposed on the genome to control and direct its expression (Bonifer 2000).
  • HSs can be present in a number of forms - such as constitutive HSs, developmentally regulated HSs, tissue-specific HSs and cell type-specific HSs.
  • Such constitutive HSs could, in addition to regulating the expression of adjacent genes, serve as border elements to define functional chromatin domains, or could facilitate the precise folding patterns of individual chromatin fibres (Filipsky et al, 1990).
  • the continuous reconfiguration of chromatin architecture is an essential prerequisite for directing the changing gene expressions patterns during embryonic development and cell type-specific differentiation.
  • Many HSs - such as developmentally regulated HSs - are created near defined subsets of genes in a tissue- and stage-specific manner (see e.g. Gross and Garrard, 1988) due to the local activity of transcription factors and chromatin remodeling machineries (reviewed in Wolffe and Hayes, 1999).
  • HSs near genes are one of several steps in the pathway that prepares a regulatory sequence to become functionally active in chromatin.
  • One of the best-understood model systems is the chicken lysozyme gene, where the HS configuration on its promoter has been shown to be highly dynamic.
  • Several distinct HSs appear and disappear over different promoter elements as cells progress through haemopoetic development (Huber et al., 1995; see Kontaraki et al., 2000). In many cases, a direct correlation between the appearances and disappearances of HSs with known biological functions has been shown.
  • HS consensus sequence refers to a plurality of motifs ie. nucleotides that are common, although not necessarily identical, to other nucleotides in an HS sequence.
  • an HS consensus sequence is an idealised sequence that represents the most likely motif to occur at each position within an HS sequence.
  • a plurality of motifs refers to at least about 2 to 200 or more motifs; more preferably, at least about 2 to 100 or more motifs; more preferably, at least about 2 to 50 or more motifs; more preferably, at least about 2 to 20 or more motifs; more preferably, at least about 6, 7, 8, 9, 10, 11, 12, 13, 14 or 15 motifs; most preferably, 7, 8, 12, 13 or 15 motifs; or any suitable combination of start or end points, for example, at least about 6 to 50 or more motifs.
  • the present invention demonstrates that it is possible to identify and extract HS consensus sequences comprising motifs that are shared by different HS sequences. There may be different functional types of HSs that may contain different sets of shared HS consensus sequences. It is possible to identify HS consensus sequences in other DNA sequence as a bioinformatic tool for the in silico prediction of HSs in these sequences in the absence of experimentally derived information.
  • HS core sequence refers to motifs ie. nucleotides that are typically within about 100 to 200 base pairs of a hypersensitive target site. Motifs may be fragmented at hypersensitive target sites by various entities including chemical or physical agent such as bleomycin, bromoacetaldehyde, chloracetaldehyde, cobalt chiral complex, copper phenanthroline, diethyl pyrocarbonate, dimethyl sulfate, iron(II)-EDTA, methidiumpropyl- EDTA, neocarzinostatin, psoralen and ultraviolet light.
  • chemical or physical agent such as bleomycin, bromoacetaldehyde, chloracetaldehyde, cobalt chiral complex, copper phenanthroline, diethyl pyrocarbonate, dimethyl sulfate, iron(II)-EDTA, methidiumpropyl- EDTA, neocarzinostatin, psoralen
  • the entity may be an enzyme such as a sequence specific nuclease, a non-sequence specific nuclease, Bal-31, DNase I, DNase II, an endogenous nuclease, exonuclease III, lambda exonuclease, micrococcal nuclease, mung bean nuclease, Neurospora crassa nuclease, a restriction enzyme including type I, II and III restriction enzymes, SI nuclease or a topoisomerases such as topoisomerase I or II.
  • the entity is an enzyme. More preferably, the entity is a restriction enzyme.
  • the restriction enzyme recognises at least a 4 base pair (bp) target sequence.
  • the restriction enzyme is selected from the group consisting of DpnII, Mbol ( Figure 1), Nlalll, SauIIIA and Tsp509I.
  • the methods of the present invention may involve the use of one or more such entities.
  • two different entities may be used - such as a restriction enzyme that recognises a 4 bp target sequence and a restriction enzyme that recognises a 6 bp target sequence.
  • a "plurality of HS core sequences” refers to at least about 3 HS core sequences and preferably is selected from the group comprising:about at least 10, 11, 12, 13, 14,15, 16, 17, 18, 19 or 20 HS core sequences; about at least 21-100 HS core sequences; about at least 101- 1000 HS core sequences; about at least 1001 to 5000 HS core sequences; about at least 5001 to 10000 HS core sequences; about at least 10001 to 50000 HS core sequences; and about at least 50001 to 100,000 HS core sequences; or any suitable combination of start or end points, for example, at least about 15 to about 100,000 HS core sequences.
  • the HS core sequence may comprise 200 or more nucleotides.
  • the HS core sequence may comprise 199 or less nucleotides. Whilst the reduction in the size definition of the HS core sequence may not have any effect on the validity of the approach described here it may reduce the amount of 'background noise' in the motif-extraction step.
  • the size of the HS core sequence may be optimised depending on the size of the HS data set and the amount of background noise that is detected.
  • the HS core sequence may even comprise 150 or less nucleotides, 100 or less nucleotides, 50 or less nucleotides or even 25 or less nucleotides.
  • the present invention relates to a method for identifying one or more HS consensus sequences comprising the steps of: (a) providing a plurality of HS core sequences; (b) using a search algorithm to search for a plurality of motifs that are shared by the HS core sequences or subsets thereof; and (c) returning one or more HS consensus sequences comprising a plurality of motifs identified in step (b).
  • Hypertag Display The principle of Hypertag Display is as follows; DNA present in HSs is selectively cut with the restriction enzyme Mbo I, recognizing the 4 bp target sequence 5'GATC ⁇ . Ligation of a compatible BamH I adapter molecule to the cleaved ends results in the selective tagging of each cleaved Mbo I site with a fragment of predetermined and known sequence. The tagged fragment is subsequently amplified by PCR using an oligonucleotide complementary to the adapter molecule and a second oligonucleotide (the 'Hypertag' primer) that is complementary to a sequence located next to a previously mapped HS. The required local sequence information can be derived from data obtained through the HS library approach.
  • This step covalently joins the DNA sequences adjacent to the Mbol cleavage site to the plasmid DNA, but does not yet result in the formation of a functional recombinant DNA molecule; the other end of the genomic fragment will usually be tens of kilobases away, may also be randomly sheared during the genomic DNA extraction step and will thus not be suitable for specifically joining the other BamHI site in the linearised plasmid.
  • the ligation mixture is cut to completion with EcoRI. This enzyme cuts at a defined site within the polylinker of the plasmid vector and also cuts every target site present in the genomic DNA.
  • this step creates the condition for specifically joining the other end of the construct through intramolecular ligation. Due to the random orientation of the plasmid vector relative to the Mbol fragment during the first ligation approximately 50% of the clones are lost at this stage, but the other 50% of ligation products will contain specifically cloned MboI-EcoRI genomic fragments. Transfection of the ligation products results in the creation of a library of genomic DNA fragments derived from a large variety of HSs. Determination of the insert sequence adjacent to the BamHI-MboI junction of each clone establishes, after a search against human genome databases, the precise genomic location of the Mbol site and thus allows the positioning of a specific HS surrounding it.
  • Various search algorithms may be used to search a plurality of motifs that are shared by HS core sequences.
  • Several methods to search for over-represented motifs in the upstream region of a set of coregulated genes have been developed and tested as described by Ohler & Niemann (2001) Trends Genet. 17, 56-60. These methods can be divided in to two different types: (1) methods based on word counting and (2) methods based on probabilistic sequence models.
  • Word counting methods are based on the frequency analysis of oligonucleotides in sequences and overrepresentation is measured by comparing the counted number of occurrences of a word to the expected number of occurrences. A common motif is then compiled by grouping similar words. Word counting methods have been described by Jensen & Knudsen (2001) Bioinformatics 16, 326-333 and Van Helden et al. (1998) J. Mol. Biol. 281, 827-842.
  • the motif model is represented as a position probability matrix and the motif is assumed to be hidden in a noisy background sequence.
  • maximum likelihood estimation is used.
  • the most frequent methods used for this are Expectation Maximisation (a maximum likelihood algorithm for estimating the parameters of a probabilistic model) and Gibbs Sampling (a stochastic equivalent of Expectation Maximisation).
  • Expectation Maximisation a maximum likelihood algorithm for estimating the parameters of a probabilistic model
  • Gibbs Sampling a stochastic equivalent of Expectation Maximisation
  • the search algorithm may include a statistical model such as a Gibbs-statistical model, a Markov-statistical model, a Gaussian-statistical model, a Poisson-statistical model or a Monte Carlo-statistical model.
  • a statistical model such as a Gibbs-statistical model, a Markov-statistical model, a Gaussian-statistical model, a Poisson-statistical model or a Monte Carlo-statistical model.
  • HS consensus sequences may be identified using YEBIS or MOTIFSAMPLER.
  • YEBIS (Yada et al, 1998) is available at www-scc.jst.go.jp/YEBIS/MotifExtraction. This program is capable of extracting a set of sequence motifs without any a priori knowledge from a number of related DNA sequences.
  • YEBIS uses an algorithm based upon a Markov statistical model and may be applied to a large number of unaligned sequences.
  • MOTIFSAMPLER Thijs et al, 2001
  • vvww.esat.k euven.ac.be/ ⁇ thijsAV ' ork/MotifSampler.ht ⁇ nl This software package tries to find over-represented motifs in the upstream region of a set of co-regulated genes.
  • This motif finding algorithm uses Gibbs sampling to find the position probability matrix that represents the motif. Higher-order background models are used to improve the robustness of the motif finding.
  • the Motif Sampler comes with background models for several organisms but is also suitable for other organisms since the background model can also be calculated from the input sequences. This programme differs from YEBIS because the length of the motifs and number of detected motifs can be entered as part of the search criteria.
  • MotifSampler requires four search parameters, including motif length and copy number.
  • the HS core sequences were analysed by specifying the expected lengths of motifs as 8, 12 and 15 in three independent runs. Consensus sequences shared by different members of the HS core sequences were successful identified. For some aspects of the present invention, MOTIFSAMPLER motifs of length 12 are preferred.
  • both YEBIS and MotifSampler are applications for searching for motifs, such as those that may be characteristic of HS sequences.
  • the motifs are extracted and can be sorted into groups.
  • An algorithm including a statistical model is applied and a matrix is returned that is derived by scoring the motifs at each position; this matrix can be used to define the variability between motifs.
  • the plurality of HS core sequences may be aligned prior to searching such that correspondences are assigned to preserve the order of the residues within the HS core sequences by identifying a start point, and if necessary introducing gaps.
  • step (c) Returning a HS consensus sequences comprising a plurality of motifs identified in step (b .
  • the HS consensus sequences are returned as a regular expression, or a sequence logo ie. a graphic method of illustrating consensus information comprising coloured letters of different sizes, where the letters indicate different proportions of motifs..
  • the HS consensus sequences are returned as a weight matrix.
  • Background teachings on weight matrices have been presented by Freeh et al. (1997). The following information concerning weight matrices has been extracted from that source:
  • a weight matrix uses the complete composition of nucleotides for each position of an alignment to achieve a more differentiated rating of a matching sequence. For example, a single position of an alignment of 12 sequences containing TTTTTTTAAACC (each letter representing one sequence at this position) would be assigned T in the IUPAC consensi. A new sequence with a T at this position would be considered a match while an A at the same place would cause the whole sequence to be dismissed as no match. Even a simple nucleotide distribution matrix would assign a weight score (in this case proportional to the percentage of the nucleotide) of 0.58 to the T and still 0.25 to an A. Thus, weight matrices represent the similarity of the tested sequence to all of the sequences in the alignment much better than
  • IUPAC consensi Most weight matrix-based methods add some more weighting by comparison of the actual nucleotide distribution with random values or by other statistical measures eg. information content.
  • Comput. Appl. Biosci. 11, 563-566 uses a matrix library containing more than 200 matrices.
  • Matlnspector (Ghosh (1993) Nucleic Acid Res. 21, 3117-3118) allows testing of individually selected matrices. Conslnspector (Kondrakhin et al. (1995) Comput. Appl. Biosci. 11, 477-
  • Weight matrices are advantageously used in the present invention because they are much less sensitive to sequence selection and provide a quantitative score. Even a single mismatch at a critical position will reduce the score of the match.
  • the weight matrix is a position specific scoring matrix (PSSM).
  • PSSMs as described by Freeh et al. (1997) use the complete composition of nucleotides for each position of the alignment to achieve a more differentiated rating of a matching sequence.
  • the HS consensus sequences are returned as a PSSM comprising the steps of: (a) computing a score for finding a matching sequence in the plurality of HS consensus sequences; and (b) returning the variability in the plurality of HS consensus sequences as a PSSM.
  • HS consensus sequences may be returned as a PSSM using various methods known in the art such as E-matrix maker (Thomas et al. (1999) Journal of Computational Biology 6: 219-235, 1999; Thomas et al. Bioinformatics 16: 233-244, 2000) which is available at http://motif.stanford.edu/ematrix-maker
  • the present invention relates to a method for identifying an HS sequence comprising the steps of: (a) identifying a plurality of HS core sequences; (b) using a search algorithm to search for a plurality of motifs that are shared by the plurality of HS core sequences; (c) returning an HS consensus sequences comprising a plurality of the motifs identified in step (b); and (d) searching for an HS sequence comprising one or more HS consensus sequences.
  • HS sequences comprising HS consensus sequences may be searched using various methods known in the art.
  • DNA sequences that are not part of the plurality of HS core sequence are searched for the presence of HS consensus sequences by searching for clusters of cw-elements.
  • the most probable arrangement of czs-elements in the cluster are integrated using the Viterbi algorithm.
  • Cluster Of Motifs E-value Tool fhttp://zlab.bu.edu/ ⁇ mfrith/comet/form.html
  • COMET assigns a positive score to each motif using the standard method of log likelihood ratios, and subtracts a 'gap penalty' linearly proportional to the distances between motifs.
  • each motif cluster receives a score, which is higher if the individual motifs are stronger, but lower if they are further apart.
  • the scoring scheme corresponds to a log likelihood ratio of explaining the data given a cluster model versus a background model.
  • the cluster model is for czs-elements to occur in a uniform distribution, with some intensity, whereas the background model consists of random nucleotides.
  • the gap penalty corresponds in a one-to-one fashion with the intensity parameter of the cluster model.
  • a forward-backward algorithm to consider the sum of all paths through a hidden Markov model is used.
  • CISTER Frrith et al, 2001 detects cis- element clusters by using a statistical model (a hidden Markov model) of what it expects these clusters to look like.
  • the parameters allow the user to vary some aspects of the model, and it is quite possible that different model parameters are suitable for different types of motif cluster.
  • Parameters include (i) the distance between neighbouring cis-elements within a cluster is assumed to be geometrically distributed with mean a; (ii) The number of cis- elements in a cluster is assumed to be geometrically distributed with mean b; and (iii) the distance between regulatory cis-element clusters is assumed to be geometrically distributed with mean g.
  • the background states are programmed to represent the local abundances of the 4 bases in the query sequence. Examining local abundances accounts for the biological reality of heterogeneous base composition, and prevents, for example, many spurious GC-rich motifs being detected in a part of the sequence that happens to be generally GC-rich. Cister uses the technique of posterior decoding, with this hidden Markov model.
  • a reformatting program may be used to convert the PSSMs into a format that a program subsequently used to process the PSSMs - such as CISTER - can understand.
  • various search algorithms may be used to search a plurality of motifs that are shared by HS core sequences.
  • Some programs - such as YEBIS - provide a fractional value for each of the four nucleotides for each position of the PSSM.
  • the reformatting program typically extracts data that show the aligned motifs by providing the actual numbers of occurrences of each nucleotide in each position of the PSSM. The results that are returned are actual numbers rather than fractions.
  • the methods of the present invention may be used to define high-resolution stochastic models to optimise the recognition rate of HSs. Accordingly, it may be possible to determine whether certain consensus sequences occur in a particular combination, whether the spacing between certain motifs is important, the relative frequency of each motif and whether some of the consensus sequences are more diagnostic than others.
  • step (c) returns the HS consensus sequences as a PSSM comprising the steps of: (i) providing a plurality of HS consensus sequences; (ii) computing the log-odds score for finding a matching sequence in the plurality of HS consensus sequences; (iii) returning the variability in the plurality of HS consensus sequences as a PSSM; and (iv) identifying HS sequences in one or more DNA sequences that were not part of the plurality of HS core sequences using the PSSMs identified.
  • HS consensus sequences may be returned as a PSSM using various methods known in the art such as E-matrix maker (Thomas et al. (1999) Journal of Computational Biology 6: 219-235, 1999; Thomas et al. Bioinformatics 16: 233-244, 2000) which is available at http://motif.stanford.edu/ematrix-maker.
  • the present invention proves a method for identifying an HS core sequence comprising the steps of: (a) providing a DNA sequence in the sense or antisense orientation that is not part of the plurality of HS core sequences; (b) providing an HS sequence; and (c) searching the DNA sequence for the presence a hypersensitive restriction site.
  • the DNA sequence may be from a database of DNA sequences.
  • the DNA sequence is about 10 kb in length.
  • the sequence may be converted to capitals and the ">" characters and whitespace may be removed. If the alignment in the sense orientation fails then the DNA sequence may be converted to an antisense orientation using routine methods known to in the art.
  • An HS sequence is provided which may be converted to capitals and the ">" characters and whitespace may be removed.
  • the HS sequence is between about 50 nucleotides to about 200 nucleotides in length, for example the HS sequence may be about 50 nucleotides in length.
  • the DNA sequence is then searched for the presence a hypersensitive site - such as hypersensitive enzyme site - in the sense or antisense orientation.
  • the enzyme may be a sequence specific nuclease or a restriction enzyme including type I, II and III restriction enzymes.
  • the enzyme is a restriction enzyme - such as a restriction enzyme that recognises at least a 4 base pair (bp) target sequence. More preferably, the restriction enzyme is selected from the group consisting of DpnII, Mbol, Nlalll, Sau3A and Tsp509I.
  • Examples include the inhibition of transcription of mutated proto-oncogenes - such as erbB-2 and bcr- abl (cancer); activation of fetal haemoglobin (sickle cell anemia); growth hormone (dwarfism); and erythropoetin and vascular endothelial growth factor (cancer therapy, diabetes); and the regulation of human telomerase reverse transcriptase to control aging and cancer proliferation. Therefore, the ability to predict the locations of HSs by bioinformatic means as described herein, has numerous implications for biotechnological and medical applications. Much of the current experimental work in biotechnology relates to the identification of various human gene regulatory sequences.
  • enhancers contain clusters of transcription factor binding sites, and can stimulate the activity of adjacent genes substantially. Enhancers also play an important role in directing tissue-specific gene expression programmes. Other gene regulatory regions, such as silencers, are involved in switching off the expression of nearby genes. Research carried out over the last two decades has established a strong link between the locations of enhancers and silencers with HSs (reviewed in Gross and Garrard, 1988; Bonifer, 2000). The ability to detect HSs using a bioinformatic approach in various eukaryotic genome sequences (especially the human genome) may have the potential to identify systematically numerous constitutive and tissue-specific enhancer and silencer sequences. These identified enhancer and silencer sequences may be suitable for numerous applications in gene therapy.
  • Constitutive enhancers that are active in a wide spectrum of cell types can be used to promote the expression of a target gene in any cell type.
  • Tissue-specific enhancers provide an enhanced level of specificity and can be used to promote the expression of target genes in a particular tissue, or in a restricted range of cell types.
  • silencers can be used to switch of the expression of unwanted genes (e.g. oncogenes) or to silence the expression of parasitic genomes (e.g. during viral infections). Examples to illustrate this approach can be found in Smith et al. (2000) and Phylactides et al. (2002).
  • HSs cystic fibrosis transmembrane conductance regulator
  • the identified enhancers can be used to direct the tissue- and stage-specific expression of a synthetic CFTR gene in future gene therapeutic applications.
  • Harland et al. (2002) specifically set out to identify an enhancer that confers ubiquitous expression to adjacent genes. They successfully analysed the promoter of the universally expressed transcription factor TATA-binding protein (TBP) for the presence of a DNAase I hypersensitive site indicative of the location of such enhancers.
  • TATA-binding protein TBP
  • bioinformatic tools for the predictions of HSs may be applied with ease to large regions of sequenced genomes and to identify and select HS candidate sequences located near a multitude of different genes. These candidate sequences can then be experimentally verified, thus providing substantial savings in cost and time.
  • HSs identified by bioinformatic means near any gene known or suspected to cause genetic and other diseases may therefore be useful for the prediction of genomic regions that are important candidates for the development of diagnostic tests. Sequencing of such bioinformatically identified regions in genomes derived from normal individuals and patients will rapidly and efficiently identify such mutations and lead to possible therapeutic interventions (eg. by gene therapy).
  • HSs by bioinformatic means also has numerous implications for transcription factor-based therapeutic applications.
  • Ma et al. (2002) identified two transcription factors binding to DNA sequences present in a HS near the platelet-derived growth factor (PDGF)-A gene (implicated in tumorigenesis, metastasis and tumour progression). These transcription factors are repressors and are capable of diminishing the transcription of the PDFG-A gene.
  • the identified transcription factors may play an important role in dampening the expression of this oncogenic growth factor.
  • the bioinformatic identification of HSs (especially those located near disease-causing genes) may be the starting point of large scale screens to identify transcription factors capable of interacting with them.
  • the bioinformatically identified HS core sequences can be chemically synthesised as single- and double-stranded oligonucleotides and used to isolate transcription factors binding to them (e.g. using the cDNA expression cDNA library screening approach used by Ma et al. (2002)).
  • the oligonucleotides derived from predicted HSs may be used to prepare DNA affinity columns (Kadonaga and Tjian, 1986).
  • the identified transcription factors may then be used as targets for the isolation and development of drugs capable of modulating their functional characteristics in a therapeutically useful manner.
  • HSs are an important regulatory access point for external agents to act upon the genome. It will be appreciated that the methods of the present invention have many applications in biotechnology and medicine, which may be carried out faster, and more economically using the methods described herein. The methods of the present invention are broadly applicable to all eukaryotic genomes.
  • nucleotide sequence is synonymous with the term “polynucleotide”.
  • aspects of the present invention involve the use of nucleotide sequences, which are available in databases.
  • the nucleotide sequence may be DNA or RNA of genomic or synthetic or recombinant origin.
  • the nucleotide sequence may be double-stranded or single-stranded whether representing the sense or antisense strand or combinations thereof.
  • the nucleotide sequence may be prepared by use of recombinant DNA techniques (e.g. recombinant DNA).
  • the nucleotide sequence may be the same as the naturally occurring form, or may be derived therefrom.
  • amino acid sequence is synonymous with the term “polypeptide” and/or the term “protein”. In some instances, the term “amino acid sequence” is synonymous with the term “peptide”. In some instances, the term “amino acid sequence” is synonymous with the term “protein”.
  • aspects of the present invention concern the use of amino acid sequences, which may be available in databases.
  • amino acid sequence may be isolated from a suitable source, or it may be made synthetically or it may be prepared by use of recombinant DNA techniques.
  • the present invention encompasses the use of variants, homologues and derivatives of nucleotide sequences.
  • the term “homologue” means an entity having a certain homology with nucleotide sequences.
  • the term “homology” can be equated with "identity”.
  • An homologous sequence is taken to include a nucleotide sequence which may be at least 75, 85 or 90% identical, preferably at least 95 or 98% identical to the subject sequence.
  • Homology comparisons can be conducted by eye, or more usually, with the aid of readily available sequence comparison programs. These commercially available computer programs can calculate % homology between two or more sequences.
  • % homology may be calculated over contiguous sequences, i.e. one sequence is aligned with the other sequence and each nucleotide in one sequence is directly compared with the corresponding nucleotidein the other sequence, one residue at a time. This is called an "ungapped" alignment. Typically, such ungapped alignments are performed only over a relatively short number of residues.
  • BLAST and FASTA are available for offline and online searching (see Ausubel et al, 1999 ibid, pages 7-58 to 7-60). However, for some applications, it is preferred to use the GCG Bestfit program.
  • a new tool, called BLAST 2 Sequences is also available for comparing nucleotide sequences (see FEMS Microbiol Lett 1999 174(2): 247-50; FEMS Microbiol Lett 1999 177(1): 187-8)
  • a scaled similarity score matrix is generally used that assigns scores to each pairwise comparison based on chemical similarity or evolutionary distance.
  • An example of such a matrix commonly used is the BLOSUM62 matrix - the default matrix for the BLAST suite of programs.
  • GCG Wisconsin programs generally use either the public default values or a custom symbol comparison table if supplied (see user manual for further details). For some applications, it is preferred to use the public default values for the GCG package, or in the case of other software, the default matrix, such as BLOSUM62.
  • % homology preferably % sequence identity.
  • the software typically does this as part of the sequence comparison and generates a numerical result.
  • Nucleotide sequences may include within them synthetic or modified nucleotides.
  • a number of different types of modification to oligonucleotides are known in the art. These include methylphosphonate and phosphorothioate backbones and/or the addition of acridine or polylysine chains at the 3' and/or 5' ends of the molecule. Such modifications may be carried out to enhance the in vivo activity or life span of nucleotide sequences.
  • the present invention relates to a method comprising the additional step of using the identified HS consensus sequences to prepare one or more nucleic acid constructs.
  • the present invention also relates to a nucleic acid construct comprising one or more HS consensus sequences.
  • construct is synonymous with the term “vector” and includes expression vectors, transformation vectors and shuttle vectors.
  • expression vector means a construct capable of in vivo or in vitro expression.
  • transformation vector means a construct capable of being transferred from one entity to another entity - which may be of the species or may be of a different species. If the construct is capable of being transferred from one species to another - such as from an
  • Escherichia coli plasmid to a bacterium such as of the genus Bacillus
  • the transformation vector is sometimes called a "shuttle vector". It may even be a construct capable of being transferred from an E. coli plasmid to an Agrobacterium to a plant.
  • Vectors may be transformed into a suitable host cell as described below to provide for expression of a polypeptide encompassed in the present invention.
  • the vectors may be for example, plasmid, virus or phage vectors provided with an origin of replication, optionally a promoter for the expression of the said polynucleotide and optionally a regulator of the promoter.
  • Vectors may be used in vitro, for example for the production of RNA or used to transfect or transform a host cell.
  • polynucleotides for use in the present invention may be incorporated into a construct — such as a recombinant vector (typically a replicable vector), for example a cloning or expression vector.
  • the vector may be used to replicate the nucleic acid in a compatible host cell.
  • quantities of polynucleotides may be made by introducing a polynucleotide into a replicable vector, introducing the vector into a compatible host cell, and growing the host cell under conditions, which bring about replication of the vector.
  • the vector may be recovered from the host cell. Suitable host cells are described below in connection with expression vectors.
  • Genetically engineered host cells may be used to express an amino acid sequence (or variant, homologue, fragment or derivative thereof) in screening methods for the identification of agents and antagonists. Such genetically engineered host cells could be used to screen peptide libraries or organic molecules. Antagonists and agents such as antibodies, peptides or small organic molecules will provide the basis for pharmaceutical compositions.
  • the present invention relates to a method comprising the additional step of using the identified HS consensus sequences in an assay (or assay development program).
  • Any one or more appropriate targets - such as a nucleotide sequence of an HS consensus sequence - may be used for identifying a chromatin modulating (e.g. modifying) agent according to the present invention.
  • the target employed in such a test may be free in solution, affixed to a solid support, borne on a cell surface, or located intracellularly.
  • the abolition of target activity or the formation of binding complexes between the target and the chromatin modulating (e.g. modifying) agent being tested may be measured.
  • the methods of the present invention may be a screen, whereby a number of chromatin modulating (e.g. modifying) agents are tested.
  • Techniques for drug screening may be based on the method described in Geysen, European Patent Application 84/03564, published on September 13, 1984.
  • large numbers of different small peptide test compounds are synthesized on a solid substrate, such as plastic pins or some other surface.
  • the peptide test compounds are reacted with a suitable target or fragment thereof and washed. Bound entities are then detected - such as by appropriately adapting methods well known in the art.
  • a purified target may also be coated directly onto plates for use in a drug screening techniques.
  • non-neutralising antibodies may be used to capture the peptide and immobilise it on a solid support.
  • chromatin modulating agent may refer to a single entity or a combination of entities.
  • the chromatin modulating agent may be an organic compound or other chemical.
  • the chromatin modulating agent may be a compound, which is obtainable from or produced by any suitable source, whether natural or artificial.
  • the chromatin modulating agent may be an amino acid molecule, a polypeptide, or a chemical derivative thereof, or a combination thereof.
  • the chromatin modulating agent may even be a polynucleotide molecule - which may be a sense or an anti-sense molecule.
  • the chromatin modulating agent may even be an antibody.
  • the chromatin modulating agent may be designed or obtained from a library of compounds, which may comprise peptides, as well as other compounds, such as small organic molecules.
  • the chromatin modulating (e.g. modifying) agent may be a natural substance, a biological macromolecule, or an extract made from biological materials such as bacteria, fungi, or animal (particularly mammalian) cells or tissues, an organic or an inorganic molecule, a synthetic agent, a semi-synthetic agent, a structural or functional mimetic, a peptide, a peptidomimetics, a derivatised agent, a peptide cleaved from a whole protein, a peptide synthesised synthetically (such as, by way of example, either using a peptide synthesizer or by recombinant techniques) or combinations thereof, a recombinant agent, an antibody, a natural or a non-natural agent, a fusion protein or equivalent thereof and mutants, derivatives or combinations thereof.
  • the chromatin modulating (e.g. modifying) agent may be an organic compound.
  • the organic compounds may comprise two or more hydrocarbyl groups.
  • hydrocarbyl group means a group comprising at least C and H and may optionally comprise one or more other suitable substituents. Examples of such substituents may include halo-, alkoxy-, nitro-, an alkyl group, a cyclic group etc.
  • substituents may include halo-, alkoxy-, nitro-, an alkyl group, a cyclic group etc.
  • a combination of substituents may form a cyclic group. If the hydrocarbyl group comprises more than one C then those carbons need not necessarily be linked to each other. For example, at least two of the carbons may be linked via a suitable element or group.
  • the hydrocarbyl group may contain hetero atoms. Suitable hetero atoms will be apparent to those skilled in the art and include, for instance, sulphur, nitrogen and oxygen.
  • the chromatin modulating (e.g. modifying) agent may comprise at least one cyclic group.
  • the cyclic group may be a polycyclic group, such as a non-fused polycyclic group.
  • the chromatin modulating (e.g. modifying) agent may comprise at least one of said cyclic groups linked to another hydrocarbyl group.
  • the chromatin modulating (e.g. modifying) agent may contain halo groups.
  • halo means halogen compounds eg. halides and includes fluoro, chloro, bromo or iodo groups.
  • the chromatin modulating (e.g. modifying) agent may contain one or more of alkyl, alkoxy, alkenyl, alkylene and alkenylene groups - which may be unbranched- or branched-chain.
  • the chromatin modulating (e.g. modifying) agent may be in the form of a pharmaceutically acceptable salt - such as an acid addition salt or a base salt - or a solvate thereof, including a hydrate thereof.
  • a pharmaceutically acceptable salt - such as an acid addition salt or a base salt - or a solvate thereof, including a hydrate thereof.
  • the chromatin modulating (e.g. modifying) agent of the present invention may be capable of displaying other therapeutic properties.
  • the chromatin modulating (e.g. modifying) agent may be used in combination with one or more other pharmaceutically active agents.
  • combinations of active agents are administered, then they may be administered simultaneously, separately or sequentially.
  • the present invention relates to a method comprising the additional step of using the identified HS consensus sequence(s) in a pharmaceutical (or in the preparation of or development of a pharmaceutical).
  • compositions useful in the present invention may comprise a therapeutically effective amount of chromatin modulating (e.g. modifying) agent(s) and pharmaceutically acceptable carrier, diluent or excipient (including combinations thereof).
  • chromatin modulating agent(s) e.g. modifying
  • pharmaceutically acceptable carrier e.g. diluent or excipient
  • compositions may be for human or animal usage in human and veterinary medicine and will typically comprise any one or more of a pharmaceutically acceptable diluent, carrier, or excipient.
  • Acceptable carriers or diluents for therapeutic use are well known in the pharmaceutical art, and are described, for example, in Remington's Pharmaceutical Sciences, Mack Publishing Co. (A. R. Gennaro edit. 1985).
  • the choice of pharmaceutical carrier, excipient or diluent may be selected with regard to the intended route of administration and standard pharmaceutical practice.
  • Pharmaceutical compositions may comprise as - or in addition to - the carrier, excipient or diluent any suitable binder(s), lubricant(s), suspending agent(s), coating agent(s) or solubilising agent(s).
  • Preservatives, stabilizers, dyes and even flavoring agents may be provided in pharmaceutical compositions.
  • preservatives include sodium benzoate, sorbic acid and esters of p-hydroxybenzoic acid.
  • Antioxidants and suspending agents may be also used.
  • compositions useful in the present invention may be formulated to be administered using a mini-pump or by a mucosal route, for example, as a nasal spray or aerosol for inhalation or ingestable solution, or parenterally in which the composition is formulated by an injectable form, for delivery, by, for example, an intravenous, intramuscular or subcutaneous route.
  • the formulation may be designed to be administered by a number of routes.
  • Chromatin modulating (e.g. modifying) agents may also be used in combination with a cyclodextrin.
  • Cyclodextrins are known to form inclusion and non-inclusion complexes with drug molecules. Formation of a drug-cyclodextrin complex may modify the solubility, dissolution rate, bioavailability and/or stability property of a drug molecule. Drug- cyclodextrin complexes are generally useful for most dosage forms and administration routes.
  • the cyclodextrin may be used as an auxiliary additive, e.g. as a carrier, diluent or solubiliser.
  • Alpha-, beta- and gamma- cyclodextrins are most commonly used and suitable examples are described in WO-A- 91/11172, WO-A-94/02518 and WO-A-98/55148.
  • the chromatin modulating (e.g. modifying) agent is a protein
  • said protein may be prepared in situ in the subject being treated.
  • nucleotide sequences encoding said protein may be delivered by use of non- viral techniques (e.g. by use of liposomes) and/or viral techniques (e.g. by use of retroviral vectors) such that the said protein is expressed from said nucleotide sequence.
  • the chromatin modulating (e.g. modifying) agents may exist as stereoisomers and/or geometric isomers - e.g. they may possess one or more asymmetric and/or geometric centres and so may exist in two or more stereoisomeric and/or geometric forms.
  • the present invention contemplates the use of the entire individual stereoisomers and geometric isomers of those chromatin modulating (e.g. modifying) agents, and mixtures thereof.
  • the terms used in the claims encompass these forms, provided said forms retain the appropriate functional activity (though not necessarily to the same degree).
  • the chromatin modulating (e.g. modifying) agent may be administered in the form of a pharmaceutically acceptable salt.
  • Suitable acid addition salts are formed from acids which form non-toxic salts and include the hydrochloride, hydrobromide, hydroiodide, nitrate, sulphate, bisulphate, phosphate, hydrogenphosphate, acetate, trifluoroacetate, gluconate, lactate, salicylate, citrate, tartrate, ascorbate, succinate, maleate, fumarate, gluconate, formate, benzoate, methanesulphonate, ethanesulphonate, benzenesulphonate and p-toluenesulphonate salts.
  • suitable pharmaceutically acceptable base addition salts can be formed from bases which form non-toxic salts and include the aluminium, calcium, lithium, magnesium, potassium, sodium, zinc, and pharmaceutically- active amines such as diethanolamine, salts.
  • a pharmaceutically acceptable salt of a chromatin modulating (e.g. modifying) agent may be readily prepared by mixing together solutions of a chromatin modulating (e.g. modifying) agent and the desired acid or base, as appropriate. The salt may precipitate from solution and be collected by filtration or may be recovered by evaporation of the solvent.
  • a chromatin modulating (e.g. modifying) agent may exist in polymorphic. form.
  • a chromatin modulating (e.g. modifying) agent may contain one or more asymmetric carbon atoms and therefore exist in two or more stereoisomeric forms. Where a chromatin modulating (e.g. modifying) agent contains an alkenyl or alkenylene group, cis (E) and trans (Z) isomerism may also occur.
  • the present invention includes the individual stereoisomers of a chromatin modulating (e.g. modifying) agent and, where appropriate, the individual tautomeric forms thereof, together with mixtures thereof.
  • Separation of diastereoisomers or cis- and tr ⁇ y-isomers may be achieved by conventional techniques, e.g. by fractional crystallisation, chromatography or H.P.L.C. of a stereoisomeric mixture of an agent or a suitable salt or derivative thereof.
  • An individual enantiomer of a chromatin modulating (e.g. modifying) agent may also be prepared from a corresponding optically pure intermediate or by resolution, such as by H.P.L.C. of the corresponding racemate using a suitable chiral support or by fractional crystallisation of the diastereoisomeric salts formed by reaction of the corresponding racemate with a suitable optically active acid or base, as appropriate.
  • the present invention also encompasses all suitable isotopic variations of a chromatin modulating (e.g. modifying) agent or a pharmaceutically acceptable salt thereof.
  • An isotopic variation of a chromatin modulating (e.g. modifying) agent or a pharmaceutically acceptable salt thereof is defined as one in which at least one atom is replaced by an atom having the same atomic number but an atomic mass different from the atomic mass usually found in nature. Examples of isotopes that may be incorporated into a chromatin modulating (e.g.
  • modifying agent and pharmaceutically acceptable salts thereof include isotopes of hydrogen, carbon, nitrogen, oxygen, phosphorus, sulphur, fluorine and chlorine such as 2 H, 3 H, 13 C, 14 C, 15 N, 17 O, 18 O, 31 P, 32 P, 35 S, 18 F and 36 C1, respectively.
  • Certain isotopic variations of a chromatin modulating (e.g. modifying) agent and pharmaceutically acceptable salts thereof, for example, those in which a radioactive isotope such as 3 H or 14 C is incorporated are useful in drug and/or substrate tissue distribution studies. Tritiated, i.e., 3 H, and carbon- 14, i.e., 14 C, isotopes are particularly preferred for their ease of preparation and detectability.
  • isotopic variations of chromatin modulating (e.g. modifying) agents and pharmaceutically acceptable salts thereof can generally be prepared by conventional procedures using appropriate isotopic variations of suitable reagents.
  • a chromatin modulating (e.g. modifying) agent may be derived from a prodrug.
  • prodrugs include entities that have certain protected group(s) and which may not possess pharmacological activity as such, but may, in certain instances, be administered (such as orally or parenterally) and thereafter metabolised in the body to form an agent of the present invention which are pharmacologically active.
  • pro-moieties for example as described in "Design of Prodrugs” by H. Bundgaard, Elsevier, 1985 (the disclosured of which is hereby incorporated by reference), may be placed on appropriate functionalities of chromatin modulating (e.g. modifying) agents. Such prodrugs are also included within the scope of the invention.
  • the present invention also includes the use of zwitterionic forms of a chromatin modulating (e.g. modifying) agent of the present invention.
  • chromatin modulating e.g. modifying
  • the terms used in the claims encompass one or more of the forms just mentioned.
  • a chromatin modulating (e.g. modifying) agent may be administered as a pharmaceutically acceptable salt.
  • a pharmaceutically acceptable salt may be readily prepared by using a desired acid or base, as appropriate. The salt may precipitate from solution and be collected by filtration or may be recovered by evaporation of the solvent.
  • the chromatin modulating (e.g. modifying) agent may be prepared by chemical synthesis techniques. It will be apparent to those skilled in the art that sensitive functional groups may need to be protected and deprotected during synthesis of a chromatin modulating (e.g. modifying) agent. This may be achieved by conventional techniques, for example as described in "Protective Groups in Organic Synthesis” by T W Greene and P G M Wuts, John Wiley and Sons Inc. (1991), and by P.J.Kocienski, in “Protecting Groups", Georg Thieme Verlag (1994).
  • any stereocentres present could, under certain conditions, be racemised, for example if a base is used in a reaction with a substrate having an having an optical centre comprising a base-sensitive group. This is possible during e.g. a guanylation step. It should be possible to circumvent potential problems such as this by choice of reaction sequence, conditions, reagents, protection/deprotection regimes etc. as is well-known in the art.
  • the compounds and salts may be separated and purified by conventional methods.
  • Separation of diastereomers may be achieved by conventional techniques, e.g. by fractional crystallisation, chromatography or H.P.L.C. of a stereoisomeric mixture of a compound or a suitable salt or derivative thereof.
  • An individual enantiomer of a compound may also be prepared from a corresponding optically pure intermediate or by resolution, such as by H.P.L.C. of the corresponding racemate using a suitable chiral support or by fractional crystallisation of the diastereomeric salts formed by reaction of the corresponding racemate with a suitably optically active acid or base.
  • the chromatin modulating (e.g. modifying) agent or variants, homologues, derivatives, fragments or mimetics thereof may be produced using chemical methods to synthesise the chromatin modulating (e.g. modifying) agent in whole or in part.
  • the chromatin modulating (e.g. modifying) agent is a peptide
  • the peptide can be synthesized by solid phase techniques, cleaved from the resin, and purified by preparative high performance liquid chromatography (e.g., Creighton (1983) Proteins Structures And Molecular Principles, WH Freeman and Co, New York NY).
  • the composition of the synthetic peptides may be confirmed by amino acid analysis or sequencing (e.g., the Edman degradation procedure; Creighton, supra).
  • derivative or "derivatised” as used herein includes chemical modification of an chromatin modulating (e.g. modifying) agent. Illustrative of such chemical modifications would be replacement of hydrogen by a halo group, an alkyl group, an acyl group or an amino group.
  • the chromatin modulating (e.g. modifying) agent may be a chemically modified agent.
  • the chemical modification of a chromatin modulating (e.g. modifying) agent may either enhance or reduce hydrogen bonding interaction, charge interaction, hydrophobic interaction, Van Der Waals interaction or dipole interaction.
  • the chromatin modulating (e.g. modifying) agent may act as a model (for example, a template) for the development of other compounds.
  • the present invention provides a method of modulating (e.g. modifying) chromatin structure in a subject comprising administering to the subject an effective amount of one or more chromatin modulating (e.g. modifying) agents identified according to the methods of the present invention.
  • the chromatin modulating (e.g. modifying) agents of the present invention may be administered alone but will generally be administered as a pharmaceutical composition comprising one or more components - e.g. when the components are in admixture with a suitable pharmaceutical excipient, diluent or carrier selected with regard to the intended route of administration and standard pharmaceutical practice.
  • the components may be administered (e.g. orally) in the form of tablets, capsules, ovules, elixirs, solutions or suspensions, which may contain flavouring or colouring agents, for immediate-, delayed-, modified-, sustained-, pulsed- or controlled-release applications.
  • the tablet may contain excipients such as microcrystalline cellulose, lactose, sodium citrate, calcium carbonate, dibasic calcium phosphate and glycine, disintegrants such as starch (preferably corn, potato or tapioca starch), sodium starch glycollate, croscarmellose sodium and certain complex silicates, and granulation binders such as polyvinylpyrrolidone, hydroxypropylmethylcellulose (HPMC), hydroxypropylcellulose (HPC), sucrose, gelatin and acacia. Additionally, lubricating agents such as magnesium stearate, stearic acid, glyceryl behenate and talc may be included.
  • excipients such as microcrystalline cellulose, lactose, sodium citrate, calcium carbonate, dibasic calcium phosphate and glycine
  • disintegrants such as starch (preferably corn, potato or tapioca starch), sodium starch glycollate, croscarmellose sodium and certain complex silicates
  • Solid compositions of a similar type may also be employed as fillers in gelatin capsules.
  • Preferred excipients in this regard include lactose, starch, a cellulose, milk sugar or high molecular weight polyethylene glycols.
  • the chromatin modulating (e.g. modifying) agent may be combined with various sweetening or flavouring agents, colouring matter or dyes, with emulsifying and/or suspending agents and with diluents such as water, ethanol, propylene glycol and glycerin, and combinations thereof.
  • the routes for administration include, but are not limited to, one or more of: oral (e.g. as a tablet, capsule, or as an ingestable solution), topical, mucosal (e.g. as a nasal spray or aerosol for inhalation), nasal, parenteral (e.g. by an injectable form), gastrointestinal, intraspinal, intraperitoneal, intramuscular, intravenous, intrauterine, intraocular, intradermal, intracranial, intratracheal, intravaginal, intracerebroventricular, intracerebral, subcutaneous, ophthalmic (including intravitreal or intracameral), transdermal, rectal, buccal, vaginal, epidural, sublingual.
  • oral e.g. as a tablet, capsule, or as an ingestable solution
  • mucosal e.g. as a nasal spray or aerosol for inhalation
  • nasal parenteral (e.g. by an injectable form)
  • gastrointestinal intraspinal, intraperitoneal
  • the composition comprises more than one active component, then those components may be administered by different routes.
  • a component is administered parenterally, then examples of such administration include one or more of: intravenously, intra-arterially, intraperitoneally, intrathecally, intraventricularly, intraurethrally, intrasternally, intracranially, intramuscularly or subcutaneously ad ⁇ iinistering the component; and/or by using infusion techniques.
  • the component is best used in the form of a sterile aqueous solution which may contain other substances, for example, enough salts or glucose to make the solution isotonic with blood.
  • aqueous solutions should be suitably buffered (preferably to a pH of from 3 to 9), if necessary.
  • suitable parenteral formulations under sterile conditions is readily accomplished by standard pharmaceutical techniques well-known to those skilled in the art.
  • the component(s) useful in the present invention may be administered intranasally or by inhalation and is conveniently delivered in the form of a dry powder inhaler or an aerosol spray presentation from a pressurised container, pump, spray or nebuliser with the use of a suitable propellant, e.g. dichlorodifluoromethane, trichlorofluoromethane, dichlorotetrafluoroethane, a hydrofluoroalkane such as 1,1,1,2-tetrafluoroethane (HFA 134ATM) or 1,1,1,2,3,3,3-he ⁇ tafluoro ⁇ ropane (HFA 227EATM), carbon dioxide or other suitable gas.
  • a suitable propellant e.g. dichlorodifluoromethane, trichlorofluoromethane, dichlorotetrafluoroethane, a hydrofluoroalkane such as 1,1,1,2-tetrafluoroethane (HFA 134
  • the dosage unit may be determined by providing a valve to deliver a metered amount.
  • the pressurised container, pump, spray or nebuliser may contain a solution or suspension of the active compound, e.g. using a mixture of ethanol and the propellant as the solvent, which may additionally contain a lubricant, e.g. sorbitan trioleate.
  • a lubricant e.g. sorbitan trioleate.
  • Capsules and cartridges (made, for example, from gelatin) for use in an inhaler or insufflator may be formulated to contain a powder mix of the agent and a suitable powder base such as lactose or starch.
  • the component(s) may be administered in the form of a suppository or pessary, or it may be applied topically in the form of a gel, hydrogel, lotion, solution, cream, ointment or dusting powder.
  • the component(s) may also be dermally or transdermally administered, for example, by the use of a skin patch. They may also be administered by the pulmonary or rectal routes. They may also be administered by the ocular route.
  • the compounds may be formulated as micronised suspensions in isotonic, pH adjusted, sterile saline, or, preferably, as solutions in isotonic, pH adjusted, sterile saline, optionally in combination with a preservative such as a benzylalkonium chloride.
  • a preservative such as a benzylalkonium chloride.
  • they may be formulated in an ointment such as petrolatum.
  • the component(s) may be formulated as a suitable ointment containing the active compound suspended or dissolved in, for example, a mixture with one or more of the following: mineral oil, liquid petrolatum, white petrolatum, propylene glycol, polyoxyethylene polyoxypropylene compound, emulsifying wax and water.
  • it may be formulated as a suitable lotion or cream, suspended or dissolved in, for example, a mixture of one or more of the following: mineral oil, sorbitan monostearate, a polyethylene glycol, liquid paraffin, polysorbate 60, cetyl esters wax, cetearyl alcohol, 2- octyldodecanol, benzyl alcohol and water.
  • the term "administered” also includes delivery by viral or non-viral techniques.
  • Viral delivery mechanisms include but are not limited to adenoviral vectors, adeno-associated viral (AAV) vectos, he ⁇ es viral vectors, retroviral vectors, lentiviral vectors, and baculoviral vectors.
  • Non-viral delivery mechanisms include lipid mediated transfection, liposomes, immunoliposomes, lipofectin, cationic facial amphiphiles (CFAs) and combinations thereof.
  • a physician will determine the actual dosage which will be most suitable for an individual subject.
  • the specific dose level and frequency of dosage for any particular patient may be varied and will depend upon a variety of factors including the activity of the specific compound employed, the metabolic stability and length of action of that compound, the age, body weight, general health, sex, diet, mode and time of administration, rate of excretion, drug combination, the severity of the particular condition, and the individual undergoing therapy.
  • the component(s) may be formulated into a pharmaceutical composition, such as by mixing with one or more of a suitable carrier, diluent or excipient, by using techniques that are known in the art.
  • a suitable carrier diluent or excipient
  • the present invention employs, unless otherwise indicated, conventional techniques of chemistry, molecular biology, microbiology, recombinant DNA and immunology, which are within the capabilities of a person of ordinary skill in the art. Such techniques are explained in the literature. See, for example, J. Sambrook, E. F. Fritsch, and T. Maniatis, 1989, Molecular Cloning: A Laboratory Manual, Second Edition, Books 1-3, Cold Spring Harbor Laboratory Press; Ausubel, F. M. et al. (1995 and periodic supplements; Current Protocols in Molecular Biology, ch. 9, 13, and 16, John Wiley & Sons, New York, N.Y.); B. Roe, J. Crabtree, and A.
  • HeLa S3 cells obtained from the European Collection of Cell Cultures; ECACC Ref. No. 87110901 are grown to 80% confluency in 150 cm 2 flasks at 37°C in Dulbecco's Minimal Essential Medium/10% newborn calf serum (Sigma) in a 5% CO2 humidified atmosphere. Before carrying out the procedure the appearance of cells is visually checked and their overall viability (>97%) assessed by trypan blue staining. After removing the medium the adherent cells are rinsed in Dulbecco's PBS (-Ca 2+ /Mg2+) and around 75% of the cells are detached by trypsin treatment. Isolation of nuclei is carried out using established protocols (Protocol
  • NP40 lysis buffer 10 mM Tris-HCl [pH 7.5]; 10 mM NaCl; 3 mM MgCl 2 ; 0.5% NP-40; 0.15 mM spermine-tetrachloride; 0.15 mM spermidine-trichloride
  • the nuclei are purified from the lysate by low speed centrifugation and washed once in
  • Restriction enzyme buffer* 50 mM Tris-HCl [pH 8.0]; 100 mM NaCl; 3 mM MgCl2; 0.15 mM spermine-tetrachloride; 0.15 mM spermidine-trichloride.
  • the purified nuclei are resuspended in 500 ⁇ l NEB Buffer 3 (100 mM NaCl, 50 mM Tris-HCl [pH7.9], 10 mM MgCl2, 1 mM dithiothreitol) to yield a final volume of around 800 ⁇ l.
  • Six 100 ⁇ l aliquots of the nuclei suspension are distributed into six separate microcentrifuge tubes which are subjected to the following treatments:
  • Reaction 2 100 units Mbo I (recombinant; 5 units/ ⁇ l; New England Biolabs)
  • Reaction 3 50 units Mbo I
  • oligonucleotide adapters There are at least two requirements for oligonucleotide adapters, including: i) presence of a single-stranded cohesive end that will base pair specifically with the DNA fragment ends produced by the genomic restriction nuclease fragmentation reaction, and ii) they need to have a unique sequence that is different from any other sequence found in eukaryotic genomes and can therefore act as a specific primer binding site during the PCR reaction.
  • the residue underlined corresponds to the 5 'OH group of the top strand oligonucleotide that can be labeled with, for example, radioactive and/or fluorescent labels to facilitate the subsequent detection of the PCR products.
  • the residue underlined may be phosphorylated to ensure the formation of a covalent link between the adapter oligonucleotide and the cleaved genomic DNA during the ligation reaction.
  • sequences shown in italics are derived from bacteriophage Ml 3 and used extensively as a 'universal' sequencing primer in standard plasmids and bacteriophage cloning vectors.
  • the motif does not display any obvious homologies to any sequenced eukaryotic genome and will therefore not cross-hybridize to endogenous loci during PCR reactions. Any other unique oligonucleotide sequence not occurring in the tested genome is suitable for this task.
  • oligonucleotide is purified away from uninco ⁇ orated rATP using a G25 microspin column (Amersham-Pharmacia). 8 pmoles of the 'top strand' oligonucleotide are added to the column eluate and, after heating to 75 °C for 5 minutes, the mixture is allowed to cool to room temperature to anneal the two strands. 2 ⁇ l of the annealed
  • BamHI adapter is ligated to 1 ⁇ g of hypersensitive site Mbol-cleaved genomic DNA at 16°C for 1 hour in a final volume of 10 ⁇ (note that due to the small number of cleaved Mbo I sites in the genomic DNA the adapter is likely to be in substantial excess and therefore no alkaline phosphatase treatment of the genomic DNA is required to prevent ligation of genomic fragments to each other). 1 ⁇ l of the ligation reaction containing the adapter-tagged DNA is used for each PCR reaction.
  • PCR reactions are carried out in a total reaction volume of 50 ⁇ l with a Stratagene Robocycler using the M ⁇ P 'Easy Start' system (obtained from Merck). Amplification is carried out for 40 cycles (45 seconds at 95°C; 45 seconds at 55°C; 1 minute at 72°C). 10 ⁇ l of each PCR reaction are analyzed on a 0.7% agarose/TBE gel.
  • the HS data set identified in Example 1 and shown in Figure 9 is analysed for the presence of HS consensus sequences using the computer program YEBIS (www- scc.jst.go.jp/YEBIS/MotifExtraction/; Yada et al., 1998) with default parameters. 17 different sequences, ranging in length from 7 to 13 nucleotides are identified ( Figure 10).
  • sequences e.g. 'Motif 1'
  • 'Motif 1' Some of the sequences (e.g. 'Motif 1') are relatively short, but are present as identical copies in several different HS sequences. Other sequences are substantially longer, display some degree of variability, but consensus sequences with highly conserved residues in particular positions emerge clearly in all cases.
  • the HS data set was also processed using the MOTIFSAMPLER algorithm inco ⁇ orating a higher-order background model (www.esat.kuleuven.ac.be/ ⁇ thij sAVOrk/MotifSampler.html; Thijs et al, 2001).
  • This program differs from YEBIS because the length of the motifs and number of detected motifs can be entered as part of the search criteria.
  • Analysis of the HS data set was carried out by specifying the expected lengths of motifs as 8, 12 and 15, respectively, in three independent runs. Again, motifs shared by different members of the HS data set were successful identified.
  • the MOTIFSAMPLER motifs of length 12 appeared to be most effective.
  • the MOTIFSAMPLER output is shown in Figure 11.
  • Example 3 The variability in the various consensus sequences identified in Example 3 is encoded as 'position-specific scoring matrices' ('PSSMs'; Freeh et al, 1997).
  • the PSSMs obtained using the data obtained from YEBIS is shown in Figure 12.
  • the PSSMs are rearranged in a different format from the aligned sequences using a custom-written PERL script YEBIS- MATRLX.
  • the resulting PSSMs are listed in Figure 13.
  • DNA sequences are analysed for high-density clusters of consensus sequences identified in the HS data set using PSSMs based on the YEBIS and MOTIFSAMPLER results (described above).
  • the relatively small HS data set available at this time does not yet allow the definition of high-resolution stochastic models to optimise the recognition rate of HSs.
  • the program CISTER that inco ⁇ orates a hidden Markov model to enhance the rate of detection of biologically significant cz ' s-element clusters (Frith et al, 2001) is used.
  • the HS-PSSMs are fed into the program CISTER (sullivan.bu.edu/ ⁇ mfrifh/cister.shtml) to locate the occurrence of similar motif clusters in new DNA sequences that were not part of the initial training set.
  • the resulting program identifies with a high degree of accuracy a number of constitutive HSs present in viral and cellular gene promoters. This result indicates that the prediction of novel HSs is feasible using information extracted from our HS data set.
  • Human ⁇ -globin constitutive HS5 (Genbank Accession No. AF064190).
  • Mouse mammary tumour virus 3 ' long terminal repeat (MMTV-3' LTR; Genbank Accession No. MMTPRO).
  • the DNA sequence tested consists of a continuous string of the 59 HS core sequences (HS Core Sequences Group A & B), preceded by the same sequence randomised (10 bp Randomized HS Core Sequences) ( Figure 7).
  • This procedure creates a test sequence which contains two halves: the first half is random (and thus should lack HS-specific motifs), whereas the other half is 'packed' with all the HS sequences.
  • the randomisation is carried out in blocks of lObp, so any local variation in base composition is preserved, whereas regulatory motifs get effectively scrambled.
  • PSSMs are compiled from two non-overlapping subsets of HS core sequences: motifs were separately derived from 'Group A' and 'Group B'. This has two pu ⁇ oses:
  • the signal to noise ratio is be assessed by comparing the specific signal obtained in the right half against the randomised sequence.
  • HSs computer-based rules from a set of newly identified HSs are defined that enable the prediction of the occurrence and positions of HSs in DNA sequences with a high degree of accuracy. Further increases in the size of the proprietary HS database will help to refine the search for HS-consensus sequences and improve the reliability of this approach even further. Accordingly, the positions of HSs can now be predicted using bioinformatic tools, rather than using exclusively experimental tools; it may be possible to define the HS consensus sequences with even higher accuracy and identify further HS consensus sequences once there is access to a larger HS data set ie. a larger number of HS sequences; it is possible to apply bioinformatic tools to predict the presence and positions of HSs near key genes for biotechnological and medical interventions.
  • a computer automated method for identifying one or more HS consensus sequences comprising the steps of: (a) providing a plurality of HS core sequences; (b) using a search algorithm to search for a plurality of motifs that are shared by the HS core sequences; and(c) returning one or more HS consensus sequences comprising a plurality of motifs identified in step (b).
  • a computer automated method for identifying one or more HS sequences comprising the steps of: (a) identifying a plurality of HS core sequences; (b) using a search algorithm to search for a plurality of motifs that are shared by the plurality of HS core sequences; (c) returning one or more HS consensus sequences comprising a plurality of the motifs identified in step (b); and (d) searching for one or more HS sequences comprising one or more HS consensus sequences.
  • a computer automated method for identifying an HS core sequence comprising the steps of: (a) providing a DNA sequence in the sense or antisense orientation that is not part of the plurality of HS core sequences; (b) providing an HS sequence; and (c) searching the DNA sequence for the presence a hypersensitive restriction site.
  • Chromatin fine structure profiles for a developmentally regulated gene reorganization of the lysozyme locus before trans-activator binding and gene expression.
  • NM23-H1 and NM23-H2 repress transcriptional activities of nuclease-hypersensitive elements in the platelet-derived growth factor-A promoter. J. Biol. Chem. 277, 1560-1567. Mautner, J., Joos, S., Werner, T., Eick, D., Bornkamm, G.W., and Polack, A. (1995). Identification of two enhancer elements downstream of the human c-myc gene. Nucl. Acids Res. 23, 72-80.

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • General Health & Medical Sciences (AREA)
  • Organic Chemistry (AREA)
  • Analytical Chemistry (AREA)
  • Biotechnology (AREA)
  • Biophysics (AREA)
  • Theoretical Computer Science (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Medical Informatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • General Chemical & Material Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Microbiology (AREA)
  • Molecular Biology (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Pharmacology & Pharmacy (AREA)
  • Biochemistry (AREA)
  • Immunology (AREA)
  • Genetics & Genomics (AREA)
  • Medicinal Chemistry (AREA)
  • Veterinary Medicine (AREA)
  • Public Health (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Animal Behavior & Ethology (AREA)
  • Traffic Control Systems (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

La présente invention concerne un procédé pour identifier une ou plusieurs séquences consensus de site hypersensible (hypersensitive site : HS). Ce procédé consiste (a) à disposer d'une pluralité de séquences de noyau d'HS, (b) à utiliser un algorithme de recherche afin de rechercher une pluralité de motifs qui sont partagés par des séquences de noyau d'HS, puis à retourner une ou plusieurs séquences consensus d'HS comprenant une pluralité de motifs identifiés dans l'étape (b).
PCT/GB2003/002895 2002-07-04 2003-07-04 Procede Ceased WO2004005547A2 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2003281288A AU2003281288A1 (en) 2002-07-04 2003-07-04 Method for identifying hypersensitive site consensus sequences

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
GB0215547A GB0215547D0 (en) 2002-07-04 2002-07-04 Method
GB0215547.1 2002-07-04
PCT/GB2002/003080 WO2003004702A2 (fr) 2001-07-05 2002-07-04 Methode
GBPCT/GB02/003080 2002-07-04

Publications (2)

Publication Number Publication Date
WO2004005547A2 true WO2004005547A2 (fr) 2004-01-15
WO2004005547A3 WO2004005547A3 (fr) 2004-03-25

Family

ID=30117081

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/GB2003/002895 Ceased WO2004005547A2 (fr) 2002-07-04 2003-07-04 Procede

Country Status (2)

Country Link
AU (1) AU2003281288A1 (fr)
WO (1) WO2004005547A2 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1692262B1 (fr) * 2003-10-27 2018-08-15 Merck Sharp & Dohme Corp. Procede pour designer des arnsi pour l'extinction de genes

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB0116453D0 (en) * 2001-07-05 2001-08-29 Imp College Innovations Ltd Method

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1692262B1 (fr) * 2003-10-27 2018-08-15 Merck Sharp & Dohme Corp. Procede pour designer des arnsi pour l'extinction de genes

Also Published As

Publication number Publication date
WO2004005547A3 (fr) 2004-03-25
AU2003281288A8 (en) 2004-01-23
AU2003281288A1 (en) 2004-01-23

Similar Documents

Publication Publication Date Title
Spielmann et al. Structural variation in the 3D genome
US20230304000A1 (en) Compositions and methods of improving specificity in genomic engineering using rna-guided endonucleases
Kim et al. Variation in human chromosome 21 ribosomal RNA genes characterized by TAR cloning and long-read sequencing
Pandey et al. Methods for analysis of circular RNAs
Desmet et al. Human Splicing Finder: an online bioinformatics tool to predict splicing signals
Clouaire et al. Recruitment of MBD1 to target genes requires sequence-specific interaction of the MBD domain with methylated DNA
US20130196863A1 (en) Method of determining chromatin structure
US20160004814A1 (en) Methods and compositions related to regulation of nucleic acids
Wang et al. RNA-DNA differences are generated in human cells within seconds after RNA exits polymerase II
Eirín‐López et al. H2A. Bbd: a quickly evolving hypervariable mammalian histone that destabilizes nucleosomes in an acetylation‐independent way
Tharakan et al. Minireview: novel micropeptide discovery by proteomics and deep sequencing methods
JP2002511252A (ja) 卵巣腫瘍組織からのヒト核酸配列
Lin et al. Evolution of alternative splicing in primate brain transcriptomes
JP2008301825A (ja) ウイルス感染および腫瘍抑制に関与する哺乳動物遺伝子
CN102618549B (zh) Ncstn突变型基因、其鉴定方法和工具
Mathov et al. Harnessing epigenetics to study human evolution
Endrizzi et al. Comparative sequence analysis of the mouse and human Lgn1/SMA interval
Cheng et al. Dynamic chromatin architectures provide insights into the genetics of cattle myogenesis
Deng et al. Cdyl2-60aa encoded by CircCDYL2 accelerates cardiomyocyte death by blocking APAF1 ubiquitination in rats
WO2004005547A2 (fr) Procede
Lafontaine et al. ADAPT: a molecular mechanics approach for studying the structural properties of long DNA sequences
CN104561015B (zh) Myl4基因突变体及其应用
Hannon Bozorgmehr Four classic “de novo” genes all have plausible homologs and likely evolved from retro-duplicated or pseudogenic sequences
EP4165182A2 (fr) Modification génétique
Cinque et al. A novel complex genomic rearrangement affecting the KCNJ2 regulatory region causes a variant of Cooks syndrome

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: JP

WWW Wipo information: withdrawn in national office

Country of ref document: JP